Text model · Llama 4

Llama 4 Scout requirements

Llama 4 family · 109B params (Mixture-of-Experts, 17B active) · released Apr 2025 · 1.7M Ollama pulls. Minimum to run at Q4_K_M: Apple M4 Max (128GB).

LicenseLlama 4 Community· Conditional↓ 368.5K/mo♥ 1.3Kon HuggingFace

Q4_K_M

60.87 GB

Q8_0

Total @ Q4 (4k)

~64.2 GB

Context

128 k

Quantization sizes

GGUF quantson disk

Quantization	Size on disk
Q2_K	45.6 GB est
Q3_K_M	53.3 GB est
Q4_K_M (default)	60.87 GB
Q5_K_M	77.7 GB est
Q6_K	89.4 GB est
Q8_0	115.8 GB est
FP16	218 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama

$ ollama run llama4:scout

llama.cpp

$ llama-cli -hf unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF:Q4_K_M

LM Studio

$ lms get unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF

Which devices can run Llama 4 Scout?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

Android

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB)No

FAQ

How much VRAM or RAM does Llama 4 Scout need?

At Q4_K_M, Llama 4 Scout needs about 64.2 GB (weights ~60.87 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~119.1 GB.

Can Llama 4 Scout run on a laptop?

Llama 4 Scout is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Is Llama 4 Scout cheaper to run because it is a MoE model?

It is faster, not lighter. Llama 4 Scout activates only 17B of 109B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 109B.

Can I use Llama 4 Scout commercially?

Conditionally. Llama 4 Community License: free under 700M MAU, with use restrictions.

Llama 4 Scout, 109B MoE (17B active), natively multimodal with very long context. Q4_K_M ~61GB across a 2-file series. Size from the unsloth GGUF repo (Ollama llama4:scout is a comparable ~63GB build).

Sources

Memory figures are estimates. See methodology.