Skip to content
localmodel.run

Text model · Llama 4

Llama 4 Scout requirements

Llama 4 family · 109B params (Mixture-of-Experts, 17B active) · released Apr 2025 · 1.7M Ollama pulls. Minimum to run at Q4_K_M: Apple M4 Max (128GB).

LicenseLlama 4 Community· Conditional↓ 368.5K/mo♥ 1.3Kon HuggingFace
Q4_K_M
60.87 GB
Q8_0
-
Total @ Q4 (4k)
~64.2 GB
Context
128 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K45.6 GB est
Q3_K_M53.3 GB est
Q4_K_M (default)60.87 GB
Q5_K_M77.7 GB est
Q6_K89.4 GB est
Q8_0115.8 GB est
FP16218 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama
$ ollama run llama4:scout
llama.cpp
$ llama-cli -hf unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF:Q4_K_M
LM Studio
$ lms get unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF

Which devices can run Llama 4 Scout?

FAQ

How much VRAM or RAM does Llama 4 Scout need?

At Q4_K_M, Llama 4 Scout needs about 64.2 GB (weights ~60.87 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~119.1 GB.

Can Llama 4 Scout run on a laptop?

Llama 4 Scout is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Is Llama 4 Scout cheaper to run because it is a MoE model?

It is faster, not lighter. Llama 4 Scout activates only 17B of 109B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 109B.

Can I use Llama 4 Scout commercially?

Conditionally. Llama 4 Community License: free under 700M MAU, with use restrictions.

Llama 4 Scout, 109B MoE (17B active), natively multimodal with very long context. Q4_K_M ~61GB across a 2-file series. Size from the unsloth GGUF repo (Ollama llama4:scout is a comparable ~63GB build).

Sources

Memory figures are estimates. See methodology.