Text model · Llama 4
Llama 4 Scout requirements
Llama 4 family · 109B params (Mixture-of-Experts, 17B active) · released Apr 2025 · 1.7M Ollama pulls. Minimum to run at Q4_K_M: Apple M4 Max (128GB).
Quantization sizes
| Quantization | Size on disk |
|---|---|
| Q2_K | 45.6 GB est |
| Q3_K_M | 53.3 GB est |
| Q4_K_M (default) | 60.87 GB |
| Q5_K_M | 77.7 GB est |
| Q6_K | 89.4 GB est |
| Q8_0 | 115.8 GB est |
| FP16 | 218 GB est |
Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.
Run it
ollama run llama4:scout llama-cli -hf unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF:Q4_K_M lms get unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF Which devices can run Llama 4 Scout?
Apple Silicon Macs
RAM-only laptops
iPhone & iPad
Android
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM or RAM does Llama 4 Scout need?
At Q4_K_M, Llama 4 Scout needs about 64.2 GB (weights ~60.87 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~119.1 GB.
Can Llama 4 Scout run on a laptop?
Llama 4 Scout is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.
Is Llama 4 Scout cheaper to run because it is a MoE model?
It is faster, not lighter. Llama 4 Scout activates only 17B of 109B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 109B.
Can I use Llama 4 Scout commercially?
Conditionally. Llama 4 Community License: free under 700M MAU, with use restrictions.
Llama 4 Scout, 109B MoE (17B active), natively multimodal with very long context. Q4_K_M ~61GB across a 2-file series. Size from the unsloth GGUF repo (Ollama llama4:scout is a comparable ~63GB build).
Sources
Memory figures are estimates. See methodology.