Skip to content
localmodel.run

Text model · llama

Llama 3.3 70B requirements

llama family · 70B params · released Dec 2024 · 3.4M Ollama pulls · LMArena Elo 1318. Minimum to run at Q4_K_M: Apple M4 Max (64GB).

LicenseLlama 3.3 Community· Conditional↓ 447.1K/mo♥ 2.8Kon HuggingFace
Q4_K_M
42.52 GB
Q8_0
74.98 GB
Total @ Q4 (4k)
~45.3 GB
Context
128 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K29.3 GB est
Q3_K_M34.2 GB est
Q4_K_M (default)42.52 GB
Q5_K_M49.9 GB est
Q6_K57.4 GB est
Q8_074.98 GB
FP16140 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama
$ ollama run llama3.3:70b
llama.cpp
$ llama-cli -hf bartowski/Llama-3.3-70B-Instruct-GGUF:Q4_K_M
LM Studio
$ lms get bartowski/Llama-3.3-70B-Instruct-GGUF

Which devices can run Llama 3.3 70B?

FAQ

How much VRAM or RAM does Llama 3.3 70B need?

At Q4_K_M, Llama 3.3 70B needs about 45.3 GB (weights ~42.52 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~77.8 GB.

Can Llama 3.3 70B run on a laptop?

Llama 3.3 70B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Can I use Llama 3.3 70B commercially?

Conditionally. Llama 3.3 Community License: free under 700M MAU.

Released December 6, 2024. Delivers near-405B performance at 70B cost. Q4_K_M and Q8_0 sizes from bartowski HF repo cross-validated against Ollama tags page (43GB and 75GB displayed).

Sources

Memory figures are estimates. See methodology.