Text model · llama

Llama 3.3 70B requirements

llama family · 70B params · released Dec 2024 · 3.4M Ollama pulls · LMArena Elo 1318. Minimum to run at Q4_K_M: Apple M4 Max (64GB).

LicenseLlama 3.3 Community· Conditional↓ 447.1K/mo♥ 2.8Kon HuggingFace

Q4_K_M

42.52 GB

Q8_0

74.98 GB

Total @ Q4 (4k)

~45.3 GB

Context

128 k

Quantization sizes

GGUF quantson disk

Quantization	Size on disk
Q2_K	29.3 GB est
Q3_K_M	34.2 GB est
Q4_K_M (default)	42.52 GB
Q5_K_M	49.9 GB est
Q6_K	57.4 GB est
Q8_0	74.98 GB
FP16	140 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama

$ ollama run llama3.3:70b

llama.cpp

$ llama-cli -hf bartowski/Llama-3.3-70B-Instruct-GGUF:Q4_K_M

LM Studio

$ lms get bartowski/Llama-3.3-70B-Instruct-GGUF

Which devices can run Llama 3.3 70B?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

Android

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB)No

FAQ

How much VRAM or RAM does Llama 3.3 70B need?

At Q4_K_M, Llama 3.3 70B needs about 45.3 GB (weights ~42.52 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~77.8 GB.

Can Llama 3.3 70B run on a laptop?

Llama 3.3 70B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Can I use Llama 3.3 70B commercially?

Conditionally. Llama 3.3 Community License: free under 700M MAU.

Released December 6, 2024. Delivers near-405B performance at 70B cost. Q4_K_M and Q8_0 sizes from bartowski HF repo cross-validated against Ollama tags page (43GB and 75GB displayed).

Sources

Memory figures are estimates. See methodology.