Text model · gpt-oss
gpt-oss 120B requirements
gpt-oss family · 117B params (Mixture-of-Experts, 5.1B active) · released Aug 2025. Minimum to run at Q4_K_M: Apple M4 Max (128GB).
Quantization sizes
| Quantization | Size on disk |
|---|---|
| Q2_K | 49 GB est |
| Q3_K_M | 57.2 GB est |
| Q4_K_M (default) | 59.03 GB |
| Q5_K_M | 83.4 GB est |
| Q6_K | 95.9 GB est |
| Q8_0 | 124.3 GB est |
| FP16 | 234 GB est |
Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.
Run it
ollama run gpt-oss:120b llama-cli -hf ggml-org/gpt-oss-120b-GGUF:Q4_K_M lms get ggml-org/gpt-oss-120b-GGUF Which devices can run gpt-oss 120B?
Apple Silicon Macs
RAM-only laptops
iPhone & iPad
Android
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM or RAM does gpt-oss 120B need?
At Q4_K_M, gpt-oss 120B needs about 62.4 GB (weights ~59.03 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~127.7 GB.
Can gpt-oss 120B run on a laptop?
gpt-oss 120B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.
Is gpt-oss 120B cheaper to run because it is a MoE model?
It is faster, not lighter. gpt-oss 120B activates only 5.1B of 117B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 117B.
Can I use gpt-oss 120B commercially?
Yes. gpt-oss 120B is licensed Apache-2.0, which permits commercial use.
OpenAI gpt-oss 120B, MoE (5.1B active). Native MXFP4; the official ggml-org GGUF is ~59GB, so it fits an 80GB GPU or a 96GB+ Mac.
Sources
Memory figures are estimates. See methodology.