Text model · gpt-oss
gpt-oss 20B requirements
gpt-oss family · 21B params (Mixture-of-Experts, 3.6B active) · released Aug 2025 · 10.2M Ollama pulls. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4060 Ti (16GB).
Quantization sizes
| Quantization | Size on disk |
|---|---|
| Q2_K | 8.8 GB est |
| Q3_K_M | 10.3 GB est |
| Q4_K_M (default) | 11.28 GB |
| Q5_K_M | 15 GB est |
| Q6_K | 17.2 GB est |
| Q8_0 | 22.3 GB est |
| FP16 | 42 GB est |
Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.
Run it
ollama run gpt-oss:20b llama-cli -hf ggml-org/gpt-oss-20b-GGUF:Q4_K_M lms get ggml-org/gpt-oss-20b-GGUF Which devices can run gpt-oss 20B?
Apple Silicon Macs
RAM-only laptops
iPhone & iPad
Android
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM or RAM does gpt-oss 20B need?
At Q4_K_M, gpt-oss 20B needs about 13.2 GB (weights ~11.28 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~24.2 GB.
Can gpt-oss 20B run on a laptop?
gpt-oss 20B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.
Is gpt-oss 20B cheaper to run because it is a MoE model?
It is faster, not lighter. gpt-oss 20B activates only 3.6B of 21B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 21B.
Can I use gpt-oss 20B commercially?
Yes. gpt-oss 20B is licensed Apache-2.0, which permits commercial use.
OpenAI gpt-oss 20B, MoE (3.6B active). Ships native MXFP4 4-bit; the official ggml-org GGUF is ~11.3GB (the real default, not a Q4_K_M requant). Runs on a 16GB GPU, or a Mac with 24GB+ of unified memory for working headroom.
Sources
Memory figures are estimates. See methodology.