Text model · gpt-oss

gpt-oss 20B requirements

gpt-oss family · 21B params (Mixture-of-Experts, 3.6B active) · released Aug 2025 · 10.2M Ollama pulls. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4060 Ti (16GB).

LicenseApache-2.0· Commercial OK↓ 4.9M/mo♥ 4.7Kon HuggingFace

Q4_K_M

11.28 GB

Q8_0

Total @ Q4 (4k)

~13.2 GB

Context

128 k

Quantization sizes

GGUF quantson disk

Quantization	Size on disk
Q2_K	8.8 GB est
Q3_K_M	10.3 GB est
Q4_K_M (default)	11.28 GB
Q5_K_M	15 GB est
Q6_K	17.2 GB est
Q8_0	22.3 GB est
FP16	42 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama

$ ollama run gpt-oss:20b

llama.cpp

$ llama-cli -hf ggml-org/gpt-oss-20b-GGUF:Q4_K_M

LM Studio

$ lms get ggml-org/gpt-oss-20b-GGUF

Which devices can run gpt-oss 20B?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

Android

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB)Yes

FAQ

How much VRAM or RAM does gpt-oss 20B need?

At Q4_K_M, gpt-oss 20B needs about 13.2 GB (weights ~11.28 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~24.2 GB.

Can gpt-oss 20B run on a laptop?

gpt-oss 20B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.

Is gpt-oss 20B cheaper to run because it is a MoE model?

It is faster, not lighter. gpt-oss 20B activates only 3.6B of 21B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 21B.

Can I use gpt-oss 20B commercially?

Yes. gpt-oss 20B is licensed Apache-2.0, which permits commercial use.

OpenAI gpt-oss 20B, MoE (3.6B active). Ships native MXFP4 4-bit; the official ggml-org GGUF is ~11.3GB (the real default, not a Q4_K_M requant). Runs on a 16GB GPU, or a Mac with 24GB+ of unified memory for working headroom.

Sources

Memory figures are estimates. See methodology.