Skip to content
localmodel.run

Text model · gpt-oss

gpt-oss 20B requirements

gpt-oss family · 21B params (Mixture-of-Experts, 3.6B active) · released Aug 2025 · 10.2M Ollama pulls. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4060 Ti (16GB).

LicenseApache-2.0· Commercial OK↓ 4.9M/mo♥ 4.7Kon HuggingFace
Q4_K_M
11.28 GB
Q8_0
-
Total @ Q4 (4k)
~13.2 GB
Context
128 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K8.8 GB est
Q3_K_M10.3 GB est
Q4_K_M (default)11.28 GB
Q5_K_M15 GB est
Q6_K17.2 GB est
Q8_022.3 GB est
FP1642 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama
$ ollama run gpt-oss:20b
llama.cpp
$ llama-cli -hf ggml-org/gpt-oss-20b-GGUF:Q4_K_M
LM Studio
$ lms get ggml-org/gpt-oss-20b-GGUF

Which devices can run gpt-oss 20B?

FAQ

How much VRAM or RAM does gpt-oss 20B need?

At Q4_K_M, gpt-oss 20B needs about 13.2 GB (weights ~11.28 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~24.2 GB.

Can gpt-oss 20B run on a laptop?

gpt-oss 20B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.

Is gpt-oss 20B cheaper to run because it is a MoE model?

It is faster, not lighter. gpt-oss 20B activates only 3.6B of 21B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 21B.

Can I use gpt-oss 20B commercially?

Yes. gpt-oss 20B is licensed Apache-2.0, which permits commercial use.

OpenAI gpt-oss 20B, MoE (3.6B active). Ships native MXFP4 4-bit; the official ggml-org GGUF is ~11.3GB (the real default, not a Q4_K_M requant). Runs on a 16GB GPU, or a Mac with 24GB+ of unified memory for working headroom.

Sources

Memory figures are estimates. See methodology.