Skip to content
localmodel.run

Text model · gpt-oss

gpt-oss 120B requirements

gpt-oss family · 117B params (Mixture-of-Experts, 5.1B active) · released Aug 2025. Minimum to run at Q4_K_M: Apple M4 Max (128GB).

LicenseApache-2.0· Commercial OK↓ 2.8M/mo♥ 4.9Kon HuggingFace
Q4_K_M
59.03 GB
Q8_0
-
Total @ Q4 (4k)
~62.4 GB
Context
128 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K49 GB est
Q3_K_M57.2 GB est
Q4_K_M (default)59.03 GB
Q5_K_M83.4 GB est
Q6_K95.9 GB est
Q8_0124.3 GB est
FP16234 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama
$ ollama run gpt-oss:120b
llama.cpp
$ llama-cli -hf ggml-org/gpt-oss-120b-GGUF:Q4_K_M
LM Studio
$ lms get ggml-org/gpt-oss-120b-GGUF

Which devices can run gpt-oss 120B?

FAQ

How much VRAM or RAM does gpt-oss 120B need?

At Q4_K_M, gpt-oss 120B needs about 62.4 GB (weights ~59.03 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~127.7 GB.

Can gpt-oss 120B run on a laptop?

gpt-oss 120B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Is gpt-oss 120B cheaper to run because it is a MoE model?

It is faster, not lighter. gpt-oss 120B activates only 5.1B of 117B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 117B.

Can I use gpt-oss 120B commercially?

Yes. gpt-oss 120B is licensed Apache-2.0, which permits commercial use.

OpenAI gpt-oss 120B, MoE (5.1B active). Native MXFP4; the official ggml-org GGUF is ~59GB, so it fits an 80GB GPU or a 96GB+ Mac.

Sources

Memory figures are estimates. See methodology.