Text model · Qwen3

Qwen3 235B A22B requirements

Qwen3 family · 235B params (Mixture-of-Experts, 22B active) · released Apr 2025. Minimum to run at Q4_K_M: Apple M3 Ultra (256GB).

LicenseApache-2.0· Commercial OK↓ 428.4K/mo♥ 1.1Kon HuggingFace

Q4_K_M

132.39 GB

Q8_0

Total @ Q4 (4k)

~136.9 GB

Context

128 k

Quantization sizes

GGUF quantson disk

Quantization	Size on disk
Q2_K	98.4 GB est
Q3_K_M	114.9 GB est
Q4_K_M (default)	132.39 GB
Q5_K_M	167.4 GB est
Q6_K	192.7 GB est
Q8_0	249.7 GB est
FP16	470 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama

$ ollama run qwen3:235b

llama.cpp

$ llama-cli -hf unsloth/Qwen3-235B-A22B-GGUF:Q4_K_M

LM Studio

$ lms get unsloth/Qwen3-235B-A22B-GGUF

Which devices can run Qwen3 235B A22B?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

Android

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB)No

FAQ

How much VRAM or RAM does Qwen3 235B A22B need?

At Q4_K_M, Qwen3 235B A22B needs about 136.9 GB (weights ~132.39 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~254.2 GB.

Can Qwen3 235B A22B run on a laptop?

Qwen3 235B A22B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Is Qwen3 235B A22B cheaper to run because it is a MoE model?

It is faster, not lighter. Qwen3 235B A22B activates only 22B of 235B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 235B.

Can I use Qwen3 235B A22B commercially?

Yes. Qwen3 235B A22B is licensed Apache-2.0, which permits commercial use.

Qwen3 235B sparse MoE (22B active). Q4_K_M ~132GB, fits a 192GB Mac. Size from the unsloth GGUF repo, exact-matched to the Ollama 235B tag.

Sources

Memory figures are estimates. See methodology.