Text model · Qwen3
Qwen3 30B-A3B requirements
Qwen3 family · 30.5B params (Mixture-of-Experts, 3.3B active) · released Apr 2025 · 23M Ollama pulls · LMArena Elo 1383. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4090 (24GB).
Quantization sizes
| Quantization | Size on disk |
|---|---|
| Q2_K | 12.8 GB est |
| Q3_K_M | 14.9 GB est |
| Q4_K_M (default) | 18.6 GB |
| Q5_K_M | 21.7 GB est |
| Q6_K | 25 GB est |
| Q8_0 | 32.5 GB |
| FP16 | 61 GB |
Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.
Run it
ollama run qwen3:30b-a3b llama-cli -hf unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M lms get unsloth/Qwen3-30B-A3B-GGUF Which devices can run Qwen3 30B-A3B?
Apple Silicon Macs
RAM-only laptops
iPhone & iPad
Android
NVIDIA GPUs
FAQ
How much VRAM or RAM does Qwen3 30B-A3B need?
At Q4_K_M, Qwen3 30B-A3B needs about 20.7 GB (weights ~18.6 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~34.6 GB.
Can Qwen3 30B-A3B run on a laptop?
Qwen3 30B-A3B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.
Is Qwen3 30B-A3B cheaper to run because it is a MoE model?
It is faster, not lighter. Qwen3 30B-A3B activates only 3.3B of 30.5B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 30.5B.
Can I use Qwen3 30B-A3B commercially?
Yes. Qwen3 30B-A3B is licensed Apache-2.0, which permits commercial use.
MoE: 30.5B total / 3.3B active (128 total experts, 8 activated per token). Q4_K_M=18.6GB and Q8_0=32.5GB from official unsloth/Qwen3-30B-A3B-GGUF HF repo, cross-confirmed with Qwen/Qwen3-30B-A3B-GGUF. Native context 32K, extendable to 128K via YaRN. Despite large Q4 size, inference is fast due to only 3.3B active params.
Sources
Memory figures are estimates. See methodology.