Skip to content
localmodel.run

Text model · Qwen3

Qwen3 30B-A3B requirements

Qwen3 family · 30.5B params (Mixture-of-Experts, 3.3B active) · released Apr 2025 · 23M Ollama pulls · LMArena Elo 1383. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4090 (24GB).

LicenseApache-2.0· Commercial OK↓ 1.5M/mo♥ 900on HuggingFace
Q4_K_M
18.6 GB
Q8_0
32.5 GB
Total @ Q4 (4k)
~20.7 GB
Context
32 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K12.8 GB est
Q3_K_M14.9 GB est
Q4_K_M (default)18.6 GB
Q5_K_M21.7 GB est
Q6_K25 GB est
Q8_032.5 GB
FP1661 GB

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama
$ ollama run qwen3:30b-a3b
llama.cpp
$ llama-cli -hf unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M
LM Studio
$ lms get unsloth/Qwen3-30B-A3B-GGUF

Which devices can run Qwen3 30B-A3B?

FAQ

How much VRAM or RAM does Qwen3 30B-A3B need?

At Q4_K_M, Qwen3 30B-A3B needs about 20.7 GB (weights ~18.6 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~34.6 GB.

Can Qwen3 30B-A3B run on a laptop?

Qwen3 30B-A3B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.

Is Qwen3 30B-A3B cheaper to run because it is a MoE model?

It is faster, not lighter. Qwen3 30B-A3B activates only 3.3B of 30.5B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 30.5B.

Can I use Qwen3 30B-A3B commercially?

Yes. Qwen3 30B-A3B is licensed Apache-2.0, which permits commercial use.

MoE: 30.5B total / 3.3B active (128 total experts, 8 activated per token). Q4_K_M=18.6GB and Q8_0=32.5GB from official unsloth/Qwen3-30B-A3B-GGUF HF repo, cross-confirmed with Qwen/Qwen3-30B-A3B-GGUF. Native context 32K, extendable to 128K via YaRN. Despite large Q4 size, inference is fast due to only 3.3B active params.

Sources

Memory figures are estimates. See methodology.