Text model · Qwen3

Qwen3 30B-A3B requirements

Qwen3 family · 30.5B params (Mixture-of-Experts, 3.3B active) · released Apr 2025 · 23M Ollama pulls · LMArena Elo 1383. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4090 (24GB).

LicenseApache-2.0· Commercial OK↓ 1.5M/mo♥ 900on HuggingFace

Q4_K_M

18.6 GB

Q8_0

32.5 GB

Total @ Q4 (4k)

~20.7 GB

Context

32 k

Quantization sizes

GGUF quantson disk

Quantization	Size on disk
Q2_K	12.8 GB est
Q3_K_M	14.9 GB est
Q4_K_M (default)	18.6 GB
Q5_K_M	21.7 GB est
Q6_K	25 GB est
Q8_0	32.5 GB
FP16	61 GB

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama

$ ollama run qwen3:30b-a3b

llama.cpp

$ llama-cli -hf unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M

LM Studio

$ lms get unsloth/Qwen3-30B-A3B-GGUF

Which devices can run Qwen3 30B-A3B?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

Android

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB)Tight

FAQ

How much VRAM or RAM does Qwen3 30B-A3B need?

At Q4_K_M, Qwen3 30B-A3B needs about 20.7 GB (weights ~18.6 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~34.6 GB.

Can Qwen3 30B-A3B run on a laptop?

Qwen3 30B-A3B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.

Is Qwen3 30B-A3B cheaper to run because it is a MoE model?

It is faster, not lighter. Qwen3 30B-A3B activates only 3.3B of 30.5B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 30.5B.

Can I use Qwen3 30B-A3B commercially?

Yes. Qwen3 30B-A3B is licensed Apache-2.0, which permits commercial use.

MoE: 30.5B total / 3.3B active (128 total experts, 8 activated per token). Q4_K_M=18.6GB and Q8_0=32.5GB from official unsloth/Qwen3-30B-A3B-GGUF HF repo, cross-confirmed with Qwen/Qwen3-30B-A3B-GGUF. Native context 32K, extendable to 128K via YaRN. Despite large Q4 size, inference is fast due to only 3.3B active params.

Sources

Memory figures are estimates. See methodology.