Text model · mistral
Mixtral 8x7B requirements
mistral family · 46.7B params (Mixture-of-Experts, 12.9B active) · released Dec 2023 · 2.7M Ollama pulls. Minimum to run at Q4_K_M: Nvidia GeForce RTX 5090 (32GB).
Quantization sizes
| Quantization | Size on disk |
|---|---|
| Q2_K | 19.6 GB est |
| Q3_K_M | 22.8 GB est |
| Q4_K_M (default) | 26.49 GB |
| Q5_K_M | 33.3 GB est |
| Q6_K | 38.3 GB est |
| Q8_0 | 46.22 GB |
| FP16 | 93.4 GB est |
Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.
Run it
ollama run mixtral:8x7b llama-cli -hf MaziyarPanahi/Mixtral-8x7B-Instruct-v0.1-GGUF:Q4_K_M lms get MaziyarPanahi/Mixtral-8x7B-Instruct-v0.1-GGUF Which devices can run Mixtral 8x7B?
Apple Silicon Macs
RAM-only laptops
iPhone & iPad
Android
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM or RAM does Mixtral 8x7B need?
At Q4_K_M, Mixtral 8x7B needs about 28.9 GB (weights ~26.49 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~48.6 GB.
Can Mixtral 8x7B run on a laptop?
Mixtral 8x7B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.
Is Mixtral 8x7B cheaper to run because it is a MoE model?
It is faster, not lighter. Mixtral 8x7B activates only 12.9B of 46.7B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 46.7B.
Can I use Mixtral 8x7B commercially?
Yes. Mixtral 8x7B is licensed Apache-2.0, which permits commercial use.
Mixtral 8x7B sparse MoE: 46.7B total, 12.9B active (2 of 8 experts per token). All experts must fit in memory. Q4_K_M and Q8_0 sizes from the MaziyarPanahi GGUF repo.
Sources
Memory figures are estimates. See methodology.