Text model · mistral

Mixtral 8x7B requirements

mistral family · 46.7B params (Mixture-of-Experts, 12.9B active) · released Dec 2023 · 2.7M Ollama pulls. Minimum to run at Q4_K_M: Nvidia GeForce RTX 5090 (32GB).

LicenseApache-2.0· Commercial OK↓ 673.7K/mo♥ 4.7Kon HuggingFace

Q4_K_M

26.49 GB

Q8_0

46.22 GB

Total @ Q4 (4k)

~28.9 GB

Context

32 k

Quantization sizes

GGUF quantson disk

Quantization	Size on disk
Q2_K	19.6 GB est
Q3_K_M	22.8 GB est
Q4_K_M (default)	26.49 GB
Q5_K_M	33.3 GB est
Q6_K	38.3 GB est
Q8_0	46.22 GB
FP16	93.4 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama

$ ollama run mixtral:8x7b

llama.cpp

$ llama-cli -hf MaziyarPanahi/Mixtral-8x7B-Instruct-v0.1-GGUF:Q4_K_M

LM Studio

$ lms get MaziyarPanahi/Mixtral-8x7B-Instruct-v0.1-GGUF

Which devices can run Mixtral 8x7B?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

Android

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB)No

FAQ

How much VRAM or RAM does Mixtral 8x7B need?

At Q4_K_M, Mixtral 8x7B needs about 28.9 GB (weights ~26.49 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~48.6 GB.

Can Mixtral 8x7B run on a laptop?

Mixtral 8x7B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Is Mixtral 8x7B cheaper to run because it is a MoE model?

It is faster, not lighter. Mixtral 8x7B activates only 12.9B of 46.7B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 46.7B.

Can I use Mixtral 8x7B commercially?

Yes. Mixtral 8x7B is licensed Apache-2.0, which permits commercial use.

Mixtral 8x7B sparse MoE: 46.7B total, 12.9B active (2 of 8 experts per token). All experts must fit in memory. Q4_K_M and Q8_0 sizes from the MaziyarPanahi GGUF repo.

Sources

Memory figures are estimates. See methodology.