Skip to content
localmodel.run

Text model · mistral

Mixtral 8x7B requirements

mistral family · 46.7B params (Mixture-of-Experts, 12.9B active) · released Dec 2023 · 2.7M Ollama pulls. Minimum to run at Q4_K_M: Nvidia GeForce RTX 5090 (32GB).

LicenseApache-2.0· Commercial OK↓ 673.7K/mo♥ 4.7Kon HuggingFace
Q4_K_M
26.49 GB
Q8_0
46.22 GB
Total @ Q4 (4k)
~28.9 GB
Context
32 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K19.6 GB est
Q3_K_M22.8 GB est
Q4_K_M (default)26.49 GB
Q5_K_M33.3 GB est
Q6_K38.3 GB est
Q8_046.22 GB
FP1693.4 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama
$ ollama run mixtral:8x7b
llama.cpp
$ llama-cli -hf MaziyarPanahi/Mixtral-8x7B-Instruct-v0.1-GGUF:Q4_K_M
LM Studio
$ lms get MaziyarPanahi/Mixtral-8x7B-Instruct-v0.1-GGUF

Which devices can run Mixtral 8x7B?

FAQ

How much VRAM or RAM does Mixtral 8x7B need?

At Q4_K_M, Mixtral 8x7B needs about 28.9 GB (weights ~26.49 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~48.6 GB.

Can Mixtral 8x7B run on a laptop?

Mixtral 8x7B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.

Is Mixtral 8x7B cheaper to run because it is a MoE model?

It is faster, not lighter. Mixtral 8x7B activates only 12.9B of 46.7B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 46.7B.

Can I use Mixtral 8x7B commercially?

Yes. Mixtral 8x7B is licensed Apache-2.0, which permits commercial use.

Mixtral 8x7B sparse MoE: 46.7B total, 12.9B active (2 of 8 experts per token). All experts must fit in memory. Q4_K_M and Q8_0 sizes from the MaziyarPanahi GGUF repo.

Sources

Memory figures are estimates. See methodology.