Skip to content
localmodel.run

Text model · mistral

Mistral Nemo 12B requirements

mistral family · 12.2B params · released Jul 2024 · 4.9M Ollama pulls. Minimum to run at Q4_K_M: Nvidia GeForce RTX 3060 (12GB).

LicenseApache-2.0· Commercial OK↓ 374.6K/mo♥ 1.7Kon HuggingFace
Q4_K_M
6.96 GB
Q8_0
12.13 GB
Total @ Q4 (4k)
~8.6 GB
Context
128 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K5.1 GB est
Q3_K_M6 GB est
Q4_K_M (default)6.96 GB
Q5_K_M8.7 GB est
Q6_K10 GB est
Q8_012.13 GB
FP1624.4 GB est

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

Ollama
$ ollama run mistral-nemo:12b
llama.cpp
$ llama-cli -hf bartowski/Mistral-Nemo-Instruct-2407-GGUF:Q4_K_M
LM Studio
$ lms get bartowski/Mistral-Nemo-Instruct-2407-GGUF

Which devices can run Mistral Nemo 12B?

FAQ

How much VRAM or RAM does Mistral Nemo 12B need?

At Q4_K_M, Mistral Nemo 12B needs about 8.6 GB (weights ~6.96 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~13.7 GB.

Can Mistral Nemo 12B run on a laptop?

Mistral Nemo 12B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.

Can I use Mistral Nemo 12B commercially?

Yes. Mistral Nemo 12B is licensed Apache-2.0, which permits commercial use.

Mistral Nemo 12B, built with NVIDIA, 128K context. Q4_K_M and Q8_0 sizes from the bartowski GGUF repo.

Sources

Memory figures are estimates. See methodology.