Skip to content
localmodel.run

Text model · Sarvam

SSarvam-M 24B requirements

Sarvam family · 24B params · released May 2025. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4090 (24GB).

LicenseApache-2.0· Commercial OK↓ 4.1K/mo♥ 344on HuggingFace
Q4_K_M
14.3 GB
Q8_0
25.1 GB
Total @ Q4 (4k)
~16.3 GB
Context
32 k

Quantization sizes

GGUF quantson disk
QuantizationSize on disk
Q2_K10.1 GB est
Q3_K_M11.7 GB est
Q4_K_M (default)14.3 GB
Q5_K_M17.1 GB est
Q6_K19.7 GB est
Q8_025.1 GB
FP1647.2 GB

Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.

Run it

llama.cpp
$ llama-cli -hf lmstudio-community/sarvam-m-GGUF:Q4_K_M
LM Studio
$ lms get lmstudio-community/sarvam-m-GGUF

Which devices can run Sarvam-M 24B?

FAQ

How much VRAM or RAM does Sarvam-M 24B need?

At Q4_K_M, Sarvam-M 24B needs about 16.3 GB (weights ~14.3 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~27.1 GB.

Can Sarvam-M 24B run on a laptop?

Sarvam-M 24B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.

Can I use Sarvam-M 24B commercially?

Yes. Sarvam-M 24B is licensed Apache-2.0, which permits commercial use.

Dense 24B fine-tuned from Mistral-Small-3.1-24B-Base. Hybrid thinking mode. Q4_K_M 14.3GB and Q8_0 25.1GB confirmed from two independent GGUF repos (lmstudio-community, Mungert) plus the official sarvamai Q8 repo. Context 32K from config.json.

Sources

Memory figures are estimates. See methodology.