Text model · Sarvam
SSarvam-M 24B requirements
Sarvam family · 24B params · released May 2025. Minimum to run at Q4_K_M: Nvidia GeForce RTX 4090 (24GB).
Quantization sizes
| Quantization | Size on disk |
|---|---|
| Q2_K | 10.1 GB est |
| Q3_K_M | 11.7 GB est |
| Q4_K_M (default) | 14.3 GB |
| Q5_K_M | 17.1 GB est |
| Q6_K | 19.7 GB est |
| Q8_0 | 25.1 GB |
| FP16 | 47.2 GB |
Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.
Run it
llama-cli -hf lmstudio-community/sarvam-m-GGUF:Q4_K_M lms get lmstudio-community/sarvam-m-GGUF Which devices can run Sarvam-M 24B?
Apple Silicon Macs
RAM-only laptops
iPhone & iPad
Android
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM or RAM does Sarvam-M 24B need?
At Q4_K_M, Sarvam-M 24B needs about 16.3 GB (weights ~14.3 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~27.1 GB.
Can Sarvam-M 24B run on a laptop?
Sarvam-M 24B is large; you need a 24 GB+ GPU or a 32-48 GB Mac at Q4_K_M.
Can I use Sarvam-M 24B commercially?
Yes. Sarvam-M 24B is licensed Apache-2.0, which permits commercial use.
Dense 24B fine-tuned from Mistral-Small-3.1-24B-Base. Hybrid thinking mode. Q4_K_M 14.3GB and Q8_0 25.1GB confirmed from two independent GGUF repos (lmstudio-community, Mungert) plus the official sarvamai Q8 repo. Context 32K from config.json.
Sources
Memory figures are estimates. See methodology.