Text model · Sarvam
SSarvam-105B requirements
Sarvam family · 105B params (Mixture-of-Experts, 10.3B active) · released Mar 2026. Minimum to run at Q4_K_M: Apple M4 Max (128GB).
Quantization sizes
| Quantization | Size on disk |
|---|---|
| Q2_K | 44 GB est |
| Q3_K_M | 51.3 GB est |
| Q4_K_M (default) | 64.2 GB |
| Q5_K_M | 74.8 GB est |
| Q6_K | 86.1 GB est |
| Q8_0 | 111.6 GB est |
| FP16 | 210 GB est |
Lower quant = smaller and faster, slightly lower quality. Q4_K_M is the common default.
Run it
llama-cli -hf sarvamai/sarvam-105b-gguf:Q4_K_M lms get sarvamai/sarvam-105b-gguf Which devices can run Sarvam-105B?
Apple Silicon Macs
RAM-only laptops
iPhone & iPad
Android
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM or RAM does Sarvam-105B need?
At Q4_K_M, Sarvam-105B needs about 67.5 GB (weights ~64.2 GB + KV cache + overhead) at a 4k context. At Q8_0 budget ~114.9 GB.
Can Sarvam-105B run on a laptop?
Sarvam-105B is large; you need a high-memory Mac or multi-GPU setup at Q4_K_M.
Is Sarvam-105B cheaper to run because it is a MoE model?
It is faster, not lighter. Sarvam-105B activates only 10.3B of 105B params per token (so it runs quickly), but all experts must stay in memory, so it still needs memory for the full 105B.
Can I use Sarvam-105B commercially?
Yes. Sarvam-105B is licensed Apache-2.0, which permits commercial use.
MoE with 128 experts, top-8 routing plus one shared expert, 10.3B active params. Released 2026-03 under Apache 2.0. Q4_K_M 64.2GB confirmed by summing the 9 shards in the official GGUF repo. Server-class; no Q8_0 GGUF exists.
Sources
Memory figures are estimates. See methodology.