Device profile · Windows
Best local LLMs for Nvidia GeForce RTX 5090 (32GB)
Nvidia GeForce RTX 5090 (32GB) has ~31 GB usable for model weights and runs 59 of 67 popular models. Best tool: LM Studio.
- Usable memory
- ~31 GB
- Models run
- 59
- Too large
- 8
- Top pick
- 46.7B
Fits at Q4_K_M (~28.9 GB of ~31 GB usable) but with little headroom, close other apps.
Runs on Nvidia GeForce RTX 5090 (32GB)
- TightMixtral 8x7B46.7B MoE · ~28.9 GB at Q4_K_M
- CR YesCommand R 35B35B · ~22.3 GB at Q4_K_M
- Yi YesYi 1.5 34B34B · ~21.4 GB at Q4_K_M
- YesQwen2.5 32B32B · ~22.1 GB at Q4_K_M
- YesQwen3 32B32B · ~22 GB at Q4_K_M
- YesDeepSeek-R1-Distill-Qwen 32B32B · ~22.1 GB at Q4_K_M
- YesQwen2.5 Coder 32B32B · ~20.7 GB at Q4_K_M
- YesGranite 4.0 H Small32B MoE · ~20.4 GB at Q4_K_M
- YesQwen3 30B-A3B30.5B MoE · ~20.7 GB at Q4_K_M
- S YesSarvam-30B30B MoE · ~21.7 GB at Q4_K_M
- YesGemma 2 27B27B · ~18.7 GB at Q4_K_M
- YesGemma 3 27B27B · ~18.6 GB at Q4_K_M
- YesMistral Small 3 24B24B · ~16.3 GB at Q4_K_M
- S YesSarvam-M 24B24B · ~16.3 GB at Q4_K_M
- Yesgpt-oss 20B21B MoE · ~13.2 GB at Q4_K_M
- YesDeepSeek-V2-Lite16B MoE · ~12.2 GB at Q4_K_M
- YesPhi-4 14B14B · ~10.8 GB at Q4_K_M
- YesQwen2.5 14B14B · ~10.7 GB at Q4_K_M
- YesQwen3 14B14B · ~10.7 GB at Q4_K_M
- YesDeepSeek-R1-Distill-Qwen 14B14B · ~10.7 GB at Q4_K_M
- YesQwen2.5 Coder 14B14B · ~10.1 GB at Q4_K_M
- YesMistral Nemo 12B12.2B · ~8.6 GB at Q4_K_M
- YesGemma 3 12B12B · ~8.9 GB at Q4_K_M
- YesLlama 3.2 Vision 11B10.7B · ~9 GB at Q4_K_M
- FN YesFalcon3 10B10B · ~7.5 GB at Q4_K_M
- YesGemma 2 9B9B · ~7.3 GB at Q4_K_M
- GL YesGLM-4 9B9B · ~7.3 GB at Q4_K_M
- YesQwen2.5-VL 7B8.29B · ~7.1 GB at Q4_K_M
- YesLlama 3.1 8B8B · ~6.4 GB at Q4_K_M
- YesQwen3 8B8B · ~6.5 GB at Q4_K_M
- YesDeepSeek-R1-Distill-Llama 8B8B · ~6.4 GB at Q4_K_M
- YesMistral 7B7B · ~5.8 GB at Q4_K_M
- YesQwen2.5 7B7B · ~6.1 GB at Q4_K_M
- YesDeepSeek-R1-Distill-Qwen 7B7B · ~6.1 GB at Q4_K_M
- YesQwen2.5 Coder 7B7B · ~5.8 GB at Q4_K_M
- YesGemma 3 4B4B · ~3.8 GB at Q4_K_M
- YesQwen3 4B4B · ~3.8 GB at Q4_K_M
- YesPhi-3.5-mini 3.8B3.82B · ~3.7 GB at Q4_K_M
- YesPhi-4-mini 3.8B3.8B · ~3.8 GB at Q4_K_M
- YesQwen2.5-VL 3B3.75B · ~4.4 GB at Q4_K_M
- YesQwen2.5 3B3.09B · ~3.3 GB at Q4_K_M
- YesQwen2.5 Coder 3B3.09B · ~3 GB at Q4_K_M
- YesLlama 3.2 3B3B · ~3.2 GB at Q4_K_M
- YesSmolLM3 3B3B · ~3 GB at Q4_K_M
- YesGemma 2 2B2.61B · ~2.9 GB at Q4_K_M
- YesGranite 3.1 2B2.53B · ~2.8 GB at Q4_K_M
- S YesSarvam-1 2B2B · ~2.7 GB at Q4_K_M
- YesSmolLM2 1.7B1.7B · ~2.2 GB at Q4_K_M
- YesQwen3 1.7B1.7B · ~2.4 GB at Q4_K_M
- YesQwen2.5 1.5B1.54B · ~2.2 GB at Q4_K_M
- YesQwen2.5 Coder 1.5B1.54B · ~2 GB at Q4_K_M
- TL YesTinyLlama 1.1B1.1B · ~1.8 GB at Q4_K_M
- YesLlama 3.2 1B1B · ~1.8 GB at Q4_K_M
- YesGemma 3 1B1B · ~1.8 GB at Q4_K_M
- YesQwen3 0.6B0.6B · ~1.5 GB at Q4_K_M
- YesQwen2.5 0.5B0.494B · ~1.5 GB at Q4_K_M
- YesQwen2.5 Coder 0.5B0.494B · ~1.4 GB at Q4_K_M
- YesSmolLM2 360M0.362B · ~1.2 GB at Q4_K_M
- YesSmolLM2 135M0.135B · ~1 GB at Q4_K_M
Too large for this device
Best way to run models on Windows
Beginner: LM Studio, Best GUI on Windows, auto-detects CUDA/Vulkan backends.
Power user: Ollama (CUDA), Scriptable server; CUDA path is fastest on NVIDIA.
AMD GPUs run via Vulkan/ROCm at roughly half CUDA throughput. NVIDIA is the smooth path on Windows.
Full Windows tool guide →FAQ
What is the best local LLM for Nvidia GeForce RTX 5090 (32GB)?
Mixtral 8x7B is the strongest model that runs comfortably, using ~28.9 GB at Q4_K_M of the ~31 GB usable on Nvidia GeForce RTX 5090 (32GB).
How much of Nvidia GeForce RTX 5090 (32GB)'s memory can I use for a model?
About 31 GB. On a discrete GPU, leave ~1 GB of VRAM for the driver and display.
Which tool should I use on Windows?
LM Studio (Best GUI on Windows, auto-detects CUDA/Vulkan backends.) or Ollama (CUDA) for speed. AMD GPUs run via Vulkan/ROCm at roughly half CUDA throughput. NVIDIA is the smooth path on Windows.
Sources
Memory figures are estimates. See methodology.