Device profile · Windows

Best local LLMs for Nvidia GeForce RTX 4090 (24GB)

Nvidia GeForce RTX 4090 (24GB) has ~23 GB usable for model weights and runs 58 of 67 popular models. Best tool: LM Studio.

Usable memory: ~23 GB
Models run: 58
Too large: 9
Top pick: 35B

Top pick Q4_K_M

Command R 35B Tight

Fits at Q4_K_M (~22.3 GB of ~23 GB usable) but with little headroom, close other apps.

Runs on Nvidia GeForce RTX 4090 (24GB)

Compatible models 58 total

Too large for this device

DeepSeek R1 DeepSeek V3 Qwen3 235B A22B gpt-oss 120B Llama 4 Scout Sarvam-105B Qwen2.5 72B Llama 3.3 70B Mixtral 8x7B

Best way to run models on Windows

Runtime guide Windows

Beginner: LM Studio, Best GUI on Windows, auto-detects CUDA/Vulkan backends.

Power user: Ollama (CUDA), Scriptable server; CUDA path is fastest on NVIDIA.

AMD GPUs run via Vulkan/ROCm at roughly half CUDA throughput. NVIDIA is the smooth path on Windows.

Full Windows tool guide →

FAQ

What is the best local LLM for Nvidia GeForce RTX 4090 (24GB)?

Command R 35B is the strongest model that runs comfortably, using ~22.3 GB at Q4_K_M of the ~23 GB usable on Nvidia GeForce RTX 4090 (24GB).

How much of Nvidia GeForce RTX 4090 (24GB)'s memory can I use for a model?

About 23 GB. On a discrete GPU, leave ~1 GB of VRAM for the driver and display.

Which tool should I use on Windows?

LM Studio (Best GUI on Windows, auto-detects CUDA/Vulkan backends.) or Ollama (CUDA) for speed. AMD GPUs run via Vulkan/ROCm at roughly half CUDA throughput. NVIDIA is the smooth path on Windows.

Sources

Memory figures are estimates. See methodology.