# localmodel.run

> Free tool to check whether a computer or phone can run a given local AI model, and which runtime to use, across macOS, Windows, Linux, iOS and Android. Covers text LLMs plus image, video and audio generation models. Data is validated against vendor specs, Ollama and HuggingFace.

Dataset: 95 models (67 text, 6 image, 11 video, 11 audio), 39 devices. Last validated 2026-06-15. Memory figures are estimates; see https://localmodel.run/methodology.

## Text LLMs
- [Llama 3.1 8B](https://localmodel.run/model/llama-3.1-8b): 8B, 4.92GB Q4_K_M, 128k context.
- [Llama 3.3 70B](https://localmodel.run/model/llama-3.3-70b): 70B, 42.52GB Q4_K_M, 128k context.
- [Llama 3.2 3B](https://localmodel.run/model/llama-3.2-3b): 3B, 2.02GB Q4_K_M, 128k context.
- [Llama 3.2 1B](https://localmodel.run/model/llama-3.2-1b): 1B, 0.81GB Q4_K_M, 128k context.
- [Mistral 7B](https://localmodel.run/model/mistral-7b): 7B, 4.37GB Q4_K_M, 32k context.
- [Mistral Small 3 24B](https://localmodel.run/model/mistral-small-3-24b): 24B, 14.33GB Q4_K_M, 128k context.
- [Phi-4 14B](https://localmodel.run/model/phi-4-14b): 14B, 9.05GB Q4_K_M, 16k context.
- [Phi-4-mini 3.8B](https://localmodel.run/model/phi-4-mini-3.8b): 3.8B, 2.49GB Q4_K_M, 128k context.
- [Gemma 2 9B](https://localmodel.run/model/gemma-2-9b): 9B, 5.76GB Q4_K_M, 8k context.
- [Gemma 2 27B](https://localmodel.run/model/gemma-2-27b): 27B, 16.65GB Q4_K_M, 8k context.
- [Gemma 3 4B](https://localmodel.run/model/gemma-3-4b): 4B, 2.49GB Q4_K_M, 128k context.
- [Gemma 3 12B](https://localmodel.run/model/gemma-3-12b): 12B, 7.3GB Q4_K_M, 128k context.
- [Gemma 3 27B](https://localmodel.run/model/gemma-3-27b): 27B, 16.55GB Q4_K_M, 128k context.
- [Qwen2.5 7B](https://localmodel.run/model/qwen2.5-7b): 7B, 4.68GB Q4_K_M, 128k context.
- [Qwen2.5 14B](https://localmodel.run/model/qwen2.5-14b): 14B, 8.99GB Q4_K_M, 128k context.
- [Qwen2.5 32B](https://localmodel.run/model/qwen2.5-32b): 32B, 19.85GB Q4_K_M, 128k context.
- [Qwen2.5 72B](https://localmodel.run/model/qwen2.5-72b): 72B, 47.42GB Q4_K_M, 128k context.
- [Qwen3 8B](https://localmodel.run/model/qwen3-8b): 8B, 5.03GB Q4_K_M, 32k context.
- [Qwen3 14B](https://localmodel.run/model/qwen3-14b): 14B, 9GB Q4_K_M, 32k context.
- [Qwen3 32B](https://localmodel.run/model/qwen3-32b): 32B, 19.8GB Q4_K_M, 32k context.
- [Qwen3 30B-A3B](https://localmodel.run/model/qwen3-30b-a3b): 30.5B MoE (3.3B active), 18.6GB Q4_K_M, 32k context.
- [DeepSeek-R1-Distill-Qwen 7B](https://localmodel.run/model/deepseek-r1-distill-qwen-7b): 7B, 4.68GB Q4_K_M, 128k context.
- [DeepSeek-R1-Distill-Qwen 14B](https://localmodel.run/model/deepseek-r1-distill-qwen-14b): 14B, 8.99GB Q4_K_M, 128k context.
- [DeepSeek-R1-Distill-Llama 8B](https://localmodel.run/model/deepseek-r1-distill-llama-8b): 8B, 4.92GB Q4_K_M, 128k context.
- [DeepSeek-R1-Distill-Qwen 32B](https://localmodel.run/model/deepseek-r1-distill-qwen-32b): 32B, 19.85GB Q4_K_M, 128k context.
- [DeepSeek-V2-Lite](https://localmodel.run/model/deepseek-v2-lite): 16B MoE (2.4B active), 10.4GB Q4_K_M, 32k context.
- [SmolLM2 1.7B](https://localmodel.run/model/smollm2-1.7b): 1.7B, 1.06GB Q4_K_M, 8k context.
- [Qwen2.5 0.5B](https://localmodel.run/model/qwen2.5-0.5b): 0.494B, 0.491GB Q4_K_M, 128k context.
- [Qwen2.5 1.5B](https://localmodel.run/model/qwen2.5-1.5b): 1.54B, 1.12GB Q4_K_M, 128k context.
- [Qwen2.5 3B](https://localmodel.run/model/qwen2.5-3b): 3.09B, 2.1GB Q4_K_M, 128k context.
- [Qwen3 0.6B](https://localmodel.run/model/qwen3-0.6b): 0.6B, 0.48GB Q4_K_M, 32k context.
- [Qwen3 1.7B](https://localmodel.run/model/qwen3-1.7b): 1.7B, 1.28GB Q4_K_M, 32k context.
- [Qwen3 4B](https://localmodel.run/model/qwen3-4b): 4B, 2.5GB Q4_K_M, 32k context.
- [Gemma 2 2B](https://localmodel.run/model/gemma-2-2b): 2.61B, 1.71GB Q4_K_M, 8k context.
- [Gemma 3 1B](https://localmodel.run/model/gemma-3-1b): 1B, 0.81GB Q4_K_M, 32k context.
- [SmolLM2 135M](https://localmodel.run/model/smollm2-135m): 0.135B, 0.105GB Q4_K_M, 2k context.
- [SmolLM2 360M](https://localmodel.run/model/smollm2-360m): 0.362B, 0.271GB Q4_K_M, 2k context.
- [TinyLlama 1.1B](https://localmodel.run/model/tinyllama-1.1b): 1.1B, 0.669GB Q4_K_M, 2k context.
- [Granite 3.1 2B](https://localmodel.run/model/granite-3.1-2b): 2.53B, 1.55GB Q4_K_M, 128k context.
- [Phi-3.5-mini 3.8B](https://localmodel.run/model/phi-3.5-mini): 3.82B, 2.39GB Q4_K_M, 128k context.
- [Sarvam-M 24B](https://localmodel.run/model/sarvam-m-24b): 24B, 14.3GB Q4_K_M, 32k context.
- [Sarvam-1 2B](https://localmodel.run/model/sarvam-1-2b): 2B, 1.55GB Q4_K_M, 8k context.
- [Sarvam-30B](https://localmodel.run/model/sarvam-30b): 30B MoE (2.4B active), 19.6GB Q4_K_M, 64k context.
- [Sarvam-105B](https://localmodel.run/model/sarvam-105b): 105B MoE (10.3B active), 64.2GB Q4_K_M, 128k context.
- [Qwen2.5 Coder 0.5B](https://localmodel.run/model/qwen2.5-coder-0.5b): 0.494B, 0.37GB Q4_K_M, 32k context.
- [Qwen2.5 Coder 1.5B](https://localmodel.run/model/qwen2.5-coder-1.5b): 1.54B, 0.92GB Q4_K_M, 32k context.
- [Qwen2.5 Coder 3B](https://localmodel.run/model/qwen2.5-coder-3b): 3.09B, 1.8GB Q4_K_M, 32k context.
- [Qwen2.5 Coder 7B](https://localmodel.run/model/qwen2.5-coder-7b): 7B, 4.36GB Q4_K_M, 32k context.
- [Qwen2.5 Coder 14B](https://localmodel.run/model/qwen2.5-coder-14b): 14B, 8.37GB Q4_K_M, 32k context.
- [Qwen2.5 Coder 32B](https://localmodel.run/model/qwen2.5-coder-32b): 32B, 18.49GB Q4_K_M, 32k context.
- [Mistral Nemo 12B](https://localmodel.run/model/mistral-nemo-12b): 12.2B, 6.96GB Q4_K_M, 128k context.
- [Mixtral 8x7B](https://localmodel.run/model/mixtral-8x7b): 46.7B MoE (12.9B active), 26.49GB Q4_K_M, 32k context.
- [DeepSeek R1](https://localmodel.run/model/deepseek-r1): 671B MoE (37B active), 376.66GB Q4_K_M, 128k context.
- [DeepSeek V3](https://localmodel.run/model/deepseek-v3): 671B MoE (37B active), 376.66GB Q4_K_M, 128k context.
- [Qwen3 235B A22B](https://localmodel.run/model/qwen3-235b-a22b): 235B MoE (22B active), 132.39GB Q4_K_M, 128k context.
- [Llama 4 Scout](https://localmodel.run/model/llama-4-scout): 109B MoE (17B active), 60.87GB Q4_K_M, 128k context.
- [gpt-oss 20B](https://localmodel.run/model/gpt-oss-20b): 21B MoE (3.6B active), 11.28GB Q4_K_M, 128k context.
- [gpt-oss 120B](https://localmodel.run/model/gpt-oss-120b): 117B MoE (5.1B active), 59.03GB Q4_K_M, 128k context.
- [Yi 1.5 34B](https://localmodel.run/model/yi-1.5-34b): 34B, 19.24GB Q4_K_M, 32k context.
- [Command R 35B](https://localmodel.run/model/command-r-35b): 35B, 20.05GB Q4_K_M, 128k context.
- [GLM-4 9B](https://localmodel.run/model/glm-4-9b): 9B, 5.82GB Q4_K_M, 128k context.
- [Falcon3 10B](https://localmodel.run/model/falcon3-10b): 10B, 5.86GB Q4_K_M, 32k context.
- [Granite 4.0 H Small](https://localmodel.run/model/granite-4.0-h-small): 32B MoE (9B active), 18.23GB Q4_K_M, 128k context.
- [SmolLM3 3B](https://localmodel.run/model/smollm3-3b): 3B, 1.78GB Q4_K_M, 128k context.
- [Qwen2.5-VL 3B](https://localmodel.run/model/qwen2.5-vl-3b): 3.75B, 3.05GB Q4_K_M, 32k context.
- [Qwen2.5-VL 7B](https://localmodel.run/model/qwen2.5-vl-7b): 8.29B, 5.62GB Q4_K_M, 32k context.
- [Llama 3.2 Vision 11B](https://localmodel.run/model/llama-3.2-vision-11b): 10.7B, 7.36GB Q4_K_M, 128k context.

## Image models
- [Stable Diffusion 1.5](https://localmodel.run/model/stable-diffusion-1-5): 860M, ~3.7GB VRAM at fp16, license CreativeML OpenRAIL-M.
- [Stable Diffusion XL 1.0](https://localmodel.run/model/sdxl-1-0): 2.6B, ~7.5GB VRAM at fp16, license CreativeML OpenRAIL++-M.
- [Stable Diffusion 3.5 Large](https://localmodel.run/model/stable-diffusion-3-5-large): 8.1B, ~7GB VRAM at Q4 GGUF, license Stability Community License.
- [FLUX.1 dev](https://localmodel.run/model/flux-1-dev): 12B, ~6.5GB VRAM at Q4 GGUF, license FLUX.1-dev Non-Commercial License.
- [FLUX.1 schnell](https://localmodel.run/model/flux-1-schnell): 12B, ~6.5GB VRAM at Q4 GGUF, license Apache-2.0.
- [Qwen-Image](https://localmodel.run/model/qwen-image): 20B, ~14GB VRAM at Q4_K_M GGUF, license Apache-2.0.

## Video models
- [Wan 2.1 T2V 1.3B](https://localmodel.run/model/wan-2-1-t2v-1-3b): 1.3B, ~6GB VRAM at Q4 GGUF, license Apache-2.0.
- [Wan 2.1 T2V 14B](https://localmodel.run/model/wan-2-1-t2v-14b): 14B, ~12GB VRAM at Q4 GGUF, license Apache-2.0.
- [Wan 2.2 TI2V 5B](https://localmodel.run/model/wan-2-2-ti2v-5b): 5B, ~8GB VRAM at Q4 GGUF, license Apache-2.0.
- [Wan 2.2 T2V A14B](https://localmodel.run/model/wan-2-2-t2v-a14b): 27B, ~16GB VRAM at Q4 GGUF, license Apache-2.0.
- [LTX-Video 2B](https://localmodel.run/model/ltx-video-2b): 2B, ~10GB VRAM at fp8 + offload, license LTX-Video Open Weights (OpenRAIL-M).
- [LTX-Video 13B](https://localmodel.run/model/ltx-video-13b): 13B, ~20GB VRAM at fp8, license LTX-Video Open Weights (OpenRAIL-M).
- [CogVideoX-5B](https://localmodel.run/model/cogvideox-5b): 5B, ~16GB VRAM at INT8 / fp8, license CogVideoX License.
- [CogVideoX-2B](https://localmodel.run/model/cogvideox-2b): 2B, ~8GB VRAM at fp16 + offload, license Apache-2.0.
- [HunyuanVideo](https://localmodel.run/model/hunyuanvideo): 13B, ~16GB VRAM at Q4 GGUF, license Tencent Hunyuan Community License.
- [Mochi 1](https://localmodel.run/model/mochi-1): 10B, ~20GB VRAM at fp8 + offload, license Apache-2.0.
- [Stable Video Diffusion (img2vid-XT)](https://localmodel.run/model/stable-video-diffusion): 1.5B, ~8GB VRAM at fp16 + offload, license Stable Video Diffusion Community License.

## Audio & voice models
- [Whisper large-v3](https://localmodel.run/model/whisper-large-v3): 1.55B, ~2.5GB memory at int8, license MIT.
- [Whisper large-v3-turbo](https://localmodel.run/model/whisper-large-v3-turbo): 809M, ~1.5GB memory at int8, license MIT.
- [Whisper small](https://localmodel.run/model/whisper-small): 244M, ~0.85GB memory at fp16 (whisper.cpp), license MIT.
- [Kokoro-82M](https://localmodel.run/model/kokoro-82m): 82M, ~1GB memory at fp32, license Apache-2.0.
- [Orpheus 3B](https://localmodel.run/model/orpheus-3b): 3B, ~4GB memory at Q4_K_M GGUF, license Apache-2.0.
- [Bark](https://localmodel.run/model/bark): 900M, ~5GB memory at fp32, license MIT.
- [Dia 1.6B](https://localmodel.run/model/dia-1-6b): 1.6B, ~10GB memory at fp16, license Apache-2.0.
- [MusicGen small](https://localmodel.run/model/musicgen-small): 300M, ~3GB memory at fp32, license CC-BY-NC-4.0.
- [MusicGen medium](https://localmodel.run/model/musicgen-medium): 1.5B, ~14GB memory at fp32, license CC-BY-NC-4.0.
- [MusicGen large](https://localmodel.run/model/musicgen-large): 3.3B, ~20GB memory at fp32, license CC-BY-NC-4.0.
- [Stable Audio Open 1.0](https://localmodel.run/model/stable-audio-open): 1.3B, ~15GB memory at fp32, license Stability Community License.

## Devices
- [Apple M1 (8GB)](https://localmodel.run/best-llm-for/apple-m1-8gb): 8GB unified, ~5.5GB usable for weights.
- [Apple M2 (16GB)](https://localmodel.run/best-llm-for/apple-m2-16gb): 16GB unified, ~10.5GB usable for weights.
- [Apple M3 Pro (18GB)](https://localmodel.run/best-llm-for/apple-m3-18gb): 18GB unified, ~12GB usable for weights.
- [Apple M4 (16GB)](https://localmodel.run/best-llm-for/apple-m4-16gb): 16GB unified, ~10.5GB usable for weights.
- [Apple M4 (24GB)](https://localmodel.run/best-llm-for/apple-m4-24gb): 24GB unified, ~16GB usable for weights.
- [Apple M4 Pro (24GB)](https://localmodel.run/best-llm-for/apple-m4-pro-24gb): 24GB unified, ~16GB usable for weights.
- [Apple M4 Pro (48GB)](https://localmodel.run/best-llm-for/apple-m4-pro-48gb): 48GB unified, ~32GB usable for weights.
- [Apple M4 Max (64GB)](https://localmodel.run/best-llm-for/apple-m4-max-64gb): 64GB unified, ~48GB usable for weights.
- [Apple M4 Max (128GB)](https://localmodel.run/best-llm-for/apple-m4-max-128gb): 128GB unified, ~96GB usable for weights.
- [Apple M3 Ultra (256GB)](https://localmodel.run/best-llm-for/apple-m3-ultra-256gb): 256GB unified, ~192GB usable for weights.
- [Nvidia GeForce RTX 3060 (12GB)](https://localmodel.run/best-llm-for/nvidia-rtx-3060-12gb): 12GB vram, ~11GB usable for weights.
- [Nvidia GeForce RTX 4060 Ti (16GB)](https://localmodel.run/best-llm-for/nvidia-rtx-4060-ti-16gb): 16GB vram, ~15GB usable for weights.
- [Nvidia GeForce RTX 4070 (12GB)](https://localmodel.run/best-llm-for/nvidia-rtx-4070-12gb): 12GB vram, ~11GB usable for weights.
- [Nvidia GeForce RTX 4080 (16GB)](https://localmodel.run/best-llm-for/nvidia-rtx-4080-16gb): 16GB vram, ~15GB usable for weights.
- [Nvidia GeForce RTX 4090 (24GB)](https://localmodel.run/best-llm-for/nvidia-rtx-4090-24gb): 24GB vram, ~23GB usable for weights.
- [Nvidia GeForce RTX 5090 (32GB)](https://localmodel.run/best-llm-for/nvidia-rtx-5090-32gb): 32GB vram, ~31GB usable for weights.
- [Nvidia GeForce RTX 3090 (24GB)](https://localmodel.run/best-llm-for/nvidia-rtx-3090-24gb): 24GB vram, ~23GB usable for weights.
- [AMD Radeon RX 7900 XTX (24GB)](https://localmodel.run/best-llm-for/amd-rx-7900-xtx-24gb): 24GB vram, ~23GB usable for weights.
- [8GB RAM Laptop (CPU/iGPU only)](https://localmodel.run/best-llm-for/laptop-8gb): 8GB ram, ~5GB usable for weights.
- [16GB RAM Laptop (CPU/iGPU only)](https://localmodel.run/best-llm-for/laptop-16gb): 16GB ram, ~12GB usable for weights.
- [32GB RAM Laptop (CPU/iGPU only)](https://localmodel.run/best-llm-for/laptop-32gb): 32GB ram, ~28GB usable for weights.
- [iPhone 15 Pro](https://localmodel.run/best-llm-for/iphone-15-pro): 8GB unified, ~4.5GB usable for weights.
- [iPhone 16](https://localmodel.run/best-llm-for/iphone-16): 8GB unified, ~4.5GB usable for weights.
- [iPhone 16 Pro](https://localmodel.run/best-llm-for/iphone-16-pro): 8GB unified, ~4.5GB usable for weights.
- [iPad Pro M4 (16GB, 1TB/2TB config)](https://localmodel.run/best-llm-for/ipad-pro-m4-16gb): 16GB unified, ~12GB usable for weights.
- [Google Pixel 9 Pro](https://localmodel.run/best-llm-for/pixel-9-pro): 16GB ram, ~10.5GB usable for weights.
- [Samsung Galaxy S24 Ultra](https://localmodel.run/best-llm-for/samsung-s24-ultra): 12GB ram, ~8.5GB usable for weights.
- [Samsung Galaxy S25 Ultra (16GB, 1TB config only)](https://localmodel.run/best-llm-for/samsung-s25-ultra-16gb): 16GB ram, ~12GB usable for weights.
- [Generic Android Phone (8GB RAM)](https://localmodel.run/best-llm-for/android-generic-8gb): 8GB ram, ~4.5GB usable for weights.
- [Generic Android Phone (12GB RAM)](https://localmodel.run/best-llm-for/android-generic-12gb): 12GB ram, ~8.5GB usable for weights.
- [Apple M5 (16GB)](https://localmodel.run/best-llm-for/apple-m5-16gb): 16GB unified, ~10.5GB usable for weights.
- [Apple M5 (32GB)](https://localmodel.run/best-llm-for/apple-m5-32gb): 32GB unified, ~21GB usable for weights.
- [Apple M5 Pro (48GB)](https://localmodel.run/best-llm-for/apple-m5-pro-48gb): 48GB unified, ~32GB usable for weights.
- [Apple M5 Max (128GB)](https://localmodel.run/best-llm-for/apple-m5-max-128gb): 128GB unified, ~96GB usable for weights.
- [iPhone 17](https://localmodel.run/best-llm-for/iphone-17): 8GB unified, ~4.5GB usable for weights.
- [iPhone 17 Pro](https://localmodel.run/best-llm-for/iphone-17-pro): 12GB unified, ~8GB usable for weights.
- [iPhone Air](https://localmodel.run/best-llm-for/iphone-air): 12GB unified, ~8GB usable for weights.
- [Google Pixel 10 Pro](https://localmodel.run/best-llm-for/pixel-10-pro): 16GB ram, ~10.5GB usable for weights.
- [Samsung Galaxy S26 Ultra (16GB, 1TB config)](https://localmodel.run/best-llm-for/samsung-s26-ultra): 16GB ram, ~12GB usable for weights.

## Tools by platform
- macOS: beginner LM Studio, power mlx-lm. vLLM is NOT a Mac tool, it is a CUDA/Linux serving engine. Unified memory is not a fixed VRAM slice; ~70% is usable for weights.
- Windows: beginner LM Studio, power Ollama (CUDA). AMD GPUs run via Vulkan/ROCm at roughly half CUDA throughput. NVIDIA is the smooth path on Windows.
- Linux: beginner Ollama, power vLLM. vLLM shines for multi-user serving/throughput. For a single local chat, Ollama or llama.cpp is simpler and lighter.
- iOS: beginner Apple Foundation Models, power PocketPal AI. Phones realistically run 1B-4B class models. Anything larger thermally throttles or OOMs.
- Android: beginner PocketPal AI, power MLC LLM / LiteRT-LM. NPU acceleration is limited and chip-specific; most apps run on CPU. Expect 1B-4B class.

## Key pages
- [How we calculate](https://localmodel.run/methodology)
- [All models](https://localmodel.run/can-i-run)
- [All devices](https://localmodel.run/best-llm-for)
- [Tools guide](https://localmodel.run/tools): Compare local LLM runtimes (Ollama, LM Studio, llama.cpp, etc.) by platform and use-case.
- [Leaderboard](https://localmodel.run/leaderboard): Open text LLMs ranked by LMArena (Chatbot Arena) Elo, each linked to its hardware requirements.
- [Developers](https://localmodel.run/developers): API reference and JSON endpoints for programmatic access to model and device data.