Real memory math

Can I run this AI model locally?

Pick your device. Pick a model. Get a yes, a tight, or a no, plus the exact command to run it. No guessing. No signup.

Models: 95
Devices: 39
Modalities: 4
Free: Always

Browse all models → How we calculate →

Your device

Model to run

Yes

NEEDS6.4 GB

USABLE10.5 GB

Runs at Q4_K_M using ~6.4 GB of ~10.5 GB usable. You have room for Q8_0 for higher quality.

$ollama run llama3.1:8b

Best on macOS: LM Studio · Q4_K_M recommended

See the full breakdown Get my Rig Score

Covers Text generation Image models Video models Audio models Apple Silicon NVIDIA / AMD

Popular models

All 95 models →

Llama 3.1 8B

4.92 GB at Q4_K_M · 128k context

111M Ollama pulls

DeepSeek-R1-Distill-Qwen 7B

4.68 GB at Q4_K_M · 128k context

79.3M Ollama pulls

Gemma 3 4B

2.49 GB at Q4_K_M · 128k context

32.8M Ollama pulls

Mistral 7B

4.37 GB at Q4_K_M · 32k context

26.1M Ollama pulls

Qwen2.5 7B

4.68 GB at Q4_K_M · 128k context

23.2M Ollama pulls

Qwen3 8B

5.03 GB at Q4_K_M · 32k context

23M Ollama pulls

Popular hardware

All 39 devices →

How it works

Real sizes

Quant sizes come from HuggingFace GGUF repos, Ollama (text models) and vendor specs, not guesses. Every number is sourced.

Honest memory math

We add KV cache and runtime overhead, and use realistic usable memory (Apple unified ~66-75% depending on chip, GPU VRAM minus driver).

The right tool

Every platform has a different winner. MLX on Mac, CUDA on Windows, vLLM on Linux, PocketPal on phones. Wrong choice and you lose half your tokens per second.

Best LLM runtime per platform

Apple Foundation Models

power: PocketPal AI

Android

PocketPal AI

power: MLC LLM / LiteRT-LM

Frequently asked

How do I know if my computer can run a local AI model?

Compare the model's memory needs to your usable memory. A 7-8B text model at Q4_K_M needs about 6-7 GB, so it runs on a 16 GB Mac or a 12 GB GPU. Image and video diffusion models typically need 4-12 GB of GPU or Apple Silicon VRAM. Audio models (Whisper, Kokoro) run on CPU and need 1-4 GB. localmodel.run does this math for 95 models across 39 devices.

Can I run AI models locally on a Mac?

Yes. Apple Silicon shares unified memory, so a 16 GB Mac runs 7-8B models and a 64 GB+ Mac runs 70B. Use LM Studio (which ships MLX) for a GUI, or mlx-lm for the most speed. vLLM is not a Mac tool, it is a Linux/CUDA serving engine.

Can I run AI models on my phone?

Yes, within limits. iPhones and Android flagships realistically run 1B-4B text models. For text: PocketPal AI works on both iOS and Android; Apple Foundation Models is built into iOS 26. For images on iPhone: Draw Things supports diffusion models locally. For audio: Whisper (speech-to-text) runs on both iOS and Android.

Which is the best tool to run models locally?

On Mac, start with LM Studio, it ships MLX and has a GUI. On Linux, Ollama for quick chat, vLLM if you are serving traffic. On phones: for text, PocketPal AI (iOS and Android) or Apple Foundation Models (iOS 26); for images on iPhone, Draw Things; for audio, whisper.cpp. Each device page links the right tool so you do not have to guess.

Estimates, not guarantees. See how we calculate and our sources.