# Can I run TinyLlama 1.1B on Nvidia GeForce RTX 4070 (12GB)?

Updated: 2026-06-15

**Yes, it runs.** Runs at Q4_K_M using ~1.8 GB of ~11 GB usable. You have room for FP16 for higher quality.

- Model: 1.1B, Q4_K_M 0.669 GB
- Device: 12 GB vram, ~11 GB usable for weights
- Needs ~1.8 GB at Q4_K_M; recommended quant: Q4_K_M
- Best tool on Windows: LM Studio
- Command: `ollama run tinyllama:1.1b`

Estimate. Method: weights + KV cache + ~0.8GB overhead. Sources: https://ollama.com/library/tinyllama:1.1b, https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF, https://github.com/jzhang38/TinyLlama.

More: https://localmodel.run/can-i-run/tinyllama-1.1b/nvidia-rtx-4070-12gb