audio model · whisper · Windows
Can I run Whisper small on Nvidia GeForce RTX 3060 (12GB)?
Yes. Whisper small runs on Nvidia GeForce RTX 3060 (12GB) at fp16 (whisper.cpp) (~0.85 GB of ~11 GB usable).
Runs at fp16 (whisper.cpp) using ~0.85 GB of ~11 GB usable.
- Peak memory
- ~0.85 GB
- Usable on device
- ~11 GB
- Device memory
- 12 GB
- Quant
- fp16 (whisper.cpp)
How to run it
Use whisper.cpp or faster-whisper at fp16 (whisper.cpp). It is light enough to run on CPU; a GPU just makes it faster.
- Type
- Speech to text
- Parameters
- 244M
- Peak memory
- ~0.85 GB at fp16 (whisper.cpp)
- License
- MIT
- Memory
- 12 GB vram
- Usable for weights
- ~11 GB
- Best runtime
- Ollama (CUDA) / llama.cpp CUDA
You could also run
Run Whisper small on other hardware
FAQ
Can Nvidia GeForce RTX 3060 (12GB) run Whisper small?
Yes. Whisper small runs on Nvidia GeForce RTX 3060 (12GB) at fp16 (whisper.cpp) (~0.85 GB of ~11 GB usable).
How much memory does Whisper small need?
Nvidia GeForce RTX 3060 (12GB) has room to spare. At fp16 (whisper.cpp) the realistic peak is ~0.85 GB of memory.
What do I use to run Whisper small locally?
Whisper small runs in whisper.cpp or faster-whisper (among others). It runs on CPU, so no GPU is required.
Sources
VRAM figures are sourced peak-usage anchors at the noted quant, validated 2026-06-15. See methodology.