Audio model · whisper

Whisper large-v3-turbo requirements

Speech to text · 809M params · int8 (faster-whisper) / fp16 (whisper.cpp) · released Oct 2024. Light enough to run on CPU, no GPU required.

MIT Commercial use OK

Peak memory (int8)

~1.5 GB

Runs on CPU

Yes

Parameters

809M

Type

Speech to text

Run it

Runtime tools int8

Whisper large-v3-turbo runs in whisper.cpp, faster-whisper, MacWhisper or WhisperKit at int8. It runs CPU-only, and the smaller tiers are fast enough for real-time use on a laptop or phone.

whisper.cppfaster-whisperMacWhisperWhisperKit

Which devices can run Whisper large-v3-turbo?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

Android

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB) Yes

FAQ

How much memory does Whisper large-v3-turbo need?

At int8 it consumes ~1.5 GB. It runs on CPU, so a GPU is optional.

Can Whisper large-v3-turbo run on a phone or CPU?

Yes for CPU. The smaller tiers are light enough for real-time use, and on-device phone runtimes are available.

Can I use Whisper large-v3-turbo commercially?

Yes. Whisper large-v3-turbo is licensed MIT, which permits commercial use.

Notes

Pruned large-v3 (decoder layers cut from 32 to 4, ~809M params) that runs roughly 5x faster at near-large-v3 accuracy. Via faster-whisper int8 it peaks at ~1.5GB VRAM (1,545MB measured). Light enough for phones (CoreML) and CPU. License MIT. Sources: HF model card, faster-whisper benchmark, whisper.cpp README.

Sources

Memory is a sourced peak-usage anchor at int8, validated 2026-06-15. See methodology.