Audio model · orpheus
OR Orpheus 3B requirements
Text to speech · 3B params · Q4_K_M GGUF (llama.cpp) / fp16 (vLLM) · released Mar 2025. Light enough to run on CPU, no GPU required.
Run it
Orpheus 3B runs in llama.cpp, LM Studio or vLLM at Q4_K_M GGUF. It runs CPU-only, and the smaller tiers are fast enough for real-time use on a laptop.
Which devices can run Orpheus 3B?
Apple Silicon Macs
- Apple M1 (8GB) Yes
- Apple M2 (16GB) Yes
- Apple M4 (16GB) Yes
- Apple M5 (16GB) Yes
- Apple M3 Pro (18GB) Yes
- Apple M4 (24GB) Yes
- Apple M4 Pro (24GB) Yes
- Apple M5 (32GB) Yes
- Apple M4 Pro (48GB) Yes
- Apple M5 Pro (48GB) Yes
- Apple M4 Max (64GB) Yes
- Apple M4 Max (128GB) Yes
- Apple M5 Max (128GB) Yes
- Apple M3 Ultra (256GB) Yes
RAM-only laptops
iPhone & iPad
No mainstream local runtime for Orpheus 3B on iPhone & iPad yet.
Android
No mainstream local runtime for Orpheus 3B on Android yet.
NVIDIA GPUs
AMD GPUs
FAQ
How much memory does Orpheus 3B need?
At Q4_K_M GGUF it consumes ~4 GB. It runs on CPU, so a GPU is optional.
Can Orpheus 3B run on a phone or CPU?
Yes for CPU. It runs on Mac or laptop CPU, though no phone runtime is confirmed.
Can I use Orpheus 3B commercially?
Yes. Orpheus 3B is licensed Apache-2.0, which permits commercial use.
A Llama-3.2-3B fine-tune that emits speech tokens, so it runs through the usual LLM stacks. The llama.cpp GGUF path (Q4_K_M ~2.5GB on disk) runs on CPU or Apple Silicon Metal at roughly ~4GB with overhead; the vLLM/CUDA path needs more VRAM headroom. Apache-2.0, commercial OK. Anchor is the Q4_K_M file plus typical llama.cpp overhead (synthesis, not a single measurement). Sources: Canopy model card and repo, Mungert GGUF repo.
Sources
Memory is a sourced peak-usage anchor at Q4_K_M GGUF (composed from reported sizes, not a single measurement), validated 2026-06-15. See methodology.