Skip to content
localmodel.run

Audio model · orpheus

OR Orpheus 3B requirements

Text to speech · 3B params · Q4_K_M GGUF (llama.cpp) / fp16 (vLLM) · released Mar 2025. Light enough to run on CPU, no GPU required.

Apache-2.0 Commercial use OK
Peak memory (Q4_K_M GGUF)
~4 GB
Runs on CPU
Yes
Parameters
3B
Type
Text to speech

Run it

Runtime tools Q4_K_M GGUF

Orpheus 3B runs in llama.cpp, LM Studio or vLLM at Q4_K_M GGUF. It runs CPU-only, and the smaller tiers are fast enough for real-time use on a laptop.

llama.cppLM StudiovLLM

Which devices can run Orpheus 3B?

FAQ

How much memory does Orpheus 3B need?

At Q4_K_M GGUF it consumes ~4 GB. It runs on CPU, so a GPU is optional.

Can Orpheus 3B run on a phone or CPU?

Yes for CPU. It runs on Mac or laptop CPU, though no phone runtime is confirmed.

Can I use Orpheus 3B commercially?

Yes. Orpheus 3B is licensed Apache-2.0, which permits commercial use.

Notes

A Llama-3.2-3B fine-tune that emits speech tokens, so it runs through the usual LLM stacks. The llama.cpp GGUF path (Q4_K_M ~2.5GB on disk) runs on CPU or Apple Silicon Metal at roughly ~4GB with overhead; the vLLM/CUDA path needs more VRAM headroom. Apache-2.0, commercial OK. Anchor is the Q4_K_M file plus typical llama.cpp overhead (synthesis, not a single measurement). Sources: Canopy model card and repo, Mungert GGUF repo.

Sources

Memory is a sourced peak-usage anchor at Q4_K_M GGUF (composed from reported sizes, not a single measurement), validated 2026-06-15. See methodology.