Audio model · orpheus

OR Orpheus 3B requirements

Text to speech · 3B params · Q4_K_M GGUF (llama.cpp) / fp16 (vLLM) · released Mar 2025. Light enough to run on CPU, no GPU required.

Apache-2.0 Commercial use OK

Peak memory (Q4_K_M GGUF)

~4 GB

Runs on CPU

Yes

Parameters

Type

Text to speech

Run it

Runtime tools Q4_K_M GGUF

Orpheus 3B runs in llama.cpp, LM Studio or vLLM at Q4_K_M GGUF. It runs CPU-only, and the smaller tiers are fast enough for real-time use on a laptop.

llama.cppLM StudiovLLM

Which devices can run Orpheus 3B?

Apple Silicon Macs

RAM-only laptops

iPhone & iPad

No mainstream local runtime for Orpheus 3B on iPhone & iPad yet.

Android

No mainstream local runtime for Orpheus 3B on Android yet.

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB) Yes

FAQ

How much memory does Orpheus 3B need?

At Q4_K_M GGUF it consumes ~4 GB. It runs on CPU, so a GPU is optional.

Can Orpheus 3B run on a phone or CPU?

Yes for CPU. It runs on Mac or laptop CPU, though no phone runtime is confirmed.

Can I use Orpheus 3B commercially?

Yes. Orpheus 3B is licensed Apache-2.0, which permits commercial use.

Notes

A Llama-3.2-3B fine-tune that emits speech tokens, so it runs through the usual LLM stacks. The llama.cpp GGUF path (Q4_K_M ~2.5GB on disk) runs on CPU or Apple Silicon Metal at roughly ~4GB with overhead; the vLLM/CUDA path needs more VRAM headroom. Apache-2.0, commercial OK. Anchor is the Q4_K_M file plus typical llama.cpp overhead (synthesis, not a single measurement). Sources: Canopy model card and repo, Mungert GGUF repo.

Sources

Memory is a sourced peak-usage anchor at Q4_K_M GGUF (composed from reported sizes, not a single measurement), validated 2026-06-15. See methodology.