audio model · orpheus · macOS
OR Can I run Orpheus 3B on Apple M4 Max (128GB)?
Yes. Orpheus 3B runs on Apple M4 Max (128GB) at Q4_K_M GGUF (~4 GB of ~96 GB usable).
Runs at Q4_K_M GGUF using ~4 GB of ~96 GB usable.
- Peak memory
- ~4 GB
- Usable on device
- ~96 GB
- Device memory
- 128 GB
- Quant
- Q4_K_M GGUF
How to run it
Use llama.cpp or LM Studio at Q4_K_M GGUF. It is light enough to run on CPU; a GPU just makes it faster.
- Type
- Text to speech
- Parameters
- 3B
- Peak memory
- ~4 GB at Q4_K_M GGUF
- License
- Apache-2.0
- Memory
- 128 GB unified
- Usable for weights
- ~96 GB
- Best runtime
- MLX direct / Ollama (MLX backend)
You could also run
Run Orpheus 3B on other hardware
FAQ
Can Apple M4 Max (128GB) run Orpheus 3B?
Yes. Orpheus 3B runs on Apple M4 Max (128GB) at Q4_K_M GGUF (~4 GB of ~96 GB usable).
How much memory does Orpheus 3B need?
Apple M4 Max (128GB) has room to spare. At Q4_K_M GGUF the realistic peak is ~4 GB of memory.
What do I use to run Orpheus 3B locally?
Orpheus 3B runs in llama.cpp or LM Studio (among others). It runs on CPU, so no GPU is required.
Sources
VRAM figures are sourced peak-usage anchors at the noted quant, validated 2026-06-15. See methodology.