Skip to content
localmodel.run

audio model · orpheus · Windows

OR Can I run Orpheus 3B on Nvidia GeForce RTX 4070 (12GB)?

Compatibility verdict memory check
Yes, it runsfast on this GPU

Yes. Orpheus 3B runs on Nvidia GeForce RTX 4070 (12GB) at Q4_K_M GGUF (~4 GB of ~11 GB usable).

Needs ~4 GB Device usable ~11 GB

Runs at Q4_K_M GGUF using ~4 GB of ~11 GB usable.

Peak memory
~4 GB
Usable on device
~11 GB
Device memory
12 GB
Quant
Q4_K_M GGUF

How to run it

Use llama.cpp or LM Studio at Q4_K_M GGUF. It is light enough to run on CPU; a GPU just makes it faster.

Model orpheus
Type
Text to speech
Parameters
3B
Peak memory
~4 GB at Q4_K_M GGUF
License
Apache-2.0
Full Orpheus 3B requirements →
Device Windows
Memory
12 GB vram
Usable for weights
~11 GB
Best runtime
Ollama (CUDA) / vLLM (Linux)
Best models for Nvidia GeForce RTX 4070 (12GB) →

You could also run

Run Orpheus 3B on other hardware

FAQ

Can Nvidia GeForce RTX 4070 (12GB) run Orpheus 3B?

Yes. Orpheus 3B runs on Nvidia GeForce RTX 4070 (12GB) at Q4_K_M GGUF (~4 GB of ~11 GB usable).

How much memory does Orpheus 3B need?

Nvidia GeForce RTX 4070 (12GB) has room to spare. At Q4_K_M GGUF the realistic peak is ~4 GB of memory.

What do I use to run Orpheus 3B locally?

Orpheus 3B runs in llama.cpp or LM Studio (among others). It runs on CPU, so no GPU is required.

Sources

VRAM figures are sourced peak-usage anchors at the noted quant, validated 2026-06-15. See methodology.