Video model · mochi
MO Mochi 1 requirements
DIT video model · 10B params · 480×848, 85f (~3s) · released Oct 2024. Realistic minimum to run: Nvidia GeForce RTX 4090 (24GB) at fp8 + offload.
Backbone size by precision
| Precision | Size |
|---|---|
| fp16 / bf16 | 20.1 GB |
| fp8 (recommended) | 10 GB |
Backbone weights only. Peak VRAM is dominated by the activation memory for 85 frames at 480×848, not the file size.
Pipeline components
| Component | Size |
|---|---|
| T5-XXL text encoder offloaded | 9.5 GB |
| VAE (3D) | 1.84 GB |
Video VAEs are larger than image VAEs because they decode a temporal stack of frames.
Run it
Mochi 1 runs in ComfyUI or Diffusers. Generating more frames or higher resolution raises peak VRAM sharply; the fp8 + offload figure is for the default 85-frame clip.
Which devices can run Mochi 1?
Apple Silicon Macs
No mainstream local runtime for a 10B video model on Apple Silicon Macs yet.
RAM-only laptops
No mainstream local runtime for a 10B video model on RAM-only laptops yet.
iPhone & iPad
No mainstream local runtime for a 10B video model on iPhone & iPad yet.
Android
No mainstream local runtime for a 10B video model on Android yet.
NVIDIA GPUs
AMD GPUs
No mainstream local runtime for a 10B video model on AMD GPUs yet.
FAQ
How much VRAM does Mochi 1 need?
At fp8 + offload the realistic peak is ~20 GB, versus ~60 GB with every component resident. With aggressive CPU offload it drops to ~18 GB, much slower.
Why is peak VRAM lower than the sum of the files?
The text encoder is run once to encode your prompt, then offloaded to CPU before the frames are generated, so it is not resident at the memory peak.
Can I use Mochi 1 commercially?
Yes. Mochi 1 is licensed Apache-2.0, which permits commercial use.
Genmo's 10B AsymmDiT video model (480x848). Genmo cites ~60GB for full fp32 single-GPU; diffusers needs 42GB (fp32 + offload) or 22GB (bf16 + offload), and ComfyUI's fp8_scaled path runs under 20GB. The T5 encoder is offloaded. Effectively needs a 24GB GPU. Apache-2.0, commercial OK. Sources: Genmo card, diffusers Mochi docs, Comfy-Org repack.
Sources
VRAM is a sourced peak-usage anchor at fp8 + offload for the default clip length, validated 2026-06-15. See methodology.