Video model · mochi

MO Mochi 1 requirements

DIT video model · 10B params · 480×848, 85f (~3s) · released Oct 2024. Realistic minimum to run: Nvidia GeForce RTX 4090 (24GB) at fp8 + offload.

Apache-2.0 Commercial use OK

Peak VRAM (fp8 + offload)

~20 GB

All resident

~60 GB

Offload floor

~18 GB

Clip

85f / ~3s

Backbone size by precision

PrecisionOn disk

Precision	Size
fp16 / bf16	20.1 GB
fp8 (recommended)	10 GB

Backbone weights only. Peak VRAM is dominated by the activation memory for 85 frames at 480×848, not the file size.

Pipeline components

ComponentSize

Component	Size
T5-XXL text encoder offloaded	9.5 GB
VAE (3D)	1.84 GB

Video VAEs are larger than image VAEs because they decode a temporal stack of frames.

Run it

Mochi 1 runs in ComfyUI or Diffusers. Generating more frames or higher resolution raises peak VRAM sharply; the fp8 + offload figure is for the default 85-frame clip.

ComfyUIDiffusers

Which devices can run Mochi 1?

Apple Silicon Macs

No mainstream local runtime for a 10B video model on Apple Silicon Macs yet.

RAM-only laptops

No mainstream local runtime for a 10B video model on RAM-only laptops yet.

iPhone & iPad

No mainstream local runtime for a 10B video model on iPhone & iPad yet.

Android

No mainstream local runtime for a 10B video model on Android yet.

NVIDIA GPUs

AMD GPUs

No mainstream local runtime for a 10B video model on AMD GPUs yet.

FAQ

How much VRAM does Mochi 1 need?

At fp8 + offload the realistic peak is ~20 GB, versus ~60 GB with every component resident. With aggressive CPU offload it drops to ~18 GB, much slower.

Why is peak VRAM lower than the sum of the files?

The text encoder is run once to encode your prompt, then offloaded to CPU before the frames are generated, so it is not resident at the memory peak.

Can I use Mochi 1 commercially?

Yes. Mochi 1 is licensed Apache-2.0, which permits commercial use.

Genmo's 10B AsymmDiT video model (480x848). Genmo cites ~60GB for full fp32 single-GPU; diffusers needs 42GB (fp32 + offload) or 22GB (bf16 + offload), and ComfyUI's fp8_scaled path runs under 20GB. The T5 encoder is offloaded. Effectively needs a 24GB GPU. Apache-2.0, commercial OK. Sources: Genmo card, diffusers Mochi docs, Comfy-Org repack.

Sources

VRAM is a sourced peak-usage anchor at fp8 + offload for the default clip length, validated 2026-06-15. See methodology.