Video model · hunyuanvideo
HY HunyuanVideo requirements
DIT video model · 13B params · 544×960, 129f (~5s) · released Dec 2024. Realistic minimum to run: Nvidia GeForce RTX 4090 (24GB) at Q4 GGUF.
Commercial use permitted under 100M monthly active users; not licensed in the EU, UK, or South Korea.
Backbone size by precision
| Precision | Size |
|---|---|
| fp16 / bf16 | 25.6 GB |
| fp8 | 13.2 GB |
| Q8 GGUF | 14 GB |
| Q4 GGUF (recommended) | 7.88 GB |
| Q2 GGUF | 6.09 GB |
Backbone weights only. Peak VRAM is dominated by the activation memory for 129 frames at 544×960, not the file size.
Pipeline components
| Component | Size |
|---|---|
| LLaVA-LLaMA text encoder offloaded | 16.1 GB |
| CLIP | 0.5 GB |
| VAE (3D) | 0.49 GB |
Video VAEs are larger than image VAEs because they decode a temporal stack of frames.
Run it
HunyuanVideo runs in ComfyUI. Generating more frames or higher resolution raises peak VRAM sharply; the Q4 GGUF figure is for the default 129-frame clip.
Which devices can run HunyuanVideo?
Apple Silicon Macs
No mainstream local runtime for a 13B video model on Apple Silicon Macs yet.
RAM-only laptops
No mainstream local runtime for a 13B video model on RAM-only laptops yet.
iPhone & iPad
No mainstream local runtime for a 13B video model on iPhone & iPad yet.
Android
No mainstream local runtime for a 13B video model on Android yet.
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM does HunyuanVideo need?
At Q4 GGUF the realistic peak is ~16 GB, versus ~60 GB with every component resident. With aggressive CPU offload it drops to ~8 GB, much slower.
Why is peak VRAM lower than the sum of the files?
The text encoder is run once to encode your prompt, then offloaded to CPU before the frames are generated, so it is not resident at the memory peak.
Can I use HunyuanVideo commercially?
Conditionally. Commercial use permitted under 100M monthly active users; not licensed in the EU, UK, or South Korea.
Tencent's 13B video DiT. Tencent's own table lists 60GB peak at 720p and 45GB at 544p with no offload; the fp8 backbone saves ~10GB. With GGUF Q4_K_M (7.88GB) + the LLaVA encoder on CPU, ComfyUI runs 544p around ~14-16GB (community-measured at reduced resolution). Tencent Community License: commercial under 100M MAU, and not licensed in the EU, UK, or South Korea. Anchor is the GGUF Q4 path (synthesis). Sources: Tencent card, city96 GGUF, Hunyuan repo.
Sources
VRAM is a sourced peak-usage anchor at Q4 GGUF (composed from component sizes, not a single measurement) for the default clip length, validated 2026-06-15. See methodology.