Video model · hunyuanvideo

HY HunyuanVideo requirements

DIT video model · 13B params · 544×960, 129f (~5s) · released Dec 2024. Realistic minimum to run: Nvidia GeForce RTX 4090 (24GB) at Q4 GGUF.

Tencent Hunyuan Community License Commercial use: conditional

Commercial use permitted under 100M monthly active users; not licensed in the EU, UK, or South Korea.

Peak VRAM (Q4 GGUF)

~16 GB

All resident

~60 GB

Offload floor

~8 GB

Clip

129f / ~5s

Backbone size by precision

PrecisionOn disk

Precision	Size
fp16 / bf16	25.6 GB
fp8	13.2 GB
Q8 GGUF	14 GB
Q4 GGUF (recommended)	7.88 GB
Q2 GGUF	6.09 GB

Backbone weights only. Peak VRAM is dominated by the activation memory for 129 frames at 544×960, not the file size.

Pipeline components

ComponentSize

Component	Size
LLaVA-LLaMA text encoder offloaded	16.1 GB
CLIP	0.5 GB
VAE (3D)	0.49 GB

Video VAEs are larger than image VAEs because they decode a temporal stack of frames.

Run it

HunyuanVideo runs in ComfyUI. Generating more frames or higher resolution raises peak VRAM sharply; the Q4 GGUF figure is for the default 129-frame clip.

ComfyUI

Which devices can run HunyuanVideo?

Apple Silicon Macs

No mainstream local runtime for a 13B video model on Apple Silicon Macs yet.

RAM-only laptops

No mainstream local runtime for a 13B video model on RAM-only laptops yet.

iPhone & iPad

No mainstream local runtime for a 13B video model on iPhone & iPad yet.

Android

No mainstream local runtime for a 13B video model on Android yet.

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB) Yes

FAQ

How much VRAM does HunyuanVideo need?

At Q4 GGUF the realistic peak is ~16 GB, versus ~60 GB with every component resident. With aggressive CPU offload it drops to ~8 GB, much slower.

Why is peak VRAM lower than the sum of the files?

The text encoder is run once to encode your prompt, then offloaded to CPU before the frames are generated, so it is not resident at the memory peak.

Can I use HunyuanVideo commercially?

Conditionally. Commercial use permitted under 100M monthly active users; not licensed in the EU, UK, or South Korea.

Tencent's 13B video DiT. Tencent's own table lists 60GB peak at 720p and 45GB at 544p with no offload; the fp8 backbone saves ~10GB. With GGUF Q4_K_M (7.88GB) + the LLaVA encoder on CPU, ComfyUI runs 544p around ~14-16GB (community-measured at reduced resolution). Tencent Community License: commercial under 100M MAU, and not licensed in the EU, UK, or South Korea. Anchor is the GGUF Q4 path (synthesis). Sources: Tencent card, city96 GGUF, Hunyuan repo.

Sources

VRAM is a sourced peak-usage anchor at Q4 GGUF (composed from component sizes, not a single measurement) for the default clip length, validated 2026-06-15. See methodology.