Video model · ltx-video

LX LTX-Video 2B requirements

DIT video model · 2B params · 1216×704, 121f (~4s) · released Nov 2024. Realistic minimum to run: Nvidia GeForce RTX 3060 (12GB) at fp8 + offload.

LTX-Video Open Weights (OpenRAIL-M) Commercial use OK

OpenRAIL-M open-weights license; commercial use permitted subject to use-based restrictions.

Peak VRAM (fp8 + offload)

~10 GB

All resident

~12 GB

Offload floor

~6 GB

Clip

121f / ~4s

Backbone size by precision

PrecisionOn disk

Precision	Size
fp16 / bf16	6.34 GB
fp8 (recommended)	4.46 GB
Q8 GGUF	2.17 GB
Q4 GGUF	1.42 GB

Backbone weights only. Peak VRAM is dominated by the activation memory for 121 frames at 1216×704, not the file size.

Pipeline components

ComponentSize

Component	Size
T5-XXL text encoder offloaded	2.9 GB

Video VAEs are larger than image VAEs because they decode a temporal stack of frames.

Run it

LTX-Video 2B runs in ComfyUI or Diffusers. Generating more frames or higher resolution raises peak VRAM sharply; the fp8 + offload figure is for the default 121-frame clip.

ComfyUIDiffusers

Which devices can run LTX-Video 2B?

Apple Silicon Macs

RAM-only laptops

No mainstream local runtime for a 2B video model on RAM-only laptops yet.

iPhone & iPad

No mainstream local runtime for a 2B video model on iPhone & iPad yet.

Android

No mainstream local runtime for a 2B video model on Android yet.

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB) Yes

FAQ

How much VRAM does LTX-Video 2B need?

At fp8 + offload the realistic peak is ~10 GB, versus ~12 GB with every component resident. With aggressive CPU offload it drops to ~6 GB, much slower.

Why is peak VRAM lower than the sum of the files?

The text encoder is run once to encode your prompt, then offloaded to CPU before the frames are generated, so it is not resident at the memory peak.

Can I use LTX-Video 2B commercially?

Yes. LTX-Video 2B is licensed LTX-Video Open Weights (OpenRAIL-M), which permits commercial use.

Lightricks' fast DiT video model (2B), notable for near-real-time generation. The diffusers docs cite ~10GB VRAM with fp8 layerwise casting plus group offloading (the T5 encoder is offloaded). GGUF Q4_K_M (1.42GB backbone) + quantized T5 brings it to ~6GB. Generates 1216x704 at 30fps. OpenRAIL-M, commercial OK. Sources: Lightricks card, diffusers LTX docs, city96 GGUF.

Sources

VRAM is a sourced peak-usage anchor at fp8 + offload for the default clip length, validated 2026-06-15. See methodology.