Skip to content
localmodel.run

video model · stable-video-diffusion

ST Can I run Stable Video Diffusion (img2vid-XT)? Pick your device.

Stable Video Diffusion (img2vid-XT) is a 1.5B UNET video model. Realistic peak is ~8 GB of VRAM at fp16 + offload. It needs a GPU (Apple Silicon, NVIDIA or AMD); pick your hardware below.

VRAM is a sourced peak-usage anchor at fp16 + offload, validated 2026-06-15. See methodology.