video model · stable-video-diffusion
ST Can I run Stable Video Diffusion (img2vid-XT)? Pick your device.
Stable Video Diffusion (img2vid-XT) is a 1.5B UNET video model. Realistic peak is ~8 GB of VRAM at fp16 + offload. It needs a GPU (Apple Silicon, NVIDIA or AMD); pick your hardware below.
Apple Silicon Macs
- Apple M1 (8GB) No
- Apple M2 (16GB) Yes
- Apple M4 (16GB) Yes
- Apple M5 (16GB) Yes
- Apple M3 Pro (18GB) Yes
- Apple M4 (24GB) Yes
- Apple M4 Pro (24GB) Yes
- Apple M5 (32GB) Yes
- Apple M4 Pro (48GB) Yes
- Apple M5 Pro (48GB) Yes
- Apple M4 Max (64GB) Yes
- Apple M4 Max (128GB) Yes
- Apple M5 Max (128GB) Yes
- Apple M3 Ultra (256GB) Yes
NVIDIA GPUs
AMD GPUs
VRAM is a sourced peak-usage anchor at fp16 + offload, validated 2026-06-15. See methodology.