video model · stable-video-diffusion · Windows
ST Can I run Stable Video Diffusion (img2vid-XT) on Nvidia GeForce RTX 4080 (16GB)?
Yes. Stable Video Diffusion (img2vid-XT) runs on Nvidia GeForce RTX 4080 (16GB) at fp16 + offload (~8 GB of ~15 GB usable).
Runs at fp16 + offload using ~8 GB of ~15 GB usable.
- Peak VRAM
- ~8 GB
- Usable on device
- ~15 GB
- Device memory
- 16 GB
- Quant
- fp16 + offload
How to run it
Use ComfyUI or Diffusers at fp16 + offload. It conditions on an image, not a text prompt; the pipeline offloads each stage off the GPU between passes, keeping peak VRAM near the active stage.
- Type
- video (UNET)
- Parameters
- 1.5B
- Peak VRAM
- ~8 GB at fp16 + offload
- Resolution
- 1024×576
- License
- Stable Video Diffusion Community License
- Memory
- 16 GB vram
- Usable for weights
- ~15 GB
- Best runtime
- vLLM (Linux) / Ollama (CUDA)
You could also run
Run Stable Video Diffusion (img2vid-XT) on other hardware
FAQ
Can Nvidia GeForce RTX 4080 (16GB) run Stable Video Diffusion (img2vid-XT)?
Yes. Stable Video Diffusion (img2vid-XT) runs on Nvidia GeForce RTX 4080 (16GB) at fp16 + offload (~8 GB of ~15 GB usable).
How much VRAM does Stable Video Diffusion (img2vid-XT) need?
Nvidia GeForce RTX 4080 (16GB) has room to spare. At fp16 + offload the realistic peak is ~8 GB of VRAM, versus ~22 GB with every component kept resident (no offload). With aggressive CPU offload it drops to ~8 GB, much slower.
What do I use to run Stable Video Diffusion (img2vid-XT) locally?
Stable Video Diffusion (img2vid-XT) runs in ComfyUI or Diffusers. It loads as a video diffusion checkpoint plus its image encoder and VAE, not a single chat command.
Sources
VRAM figures are sourced peak-usage anchors at the noted quant, validated 2026-06-15. See methodology.