Skip to content
localmodel.run

video model · stable-video-diffusion · Windows

ST Can I run Stable Video Diffusion (img2vid-XT) on Nvidia GeForce RTX 3060 (12GB)?

Compatibility verdict VRAM check
Yes, it runsfast on this GPU

Yes. Stable Video Diffusion (img2vid-XT) runs on Nvidia GeForce RTX 3060 (12GB) at fp16 + offload (~8 GB of ~11 GB usable).

Needs ~8 GB Device usable ~11 GB

Runs at fp16 + offload using ~8 GB of ~11 GB usable.

Peak VRAM
~8 GB
Usable on device
~11 GB
Device memory
12 GB
Quant
fp16 + offload

How to run it

Use ComfyUI or Diffusers at fp16 + offload. It conditions on an image, not a text prompt; the pipeline offloads each stage off the GPU between passes, keeping peak VRAM near the active stage.

Model stable-video-diffusion
Type
video (UNET)
Parameters
1.5B
Peak VRAM
~8 GB at fp16 + offload
Resolution
1024×576
License
Stable Video Diffusion Community License
Full Stable Video Diffusion (img2vid-XT) requirements →
Device Windows
Memory
12 GB vram
Usable for weights
~11 GB
Best runtime
Ollama (CUDA) / llama.cpp CUDA
Best models for Nvidia GeForce RTX 3060 (12GB) →

You could also run

Run Stable Video Diffusion (img2vid-XT) on other hardware

FAQ

Can Nvidia GeForce RTX 3060 (12GB) run Stable Video Diffusion (img2vid-XT)?

Yes. Stable Video Diffusion (img2vid-XT) runs on Nvidia GeForce RTX 3060 (12GB) at fp16 + offload (~8 GB of ~11 GB usable).

How much VRAM does Stable Video Diffusion (img2vid-XT) need?

Nvidia GeForce RTX 3060 (12GB) has room to spare. At fp16 + offload the realistic peak is ~8 GB of VRAM, versus ~22 GB with every component kept resident (no offload). With aggressive CPU offload it drops to ~8 GB, much slower.

What do I use to run Stable Video Diffusion (img2vid-XT) locally?

Stable Video Diffusion (img2vid-XT) runs in ComfyUI or Diffusers. It loads as a video diffusion checkpoint plus its image encoder and VAE, not a single chat command.

Sources

VRAM figures are sourced peak-usage anchors at the noted quant, validated 2026-06-15. See methodology.