Image model · UNET
SD Stable Diffusion XL 1.0 requirements
UNET image model · 2.6B params · 1024×1024 · 25-40 steps · released Jul 2023. Realistic minimum to run: Nvidia GeForce RTX 3060 (12GB) at fp16.
Use-based restrictions; no revenue cap.
Backbone size by precision
| Precision | Size |
|---|---|
| fp16 / bf16 (recommended) | 5.1 GB |
Backbone weights only. The verdict uses peak VRAM consumed at fp16, not the file size.
Pipeline components
| Component | Size |
|---|---|
| CLIP-L text encoder | 0.25 GB |
| OpenCLIP-G text encoder | 1.39 GB |
| VAE | 0.34 GB |
Encoders marked “offloaded” move to CPU before denoising, so they do not count toward peak VRAM.
Run it
Stable Diffusion XL 1.0 runs in ComfyUI, AUTOMATIC1111 / Forge, Draw Things or diffusers. Load the fp16 backbone with its text encoder and VAE; there is no single chat command like a text LLM.
Which devices can run Stable Diffusion XL 1.0?
Apple Silicon Macs
- Apple M1 (8GB) No
- Apple M2 (16GB) Yes
- Apple M4 (16GB) Yes
- Apple M5 (16GB) Yes
- Apple M3 Pro (18GB) Yes
- Apple M4 (24GB) Yes
- Apple M4 Pro (24GB) Yes
- Apple M5 (32GB) Yes
- Apple M4 Pro (48GB) Yes
- Apple M5 Pro (48GB) Yes
- Apple M4 Max (64GB) Yes
- Apple M4 Max (128GB) Yes
- Apple M5 Max (128GB) Yes
- Apple M3 Ultra (256GB) Yes
RAM-only laptops
No mainstream local runtime for a 2.6B image model on RAM-only laptops yet.
iPhone & iPad
Android
No mainstream local runtime for a 2.6B image model on Android yet.
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM does Stable Diffusion XL 1.0 need?
At fp16 the realistic peak is ~7.5 GB, versus ~8.5 GB with every component resident. With aggressive CPU offload it drops to ~4 GB, much slower.
Why is peak VRAM lower than the sum of the files?
The pipeline moves each stage off the GPU between passes (sequential CPU offload), so peak VRAM stays near the active stage rather than the sum of every file.
Can I use Stable Diffusion XL 1.0 commercially?
Yes. Stable Diffusion XL 1.0 is licensed CreativeML OpenRAIL++-M, which permits commercial use.
UNet 2.6B (3.5B with CLIP-L + OpenCLIP-G; no T5). The two CLIP encoders total ~1.6GB and stay resident, so peak VRAM is the sum: ~7.5GB measured at 1024x1024 on an 8GB card, matching Stability's stated 8GB minimum. Runs on 4GB with AUTOMATIC1111 --lowvram, slowly. An optional refiner adds a second ~6GB UNet. Sources: Stability SDXL announcement, ComfyUI 8GB measurement, AUTOMATIC1111 Optimum SDXL wiki.
Sources
VRAM is a sourced peak-usage anchor at fp16, validated 2026-06-15. See methodology.