Image model · UNET
SD Stable Diffusion 1.5 requirements
UNET image model · 0.86B params · 512×512 · 20-50 steps · released Oct 2022. Realistic minimum to run: Apple M1 (8GB) at fp16.
Use-based restrictions (no illegal or harmful content); no revenue cap.
Backbone size by precision
| Precision | Size |
|---|---|
| fp16 / bf16 (recommended) | 1.7 GB |
Backbone weights only. The verdict uses peak VRAM consumed at fp16, not the file size.
Pipeline components
| Component | Size |
|---|---|
| CLIP-L text encoder | 0.25 GB |
| VAE | 0.17 GB |
Encoders marked “offloaded” move to CPU before denoising, so they do not count toward peak VRAM.
Run it
Stable Diffusion 1.5 runs in AUTOMATIC1111, ComfyUI, Draw Things or diffusers. Load the fp16 backbone with its text encoder and VAE; there is no single chat command like a text LLM.
Which devices can run Stable Diffusion 1.5?
Apple Silicon Macs
- Apple M1 (8GB) Yes
- Apple M2 (16GB) Yes
- Apple M4 (16GB) Yes
- Apple M5 (16GB) Yes
- Apple M3 Pro (18GB) Yes
- Apple M4 (24GB) Yes
- Apple M4 Pro (24GB) Yes
- Apple M5 (32GB) Yes
- Apple M4 Pro (48GB) Yes
- Apple M5 Pro (48GB) Yes
- Apple M4 Max (64GB) Yes
- Apple M4 Max (128GB) Yes
- Apple M5 Max (128GB) Yes
- Apple M3 Ultra (256GB) Yes
RAM-only laptops
No mainstream local runtime for a 0.86B image model on RAM-only laptops yet.
iPhone & iPad
Android
No mainstream local runtime for a 0.86B image model on Android yet.
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM does Stable Diffusion 1.5 need?
At fp16 the realistic peak is ~3.7 GB, versus ~4 GB with every component resident. With aggressive CPU offload it drops to ~2 GB, much slower.
Why is peak VRAM lower than the sum of the files?
The pipeline moves each stage off the GPU between passes (sequential CPU offload), so peak VRAM stays near the active stage rather than the sum of every file.
Can I use Stable Diffusion 1.5 commercially?
Yes. Stable Diffusion 1.5 is licensed CreativeML OpenRAIL-M, which permits commercial use.
UNet 0.86B (~0.98B with the CLIP-L text encoder; no T5). The single CLIP-L encoder is ~0.25GB so all components stay resident and peak VRAM is the sum: UNet ~1.7GB + CLIP-L ~0.25GB + VAE ~0.17GB plus activations is ~3.7GB at 512x512, the widely-cited 4GB minimum. Runs on 2GB with AUTOMATIC1111 --lowvram, slowly. Sources: HF model card, AUTOMATIC1111 Optimizations wiki, fp16 checkpoint size.
Sources
VRAM is a sourced peak-usage anchor at fp16 (composed from component sizes, not a single measurement), validated 2026-06-15. See methodology.