Image model · MMDIT
SD Stable Diffusion 3.5 Large requirements
MMDIT image model · 8.1B params · 1024×1024 · 28-40 steps · released Oct 2024. Realistic minimum to run: Nvidia GeForce RTX 3060 (12GB) at Q4 GGUF.
Free for commercial use under $1M annual revenue; an enterprise license is required above that.
Backbone size by precision
| Precision | Size |
|---|---|
| fp16 / bf16 | 16.5 GB |
| fp8 | 14.9 GB |
| Q8 GGUF | 8.78 GB |
| Q4 GGUF (recommended) | 4.77 GB |
Backbone weights only. The verdict uses peak VRAM consumed at Q4 GGUF, not the file size.
Pipeline components
| Component | Size |
|---|---|
| CLIP-L text encoder | 0.25 GB |
| OpenCLIP-G text encoder | 1.39 GB |
| T5-XXL text encoder offloaded | 2.9 GB |
| VAE | 0.17 GB |
Encoders marked “offloaded” move to CPU before denoising, so they do not count toward peak VRAM.
Run it
Stable Diffusion 3.5 Large runs in ComfyUI, Draw Things or diffusers. Load the Q4 GGUF backbone with its text encoder and VAE; there is no single chat command like a text LLM.
Which devices can run Stable Diffusion 3.5 Large?
Apple Silicon Macs
- Apple M1 (8GB) No
- Apple M2 (16GB) Yes
- Apple M4 (16GB) Yes
- Apple M5 (16GB) Yes
- Apple M3 Pro (18GB) Yes
- Apple M4 (24GB) Yes
- Apple M4 Pro (24GB) Yes
- Apple M5 (32GB) Yes
- Apple M4 Pro (48GB) Yes
- Apple M5 Pro (48GB) Yes
- Apple M4 Max (64GB) Yes
- Apple M4 Max (128GB) Yes
- Apple M5 Max (128GB) Yes
- Apple M3 Ultra (256GB) Yes
RAM-only laptops
No mainstream local runtime for a 8.1B image model on RAM-only laptops yet.
iPhone & iPad
Android
No mainstream local runtime for a 8.1B image model on Android yet.
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM does Stable Diffusion 3.5 Large need?
At Q4 GGUF the realistic peak is ~7 GB, versus ~19 GB with every component resident. With aggressive CPU offload it drops to ~5 GB, much slower.
Why is peak VRAM lower than the sum of the files?
The text encoder is run once to encode your prompt, then offloaded to CPU before the denoising steps, so it is not resident at the memory peak.
Can I use Stable Diffusion 3.5 Large commercially?
Conditionally. Free for commercial use under $1M annual revenue; an enterprise license is required above that.
8.1B MMDiT with three text encoders (CLIP-L, OpenCLIP-G, T5-XXL). The 9.8GB T5-XXL is offloaded to CPU after prompt-encoding, so peak VRAM tracks the backbone, not the sum. At Q4 GGUF (4.77GB backbone) with T5 offloaded, peak is ~7GB (synthesis from city96 component sizes plus the diffusers offload behavior, not a single measurement). Stability's full-fp16 baseline is 19GB (11GB with TensorRT fp8). Sources: Stability SD3.5 announcement, city96 SD3.5 GGUF repo, Stability TensorRT note, diffusers SD3 docs.
Sources
VRAM is a sourced peak-usage anchor at Q4 GGUF (composed from component sizes, not a single measurement), validated 2026-06-15. See methodology.