Image model · DIT
FX FLUX.1 schnell requirements
DIT image model · 12B params · 1024×1024 · 1-4 steps · released Aug 2024. Realistic minimum to run: Nvidia GeForce RTX 3060 (12GB) at Q4 GGUF.
Apache-2.0; the only FLUX.1 variant licensed for commercial products.
Backbone size by precision
| Precision | Size |
|---|---|
| fp16 / bf16 | 23.8 GB |
| fp8 | 11.9 GB |
| Q8 GGUF | 12.7 GB |
| Q4 GGUF (recommended) | 6.78 GB |
| Q2 GGUF | 4.01 GB |
Backbone weights only. The verdict uses peak VRAM consumed at Q4 GGUF, not the file size.
Pipeline components
| Component | Size |
|---|---|
| CLIP-L text encoder | 0.25 GB |
| T5-XXL text encoder offloaded | 2.9 GB |
| VAE | 0.34 GB |
Encoders marked “offloaded” move to CPU before denoising, so they do not count toward peak VRAM.
Run it
FLUX.1 schnell runs in ComfyUI, Draw Things or diffusers. Load the Q4 GGUF backbone with its text encoder and VAE; there is no single chat command like a text LLM.
Which devices can run FLUX.1 schnell?
Apple Silicon Macs
- Apple M1 (8GB) No
- Apple M2 (16GB) Yes
- Apple M4 (16GB) Yes
- Apple M5 (16GB) Yes
- Apple M3 Pro (18GB) Yes
- Apple M4 (24GB) Yes
- Apple M4 Pro (24GB) Yes
- Apple M5 (32GB) Yes
- Apple M4 Pro (48GB) Yes
- Apple M5 Pro (48GB) Yes
- Apple M4 Max (64GB) Yes
- Apple M4 Max (128GB) Yes
- Apple M5 Max (128GB) Yes
- Apple M3 Ultra (256GB) Yes
RAM-only laptops
No mainstream local runtime for a 12B image model on RAM-only laptops yet.
iPhone & iPad
Android
No mainstream local runtime for a 12B image model on Android yet.
NVIDIA GPUs
AMD GPUs
FAQ
How much VRAM does FLUX.1 schnell need?
At Q4 GGUF the realistic peak is ~6.5 GB, versus ~33 GB with every component resident. With aggressive CPU offload it drops to ~3 GB, much slower.
Why is peak VRAM lower than the sum of the files?
The text encoder is run once to encode your prompt, then offloaded to CPU before the denoising steps, so it is not resident at the memory peak.
Can I use FLUX.1 schnell commercially?
Yes. FLUX.1 schnell is licensed Apache-2.0, which permits commercial use.
The same 12B DiT as FLUX.1 dev, distilled to 1-4 step generation and licensed Apache-2.0 (the only commercially-usable FLUX.1 variant). T5-XXL is offloaded after prompt-encoding, so the Q4 GGUF backbone peaks at ~6.5GB during denoising. All-resident bf16 is ~33GB. Sources: BFL schnell model card, city96 FLUX.1-schnell GGUF repo, diffusers memory docs.
Sources
VRAM is a sourced peak-usage anchor at Q4 GGUF (composed from component sizes, not a single measurement), validated 2026-06-15. See methodology.