Image model · DIT

FX FLUX.1 dev requirements

DIT image model · 12B params · 1024×1024 · 20-50 steps · released Aug 2024. Realistic minimum to run: Nvidia GeForce RTX 3060 (12GB) at Q4 GGUF.

FLUX.1-dev Non-Commercial License Non-commercial

Weights are non-commercial. Generated images may be used commercially, but using the model inside a product needs a separate Black Forest Labs license.

Peak VRAM (Q4 GGUF)

~6.5 GB

All resident

~33 GB

Offload floor

~3 GB

Resolution

1024×1024

Backbone size by precision

Precision On disk

Precision	Size
fp16 / bf16	23.8 GB
fp8	11.9 GB
Q8 GGUF	12.7 GB
Q4 GGUF (recommended)	6.81 GB
Q2 GGUF	4.03 GB

Backbone weights only. The verdict uses peak VRAM consumed at Q4 GGUF, not the file size.

Pipeline components

Component Size

Component	Size
CLIP-L text encoder	0.25 GB
T5-XXL text encoder offloaded	2.9 GB
VAE	0.34 GB

Encoders marked “offloaded” move to CPU before denoising, so they do not count toward peak VRAM.

Run it

FLUX.1 dev runs in ComfyUI, Draw Things or diffusers. Load the Q4 GGUF backbone with its text encoder and VAE; there is no single chat command like a text LLM.

ComfyUIDraw Thingsdiffusers

Which devices can run FLUX.1 dev?

Apple Silicon Macs

RAM-only laptops

No mainstream local runtime for a 12B image model on RAM-only laptops yet.

iPhone & iPad

Android

No mainstream local runtime for a 12B image model on Android yet.

NVIDIA GPUs

AMD GPUs

AMD Radeon RX 7900 XTX (24GB) Yes

FAQ

How much VRAM does FLUX.1 dev need?

At Q4 GGUF the realistic peak is ~6.5 GB, versus ~33 GB with every component resident. With aggressive CPU offload it drops to ~3 GB, much slower.

Why is peak VRAM lower than the sum of the files?

The text encoder is run once to encode your prompt, then offloaded to CPU before the denoising steps, so it is not resident at the memory peak.

Can I use FLUX.1 dev commercially?

No. Weights are non-commercial. Generated images may be used commercially, but using the model inside a product needs a separate Black Forest Labs license.

12B DiT with CLIP-L and a 9.8GB T5-XXL text encoder. T5 is offloaded to CPU after prompt-encoding, so the Q4 GGUF backbone (6.8GB) drives a denoising peak of ~6.5GB, measured at 6.4GB on an RTX 2080 8GB in Forge. All-components-resident at bf16 is ~33GB. Sequential CPU offload runs on ~3GB, very slowly. Non-commercial license. Sources: BFL model card, city96 FLUX.1-dev GGUF repo and 8GB discussion, diffusers memory docs.

Sources

VRAM is a sourced peak-usage anchor at Q4 GGUF, validated 2026-06-15. See methodology.