I Want to Generate AI Images: Where’s the Sweet Spot for Value?
I’ve installed ComfyUI — an app that makes AI image generation easy to run on a PC — and I’ve been having fun making AI images. Blog thumbnails, social-media assets, visualizing an idea — cloud services are fine too, but the appeal of local AI is that a quick, casual generation is always within reach.
In this article I’ve broken down, by budget, “what kind of images you can make, and how fast."
* The specs and prices in this article are as of May 2026.
- 1. The upside of generating images at home
- 2. The tool we’ll use: ComfyUI
- 3. By budget: what your GPU can make
- 4. Value graph by GPU
- 5. How to choose
- 5.1. Case 1: For a hobby (tens to hundreds of images a month)
- 5.2. Case 2: Practical use for a blog or social media (tens of images a week)
- 5.3. Case 3: Commercial use, high-volume generation (hundreds a day and up)
- 5.4. Case 4: Wanting to dabble in AI video generation too
- 5.5. Case 5: Wanting to do a local LLM (Ollama) too
- 6. Choose by “what you want to make"
- 7. Summary: image generation has a low barrier to “just trying it"
Measured on the author’s machine (RTX 3090 24GB)
| Model | Resolution | Steps | Generation time |
|---|---|---|---|
| SD 1.5 | 512×512 | 20 | 8.0s |
| SDXL (Animagine) | 1024×1024 | 20 | 26.0s |
Test setup: RTX 3090 (24GB) / ComfyUI / Linux / measured May 2026
The upside of generating images at home
| Cloud (Midjourney, etc.) | Local (ComfyUI, etc.) |
|---|---|
| Around ¥1,500–6,000/month (depending on plan) | Upfront cost only |
| Limits on how many images you can make | Unlimited |
| The service decides which models you get | Use any model or LoRA you like |
| Your prompts are sent to a server | Fully local |
| Commercial use may be restricted | Free to use, depending on the model’s license |
The tool we’ll use: ComfyUI
As of 2026, the most widely used tool for local image generation is ComfyUI.
- A node-based workflow, so you can see the flow of processing visually
- Supports the major models — Stable Diffusion, FLUX, SDXL, and more
- A rich ecosystem of extensions: ControlNet, LoRA, upscalers, and so on
- An NVIDIA GPU + CUDA is the most stable (AMD ROCm is partially supported)
To install, just download and unpack it from the official ComfyUI site. No Python knowledge required.
Processing a 512×512 image pixel by pixel would mean computing over roughly 260,000 pixels, but an LDM first compresses this into a 64×64 latent space with a VAE (Variational Autoencoder) before denoising. The amount of computation is about 1/64 of pixel space. That’s why it runs at practical speeds even on a local GPU.
The flow looks like this.
1. Vectorize the text with a CLIP model → 2. Repeatedly denoise in latent space (U-Net / DiT) → 3. Restore a pixel image from the latent space with the VAE decoder
The step that loads VRAM the most is the denoising in step 2. As you raise the resolution, the latent-space size grows in proportion, so at 1024×1024 (SDXL’s standard) the VRAM used around the latent space is about 4x that of 512×512.
By budget: what your GPU can make
¥60–70k range (RTX 5060 8GB / RTX 5060 Ti 8GB)
What 8GB can do:
| Model | Can it run? | Rough time per image | Quality |
|---|---|---|---|
| FLUX.1 Schnell (FP8) | ◎ | 10–20s | High. Good at rendering text, too |
| SD 1.5 | ◎ | 3–8s | The staple. Tons of LoRAs |
| SDXL | △ | 30–60s | Runs, but slow. Combining with LoRA is rough |
| FLUX.1 Dev | × | Not enough VRAM | — |
What you can do:
- Easily generate high-quality images with FLUX Schnell
- Freely tweak styles — anime, photorealistic, and more — with SD 1.5 + LoRA
- Create blog thumbnail images
- Mass-produce images for social posts
What you can’t do:
- Complex SDXL workflows (ControlNet + LoRA at the same time)
- High-quality FLUX Dev generation
- Direct high-resolution (2K+) generation
Model-size example for FLUX.1 (12 billion parameters):
BF16 (standard distribution): ~24GB → FP8: ~12GB → GGUF 4-bit: ~6–7GB
FP8 loses some precision, but in image generation the difference usually falls within a range the human eye can’t distinguish. FLUX Schnell running on just 8GB of VRAM is thanks to this quantization plus ComfyUI’s automatic offloading.
FLUX Schnell running on 8GB is revolutionary. If you just want to try AI image generation, it’s plenty. But it’s not enough to unlock SDXL’s full potential.
Images per ¥10k (based on FLUX Schnell): Unlimited (upfront cost only, so cost-efficiency improves the more you use it)
Value: ★★★☆☆ (fine as a taster)
¥100k range (RTX 5070 12GB)
What 12GB can do:
| Model | Can it run? | Rough time per image | Quality |
|---|---|---|---|
| FLUX.1 Schnell (FP8) | ◎ | 5–10s | Fast |
| SD 1.5 | ◎ | 2–5s | Comfortable |
| SDXL | ◎ | 10–20s | Comfortable. Can combine LoRA too |
| SDXL + ControlNet | ○ | 15–30s | Lets you specify composition |
| FLUX.1 Dev | △ | Runs, but barely | FP8 required |
What you can do:
- SDXL runs comfortably → you can reliably produce high-quality images
- Generate images with a specified composition or pose using ControlNet
- Finely control style with LoRA
- Batch processing (continuous generation) is practical too
LoRA (Low-Rank Adaptation) is a technique that adds only a “low-rank difference matrix" to the model’s weight matrices. Without directly touching the huge original matrix, you can change the style with a small adapter of a few million to a few tens of millions of parameters (about 1% of the original or less).
A LoRA file is usually around 10–200MB. On VRAM it only adds a few hundred MB on top of the base model, so with 12GB you can use an SDXL base model + multiple LoRAs at once.
The sweet spot for image generation. 12GB is the minimum line where SDXL runs comfortably. From here you start to get the feeling of “I can make what I want to make."
Value: ★★★★☆ (the best balance if image generation is your main use)
¥90–100k range, 16GB (RTX 5060 Ti 16GB / RX 9070)
What 16GB can do:
| Model | Can it run? | Rough time per image | Quality |
|---|---|---|---|
| SDXL + ControlNet + LoRA | ◎ | 15–25s | Complex workflows OK |
| FLUX.1 Dev | ○ | 30–60s | Runs. Top-class quality |
| SD 3.5 | ◎ | 15–25s | A new-generation model |
| High-res upscaling | ◎ | +10–30s | Up to 2K–4K |
The model itself (for FP16: number of parameters x 2 bytes)
+ latent-space buffers (proportional to resolution)
+ additional modules like LoRA / ControlNet
+ the peak during VAE decode
Worked example — generating 1024×1024 with SDXL:
U-Net itself: ~5.1GB (FP16) + CLIP: ~1.3GB + VAE: ~0.3GB + latent-space buffer: ~2GB
= about 8.7GB total (minimal configuration, no LoRA or ControlNet)
Adding ControlNet adds +1.5–2.5GB, and one LoRA adds +0.1–0.3GB. At 12GB, one ControlNet is the limit, but at 16GB you get room to use ControlNet + multiple LoRAs at the same time.
RTX 5060 Ti 16GB vs RTX 5070 12GB:
| Comparison | RTX 5060 Ti 16GB (¥90k) | RTX 5070 12GB (¥100k) |
|---|---|---|
| VRAM | 16GB | 12GB |
| SDXL speed | A bit slow (128-bit bus) | Fast |
| FLUX Dev | Runs | Barely |
| Complex workflows | Comfortable | Barely |
More VRAM headroom, or more speed? If you want to try lots of models or build complex workflows, go 16GB of VRAM; if you want to simply generate fast and in bulk, 12GB suits you better.
Its price per GB of VRAM is the cheapest, but its compatibility with ComfyUI falls well short of NVIDIA. Operation on Windows can be unstable in places, and some custom nodes won’t work. For image generation, NVIDIA is recommended.
Value: ★★★★★ (the cheapest tier per GB of VRAM)
¥160k range (RTX 5070 Ti 16GB)
Same VRAM as the 5060 Ti 16GB, but with higher GPU performance the generation speed is 1.5–2x.
| Comparison | RTX 5060 Ti 16GB | RTX 5070 Ti 16GB |
|---|---|---|
| SDXL, one image | 15–25s | 8–15s |
| FLUX Dev, one image | 30–60s | 20–35s |
For people who generate in bulk, or who iterate on workflows a lot, the speed difference starts to matter. It’s also strong for doubling up with VR or a local LLM.
Value: ★★★★☆ (ideal if you’re planning to double up on uses)
¥180k and up (RX 7900 XTX 24GB / RTX 5090 32GB)
What 24GB and above can do:
| Model | 24GB | 32GB |
|---|---|---|
| FLUX.1 Dev | ◎ Comfortable | ◎ With room to spare |
| SDXL complex workflows | ◎ | ◎ |
| Video generation (Wan 2.1, etc.) | △ Offloading needed | ○ |
| Ultra-high resolution (4K+) | ◎ | ◎ |
Video generation is still tough on a consumer GPU, but with 24GB you reach a state where there’s almost nothing you can’t do.
Value graph by GPU
AI image generation performance vs price
How to read this graph: The horizontal axis is price (in ¥10k), the vertical axis is an overall AI-image-generation performance score. The closer to the top-left, the better the value. Point size represents VRAM capacity.
| GPU | Price (¥10k) | Image-gen score | VRAM | Notes |
|---|---|---|---|---|
| RTX 5060 Ti 8GB | 7 | 35 | 8GB | |
| RTX 5060 | 6 | 30 | 8GB | |
| RX 9070 | 8 | 40 | 16GB | * AMD = iffy ComfyUI compatibility |
| RTX 5060 Ti 16GB | 9 | 55 | 16GB | |
| RTX 5070 | 10 | 65 | 12GB | ★ The image-generation sweet spot |
| RTX 5070 Ti | 16 | 80 | 16GB | |
| RX 9070 XT | 9 | 45 | 16GB | * AMD |
| RX 7900 XTX | 18 | 85 | 24GB | * Linux recommended |
| RTX 5080 | 20 | 90 | 16GB | |
| RTX 5090 | 40 | 98 | 32GB |
* How the image-generation score is calculated:
- SDXL generation speed: 40%
- Range of supported models (VRAM-dependent): 35%
- Capacity for complex workflows: 25%
What the graph tells us
- The RTX 5070 (¥100k) is the value king for image generation. SDXL runs comfortably at 12GB, and the speed is plenty
- The RTX 5060 Ti 16GB (¥90k) is for the VRAM-first crowd. It reaches FLUX Dev, but it’s slower than the RTX 5070
- AMD (the RX 9070 line) looks like good value for its score, but the score is discounted by ComfyUI compatibility issues. On Linux it’s effectively a bit higher
- The RTX 5080 and up are for “mass production." There’s no difference in the quality of a single image, but the speed gap kicks in when generating in bulk
How to choose
Case 1: For a hobby (tens to hundreds of images a month)
→ RTX 5070 (12GB / ¥100k)
SDXL is comfortable and FLUX Schnell is fast too. You can use LoRA and ControlNet. You’ll fully enjoy the “image generation is fun" side of it. At a few hundred images a month, generation speed won’t be a bottleneck.
Case 2: Practical use for a blog or social media (tens of images a week)
→ RTX 5060 Ti 16GB (¥90k) or RTX 5070 (¥100k)
It comes down to how you view the ¥10k difference. If you want to try lots of models and play with FLUX Dev, go 5060 Ti 16GB. If speed matters and SDXL is your main, go 5070. Either is a good call.
Case 3: Commercial use, high-volume generation (hundreds a day and up)
→ RTX 5070 Ti (16GB / ¥160k)
16GB of VRAM + a fast GPU. Even building complex workflows and running batch jobs, you have headroom. Generation speed is 1.5–2x the 5060 Ti, so at high volume you recoup the price difference.
Case 4: Wanting to dabble in AI video generation too
→ RX 7900 XTX (24GB / ¥180k) * Linux recommended
→ If you can wait, hold out for the rumored RTX 5080 Ti (24GB?)
Video generation lives and dies by VRAM. At 16GB, offloading is mandatory and it’s not practical. 24GB is the minimum line.
Case 5: Wanting to do a local LLM (Ollama) too
→ RTX 5060 Ti 16GB (¥90k)
16GB pays off in both image generation and LLMs. “Covering two uses for ¥90k" is unbeatable value.
Choose by “what you want to make"
| What you want to do | VRAM needed | Recommended GPU | Budget |
|---|---|---|---|
| Blog thumbnails | 8GB | RTX 5060 Ti 8GB | ¥70k |
| Images for social posts | 8–12GB | RTX 5070 | ¥100k |
| Style control with LoRA | 12GB+ | RTX 5070 | ¥100k |
| Composition control with ControlNet | 12–16GB | RTX 5070 / 5060 Ti 16GB | ¥90–100k |
| FLUX Dev’s top quality | 16GB+ | RTX 5070 Ti | ¥160k |
| Commercial illustration work | 16GB+ | RTX 5070 Ti | ¥160k |
| AI video generation | 24GB+ | RX 7900 XTX | ¥180k |
If memory is tight in ComfyUI, the following help.
1. Tiled VAE Decode — split the image into 512×512 tiles for decoding (drastically cuts the VRAM peak)
2. Use an FP8-quantized model — ControlNet itself is also available in FP8
3. The –lowvram option — process in stages, trading speed for lower VRAM use
AI image generation support by GPU
| VRAM | FLUX Schnell | SDXL | FLUX Dev | Video |
| 8GB | ◎ FP8 | △ Slow | × | × |
| 12GB | ◎ | ◎ | △ FP8 required | × |
| 16GB | ◎ | ◎ | ○ | △ |
| 24GB | ◎ | ◎ | ◎ | ○ |
As of May 2026. ◎=Comfortable ○=Works △=Limited ×=Not enough VRAM
Summary: image generation has a low barrier to “just trying it"
Of all the corners of local AI, AI image generation is the “most visually fun" genre. Type in some text and a picture appears in a few seconds to a few tens of seconds — once you experience it, you’re hooked.
2026, where FLUX Schnell runs even on an 8GB GPU, has lowered the barrier to entry like never before.
And an image you generate can be turned into a 3D model to view in VR, or physically printed on a 3D printer — as a first step connecting the virtual and the real, AI image generation is just the right entry point.
- Running a local AI chatbot at home: a budget-by-budget guide — how to get started with local LLMs
- The full GPU spec list, 2026 edition — compare price, VRAM, and bandwidth across every GPU
- Starting local AI with a used GPU — a value check on the RTX 30/40 generations
- Recommended GPUs by use-case mix — which one if you’re doubling up
The specs and prices in this article are as of May 2026. Generation times vary by model, settings, and resolution.





Recent Comments