I Want to Generate AI Images: Where’s the Sweet Spot for Value?

I’ve installed ComfyUI — an app that makes AI image generation easy to run on a PC — and I’ve been having fun making AI images. Blog thumbnails, social-media assets, visualizing an idea — cloud services are fine too, but the appeal of local AI is that a quick, casual generation is always within reach.

In this article I’ve broken down, by budget, “what kind of images you can make, and how fast."

* The specs and prices in this article are as of May 2026.

Measured on the author’s machine (RTX 3090 24GB)

Model Resolution Steps Generation time
SD 1.5 512×512 20 8.0s
SDXL (Animagine) 1024×1024 20 26.0s

Test setup: RTX 3090 (24GB) / ComfyUI / Linux / measured May 2026

The upside of generating images at home

Cloud (Midjourney, etc.) Local (ComfyUI, etc.)
Around ¥1,500–6,000/month (depending on plan) Upfront cost only
Limits on how many images you can make Unlimited
The service decides which models you get Use any model or LoRA you like
Your prompts are sent to a server Fully local
Commercial use may be restricted Free to use, depending on the model’s license

The tool we’ll use: ComfyUI

As of 2026, the most widely used tool for local image generation is ComfyUI.

  • A node-based workflow, so you can see the flow of processing visually
  • Supports the major models — Stable Diffusion, FLUX, SDXL, and more
  • A rich ecosystem of extensions: ControlNet, LoRA, upscalers, and so on
  • An NVIDIA GPU + CUDA is the most stable (AMD ROCm is partially supported)

To install, just download and unpack it from the official ComfyUI site. No Python knowledge required.

Deep dive: why VRAM is the key to image generation — how Latent Diffusion worksThe Stable Diffusion and FLUX models ComfyUI uses are based on a technique called the Latent Diffusion Model.

Processing a 512×512 image pixel by pixel would mean computing over roughly 260,000 pixels, but an LDM first compresses this into a 64×64 latent space with a VAE (Variational Autoencoder) before denoising. The amount of computation is about 1/64 of pixel space. That’s why it runs at practical speeds even on a local GPU.

The flow looks like this.
1. Vectorize the text with a CLIP model → 2. Repeatedly denoise in latent space (U-Net / DiT) → 3. Restore a pixel image from the latent space with the VAE decoder

The step that loads VRAM the most is the denoising in step 2. As you raise the resolution, the latent-space size grows in proportion, so at 1024×1024 (SDXL’s standard) the VRAM used around the latent space is about 4x that of 512×512.

By budget: what your GPU can make

¥60–70k range (RTX 5060 8GB / RTX 5060 Ti 8GB)

[kimono_product id="15770″]

What 8GB can do:

Model Can it run? Rough time per image Quality
FLUX.1 Schnell (FP8) 10–20s High. Good at rendering text, too
SD 1.5 3–8s The staple. Tons of LoRAs
SDXL 30–60s Runs, but slow. Combining with LoRA is rough
FLUX.1 Dev × Not enough VRAM

What you can do:

  • Easily generate high-quality images with FLUX Schnell
  • Freely tweak styles — anime, photorealistic, and more — with SD 1.5 + LoRA
  • Create blog thumbnail images
  • Mass-produce images for social posts

What you can’t do:

  • Complex SDXL workflows (ControlNet + LoRA at the same time)
  • High-quality FLUX Dev generation
  • Direct high-resolution (2K+) generation
Deep dive: what is FP8 quantization?An AI model’s “weights" are normally stored in FP32 (32-bit floating point). Convert them to FP16 (16-bit) and VRAM use is halved; FP8 (8-bit) brings it down to a quarter.

Model-size example for FLUX.1 (12 billion parameters):
BF16 (standard distribution): ~24GB → FP8: ~12GB → GGUF 4-bit: ~6–7GB

FP8 loses some precision, but in image generation the difference usually falls within a range the human eye can’t distinguish. FLUX Schnell running on just 8GB of VRAM is thanks to this quantization plus ComfyUI’s automatic offloading.

FLUX Schnell running on 8GB is revolutionary. If you just want to try AI image generation, it’s plenty. But it’s not enough to unlock SDXL’s full potential.

Images per ¥10k (based on FLUX Schnell): Unlimited (upfront cost only, so cost-efficiency improves the more you use it)

Value: ★★★☆☆ (fine as a taster)

¥100k range (RTX 5070 12GB)

What 12GB can do:

Model Can it run? Rough time per image Quality
FLUX.1 Schnell (FP8) 5–10s Fast
SD 1.5 2–5s Comfortable
SDXL 10–20s Comfortable. Can combine LoRA too
SDXL + ControlNet 15–30s Lets you specify composition
FLUX.1 Dev Runs, but barely FP8 required

What you can do:

  • SDXL runs comfortably → you can reliably produce high-quality images
  • Generate images with a specified composition or pose using ControlNet
  • Finely control style with LoRA
  • Batch processing (continuous generation) is practical too
Deep dive: how LoRA works — why a tiny file can change the styleThe SDXL base model has about 3.5 billion parameters (roughly a 7GB file). When teaching it a new art style or character, retraining every parameter is impractical.

LoRA (Low-Rank Adaptation) is a technique that adds only a “low-rank difference matrix" to the model’s weight matrices. Without directly touching the huge original matrix, you can change the style with a small adapter of a few million to a few tens of millions of parameters (about 1% of the original or less).

A LoRA file is usually around 10–200MB. On VRAM it only adds a few hundred MB on top of the base model, so with 12GB you can use an SDXL base model + multiple LoRAs at once.

The sweet spot for image generation. 12GB is the minimum line where SDXL runs comfortably. From here you start to get the feeling of “I can make what I want to make."

Value: ★★★★☆ (the best balance if image generation is your main use)

¥90–100k range, 16GB (RTX 5060 Ti 16GB / RX 9070)

[kimono_product id="15760″]

What 16GB can do:

Model Can it run? Rough time per image Quality
SDXL + ControlNet + LoRA 15–25s Complex workflows OK
FLUX.1 Dev 30–60s Runs. Top-class quality
SD 3.5 15–25s A new-generation model
High-res upscaling +10–30s Up to 2K–4K
Deep dive: how to estimate VRAM useVRAM use during image generation can be roughly estimated as follows.

The model itself (for FP16: number of parameters x 2 bytes)
+ latent-space buffers (proportional to resolution)
+ additional modules like LoRA / ControlNet
+ the peak during VAE decode

Worked example — generating 1024×1024 with SDXL:
U-Net itself: ~5.1GB (FP16) + CLIP: ~1.3GB + VAE: ~0.3GB + latent-space buffer: ~2GB
= about 8.7GB total (minimal configuration, no LoRA or ControlNet)

Adding ControlNet adds +1.5–2.5GB, and one LoRA adds +0.1–0.3GB. At 12GB, one ControlNet is the limit, but at 16GB you get room to use ControlNet + multiple LoRAs at the same time.

RTX 5060 Ti 16GB vs RTX 5070 12GB:

Comparison RTX 5060 Ti 16GB (¥90k) RTX 5070 12GB (¥100k)
VRAM 16GB 12GB
SDXL speed A bit slow (128-bit bus) Fast
FLUX Dev Runs Barely
Complex workflows Comfortable Barely

More VRAM headroom, or more speed? If you want to try lots of models or build complex workflows, go 16GB of VRAM; if you want to simply generate fast and in bulk, 12GB suits you better.

A caution about the AMD RX 9070 (16GB / ~¥100k)
Its price per GB of VRAM is the cheapest, but its compatibility with ComfyUI falls well short of NVIDIA. Operation on Windows can be unstable in places, and some custom nodes won’t work. For image generation, NVIDIA is recommended.

Value: ★★★★★ (the cheapest tier per GB of VRAM)

¥160k range (RTX 5070 Ti 16GB)

[kimono_product id="15762″]

Same VRAM as the 5060 Ti 16GB, but with higher GPU performance the generation speed is 1.5–2x.

Comparison RTX 5060 Ti 16GB RTX 5070 Ti 16GB
SDXL, one image 15–25s 8–15s
FLUX Dev, one image 30–60s 20–35s

For people who generate in bulk, or who iterate on workflows a lot, the speed difference starts to matter. It’s also strong for doubling up with VR or a local LLM.

Value: ★★★★☆ (ideal if you’re planning to double up on uses)

¥180k and up (RX 7900 XTX 24GB / RTX 5090 32GB)

What 24GB and above can do:

Model 24GB 32GB
FLUX.1 Dev ◎ Comfortable ◎ With room to spare
SDXL complex workflows
Video generation (Wan 2.1, etc.) △ Offloading needed
Ultra-high resolution (4K+)

Video generation is still tough on a consumer GPU, but with 24GB you reach a state where there’s almost nothing you can’t do.

Value graph by GPU

AI image generation performance vs price

How to read this graph: The horizontal axis is price (in ¥10k), the vertical axis is an overall AI-image-generation performance score. The closer to the top-left, the better the value. Point size represents VRAM capacity.

GPU Price (¥10k) Image-gen score VRAM Notes
RTX 5060 Ti 8GB 7 35 8GB
RTX 5060 6 30 8GB
RX 9070 8 40 16GB * AMD = iffy ComfyUI compatibility
RTX 5060 Ti 16GB 9 55 16GB
RTX 5070 10 65 12GB ★ The image-generation sweet spot
RTX 5070 Ti 16 80 16GB
RX 9070 XT 9 45 16GB * AMD
RX 7900 XTX 18 85 24GB * Linux recommended
RTX 5080 20 90 16GB
RTX 5090 40 98 32GB

* How the image-generation score is calculated:

  • SDXL generation speed: 40%
  • Range of supported models (VRAM-dependent): 35%
  • Capacity for complex workflows: 25%

What the graph tells us

  1. The RTX 5070 (¥100k) is the value king for image generation. SDXL runs comfortably at 12GB, and the speed is plenty
  2. The RTX 5060 Ti 16GB (¥90k) is for the VRAM-first crowd. It reaches FLUX Dev, but it’s slower than the RTX 5070
  3. AMD (the RX 9070 line) looks like good value for its score, but the score is discounted by ComfyUI compatibility issues. On Linux it’s effectively a bit higher
  4. The RTX 5080 and up are for “mass production." There’s no difference in the quality of a single image, but the speed gap kicks in when generating in bulk

How to choose

Case 1: For a hobby (tens to hundreds of images a month)

→ RTX 5070 (12GB / ¥100k)

SDXL is comfortable and FLUX Schnell is fast too. You can use LoRA and ControlNet. You’ll fully enjoy the “image generation is fun" side of it. At a few hundred images a month, generation speed won’t be a bottleneck.

Case 2: Practical use for a blog or social media (tens of images a week)

→ RTX 5060 Ti 16GB (¥90k) or RTX 5070 (¥100k)

It comes down to how you view the ¥10k difference. If you want to try lots of models and play with FLUX Dev, go 5060 Ti 16GB. If speed matters and SDXL is your main, go 5070. Either is a good call.

Case 3: Commercial use, high-volume generation (hundreds a day and up)

→ RTX 5070 Ti (16GB / ¥160k)

16GB of VRAM + a fast GPU. Even building complex workflows and running batch jobs, you have headroom. Generation speed is 1.5–2x the 5060 Ti, so at high volume you recoup the price difference.

Case 4: Wanting to dabble in AI video generation too

→ RX 7900 XTX (24GB / ¥180k) * Linux recommended
→ If you can wait, hold out for the rumored RTX 5080 Ti (24GB?)

Video generation lives and dies by VRAM. At 16GB, offloading is mandatory and it’s not practical. 24GB is the minimum line.

Case 5: Wanting to do a local LLM (Ollama) too

Hands-on: I run image generation and a local LLM at the same time on a dual-GPU setup — an RTX 3090 (24GB) and an RTX 3060 (12GB).

→ RTX 5060 Ti 16GB (¥90k)

16GB pays off in both image generation and LLMs. “Covering two uses for ¥90k" is unbeatable value.

Choose by “what you want to make"

What you want to do VRAM needed Recommended GPU Budget
Blog thumbnails 8GB RTX 5060 Ti 8GB ¥70k
Images for social posts 8–12GB RTX 5070 ¥100k
Style control with LoRA 12GB+ RTX 5070 ¥100k
Composition control with ControlNet 12–16GB RTX 5070 / 5060 Ti 16GB ¥90–100k
FLUX Dev’s top quality 16GB+ RTX 5070 Ti ¥160k
Commercial illustration work 16GB+ RTX 5070 Ti ¥160k
AI video generation 24GB+ RX 7900 XTX ¥180k
Deep dive: why ControlNet is heavy — the cost of “conditional generation"ControlNet extracts a “feature map" from a pose image or depth map and injects it into the U-Net denoising process. Because an additional network that duplicates the encoder part of the base model’s U-Net runs, VRAM use goes up substantially (about +1.5–2.5GB with SDXL).

If memory is tight in ComfyUI, the following help.
1. Tiled VAE Decode — split the image into 512×512 tiles for decoding (drastically cuts the VRAM peak)
2. Use an FP8-quantized model — ControlNet itself is also available in FP8
3. The –lowvram option — process in stages, trading speed for lower VRAM use

[kimono_heatmap title="AI image generation support by GPU" note="As of May 2026. ◎=Comfortable ○=Works △=Limited ×=Not enough VRAM"]
VRAM|FLUX Schnell|SDXL|FLUX Dev|Video
8GB|◎ FP8|△ Slow|×|×
12GB|◎|◎|△ FP8 required|×
16GB|◎|◎|○|△
24GB|◎|◎|◎|○
[/kimono_heatmap]

Summary: image generation has a low barrier to “just trying it"

Of all the corners of local AI, AI image generation is the “most visually fun" genre. Type in some text and a picture appears in a few seconds to a few tens of seconds — once you experience it, you’re hooked.

2026, where FLUX Schnell runs even on an 8GB GPU, has lowered the barrier to entry like never before.

And an image you generate can be turned into a 3D model to view in VR, or physically printed on a 3D printer — as a first step connecting the virtual and the real, AI image generation is just the right entry point.

Related

The specs and prices in this article are as of May 2026. Generation times vary by model, settings, and resolution.

GPUs mentioned in this article

[kimono_product id="15760″]

[kimono_product id="15762″]

[kimono_product id="15761″]