2026 GPU Spec List for Local AI, VR and Image Generation
How to choose a GPU and why VRAM matters so much are covered in separate articles.
For how to choose a GPU and a budget-by-budget guide, see → Running a local AI chatbot at home: a budget-by-budget guide
This article lists the full specs and prices of every GPU you can buy as of April 2026. If you’re weighing your options, bookmark it and use it for comparison.
- 1. Full GPU spec list (as of April 2026)
- 1.1. NVIDIA GeForce RTX 50 series (Blackwell, released 2025–2026)
- 1.2. NVIDIA GeForce RTX 40 series (Ada Lovelace, released 2022–2024)
- 1.3. NVIDIA GeForce RTX 30 series (Ampere, released 2020–2022)
- 1.4. AMD Radeon RX 9000 series (RDNA 4, released 2025)
- 1.5. AMD Radeon RX 7000 series (RDNA 3, released 2022–2024)
- 2. Quick recommendation table by VRAM
- 3. Notes on the spec tables
- 4. How generation speed is estimated
- 5. Full GPU value ranking
- 6. Conclusion: if you’re torn, these three
- 7. Summary
- 8. Related
- 9. 2026 GPU prices — by VRAM
Full GPU spec list (as of April 2026)
VRAM = the amount of memory on the GPU. Since an AI model has to load in its entirety, this is the single most important spec.
Price = street price as of April 2026 (tax included). It changes daily.
Comment = the practical points for AI and VR use.
Rows with a yellow background are the good-value picks.
NVIDIA GeForce RTX 50 series (Blackwell, released 2025–2026)
| GPU | VRAM | New price | Used range | Comment |
|---|---|---|---|---|
| RTX 5090 | 32GB | ¥400k–700k (list ~¥400k; inflated on shortages) | – | The full spread. Power draw and price are also top-tier |
| RTX 5080 | 16GB | ¥210k–250k | – | Great for max-setting VR. For LLMs, 16GB is the ceiling |
| RTX 5070 Ti | 16GB | ~¥160k | – | A balanced VR + AI pick. Modest power draw too |
| RTX 5070 | 12GB | ~¥100k | – | Good for VR, but 12GB is limiting for AI |
| RTX 5060 Ti 16GB | 16GB | ~¥90k–100k | – | Cheapest-class 16GB CUDA. Great for AI image generation |
| RTX 5060 Ti 8GB | 8GB | ~¥70k | – | Entry VR. LLMs up to 8B models |
| RTX 5060 | 8GB | ~¥60k | – | Low power. For VR + light AI use |
See detailed RTX 50-series specs (bandwidth, TDP, LLM score)
| GPU | Memory bandwidth | TDP | LLM score |
|---|---|---|---|
| RTX 5090 | 1792 GB/s | 575W | 120 |
| RTX 5080 | 960 GB/s | 360W | 80 |
| RTX 5070 Ti | 896 GB/s | 300W | 72 |
| RTX 5070 | 672 GB/s | 250W | 60 |
| RTX 5060 Ti 16GB | 448 GB/s | 180W | 45 |
| RTX 5060 Ti 8GB | 448 GB/s | 180W | 40 |
| RTX 5060 | 448 GB/s | 145W | 35 |
NVIDIA GeForce RTX 40 series (Ada Lovelace, released 2022–2024)
| GPU | VRAM | New price | Used range | Comment |
|---|---|---|---|---|
| RTX 4090 | 24GB | ¥460k–550k | ¥300k–380k | The reference. Prices spiking as production ends |
| RTX 4080 SUPER | 16GB | Low stock | ¥150k–180k | Max-setting VR. A bargain used |
| RTX 4080 | 16GB | Low stock | ¥130k–160k | Nearly identical to the 4080 SUPER. Used is a good buy |
| RTX 4070 Ti SUPER | 16GB | Low stock | ¥110k–140k | Tends to be the cheapest 16GB used |
| RTX 4070 Ti | 12GB | Low stock | ¥90k–110k | Good for VR. 12GB is limiting for AI |
| RTX 4070 SUPER | 12GB | Low stock | ¥80k–100k | Low power, comfy VR. Good used value |
| RTX 4070 | 12GB | Low stock | ¥70k–90k | Entry low-power VR. 12GB works for image generation too |
| RTX 4060 Ti 16GB | 16GB | Low stock | ¥70k–100k | Narrow bandwidth, so LLMs are slow. The 16GB version is scarce even used |
| RTX 4060 Ti 8GB | 8GB | Low stock | ¥40k–50k | For lightweight VR titles |
| RTX 4060 | 8GB | Low stock | ¥30k–40k | Lowest power. LLMs only up to 8B models |
See detailed RTX 40-series specs (bandwidth, TDP, LLM score)
| GPU | Memory bandwidth | TDP | LLM score |
|---|---|---|---|
| RTX 4090 | 1008 GB/s | 450W | 100 |
| RTX 4080 SUPER | 736 GB/s | 320W | 70 |
| RTX 4080 | 716 GB/s | 320W | 68 |
| RTX 4070 Ti SUPER | 672 GB/s | 285W | 62 |
| RTX 4070 Ti | 504 GB/s | 285W | 55 |
| RTX 4070 SUPER | 504 GB/s | 220W | 50 |
| RTX 4070 | 504 GB/s | 200W | 45 |
| RTX 4060 Ti 16GB | 288 GB/s | 165W | 32 |
| RTX 4060 Ti 8GB | 288 GB/s | 160W | 28 |
| RTX 4060 | 272 GB/s | 115W | 22 |
NVIDIA GeForce RTX 30 series (Ampere, released 2020–2022)
| GPU | VRAM | New price | Used range | Comment |
|---|---|---|---|---|
| RTX 3090 Ti | 24GB | Discontinued | ¥140k–180k | About the same performance as the 3090. High 450W draw. The 3090 is often cheaper |
| RTX 3090 | 24GB | Discontinued | ¥100k–140k | 24GB in the ¥100k range. The used LLM standard |
| RTX 3080 Ti | 12GB | Discontinued | ¥60k–80k | Wide bandwidth but 12GB VRAM. Gaming + light AI |
| RTX 3080 12GB | 12GB | Discontinued | ¥50k–70k | Nearly identical to the 3080 Ti |
| RTX 3080 10GB | 10GB | Discontinued | ¥40k–60k | 10GB is awkward. 8B models will run |
| RTX 3070 Ti | 8GB | Discontinued | ¥30k–50k | Cheap as a used VR card |
| RTX 3070 | 8GB | Discontinued | ¥30k–40k | The used entry-VR standard |
| RTX 3060 Ti | 8GB | Discontinued | ¥25k–40k | Good-value used VR card |
| RTX 3060 | 12GB | Discontinued | ¥20k–30k | 12GB in the low ¥20,000s. Narrow bandwidth, but good for entry-level LLMs |
See detailed RTX 30-series specs (bandwidth, TDP, LLM score)
| GPU | Memory bandwidth | TDP | LLM score |
|---|---|---|---|
| RTX 3090 Ti | 1008 GB/s | 450W | 75 |
| RTX 3090 | 936 GB/s | 350W | 70 |
| RTX 3080 Ti | 912 GB/s | 350W | 48 |
| RTX 3080 12GB | 912 GB/s | 350W | 45 |
| RTX 3080 10GB | 760 GB/s | 320W | 42 |
| RTX 3070 Ti | 608 GB/s | 290W | 32 |
| RTX 3070 | 448 GB/s | 220W | 30 |
| RTX 3060 Ti | 448 GB/s | 200W | 25 |
| RTX 3060 | 360 GB/s | 170W | 20 |
AMD Radeon RX 9000 series (RDNA 4, released 2025)
| GPU | VRAM | New price | Used range | Comment |
|---|---|---|---|---|
| RX 9070 XT | 16GB | ~¥90k–100k | – | Gaming performance around the 5070 Ti. Linux recommended for AI |
| RX 9070 | 16GB | ~¥80k | – | 16GB for ¥80k. AI-tool support on Windows is unstable |
AMD Radeon RX 7000 series (RDNA 3, released 2022–2024)
| GPU | VRAM | New price | Used range | Comment |
|---|---|---|---|---|
| RX 7900 XTX | 24GB | ~¥180k | ¥120k–150k | 24GB for ¥180k. A strong pick with ROCm + Linux |
| RX 7900 XT | 20GB | ~¥130k | ¥90k–120k | 20GB is an unusual size. Worth considering at the right price |
| RX 7900 GRE | 16GB | ~¥80k | ¥60k–80k | 16GB in the ¥80k range. For AMD fans |
| RX 7800 XT | 16GB | ~¥70k | ¥50k–70k | Good for gaming. AI requires a ROCm environment |
| RX 7700 XT | 12GB | ~¥50k | ¥40k–50k | 12GB for ¥50k. Gaming-oriented |
See detailed AMD Radeon specs (bandwidth, TDP, LLM score)
| GPU | Memory bandwidth | TDP | LLM score |
|---|---|---|---|
| RX 9070 XT | 640 GB/s | 304W | 45 (ROCm) |
| RX 9070 | 640 GB/s | 220W | 40 (ROCm) |
| RX 7900 XTX | 960 GB/s | 355W | 60 (ROCm) |
| RX 7900 XT | 800 GB/s | 315W | 50 (ROCm) |
| RX 7900 GRE | 576 GB/s | 260W | 38 (ROCm) |
| RX 7800 XT | 624 GB/s | 263W | 35 (ROCm) |
| RX 7700 XT | 432 GB/s | 245W | 25 (ROCm) |
Quick recommendation table by VRAM
For each VRAM capacity, it lists one recommended GPU each, new and used.
AI use guide = the AI models you can practically run with that VRAM capacity.
| VRAM | New pick | Used pick | AI use guide |
|---|---|---|---|
| 8GB | RTX 5060 (~¥60k) | RTX 3060 Ti (¥25k–40k) | 8B LLM, SD 1.5, medium VR settings |
| 12GB | RTX 5070 (~¥100k) | RTX 3060 (¥20k–30k) | 14B LLM, SDXL, high VR settings |
| 16GB | RTX 5060 Ti 16GB (~¥90k–110k) | RTX 4070 Ti SUPER (¥110k–140k) | 14B + long context, complex SDXL workflows |
| 20–24GB | RX 7900 XTX (~¥180k) | RTX 3090 (¥130k–180k) | 32B LLM, FLUX Dev, nearly all image generation |
| 32GB | RTX 5090 (¥400k–700k; list ~¥400k, inflated on shortages) | – | 32B + very long context (32K+) |
Notes on the spec tables
Why memory bandwidth matters
LLM generation speed is roughly proportional to memory bandwidth. Even at the same VRAM capacity, a GPU with wider bandwidth generates text faster. For example, between the RTX 3060 (12GB / 360 GB/s) and the RTX 3080 Ti (12GB / 912 GB/s), the same model shows a 2x+ difference in generation speed.
About the LLM score
The scores in this article are relative values with the RTX 4090 set to 100, based on text-generation speed with Q4_K_M quantized models. Actual speed varies with the model and context length. AMD GPU scores are reference values in a ROCm (Linux) environment.
Cautions when buying a used GPU
Used prices are based on shop stock (with a warranty). Flea-market apps can be cheaper, but there’s a risk of mining history and fan wear, so I recommend buying from a shop.
How generation speed is estimated
The “estimated tok/s" figures in this and related articles are calculated with the following formula.
What this formula means
Every time a GPU generates one token, it reads all of the model’s parameters from memory. The ratio between “the minimum time to read all of it" and “the time it actually takes to generate one token" is about 0.75. The remaining 0.25 goes to reading and writing the past conversation history (the KV cache) and to decompressing the compressed data (Q4 quantization).
Basis for 0.75: measured data
Below are results measured on an RTX 3090 and an RTX 3060 under conditions where the model fully fits in VRAM (April 15, 2026, Ollama 0.20.2, 256-token generation, Linux).
| GPU | Bandwidth | Model | Size | Theoretical tok/s | Measured tok/s | Measured ÷ theoretical |
|---|---|---|---|---|---|---|
| RTX 3090 | 936 GB/s | qwen3:8b | 5.2GB | 180 | 126.4 | 0.70 |
| RTX 3090 | 936 GB/s | qwen3.5:9b | 6.6GB | 142 | 98.0 | 0.69 |
| RTX 3090 | 936 GB/s | qwen3:14b | 9.3GB | 101 | 76.6 | 0.76 |
| RTX 3060 | 360 GB/s | qwen3:8b | 5.2GB | 69 | 60.1 | 0.87 |
| RTX 3060 | 360 GB/s | qwen3.5:9b | 6.6GB | 55 | 46.6 | 0.85 |
| RTX 3060 | 360 GB/s | qwen3:14b | 9.3GB | 39 | 34.5 | 0.89 |
* Theoretical tok/s = memory bandwidth ÷ model size. The theoretical ceiling, assuming “one token can be generated in the time it takes to read all parameters once."
“Measured ÷ theoretical" falls in the 0.69–0.89 range, with a median of about 0.75. The narrower a GPU’s bandwidth (RTX 3060), the more memory-read waiting dominates, so results land close to the theoretical value. On a wide-bandwidth GPU (RTX 3090), the non-read work (reading/writing conversation history, decompression) stands out relatively more, so the ratio comes out a bit lower.
Limits of the estimate
This formula is only valid when the model fully fits in VRAM. If VRAM runs short and the model spills into CPU memory, speed drops dramatically (measured, it can fall to a tenth or less).
Also, when a model is split across two GPUs, you can combine the bandwidth of both, so it’s faster than a single card. In my setup, qwen3.5:27b (17GB) split across an RTX 3090 + RTX 3060 ran at 25.5 tok/s.
Quick estimated-speed table by GPU (8B and 14B models)
| GPU | Bandwidth | 8B est. | 14B est. | Notes |
|---|---|---|---|---|
| RTX 4060 Ti 8GB | 288 GB/s | 42 | — | 14B won’t fit in VRAM |
| RTX 3060 12GB | 360 GB/s | ★ 60 | ★ 35 | Author-measured |
| RTX 5060 Ti 16GB | 448 GB/s | 65 | 36 | |
| RTX 4070 Ti S 16GB | 672 GB/s | 97 | 54 | |
| RTX 5070 12GB | 672 GB/s | 97 | — | 14B won’t fit in VRAM |
| RTX 3080 10GB | 760 GB/s | 110 | — | 14B won’t fit in VRAM |
| RTX 5070 Ti 16GB | 896 GB/s | 129 | 72 | |
| RTX 3090 24GB | 936 GB/s | ★ 126 | ★ 77 | Author-measured |
| RX 7900 XTX 24GB | 960 GB/s | 138 | 77 | Assumes ROCm (Linux) |
| RTX 4090 24GB | 1008 GB/s | 145 | 81 | |
| RTX 5090 32GB | 1792 GB/s | 258 | 144 |
★ = author-measured (April 2026, Ollama 0.20.2, Linux). The rest are estimates from the formula (bandwidth ÷ model size × 0.75). Valid only when the model fully fits in VRAM. AMD GPUs are estimated in a ROCm environment.
Full GPU value ranking
How to read this chart: the longer the bar, the higher the “LLM performance per unit price" — i.e., the better the value. The value metric is “LLM score ÷ GPU price (¥10k units)." Marked [new]/[used].
* The LLM score is a relative value of text-generation speed with Ollama (Q4_K_M quantized), based on RTX 4090 = 100. AMD is a reference value under ROCm (Linux). Prices are as of April 2026. Used prices are the midpoint of the shop range.
The used RTX 3060 12GB and RTX 3080 12GB stand out on value. Among new cards, the RTX 5070 12GB and RTX 5060 Ti 16GB rank high. The pricey RTX 5090 and RTX 4090 are high-performance but rank low on the value metric. That said, choosing on value alone biases you toward “cheap GPUs with little VRAM," so the key is to secure the VRAM your use case needs first, then look at value.
Conclusion: if you’re torn, these three
| How to choose | Recommended GPU | Price guide | Reason |
|---|---|---|---|
| One new card | RTX 5060 Ti 16GB | ~¥90k | Cheapest class for 16GB VRAM in a CUDA environment. Runs AI image generation and LLMs practically |
| One used card | RTX 3060 12GB | ¥20k–30k | 12GB in the low ¥20,000s. LLM speed is slow, but cheapest if just loading the model comes first |
| Serious AI | RTX 3090 used / RX 7900 XTX new | ¥130k–180k | 24GB VRAM runs 32B models. The 3090 is a used CUDA setup; the 7900 XTX assumes Linux + ROCm |
Summary
This article is a reference of the full GPU spec list and prices as of April 2026. Prices move, so check the latest street prices before buying.
For how to choose a GPU (recommendations by use case, VRAM guidelines, a budget-by-budget guide), see the related articles below.
– I want to run an AI chatbot at home → budget-by-budget guide
– How to start local AI with a used GPU → used-GPU buying guide
– Getting started with AI image generation → ComfyUI intro
The prices and specs in this article are as of April 2026. Prices change daily, so check the latest before buying.
Related
Running a Local AI Chatbot at Home: A Budget-by-Budget Guide
Starting Local AI with a Used GPU: The Value of the RTX 30/40 Series
2026 GPU prices — by VRAM
What matters for local AI is the size of the graphics card’s working space (VRAM — you can run as much model as fits here). I looked up the main cards on sale as of June 2026 on price-comparison sites (kakaku.com, etc.) and grouped them by VRAM size. Prices move by shop and timing, so treat them as ballpark figures (across 2026, prices have trended high overall due to a memory shortage).
| VRAM | Main GPUs | Street price (new, guide) | Where it sits for local AI |
|---|---|---|---|
| 16GB | RX 9060 XT 16GB (AMD) | ~¥58k–64k | Cheapest-class 16GB. Affordable entry. Check software support |
| 16GB | RTX 5060 Ti 16GB | ~¥80k–95k | The new-NVIDIA entry point. Room up to 14B class. CUDA support makes it safe |
| 16GB | RX 9070 XT 16GB (AMD) | ~¥92k–115k | A top seller in 2026. Well-priced and high-performance |
| 16GB | RTX 5070 Ti / 5080 | ~¥150k–210k | Fast, but capacity tops out at 16GB |
| 20GB | RX 7900 XT (previous gen) | Mostly stock/used | A rare 20GB. Previous gen, so new stock is dwindling |
| 24GB | Used RTX 3090 | ~¥150k–200k (used) | The shortest path to 24GB. The real pick that runs 30B class. Spiking on AI demand and memory shortage; used only; check condition |
| 24GB | RX 7900 XTX / used RTX 4090 | Mostly previous gen/used | There’s almost no current-gen 24GB new; previous gen or used is realistic |
| 32GB | RTX 5090 | ~¥690k–720k | 70B class in reach. Scarce, spiking, high power |
For local AI, the two realistic choices in 2026 are: to start affordably new, 16GB (RTX 5060 Ti or the RX 9060/9070 line); to aim for 24GB, a used RTX 3090 (recently spiking). New 24GB has all but vanished from the current generation (even the RTX 5080 is 16GB), and beyond that you jump straight to the 32GB RTX 5090 (~¥700k). Decide first how large a model you want to run, then choose from the VRAM class that fits it and you won’t go wrong. Note that AMD (RX line) support varies by software, so it’s wise to confirm the tools you want to use actually run before buying.





Recent Comments