Running a Local AI Chatbot at Home: A Budget-by-Budget Guide

I run two GPUs in my main PC and use generative AI locally as well. I use ChatGPT and Claude too, but when I have them summarize work documents, I’ve increasingly caught myself wondering, “Is it really OK to send this outside?" And the monthly fees slowly add up.

So I set out to map, budget by budget, just how far you can take an AI chatbot on nothing but a home GPU.

* This article focuses on consumer GPUs that fit in an ordinary desktop PC (NVIDIA GeForce / AMD Radeon series). It doesn’t cover server/data-center GPUs like the NVIDIA A100 or H100 (40–80GB VRAM, ¥1M+). That’s why the VRAM ceiling here stops at 32GB.

Cloud AI vs. local AI

Cloud AI Local AI
Privacy Your conversations are sent to a server Everything stays on your PC. Nothing leaves
Monthly cost ChatGPT Plus ¥3,000/mo / Claude Pro ¥3,000/mo ¥0 (electricity only; ~50–150W while the GPU runs)
Upfront cost ¥0 GPU: ¥60k–400k
Total cost over one year about ¥36,000 GPU + electricity ~¥3,000–6,000/yr
Internet Required Not needed (works offline)
Model smarts The latest models like GPT-4o / Claude 3.5 8B–32B models (depending on your GPU’s VRAM)
Response speed 40–80 tok/s 20–130 tok/s (depending on GPU)

* If you’ll use it for more than a year, even a 16GB GPU (about ¥90k) earns back the monthly cloud-AI fee.

For me the biggest thing is that conversations never leave the machine. Summarizing meeting notes, personal questions — being able to use it without a second thought is local AI’s real strength.

Local AI comes down to VRAM

The thing I really felt after trying local LLMs is that “what you can do" is decided almost entirely by how much VRAM (GPU memory) you have.

What determines a local LLM’s performance (longer bar = bigger impact)

1. VRAM capacity
Sets how large a model you can run (most important)
2. Memory bandwidth
Directly drives how fast text comes out
3. GPU compute
Surprisingly little difference
4. CPU / RAM
Secondary

If you don’t have enough VRAM, you simply can’t run a smart model. Conversely, as long as you have the VRAM, even middling GPU compute runs at practical speed.

What do “27B" and “8B" mean?

In local-AI articles you often see labels like “8B model" or “27B model." The B (Billion) is the model’s parameter count — the “size of its brain," so to speak. Bigger numbers mean a smarter model, but they also eat more GPU memory (VRAM).

Comparing to AIs you already know makes it easier to picture.

Model size Parameters VRAM needed A familiar comparison
2–4B 2–4 billion ~2–4GB About the level of on-phone AI (Apple Intelligence, Gemini Nano). Can summarize text and handle simple exchanges, but weak on anything intricate
8B 8 billion ~5–6GB On par with the free ChatGPT’s lightweight model (GPT-4o mini). Practical for everyday chat and simple questions
14B 14 billion ~10–11GB The line where it starts to surpass the free ChatGPT (GPT-4o mini). Its language gets noticeably more natural. Personally, this is where it becomes genuinely usable
27–32B 27–32 billion ~17–22GB Quality approaching ChatGPT Plus (GPT-4o class). The “wait, this runs locally?" level
70B+ 70 billion+ 45GB+ On par with ChatGPT Plus or better. But it won’t run on a single ordinary GPU

* ChatGPT’s models (GPT-4o, etc.) don’t publish exact parameter counts, so this is a felt comparison based on benchmarks. Even at the same parameter count, quality varies a lot with the quality and volume of training data and with tuning.

The relationship between VRAM and model size is simple. A model’s parameters have to sit in the GPU’s VRAM, and if there isn’t enough, that model won’t run. For example, 8GB of VRAM handles up to an 8B model, 16GB up to 14B, and 24GB up to 32B. In other words, the amount of VRAM = the ceiling on model size = the ceiling on how smart your AI can be.

Here are numbers I measured myself.

GPU Model Generation speed VRAM used
RTX 3090 24GB qwen3.5:27b ★ 25.5 tok/s 18.2GB (split across 2 cards)
RTX 3090 24GB qwen3:8b ★ 126.4 tok/s 10.3GB
RTX 3060 12GB qwen3:8b ★ 60.1 tok/s 5.5GB

★ = author-measured values (RTX 3090 / RTX 3060, April 2026). Others are estimates from the estimation formula.

My PC has an RTX 3090 and an RTX 3060 in it. On the RTX 3090 (24GB), an 8B model screams along at 126 tok/s. Even the RTX 3060 (12GB) runs an 8B comfortably at 60 tok/s. A 27B model slows down on the 3090 alone for lack of VRAM, but split across two cards it runs practically at 25.5 tok/s. The VRAM gap maps directly onto “how smart a model you can use."

When choosing a GPU, put “how much VRAM does it have" first.

Here’s a table of what you can run and how it performs, by VRAM.

VRAM Runnable models Typical models Speed (approx.) GPU price range
8GB 8B Qwen 3 8B, Llama 3.1 8B, Gemma 3 4B 60–130 tok/s ¥60k–70k
12GB 8B–12B Gemma 3 12B, Qwen 3 8B (with room) 35–130 tok/s ¥50k–80k
16GB 14B Qwen 3 14B, DeepSeek-R1 14B, Gemma 3 12B 23–72 tok/s ¥80k–160k
24GB 32B Qwen 3 32B, Gemma 3 27B, DeepSeek-R1 32B 20–35 tok/s ¥180k–250k
32GB 32B + long context Qwen 3 32B (32K context) 50–60 tok/s ¥400k+

★ = author-measured values (RTX 3090 / RTX 3060, April 2026). Others are estimates from the estimation formula.

How to read this table: as VRAM climbs 8GB → 16GB → 24GB, the size (= smarts) of the models you can run steps up. If you want practical everyday quality, 16GB (a 14B model) is the minimum line.

Measured: generation speed by model

How to read this chart: a longer bar means faster generation (= more comfortable). gemma4 is the fastest, but for output quality qwen3.5:27b is the best. Speed and smarts are a trade-off.

[kimono_bar title="" unit="tok/s" color="#1e90ff"]
qwen3.5:27b (3090+3060)|26
qwen3.5:9b (3060)|98.8
qwen3:8b (3090)|127
gemma4:9b (3090)|133
[/kimono_bar]

* Test setup: RTX 3090 (24GB) + RTX 3060 12GB / Linux / Ollama / measured April 2026. The 27b model used a 2-GPU split load.

How much VRAM do you need?

What you want to do VRAM needed Model Speed
Just try out AI 8GB 8B (uses 5–6GB) 60–130 tok/s
Use it for practical everyday work 16GB 14B (uses 10–11GB) 23–72 tok/s
Rely on it seriously for work 24GB 32B (uses 22GB) 20–35 tok/s
The works (AI + VR + image gen) 32GB 32B + long context 50–60 tok/s

* tok/s = tokens generated per second. At 20 tok/s it’s “a slight wait, but readable"; at 40+ tok/s it “comes back instantly."

By GPU brand: which is easiest to get running?

After VRAM, the next thing that matters is “will it actually run on that GPU?" The amount of setup effort varies quite a bit by GPU brand.

GPU brand Setup Windows Mac Linux
NVIDIA (CUDA) Just install the driver
AMD Good on Linux. On Windows, AMD’s AI compute stack (ROCm) is still incomplete, so setup takes effort
Apple Silicon Just install Ollama. Shared memory lets you run large models too
Intel (iGPU) Limited support, and on the slow side

The easiest are NVIDIA (Windows/Linux) and Apple Silicon (Mac).

If you’re on Windows or Linux like me, you can’t go wrong choosing an NVIDIA GPU. Just install the driver and Ollama auto-detects it.

AMD’s appeal is that you can buy the same VRAM cheaper than NVIDIA, but on Windows the software stack for AI (ROCm) is still incomplete and takes fiddling to set up. It’s not yet “install the driver and it works" the way NVIDIA’s CUDA is. If you’re prepared to run Linux, the value for money is unbeatable.

* ROCm = AMD’s software stack for running AI on its GPUs, equivalent to CUDA on NVIDIA. NVIDIA’s CUDA has years of proven stability, while AMD’s ROCm is still maturing and support is limited, especially on Windows.

For Mac users, Apple Silicon’s unified memory is a surprising strength. With 24GB or more, you can run 32B-class models. Speed lags a dedicated NVIDIA GPU, but “a 32B running on a laptop" is a pretty interesting experience.

Getting started: pick from four apps

There are several apps for running local LLMs. I use Ollama, but the best choice is whatever suits you.

Local LLM apps compared

App What it’s like Best for OS
LM Studio Everything from model search to chat in a GUI. The most approachable First-timers Win/Mac/Linux
Ollama + Open WebUI Set up from the command line; add a browser UI with Open WebUI People who want to build their own setup Win/Mac/Linux
Jan Privacy-focused. A self-contained desktop app People who want it simple Win/Mac/Linux
GPT4All Lightweight. Few settings, so nothing to get lost in People who just want a quick try Win/Mac/Linux

My personal take: LM Studio to start, Ollama + Open WebUI once you’re in deep.

With LM Studio, you can search, download, and chat with a model right after installing, so if you’re not used to the terminal it’s the easier way in.

I chose Ollama for the nimbleness of switching between models from the command line and for its extensibility, which suit my taste. Day to day, I chat with it from a terminal app.

How to get started with Ollama (for reference)

  1. Download the installer from ollama.com
  2. Install it (Windows / Mac / Linux)
  3. Type ollama run qwen3:8b in the terminal
  4. Chat begins

On my setup, it auto-detected the GPU right after install and just worked. I never had to fuss with detailed settings.

On my machine (RTX 3090), qwen3:8b generates at about 126 tok/s. It feels like “the reply starts the instant I hit enter." On the RTX 3060 it’s 60 tok/s — the bandwidth gap shows up directly as speed, but it still feels plenty comfortable.

What changes across Windows, Mac, and Linux?

The experience differs quite a bit by OS, so let me lay it out.

OS Pros Cons Best for
Windows With NVIDIA, setup is the easiest. Plenty of GUI apps like LM Studio too Slightly more VRAM overhead than Linux. AMD GPUs take effort to set up NVIDIA GPU owners who want an easy start
Mac Apple Silicon’s unified memory runs large models. Power-efficient Slower generation than a dedicated GPU. Pricey hardware People whose main machine is a Mac; people who want portability
Linux The most memory-efficient. AMD’s AI stack (ROCm) runs stably on Linux too. Easy to run with Docker Requires technical know-how to set up AMD GPU owners; people who want to run it server-style

For beginners or first-timers, my suggestions are:

Windows users → NVIDIA GPU

Mac users → lean on Apple Silicon

Linux users → AMD GPUs come into play too

That’s roughly how it shakes out.

I run mine on Linux with an RTX 3090 + RTX 3060 in tandem. I can run Ollama (chat AI) on one and ComfyUI (image generation) on the other at the same time, and I’m quite fond of this setup.

The used-GPU option

New isn’t the only option. My RTX 3090 was bought at launch for about ¥300k at list price; my secondary RTX 3060 12GB was about ¥40k used.

The two best values on the used market are:

GPU VRAM Used price (shops) Runs Notes
RTX 3060 12GB 12GB ¥20k–35k 8B models Cheapest entry point. 12GB for around ¥20k
RTX 4060 Ti 16GB 16GB ¥70k–100k 14B models A hidden gem. 16GB at half the new price
RTX 3090 24GB 24GB ¥130k–200k 32B models Staying high on AI demand

Note: the RTX 30 series is a generation where many cards were run hard during the mining boom. That said, the RTX 3060 12GB shipped with a mining limiter (LHR) from the start, and its 12GB of VRAM wasn’t needed for mining, so heavily-abused units are relatively rare. The RTX 3080/3090, by contrast, were popular for mining, so take more care. I’d recommend buying from a used shop with a warranty.

By budget: what your GPU can do

From here I’ll break down, by concrete budget tier, which GPU runs what. As noted above, the top criterion is “how many GB of VRAM," and the next is “is it NVIDIA?" I’ve organized this around new-card prices; if you’re also considering used, see the comparison table above.

¥60k–70k tier (RTX 5060 / RTX 5060 Ti 8GB)

[kimono_product id="15770″]

What you can do with 8GB of VRAM:

Task Doable? How it feels
Everyday Q&A (weather, cooking, small talk) Plenty practical
Simple coding help OK for short snippets
Proofreading text Decent even on an 8B
Summarizing long text (papers, minutes) Short context (2K–4K tokens)
Complex reasoning / analysis The limit of an 8B model
Translation OK for simple sentences

Runnable models:

Model VRAM used Speed (approx.) Quality
Qwen 3 8B ~5.2GB 65 tok/s Decent
Llama 3.1 8B ~6.2GB 56 tok/s Better in English
Gemma 3 4B ~3.6GB 112 tok/s Basic

★ = author-measured (RTX 3090 / RTX 3060, April 2026). Others are estimates from the estimation formula, using the RTX 5060 Ti 8GB (448 GB/s) as the representative GPU.

Enough to experience “so this is what AI is like." But quality is “so-so," and it tends to lose the thread in long conversations. Ideal as a “try it out," but too shaky to rely on for work.

Value: ★★★☆☆ (fine for trying it out)

[kimono_product id="15770″]

¥90k–110k tier (RTX 5060 Ti 16GB / RX 9070)

[kimono_product id="15760″]

What you can do with 16GB of VRAM:

Task Doable? How it feels
Everyday Q&A Comfortable
Coding help (moderate) Practical at the function level
Proofreading and rewriting A 14B’s language is quite good
Summarizing long text Up to 8K–16K tokens
Drafting emails Practical
Technical Q&A As deep as a 14B gets
Drafting fiction or blog posts Usable as a first draft

Runnable models:

Model VRAM used Speed (approx.) Quality Notes
Qwen 3 14B ~10.7GB 36–72 tok/s Good A notch better in language. Personally, “usable" starts here
Gemma 3 12B ~12.4GB 27–54 tok/s Good Google’s 12B. A balanced pick
DeepSeek-R1-Distill 14B ~11GB 31–61 tok/s Fairly good Strong at reasoning (thinks before answering)

★ = author-measured (RTX 3090 / RTX 3060, April 2026). Others are estimates from the estimation formula, estimated across the bandwidth range from RTX 5060 Ti 16GB (448 GB/s) to RTX 5070 Ti (896 GB/s).

This is the “entrance to practical use." A 14B is clearly smarter than an 8B by feel — naturalness of language, grasp of the question, and accuracy of summaries are on another level. This is the line where you start thinking, “maybe I can drop the paid ChatGPT subscription and get by with this."

That said, the RTX 5060 Ti 16GB has a 128-bit bus, so token generation is slower than higher-end GPUs. Think of it as “a smart friend who talks a little slowly."

The AMD RX 9070 (16GB / about ¥80k) is the cheapest per gigabyte of VRAM, but AMD’s AI stack isn’t as mature as NVIDIA’s. On Windows, setup can take an extra step.

Value: ★★★★☆ (the best-balanced entry to practical use)

[kimono_product id="15760″]

¥160k tier (RTX 5070 Ti 16GB)

[kimono_product id="15762″]

What you can do with 16GB of VRAM (fast):

You can do the same things as the 16GB tier, but the speed is different.

Comparison RTX 5060 Ti 16GB RTX 5070 Ti 16GB
Qwen 3 14B speed ~23 tok/s ~72 tok/s
How it feels “A slight wait" “Comes back instantly"
Doubling as AI image gen A bit slow Comfortable
Doubling as VR Entry level Comfortable

The most comfortable of the 16GB options. If you also want VR or AI image generation, the ¥60k premium over the 5060 Ti is well worth it. “Overkill for local AI alone, ideal if you’re doubling up with other uses."

Value: ★★★★☆ (best if you’re doubling up)

[kimono_product id="15762″]

¥120k–300k tier (RX 7900 XTX 24GB / RTX 5080 16GB)

[kimono_product id="15771″]
[kimono_product id="15763″]

This is where “serious local AI" begins.

What you can do with 24GB of VRAM (RX 7900 XTX):

Task Doable? How it feels
Everything above Comfortable
32B models (Qwen 3 32B, etc.) Surprisingly “smarter than expected"
Analyzing / summarizing long text 16K–32K tokens is practical
Cross-document analysis Doable, but slower
Coding help (whole files) A 32B’s code comprehension is high
Specialized Q&A Solid accuracy on medicine, law, tech, and more

Runnable models:

Model VRAM used Speed (approx.) Quality Notes
Qwen 3 32B ~22.2GB 32 tok/s Very good The “this runs locally?" level
Gemma 3 27B ~22.5GB 41 tok/s Very good Google’s large model
DeepSeek-R1-Distill 32B ~22GB 32 tok/s Good Deep reasoning chains

★ = author-measured (RTX 3090 / RTX 3060, April 2026). Others are estimates from the estimation formula, using RTX 3090 (936 GB/s) bandwidth for the estimate.

A 32B model changes everything. Up to 14B it was “AI-ish, but, well, about what you’d expect"; a 32B brings the “wait, this is running locally?" surprise. Language quality, reasoning depth, and context retention are on another level.

The RX 7900 XTX (24GB / about ¥120k–150k) blows past NVIDIA on price per gigabyte of VRAM, but running AI stably calls for a Linux environment. On Windows, be ready for some configuration.

The RTX 5080 (16GB / about ¥190k–300k) is top-class in speed, but with only 16GB of VRAM it can’t run 32B models. “A fast 14B" or “a VRAM-rich 32B" — this is the biggest fork in the road.

Value: ★★★★★ (the best value tier if you’re serious about local AI)

[kimono_product id="15763″]

[kimono_product id="15771″]

¥400k–610k tier (RTX 5090 32GB)

[kimono_product id="15772″]

What you can do with 32GB of VRAM:

Run 32B models comfortably at very long context (32K+ tokens). Even 32GB isn’t enough for 70B models (which need 45GB+).

Overkill to buy purely for local AI. It’s for the “the works" crowd who want to do VR (120Hz max settings) + AI image generation (FLUX Dev) + local LLM (32B) all on one card.

Value: ★★☆☆☆ (makes sense for the works, too expensive for AI alone)

[kimono_product id="15772″]

Value-for-money charts by GPU

Local LLM value ranking

How to read this chart: a longer bar means higher performance for the price — better value. The value metric is “practical performance score ÷ price (in ¥10k units)."

[kimono_bar title="" color="#1e90ff"]
RTX 5090 32GB [New]|2.4
RTX 4090 24GB [Used]|2.9
RX 7900XTX 24GB [New]|3.3
RTX 5080 16GB [New]|3.5
RTX 4080S 16GB [Used]|4.2
RTX 5070Ti 16GB [New]|4.5
RTX 5060Ti 16GB [New]|4.7
RTX 4070TiS 16GB [Used]|5
RX 9070 16GB [New]|5
RTX 4060Ti 16GB [Used]|3.8
RTX 4070S 12GB [Used]|5.6
RTX 5060Ti 8GB [New]|5.7
RTX 5060 8GB [New]|5.8
RTX 3090 24GB [Used]|5.8
RTX 5070 12GB [New]|6
RTX 3080 12GB [Used]|7.5
RTX 3060 12GB [Used]|8
[/kimono_bar]

* How the practical performance score (out of 100) is computed: 50% for the ceiling model size you can run (VRAM-dependent), 30% for generation speed (bandwidth-dependent), and 20% for context-length headroom (VRAM-headroom-dependent), weighted and summed. Dividing that score by GPU price (in ¥10k units) gives the value metric. The bigger the number, the more performance per ¥10k.

What the chart tells us

  1. The RX 9070 (16GB / ¥80k) is the best value. But AMD’s AI stack (ROCm) is Linux-recommended, and on Windows setup takes effort
  2. Among NVIDIA cards, the RTX 5060 Ti 16GB (¥90k–110k) is the value champion. Getting 16GB of VRAM for about ¥90k–110k is the cheapest line for running a 14B model practically
  3. The RTX 5080 (¥190k–300k) and RTX 5090 (¥400k–610k) are poor value. Performance is high but so is the price, so the metric comes out low. They’re for people with budget to spare, or who double up with non-AI uses (VR, gaming)
  4. The RTX 5090 (¥400k–610k) is for “the works." Too expensive to buy for LLMs alone, but it makes sense if you’re combining VR + image gen + LLM

Model value ranking (quality per VRAM)

How to read this chart: a longer bar means “higher quality for less VRAM" — better value. The metric is “quality (5-point scale) ÷ required VRAM (GB) × 10." The ★ 14B models (Qwen 3 14B, DeepSeek-R1 14B) are the practical line. Models above them have higher quality but need more VRAM, so their value metric drops.

[kimono_bar title="" color="#1e90ff"]
Gemma 3 27B (22.5GB)|2
Qwen 3 32B (22.2GB)|2
Gemma 3 12B (12.4GB)|2.8
★ DeepSeek-R1 14B (11.0GB)|3.2
★ Qwen 3 14B (10.7GB)|3.7
Llama 3.1 8B (6.2GB)|4
Gemma 3 4B (3.6GB)|5.6
Qwen 3 8B (5.2GB)|5.8
[/kimono_bar]

* The 14B models need about 10–11GB of VRAM and score 4.0/5.0 on quality. Models of 8B and under need less VRAM, so their value metric looks high, but their quality is “so-so." Consider the absolute quality level, not just the value metric. Personally, I feel 14B and up is the minimum line for practical quality.

Tokens and text length, roughly

Throughout the article I use the unit “tok/s" (tokens per second). A token is a chunk of text — very roughly, on the order of a word or a few characters. Either way, at 126 tok/s the text comes down far faster than anyone can read.

Value quick-reference

Budget GPU Runs Quality Recommendation
¥60k–70k RTX 5060 Ti 8GB 8B So-so Try it out
¥90k RTX 5060 Ti 16GB 14B Good Best entry
¥80k RX 9070 16GB 14B Good For Linux users
¥160k RTX 5070 Ti 16GB 14B (fast) Good Best for doubling up
¥180k RX 7900 XTX 24GB 32B Very good For serious AI
¥200k RTX 5080 16GB 14B (fastest) Good Speed-focused
¥400k+ RTX 5090 32GB 32B (with room) Very good The works

So which should you actually buy?

If you just want to try it: install LM Studio or Ollama on the PC you already have. It runs on CPU and main memory alone, even without a GPU. Speed drops to roughly a tenth to a twentieth of a GPU, but it’s fast enough to read along as the text appears. That’s plenty to experience “so this is local AI." If it makes you want “faster, smarter models," then look at a GPU — there’s no rush; that order works fine.

* For reference: running an 8B model CPU-only (no GPU) on my PC gave about 8 tok/s (AMD Ryzen 9 3950X / 64GB DDR4 / a CPU released in 2019). That’s about 1/16 of the 126 tok/s with a GPU. In CPU mode the model loads into main memory, so it won’t run if you don’t have enough. An 8B model uses about 5–6GB, so with OS overhead you want at least 16GB of RAM, ideally 32GB+. My PC has a generous 64GB, so there was room to spare, but 8GB PCs may struggle. CPU inference speed also depends on memory bandwidth, so a newer PC with DDR5 should be a bit faster.

Runs even when VRAM is short: partial offload

LLMs have a trait that image-generation AI doesn’t. Image generation (Stable Diffusion, etc.) needs the whole model in VRAM to run, but an LLM can put just part of the model on the GPU and keep the rest in main memory — “partial offload."

For example, you can run a 27B model (which normally needs 18GB+ of VRAM) on a single 12GB GPU. It’s slower, but “better than not running at all" is an option you have.

Load method On GPU VRAM used Speed
All layers on GPU (2-card split) All 64 layers 26.2GB 25.5 tok/s
30 GPU layers + main memory 30 of 64 layers 11.8GB 2.9 tok/s
15 GPU layers + main memory 15 of 64 layers 7.0GB 2.1 tok/s
CPU only 0 layers 0GB 1.7 tok/s

* Measured on qwen3.5:27b. GPU: RTX 3090 + RTX 3060 / CPU: Ryzen 9 3950X / 64GB DDR4. Measured April 2026.

Getting even half onto the GPU makes it faster than CPU-only (1.7 tok/s) — 2–3 tok/s. But compared with the whole thing on the GPU (25.5 tok/s), it drops to about a tenth, so it’s hard to call comfortable.

Because of this, even when your VRAM is “just barely short," you don’t have to give up on a model over its size. If you can tolerate the speed, you can attempt models beyond your VRAM ceiling. Ollama automatically pushes whatever doesn’t fit in VRAM to main memory, so no special configuration is needed.

To start with a 14B model: the RTX 5060 Ti 16GB (about ¥90k–110k). It’s the most affordable 16GB NVIDIA in this range. But it’s in short supply and prices are climbing, so grab one while stock lasts.

[kimono_product id="15760″]

To run a 32B model: the RX 7900 XTX (24GB / about ¥120k+) is the most realistic on price. Getting 24GB of VRAM for ¥120k is a steal as of April 2026. Note that AMD’s AI stack (ROCm) is Linux-recommended. If you want Windows, look at NVIDIA’s RTX 5080 (16GB / about ¥190k+) or a used RTX 3090 (24GB / ¥130k–200k).

Related articles

Next steps

Once you’re running local AI, here’s what else you can do:

  • AI image generation: run ComfyUI on the same GPU to make images from text
  • Voice AI: transcribe with Whisper, read aloud with TTS
  • Coding help: use a local LLM as a Copilot replacement via the Continue extension in VS Code
  • Combine with VR: talk to AI avatars, put AI to work inside VR spaces

As a foundation for “bridging the virtual world and reality," a home GPU is the most versatile investment there is.

The specs and prices in this article are as of April 2026.