llm-box-uk
/home/megaproxy/claude/research/llm-box-uk
572 events
155 tool calls
9.4M tokens
$12.15 notional cost
6/16/2026, 9:55:22 PM
→
12d agoin 34.4K out 146.6K cache-write 748.2K cache-read 8.4M
Transcript
572 turns · chronological (UTC timestamps shown local)-
i want to investigate framework desktop more
-
Let me get oriented in the project first — reading the memory and seeing what's already in the catalogue about the Framework Desktop (if anything).
-
- Read /home/megaproxy/claude/research/llm-box-uk/memory.md 14ms 7.7 KB
-
- Bash grep -n -i "framework\|strix\|halo\|ryzen ai\|395\|unified" index.html | head -60 144ms 5.4 KB
-
Good — I'm oriented. The Framework Desktop currently gets only a one-line mention in Build L ("same chip if you prefer that brand"), since it's the same Ryzen AI Max+ 395 (Strix Halo) silicon as the GMKtec EVO-X2. Let me dig into it properly with live data. -
- ToolSearch select:WebSearch,WebFetch 3ms 100 B
-
Now let me research the Framework Desktop in parallel — specs/pricing, LLM-specific benchmarks, and how it compares to the GMKtec EVO-X2.
-
- WebSearch Framework Desktop Ryzen AI Max+ 395 128GB price UK 2026 9.9s 3.0 KB
-
- WebSearch Framework Desktop vs GMKtec EVO-X2 Strix Halo LLM comparison 11.6s 3.5 KB
-
- WebSearch Framework Desktop 128GB LLM inference benchmark tokens per second llama.cpp 8.5s 3.1 KB
-
Let me pull the official UK price and concrete token/s numbers.
-
- WebFetch https://frame.work/gb/en/desktop 8.1s 804 B
-
- WebFetch https://community.frame.work/t/llm-performance/66708 7.9s 1.3 KB
-
The community thread fetch returned stale projections. Let me get real-world Strix Halo numbers and the actual UK price.
-
- WebSearch Ryzen AI Max+ 395 Strix Halo gpt-oss-120b Llama 70B Qwen3 tokens per second real benchmark 2026 10.0s 3.3 KB
-
- WebFetch https://www.jeffgeerling.com/blog/2025/i-clustered-four-framework-mainboards-test-huge-llms/ 1.4s 215 B
-
- WebFetch https://przbadu.github.io/strix-halo-benchmarks/ 4.7s 891 B
-
- WebFetch https://frame.work/gb/en/products/desktop-diy-amd-aimax300/configuration/new 5.5s 769 B
-
The £3,449 figure looks inflated (US 128GB DIY is $1,999 ≈ ~£1,880 inc. VAT) — the JS config page didn't parse cleanly. Let me verify the real UK price and grab solid token/s numbers.
-
- WebSearch "Framework Desktop" 128GB price £ GBP UK pre-order shipping 7.1s 2.8 KB
-
- WebSearch Strix Halo gpt-oss-120b 40 tokens per second llama.cpp Vulkan ROCm benchmark 7.8s 2.5 KB
-
Let me grab the detailed per-model numbers from the optimization writeup, and confirm the UK price detail from the Sync Computers review.
-
- WebFetch https://www.hardware-corner.net/strix-halo-llm-optimization/ 9.8s 972 B
-
- WebFetch https://synccomputers.co.uk/framework-desktop-2025-review/ 1.4s 215 B
-
I've got a solid picture now. Here's the investigation — framed against where Framework Desktop fits relative to **Build L (GMKtec EVO-X2)**, since they're the *same silicon*. ## Framework Desktop — investigation findings ### It's the same chip as Build L Both the Framework Desktop and the GMKtec EVO-X2 run the **AMD Ryzen AI Max+ 395 "Strix Halo"** (16× Zen 5, 40 RDNA 3.5 CUs, ~256 GB/s unified LPDDR5x, 128 GB soldered). So **LLM token/s is effectively identical** between them — the differentiators are price, thermals, form factor, and ecosystem, *not* speed. ### Real-world performance (Strix Halo, 128 GB, llama.cpp) Token-generation is memory-bandwidth-bound, so 256 GB/s is the ceiling: | Model | Type | Token-gen | Notes | |---|---|---|---| | gpt-oss-120B (MXFP4) | MoE (~5B active) | **~40–55 t/s** | The killer app — genuinely fast | | Qwen3-30B (Q6) | MoE | ~30–50 t/s | Strong | | Llama-3.3-70B (Q8) | **dense** | **~4–5 t/s** | Slow — bandwidth wall | | Qwen3-235B (Q3) | MoE | ~10–15 t/s | Fits in 128 GB, usable | Prompt-processing is much healthier (Llama-70B PP ~100 t/s, gpt-oss-120B PP ~780 t/s) once you enable ROCm + hipBLASLt (`ROCBLAS_USE_HIPBLASLT=1` was the single biggest speedup in the optimization writeups). **Sweet spot = MoE + mid-size dense**, exactly as Build L already says. Dense 70B is a crawl. ### Where Framework genuinely beats the GMKtec ServeTheHome / Notebookcheck measured both: Framework draws **~144 W vs ~186 W** under the same load for near-identical performance — **~40 W less, quieter fans, cooler cores, and an internal PSU** (the GMKtec uses a 230 W external brick, and one reviewer returned the EVO-X2 as too noisy). Framework also sells the **mainboard standalone** — AMD and Jeff Geerling have **clustered four of them** to run trillion-parameter MoE models, which the sealed GMKtec can't do. Plus the repairability/ethos angle. ### The catch: UK pricing is brutal - US 128 GB DIY: **$1,999** (~£1,880 inc. VAT equivalent) - UK 128 GB: **~£2,159** now — bumped up from £1,699 due to the late-2025/2026 RAM-shortage price hikes. Reviewers literally headline it "absurd UK pricing." - vs **Build L's GMKtec EVO-X2 at £1,619.99** (128 GB + 2 TB, complete). So in the UK the Framework Desktop is **~£540 dearer for the same chip and same 128 GB**, and its main advantages — quiet + cool — are **exactly the things your garage/rack location makes irrelevant.** Note also Strix Halo RAM is **soldered on both**, so Framework's repairability story doesn't extend to memory. ### Verdict for *this* project (max capacity/£, light fine-tuning, garage) For a garage box maximizing model-size-per-pound, **the GMKtec (Build L) wins** — same speed, £540 cheaper, noise doesn't matter to you. Framework Desktop earns its premium only if you want **(a)** the build quality / efficiency / resale / repairable-PC ethos, or **(b)** the **standalone-mainboard clustering play** for trillion-param MoE. Fine-tuning is inference-first/painful either way (gfx1151 ROCm maturing). --- Want me to **add it to `index.html`** — either as a sub-variant note under Build L, or as a full Build N card (its own BOM line + the clustering angle in the verdict table)? I'd lean toward a tight "Build L variant" callout rather than a full row, since the specs are identical, but I'll do a full card if you'd rather it stand alone. Sources: [Notebookcheck efficiency](https://www.notebookcheck.net/Framework-Desktop-runs-more-efficiently-than-the-GMK-EVO-X2-with-the-exact-same-Ryzen-AI-Max-395-processor.1116958.0.html) · [Sync Computers UK review](https://synccomputers.co.uk/framework-desktop-2025-review/) · [PCGamer price hike](https://www.pcgamer.com/hardware/memory/framework-held-off-as-long-as-we-could-but-just-cranked-up-prices-of-its-desktops-and-mainboards/) · [hardware-corner Strix Halo benchmarks](https://www.hardware-corner.net/strix-halo-llm-optimization/) · [Jeff Geerling 4-board cluster](https://www.jeffgeerling.com/blog/2025/i-clustered-four-framework-mainboards-test-huge-llms/) · [AMD trillion-param cluster](https://www.amd.com/en/developer/resources/technical-articles/2026/how-to-run-a-one-trillion-parameter-llm-locally-an-amd.html)
-
qlink me gmktec one
-
[GMKtec EVO-X2 — Ryzen AI Max+ 395, 128GB + 2TB, £1,619.99 (gmktec.uk)](https://www.gmktec.uk/products/gmktec-evo-x2-amd-ryzen%e2%84%a2-ai-max-395-ai-mini-pc) That's the Build L machine — £540 cheaper than the Framework Desktop for the same chip and same 128 GB.
-
go and check page for latest prices, i think youa re out
-
- WebFetch https://www.gmktec.uk/products/gmktec-evo-x2-amd-ryzen%e2%84%a2-ai-max-395-ai-mini-pc 5.5s 757 B
-
Checked the live page — you're actually still current. The price held at **£1,619.99** for the 128GB + 2TB config, **in stock**. Two things worth noting from the page: - ⚠️ A banner warns the **EVO-X2 is expected to rise ~£300 soon** ("order now to secure the current price while stock lasts") — so £1,619.99 may not last; budget for ~£1,920 if it bumps. - Anniversary-sale freebie: a wireless keyboard + mouse combo bundled in. So the Build L price in `index.html` (£1,620) is still accurate as of today. If that £300 hike lands, the Framework Desktop gap (£2,159) shrinks from ~£540 to **~£240** — at which point Framework's quieter/cooler/clustering advantages get a lot more interesting even before the garage caveat. Want me to add that "price-hike imminent" note to Build L in the catalogue so the next re-pricing pass has it flagged? -
you are looking at 64gb with 1tb ssd for that price
-
You're right — the JS config selector defaulted to the cheapest variant and WebFetch read that as the headline. Let me find the actual 128GB/2TB price.
-
- WebSearch GMKtec EVO-X2 128GB 2TB price gmktec.uk GBP 10.9s 2.5 KB
-
Good catch — that changes the picture materially. The £1,619.99 is the **64GB / 1TB** base. The **128GB / 2TB** config is **~£2,099** (UK review units and Amazon UK both land there; mwgamers literally titles their review *"the £2,099 Mini PC that runs 70B models"*). So the memory.md / Build L figure of "£1,620 for 128GB+2TB complete" looks **wrong** — that was the base-config price misattributed to the 128GB build. Corrected lineup: | Box | Chip | RAM/SSD | UK price | |---|---|---|---| | GMKtec EVO-X2 (base) | AI Max+ 395 | 64GB / 1TB | £1,619.99 | | **GMKtec EVO-X2** | AI Max+ 395 | **128GB / 2TB** | **~£2,099** | | **Framework Desktop** | AI Max+ 395 | **128GB** (DIY, no SSD) | **~£2,159** | **This collapses the whole "GMKtec wins on value" verdict from earlier.** At 128GB the two are at **near-parity (~£60 apart)** — and the GMKtec even *includes* a 2TB SSD where the Framework DIY is barebones, so once you add storage the **Framework is arguably the better-built box at the same money** (quieter, ~40W cooler, internal PSU, standalone-mainboard clustering). The garage caveat still slightly favors GMKtec since you don't care about noise, but it's now a genuine toss-up rather than a £540 GMKtec win. That £540 gap I quoted earlier was based on the bad £1,620 figure — apologies, you were right to push. Want me to: 1. **Fix Build L's price in `index.html` + memory.md** to £2,099 (128GB) and add the 64GB/£1,620 as a sub-note, and 2. **Rewrite the Framework-vs-GMKtec verdict** to reflect parity? I'd also bump the re-priced banner date. Should I go ahead?
-
hat kind of models can i run on the 64gb one, the 98gb and the 128gb give me a tabel
-
All three are the same chip and the same ~256 GB/s bandwidth, so **speed is identical** — the only thing that changes between 64/96/128 GB is **how big a model fits**. Rough usable-for-weights budget (Linux, leaving headroom for OS + context): **~50 GB / ~80 GB / ~115 GB**. | Model | Type | Footprint (GGUF) | 64 GB | 96 GB | 128 GB | Speed (token-gen) | |---|---|---|---|---|---|---| | Llama-3.1 **8B** | dense | ~5 GB Q4 / ~9 GB Q8 | ✅ | ✅ | ✅ | ~30–45 t/s | | Qwen3 **14B** | dense | ~9 GB Q4 | ✅ | ✅ | ✅ | ~20–28 t/s | | **gpt-oss-20B** | MoE (3.6B act) | ~13 GB MXFP4 | ✅ | ✅ | ✅ | **~50–70 t/s** | | Qwen3 **30B-A3B** | MoE (3B act) | ~18 GB Q4 / ~33 GB Q8 | ✅ | ✅ | ✅ | **~50–70 t/s** | | Qwen3 / Mistral **32B** | dense | ~20 GB Q4 / ~35 GB Q8 | ✅ Q4 | ✅ | ✅ | ~10–13 t/s | | **Mixtral 8×7B** (47B) | MoE (13B act) | ~26 GB Q4 | ✅ | ✅ | ✅ | ~20–28 t/s | | **Llama-3.3 70B** | dense | ~42 GB Q4 / ~75 GB Q8 | ⚠️ Q4 only, tiny ctx | ✅ Q4 / ⚠️ Q8 | ✅ Q8 | **~4–6 t/s** (slow) | | **gpt-oss-120B** | MoE (5.1B act) | ~63 GB MXFP4 | ❌ | ✅ (tight) | ✅ | **~40–55 t/s** ⭐ | | **Qwen3 235B-A22B** | MoE (22B act) | ~103 GB Q3 / ~130 GB Q4 | ❌ | ❌ | ✅ Q3 (tight) | ~10–15 t/s | | Llama 405B / DeepSeek-V3 671B | dense / huge MoE | ~140 GB+ even at Q2 | ❌ | ❌ | ❌ | — | **Headline per box:** - **64 GB (£1,620)** — comfortable up to **32B dense** and the small MoEs (gpt-oss-20B, Qwen3-30B). 70B only just squeezes in at Q4 with tiny context. Best value if you don't need the giant MoEs. - **96 GB (~£1,800?)** — unlocks **70B dense at Q8** and, crucially, **gpt-oss-120B** — the standout model for this hardware (~40–55 t/s, genuinely usable). - **128 GB (~£2,099)** — same as 96 GB but with headroom for **Qwen3-235B-A22B at Q3** and 70B-Q8 + long context simultaneously. Only worth the premium if the 235B-class MoEs or big context are the goal. **Key takeaways:** 1. **MoE models are the whole point of this box** — gpt-oss-120B and Qwen3-235B run at conversational speed because only ~5B/~22B params are active per token. Dense 70B technically fits but crawls at 4–6 t/s. 2. The jump that *matters* is **64 → 96 GB** (it's what enables gpt-oss-120B). The **96 → 128 GB** jump only buys you the 235B-class MoEs. Footprints assume moderate context; long context (32k+) eats into the budget via KV cache and can push the ⚠️ cases over the edge. Want me to fold this 3-variant table into Build L in `index.html`?
-
so is 96gb a good choice?
-
Let me confirm the 96 GB price before I give you a straight answer — I was guessing earlier.
-
- WebSearch GMKtec EVO-X2 96GB 2TB price UK gmktec.uk GBP 8.8s 2.7 KB
-
Honest answer: **96 GB is the most *capable*-per-need tier but the *worst-value* one** — it's caught in a squeeze. Pricing (gmktec.uk official; ignore the £2,599 Amazon marketplace markup): | Variant | Price | What it unlocks | |---|---|---| | 64 GB / 1TB | **£1,620** | Everything up to 32B dense + small MoEs (gpt-oss-20B, Qwen3-30B). **No** gpt-oss-120B. | | 96 GB | **~£1,850–1,950** | Adds **gpt-oss-120B** + 70B-Q8 — but *tight* (63 GB model + KV cache at long context gets snug in ~80 GB usable). | | 128 GB / 2TB | **£2,099** | gpt-oss-120B with headroom, **Qwen3-235B at Q3**, 2TB SSD included. | The problem: 96 GB costs ~£230–330 more than 64 GB, but sits only **~£150–250 below 128 GB**. So: - If you **don't need gpt-oss-120B** → the **64 GB** is the value pick. Don't pay the 96 GB premium for capacity you won't use. - If you **do want gpt-oss-120B** (the standout model for this box) → just go **128 GB**. For ~£150 more than 96 GB you get comfortable headroom for the 120B model + long context, the 235B-class MoEs, and a 2TB SSD thrown in. 96 GB runs 120B but with no room to breathe. **So: 96 GB only makes sense if its real price comes in well under £1,800 *and* you specifically want gpt-oss-120B but never the 235B class.** That's a narrow window. For most people it's **64 GB to save money, or 128 GB to actually use the big MoEs** — the 96 GB middle gets squeezed out. One caveat on my numbers: the exact 96 GB official GBP price I couldn't pin down cleanly (Amazon listings are marketplace-inflated). If you click through to it on gmktec.uk and it's genuinely ~£1,750 or less, the calculus shifts and 96 GB becomes a reasonable "120B on a budget" pick. Want me to verify the live 96 GB price the same way we did the others?
-
128 is £2,659.99 96 is £1,899.99
-
That flips things again — thanks. With the real live prices the gaps are completely different from what I'd assumed: | Variant | Price | Gap | |---|---|---| | GMKtec 64 GB / 1TB | £1,619.99 | — | | **GMKtec 96 GB** | **£1,899.99** | +£280 over 64 GB | | GMKtec 128 GB / 2TB | **£2,659.99** | **+£760 over 96 GB** | That **£760 jump from 96 → 128 GB changes my answer: yes, 96 GB is now the smart pick** for most LLM use. Here's the reasoning, reversed from before: - 96 GB (£1,900) already unlocks the **one model that makes this box special — gpt-oss-120B at ~40–55 t/s**. That's the whole reason to buy Strix Halo over a cheaper box. - The extra £760 for 128 GB buys you only the **Qwen3-235B-class MoEs at squeezed Q3** and a bit more long-context headroom on the 120B. That's a *lot* of money for a narrow capability you may never use — and a 235B model at Q3 on 256 GB/s is a ~10–15 t/s curiosity, not a daily driver. - So **96 GB is the sweet spot**, and **64 GB** is the budget pick if you can live without gpt-oss-120B. **And a bigger consequence:** at 128 GB the GMKtec is now **£2,659.99**, while the **Framework Desktop 128 GB is £2,159** (DIY, add ~£100–130 for a BYO 2TB NVMe ≈ £2,260–2,290). So at the top tier the **Framework is now ~£370–500 *cheaper*** than the GMKtec **and** quieter/cooler/clusterable. The verdict has fully inverted from where we started: - **Want 128 GB?** → Framework Desktop wins on price *and* build. - **Want best value overall?** → **GMKtec 96 GB at £1,900** — runs gpt-oss-120B, skips the £760 premium. The catalogue's Build L price (£1,620 "128GB complete") is now wrong on two counts — wrong RAM tier *and* pre-hike. Want me to update `index.html` + memory.md with this corrected 3-tier GMKtec pricing and the inverted Framework verdict, and bump the re-priced date stamp?
-
your prices for framework are wrong too, its way more now.
-
You're right to push — I dismissed that £3,449 figure from the config page too quickly, and the RAM-shortage hikes hit the 128 GB tier hardest. Let me get the current Framework UK price properly.
-
- WebSearch Framework Desktop 128GB UK price increase RAM shortage 2026 GBP current 8.3s 3.7 KB
-
- WebFetch https://frame.work/gb/en/products/desktop-diy-amd-aimax300/configuration/new 3.0s 195 B
-
Confirmed — the live UK config page shows **£3,449 for the Framework Desktop 128 GB DIY base**, and that's *barebones* (no SSD, no OS). The RAM crisis pushed the US 128 GB from $1,999 → $2,599, and the UK premium on top lands it at £3,449. My earlier £2,159 was stale. Apologies for the churn — that's three bad price assumptions in a row, so let me lock in only the figures we've now actually verified live: | Box | Config | Price | Includes | |---|---|---|---| | GMKtec EVO-X2 | 64 GB / 1TB | **£1,619.99** | complete (SSD + cooler + PSU) | | GMKtec EVO-X2 | 96 GB | **£1,899.99** | complete | | GMKtec EVO-X2 | 128 GB / 2TB | **£2,659.99** | complete | | Framework Desktop | 128 GB DIY | **£3,449** | barebones — **add SSD (~£100+) & OS** | **This re-inverts the verdict one final time, and now it's not close:** - At **128 GB**, the GMKtec (**£2,660, with a 2TB SSD in the box**) undercuts the Framework (**£3,449 *before* you've added storage**) by **~£900+**. Same chip, same speed, same 128 GB. The Framework's quieter/cooler/clustering virtues do **not** justify a £900 premium — *especially* in a garage where noise is irrelevant. - So the Framework Desktop, at current UK pricing, is **off the value table for this project.** It's now a "buy it for the ethos / mainboard-clustering / repairable-PC" purchase, not a value pick. **Net recommendation (unchanged at the core, now with solid numbers):** 1. **GMKtec 96 GB @ £1,900 = the sweet spot** — runs gpt-oss-120B (~40–55 t/s), skips the £760 jump to 128 GB. 2. **GMKtec 64 GB @ £1,620** — budget pick if you don't need gpt-oss-120B. 3. **GMKtec 128 GB @ £2,660** — only if you want the Qwen3-235B-class MoEs; still beats Framework comfortably. 4. **Framework Desktop @ £3,449+** — only for build quality / clustering / repairability, at a steep premium. These four prices are now all live-verified (you confirmed 96/128; I confirmed 64 and Framework on the official pages). Want me to write **all of this** into `index.html` + memory.md — corrected GMKtec 3-tier pricing, the Framework as a flagged "premium, not value" entry, and a dated re-pricing note — so the catalogue stops carrying the wrong £1,620/128GB figure?
-
how capable is gpt-oss-120B, could i replace some claude work with it?
-
- WebSearch gpt-oss-120b real-world capability coding vs o4-mini vs claude sonnet benchmark 2026 7.0s 2.9 KB
-
Let me ground this with current data rather than just my priors, since you're making a real workflow decision on it.
-
- WebSearch gpt-oss-120b weaknesses hallucination tool use agentic coding local review 11.1s 2.4 KB
-
Short version: **gpt-oss-120b is genuinely useful and could absorb a real slice of routine work — but it sits roughly at the o4-mini tier, well below the Claude Opus/Sonnet 4.x you're using here.** Treat it as a capable junior, not a Claude replacement. Grounded specifics: ### Where it lands - **Reasoning/math:** near-parity with **o4-mini** on core reasoning benchmarks. Strong. It's a *reasoning* model (low/med/high "thinking effort"). - **Coding:** roughly **half** Claude Sonnet 4.x's score on coding benchmarks (~15.5 vs ~33.5 on the aggregate that search surfaced). Fine for well-scoped functions, boilerplate, single-file edits; **noticeably weaker** on large-codebase, multi-step agentic coding (exactly what Claude Code does well). - **World knowledge:** underperforms o4-mini on SimpleQA/PersonQA and **hallucinates more** — it's a ~5B-active MoE, so factual breadth is thin. Verify facts, don't trust it on obscure knowledge. - **Tool use/agentic:** decent on paper (matches o4-mini on TauBench), **but** degrades hard when wrapped in an unoptimized harness — local agentic setups often underperform the published numbers. - **Instruction-following:** instruction hierarchy is weaker than closed frontier models; more prone to prompt injection / dropping nuanced constraints. - **No vision**; 131k context (vs Claude's 200k–1M). ### What you *could* realistically offload to it ✅ Summarization, extraction, classification, reformatting, translation ✅ Boilerplate code, well-specified single-file edits, regex/script generation, test stubs ✅ First-draft generation you'll refine ✅ RAG/Q&A over your own docs ✅ **Privacy-sensitive work** — data never leaves the box (a real, unique win) ✅ **High-volume/batch** jobs where Claude's per-token cost would add up (it's ~99% cheaper, and free once you own the box) ### What to keep on Claude ❌ Complex agentic coding across a large repo (this session's kind of work) ❌ Nuanced judgment, ambiguous specs, careful instruction-following ❌ Long-context reasoning and anything needing broad/current world knowledge ❌ High-stakes correctness where the quality gap actually costs you ### The hardware caveat On the EVO-X2 it'll run ~40–55 t/s generation — fine for interactive chat. But as a *reasoning* model at "high" effort it emits long chains-of-thought before answering, so a hard prompt can take 30 s–2 min of thinking. Use **low/medium** effort for routine work to keep it snappy; the 256 GB/s bandwidth also means big contexts slow down prompt processing. ### Verdict A 96 GB EVO-X2 running gpt-oss-120b is a credible **"offload the cheap 50–60% of my Claude usage"** machine — summaries, drafts, classification, private data, batch jobs — and it pays for itself if you're spending real money on API calls for that tier of work. **Keep Claude for the hard 40%**: serious coding, judgment, long-context, and anything where being wrong is expensive. It complements Claude; it doesn't replace it. That's also a nice argument *for* the 96 GB tier specifically — it's the cheapest box that runs this exact model, which is the strongest reason to own Strix Halo at all.
-
could i have claude offload work to it to save tokens etc? and claude can orchistrate
-
Yes — this is a real, working pattern, and it's actually the *best* argument for the box. The shape is **Claude = orchestrator/router/verifier; local gpt-oss-120b = worker** for the cheap, high-volume, low-judgment subtasks. But there's one piece of token economics you need to understand up front, because it determines whether you actually save money. ## How you'd wire it Claude Code can't point its *subagents* at a non-Claude model (subagents always run on Claude). So offload happens by giving the main Claude **a tool that calls your local box**, and letting it decide when to use it: 1. **Run an OpenAI-compatible endpoint on the EVO-X2** — `llama.cpp --server`, Ollama, or LM Studio all expose `/v1/chat/completions` on the LAN. 2. **Expose it to Claude Code**, two options: - **MCP server** (cleanest) — a small MCP wrapper with a tool like `delegate_to_local(task, input)`. Claude calls it like any other tool. Add it to `.claude/settings.json`. - **CLI script** (simplest) — a `~/bin/ask-local` shell script that `curl`s the endpoint; Claude invokes it via Bash. Zero infra. 3. **Claude orchestrates**: plans the work, fans out subtasks to the local model, ingests results, verifies, assembles. ## The economics truth (this is the important bit) When Claude calls the local model as a tool, **the local model's output comes *back into Claude's context* as a tool result — and you pay Claude input tokens to read it.** So naive offloading ("local does it, Claude reads the whole thing") saves less than people expect. You genuinely save when the work is structured so Claude **doesn't have to round-trip the full output**: ✅ **Local writes directly to files**, Claude reads only a diff or a pass/fail — the bulk output never hits Claude's context. ✅ **Batch/parallel jobs** — local classifies/summarizes 500 items, Claude ingests only the aggregate. ✅ **Cheap generation Claude would otherwise do** — output tokens cost ~5× input, so moving *generation* (drafts, boilerplate, summaries) to the free local box is where the real money is. ✅ **Privacy/offline** work — value independent of token cost. ❌ Where it *doesn't* save: task → local → Claude reads entire result → Claude reworks it. You paid twice and added latency. If Claude has to fully re-read and re-reason over the output, just let Claude do it. **Rule of thumb:** offload work whose *output* is large or mechanical and whose *result Claude needs only in summary/file form*. Keep work where Claude must reason over every token of the output. ## What maps cleanly to this split | Claude (orchestrator) | gpt-oss-120b (worker) | |---|---| | Plan, decompose, route | Summarize/extract/classify in bulk | | Verify & merge results | Generate boilerplate / first-draft code into files | | Hard reasoning, judgment calls | Reformat, translate, regex, test stubs | | Final assembly & quality gate | RAG over private local docs | | The 40% that needs to be right | The 60% that just needs to be done | ## Honest verdict This works and the orchestration story is the strongest reason to buy the box — but the savings come from **architecture, not just "having a local model."** Design the delegation so bulk output lands in files and only summaries return to Claude. Done right, you offload a big chunk of cheap generation and keep Claude for judgment; done naively, you pay Claude to read everything anyway and gain only privacy. Two things I can do next: 1. **Verify the exact Claude Code MCP/subagent wiring** (I'd confirm against current Claude Code docs rather than wing it) and sketch the `delegate_to_local` MCP server + settings. 2. **Capture this as an `ideas/` entry** (e.g. `ideas/local-llm-offload/`) so it's planned out separately from the hardware research — it's really its own project. Want either? -
can yo fan ouy and do research for other systems like gmktec, i want current pricing for anything you find. hone in on 96gb but not only that
-
This is a perfect fan-out. I'll launch four parallel research agents, each owning a cluster of vendors, all instructed to return **current (June 2026) UK GBP pricing with source links**, flag **96 GB configs specifically**, and note the LLM-relevant specs (chip, memory bandwidth, power). Then I'll synthesize into one comparison table.
-
- Agent Today is 2026-06-16. I'm researching self-hosted LLM "boxes" for the UK market. I need CURRENT pricing (June 2026) for A… 165.6s 8.0 KB
-
Today is 2026-06-16. I'm researching self-hosted LLM "boxes" for the UK market. I need CURRENT pricing (June 2026) for AMD Ryzen AI Max+ 395 / Ryzen AI Max "Strix Halo" mini-PCs from Chinese/OEM brands — competitors to the GMKtec EVO-X2. Use web search and web fetch (you have access via ToolSearch — search "select:WebSearch,WebFetch" first to load them). Cover these vendors/models and ANY other Strix Halo mini-PC you find: - GMKtec EVO-X2 (confirm all tiers: 64GB/1TB, 96GB, 128GB/2TB — I have £1,619.99 / £1,899.99 / £2,659.99 from gmktec.uk, verify) and any GMKtec EVO-X1 / newer AI model - Beelink GTR9 / GTR9 Pro - Bosgame M5 - Aoostar (any Strix Halo model) - Minisforum (MS-S1 AI or any Ryzen AI Max model) - Acemagic, FEVM, or any other brand selling a Strix Halo box For EACH model report, in a markdown table: - Brand + model - Chip (e.g. Ryzen AI Max+ 395) - RAM tiers offered and which has 96GB - Memory bandwidth (GB/s) if stated - Included SSD / storage - Power draw / PSU (internal vs external brick) - **Current price in GBP** — prefer official UK store, then Amazon UK / Scan / Overclockers; if only USD/EUR available, give that + note ~20% UK VAT/import needs adding - Source URL for the price + whether the page actually showed the price or you inferred it - Stock/availability HONE IN on 96GB configs (note price + availability for each), but capture 64GB and 128GB too. Flag the late-2025/2026 RAM-shortage price volatility where relevant. Be honest where you couldn't confirm a live price — say "unconfirmed" rather than guessing. End with a 3-5 bullet summary of the best-value 96GB option(s) you found. Return the table + bullets as your final message.
-
- ToolSearch select:WebSearch,WebFetch 3ms 100 B
-
I'll load the web tools and research current Strix Halo mini-PC pricing.
-
I'll research current pricing across these vendors. Let me start with parallel searches for the main models.
-
- WebSearch GMKtec EVO-X2 Ryzen AI Max+ 395 96GB price UK 2026 6.6s 2.5 KB
-
- WebSearch Beelink GTR9 Pro Ryzen AI Max+ 395 128GB price 2026 7.2s 2.8 KB
-
- WebSearch Bosgame M5 Ryzen AI Max+ 395 128GB price UK 2026 7.9s 2.2 KB
-
Today is 2026-06-16. I'm researching self-hosted LLM "boxes" for the UK market. I need CURRENT pricing (June 2026) for AMD Ryzen AI Max+ 395 "Strix Halo" systems from MAJOR/branded vendors (HP, Asus, Lenovo, MSI, Corsair, Zotac, etc) — the workstation/branded class as opposed to Chinese mini-PC OEMs. Use web search and web fetch (you have access via ToolSearch — search "select:WebSearch,WebFetch" first to load them). Cover these and any others you find: - HP Z2 Mini G1a (Strix Halo workstation) - Asus ROG Flow Z13 (2025/2026 Ryzen AI Max+ 395 tablet) and any Asus ProArt / NUC Strix Halo desktop - Corsair AI Workstation 300 (Framework-based Strix Halo) - Lenovo, MSI, Zotac ZBOX, Gigabyte — any Strix Halo desktop/mini-PC For EACH model, in a markdown table: - Brand + model - Chip - RAM tiers offered and which has 96GB (note: many branded units only do 32/64/128) - Memory bandwidth (GB/s) if stated - Included SSD / storage - Power / form factor - **Current price in GBP** — prefer official UK store, then UK retailers (Scan, Overclockers, CCL, Amazon UK); if only USD/EUR, give that + note ~20% UK VAT/import - Source URL + whether the page actually showed the price or you inferred it - Stock/availability HONE IN on 96GB configs where they exist (price + availability), but capture 64GB/128GB too. These branded units tend to be pricier than GMKtec — quantify the premium. Be honest where a live price couldn't be confirmed ("unconfirmed"). End with a 3-5 bullet summary of whether any branded Strix Halo box is worth the premium over a GMKtec-class mini-PC. Return the table + bullets as your final message. -
- Agent Today is 2026-06-16. I'm researching self-hosted LLM "boxes" for the UK market. I need CURRENT pricing (June 2026) for A… 156.1s 6.4 KB
-
I'll start by loading the web tools, then research each branded Strix Halo system.
-
- ToolSearch select:WebSearch,WebFetch 2ms 100 B
-
I'll research each branded Strix Halo system. Let me start with parallel searches for the major models.
-
- WebSearch HP Z2 Mini G1a Ryzen AI Max+ 395 price UK 96GB 128GB 2026 6.7s 2.1 KB
-
- WebSearch Corsair AI Workstation 300 Strix Halo Ryzen AI Max 395 128GB price UK 7.9s 2.7 KB
-
- WebSearch Asus ROG Flow Z13 2025 Ryzen AI Max+ 395 128GB price UK 8.7s 2.7 KB
-
- WebSearch Zotac ZBOX Strix Halo Ryzen AI Max+ 395 mini PC price 2026 8.1s 3.0 KB
-
Let me fetch the actual store pages to confirm live prices, and search for the remaining vendors.
-
- Agent Today is 2026-06-16. I'm researching self-hosted LLM boxes for the UK. I need CURRENT (June 2026) UK GBP pricing for the… 117.9s 8.5 KB
-
- WebFetch https://www.gmktec.uk/products/evo-x2-ai-mini-pc-with-amd-ryzen-ai-max-395 1.6s 215 B
-
Today is 2026-06-16. I'm researching self-hosted LLM boxes for the UK. I need CURRENT (June 2026) UK GBP pricing for the Framework Desktop and the broader Framework / standalone-mainboard Strix Halo ecosystem. Use web search and web fetch (you have access via ToolSearch — search "select:WebSearch,WebFetch" first to load them). Note: Framework's UK config page is at https://frame.work/gb/en/products/desktop-diy-amd-aimax300/configuration/new and is JS-heavy — fetch it but cross-check against news/review sources because the late-2025/2026 RAM shortage caused repeated price hikes. Cover: - Framework Desktop DIY — ALL RAM tiers (32GB, 64GB, 128GB — does it offer 96GB? Framework historically only does 32/64/128, confirm). I currently believe 128GB DIY base = £3,449 (barebones, no SSD/OS) — VERIFY this and get the 64GB price too. - Framework Desktop pre-built / "complete" configs if offered, with storage + OS add-on costs in GBP - The Framework Mainboard sold standalone (price in GBP) — used for clustering - Any other Strix Halo standalone mainboard (Sixunited reference board, etc) and its price - The history/magnitude of the 2025-2026 RAM-shortage price hikes on the 128GB tier (how much it went up, when) Output a markdown table: config | RAM | what's included (barebones vs complete) | current GBP price | source URL | confirmed-or-inferred. Then a short section on storage/OS add-on costs and the mainboard-clustering price. Be explicit about barebones-vs-complete since that's the crux of comparing to GMKtec (which includes SSD). Be honest where you couldn't confirm live prices. End with 3-4 bullets: is the Framework Desktop price-competitive in the UK right now, and what's the cheapest realistic out-the-door 128GB build (incl SSD + a free OS like Linux)? Return table + bullets as final message.
-
- WebFetch https://www.amazon.co.uk/GMKtec-EVO-X2-Computers-LPDDR5X-8000MHz/dp/B0FK1X9TDX 5.5s 726 B
-
- ToolSearch select:WebSearch,WebFetch 2ms 100 B
-
I'll load the web tools first, then research Framework Desktop UK pricing.
-
- WebFetch https://www.bee-link.com/products/beelink-gtr9-pro-amd-ryzen-ai-max-395 7.9s 1004 B
-
Let me start with searches and the Framework config page fetch in parallel.
-
Good start. Let me fetch the UK retailer pages directly (Scan) for live prices, and search for the remaining brands.
-
- WebFetch https://frame.work/gb/en/products/desktop-diy-amd-aimax300/configuration/new 5.4s 806 B
-
- WebSearch Framework Desktop price increase RAM shortage 2026 128GB 8.9s 3.5 KB
-
- WebSearch Framework Desktop DIY 128GB price GBP UK 2026 £3449 7.5s 3.1 KB
-
- WebFetch https://www.scan.co.uk/products/corsair-ai-workstation-300-ryzen-ai-maxplus-395-radeon-8060s-igpu-128gb-ddr5-2tbplus2tb-… 3.9s 302 B
-
Today is 2026-06-16. I'm researching self-hosted LLM boxes for the UK. Beyond AMD Strix Halo mini-PCs, I need CURRENT (June 2026) UK GBP pricing for ADJACENT unified-memory / dedicated local-AI boxes that compete in the same "run big models on one quiet box" niche. Use web search and web fetch (you have access via ToolSearch — search "select:WebSearch,WebFetch" first to load them). Cover: - NVIDIA DGX Spark (GB10 Grace-Blackwell, 128GB unified, ~273 GB/s) and its OEM clones: Asus Ascent GX10, Dell Pro Max GB10, HP, Lenovo, MSI EdgeXpert, Gigabyte AI TOP ATOM — current UK GBP price each - Apple Mac Studio current UK pricing (M4 Max and M3 Ultra) at 64GB / 96GB / 128GB tiers — note memory bandwidth per tier - Apple Mac Mini M4 Pro (64GB) UK price - Any other "AI mini-PC / desk box" with >=64GB unified memory aimed at local LLMs For EACH, in a markdown table: - Brand + model - Chip + memory bandwidth (GB/s) — THIS IS CRITICAL for LLM token speed, always include it - RAM tiers and which has 96GB - Included storage - Power / form factor - **Current UK GBP price** — official store first (Apple UK, NVIDIA partners, retailers); if USD-only note ~20% VAT - Source URL + confirmed-or-inferred - Stock/availability HONE IN on 96GB-class configs where they exist (e.g. Mac Studio 96GB), but capture the 64GB and 128GB neighbours too. For each, add a one-line LLM-suitability note focused on memory bandwidth (e.g. DGX Spark 273 GB/s vs Strix Halo 256 vs Mac M3 Ultra 800). Be honest where prices are unconfirmed. End with 3-4 bullets comparing these to a £1,900 GMKtec EVO-X2 96GB on price-per-capability for local LLM inference. Return table + bullets as final message.
-
- Agent Today is 2026-06-16. I'm researching self-hosted LLM boxes for the UK. Beyond AMD Strix Halo mini-PCs, I need CURRENT (J… 140.9s 8.7 KB
-
- WebFetch https://www.scan.co.uk/products/rog-flow-z13-gz302ea-ru014w 3.7s 236 B
-
I'll load the web tools first, then research current pricing.
-
- ToolSearch select:WebSearch,WebFetch 2ms 100 B
-
- WebSearch HP Z2 Mini G1a UK price scan.co.uk OR overclockers.co.uk Ryzen AI Max 395 128GB 9.3s 2.6 KB
-
- WebSearch MSI Strix Halo Ryzen AI Max+ 395 mini PC desktop 2026 price 8.2s 3.5 KB
-
The Beelink page shows $4,399 which seems like an anomaly (possibly a multi-item bundle or currency confusion). Let me find the correct GMKtec UK URL and verify Beelink, plus search the other vendors.
-
I'll research current UK pricing across these categories. Let me run parallel searches.
-
- WebSearch Asus Ascent GX10 UK price DGX Spark clone GB10 7.0s 3.0 KB
-
- WebSearch NVIDIA DGX Spark UK price GBP GB10 128GB 2026 5.9s 2.0 KB
-
- WebSearch gmktec.uk EVO-X2 mini PC official store price 64GB 96GB 128GB 7.3s 3.2 KB
-
- WebSearch Minisforum MS-S1 AI Ryzen AI Max+ 395 price 2026 10.1s 3.0 KB
-
- WebSearch Mac Studio M3 Ultra M4 Max UK price 96GB memory bandwidth 2026 10.6s 1.9 KB
-
- WebSearch Mac Mini M4 Pro 64GB UK price Apple store 6.1s 2.2 KB
-
- WebSearch Bosgame M5 96GB Ryzen AI Max+ 395 price bosgame.com 2026 8.1s 3.1 KB
-
- WebSearch Aoostar Ryzen AI Max+ 395 mini PC Strix Halo price 2026 9.5s 3.5 KB
-
I have the live UK config prices. Now let me get the RAM-shortage history detail and the standalone mainboard pricing.
-
- WebSearch Framework Desktop Mainboard standalone price GBP UK clustering Ryzen AI Max 10.2s 3.5 KB
-
- WebFetch https://frame.work/blog/updates-on-memory-pricing-and-navigating-the-volatile-memory-market 7.9s 1.2 KB
-
- WebSearch Strix Halo standalone mainboard Sixunited reference board price 2026 buy 7.9s 3.0 KB
-
The Scan ROG Flow Z13 listing was actually the Max 390/32GB variant, not the Max+ 395/128GB. Let me get the HP Z2 Mini G1a UK price from Amazon UK / Currys, the Asus ROG Flow Z13 128GB UK price, and check Asus desktop (NUC) and Gigabyte options. Let me also get memory bandwidth specs.
-
- WebFetch https://www.amazon.co.uk/HP-Z2-Mini-G1a-Workstation/dp/B0FVQ8FRX1 2.2s 227 B
-
- WebSearch Asus ROG Flow Z13 GZ302EA 128GB Ryzen AI Max+ 395 UK price scan OR amazon.co.uk in stock 8.8s 3.1 KB
-
- WebSearch Asus NUC Strix Halo Ryzen AI Max 395 desktop OR ProArt PX13 96GB 2026 10.9s 3.4 KB
-
Good progress. Let me fetch the Apple UK Mac Studio page directly for exact tier pricing, and search for the remaining OEM clones and other AI mini-PCs.
-
Good progress. Now let me fetch the official store pages to confirm exact live prices for GMKtec UK, Minisforum, Bosgame, and the techradar roundup which tracks all 31 models.
-
- WebSearch Strix Halo Ryzen AI Max+ 395 memory bandwidth 256 GB/s LPDDR5X 8000 10.7s 3.0 KB
-
- WebFetch https://www.apple.com/uk/shop/buy-mac/mac-studio 8.1s 978 B
-
- WebFetch https://www.gmktec.com/products/amd-ryzen%e2%84%a2-ai-max-395-evo-x2-ai-mini-pc 8.0s 955 B
-
- WebSearch Dell Pro Max GB10 UK price DGX Spark Lenovo HP MSI EdgeXpert Gigabyte AI TOP ATOM 8.5s 3.0 KB
-
- WebFetch https://store.minisforum.com/products/minisforum-ms-s1-max-mini-pc 10.6s 873 B
-
- WebSearch AI mini PC 64GB 128GB unified memory local LLM 2026 alternatives GMKtec EVO-X2 12.5s 4.2 KB
-
- WebFetch https://www.bosgame.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395-96gb-128gb-2tb 5.8s 957 B
-
- WebFetch https://www.techradar.com/pro/there-are-15-amd-ryzen-ai-max-395-mini-pcs-in-the-world-right-now-heres-where-you-can-buy-… 5.8s 983 B
-
Let me fetch the Framework Mainboard product page directly for GBP pricing, and confirm the standalone mainboard price.
-
- WebFetch https://frame.work/gb/en/products/framework-desktop-mainboard-amd-ryzen-ai-max-300-series 3.8s 473 B
-
Memory bandwidth confirmed: 256 GB/s (LPDDR5X-8000, 256-bit), ~215 GB/s measured. Now let me get live UK prices for the Asus ROG Flow Z13 128GB (Currys), HP Z2 Mini G1a UK (Box/Currys), and check Gigabyte/Lenovo + GMKtec baseline for the premium comparison.
-
- WebFetch https://www.tomshardware.com/desktops/gaming-pcs/diy-pc-maker-framework-finally-succumbs-to-ram-apocalypse-is-raising-pr… 3.9s 626 B
-
- WebFetch https://www.currys.co.uk/products/asus-rog-flow-z13-gz302earu127w-13.4-gaming-laptop-amd-ryzen-ai-max-395-1-tb-ssd-10286… 1.7s 215 B
-
The US prices have spiked significantly (RAM shortage). Now let me get the GMKtec UK GBP pricing directly and verify Amazon UK prices via search, plus check Aoostar/Acemagic/FEVM and the GMKtec EVO-X1.
-
- WebFetch https://box.co.uk/a40q1et-hp-z2-mini-g1a-copilot-pc-amd-ryzen-ai-max 1.6s 215 B
-
- WebSearch Gigabyte AI TOP Strix Halo Ryzen AI Max+ 395 mini PC 2026 price OR Lenovo Strix Halo 9.3s 5.6 KB
-
- WebFetch error https://uk.gmktec.com/products/amd-ryzen%e2%84%a2-ai-max-395-evo-x2-ai-mini-pc 1.4s 12 B
-
- WebSearch GMKtec EVO-X2 Ryzen AI Max+ 395 128GB price UK 2026 7.8s 2.0 KB
-
Let me get exact Apple UK tier prices and confirm the remaining OEM clone UK prices (HP, Lenovo, MSI, Gigabyte). The Apple buy pages will have GBP figures.
-
- WebSearch GMKtec EVO-X2 96GB Amazon UK price June 2026 in stock 7.4s 2.3 KB
-
- WebSearch Aoostar NEX395 OR "AG01" Ryzen AI Max 395 mini PC price 7.7s 2.9 KB
-
- WebSearch FEVM FA-EX9 OR Acemagic Ryzen AI Max+ 395 mini PC 96GB price 2026 9.0s 3.3 KB
-
I have the mainboard GBP prices. Let me get the price-history detail from an accessible source.
-
- WebFetch https://www.apple.com/uk/shop/buy-mac/mac-studio/m3-ultra-chip-28-core-cpu-60-core-gpu-96gb-memory-1tb-storage 4.1s 429 B
-
- WebFetch https://www.notebookcheck.net/Framework-Desktop-now-cost-up-to-460-more-due-to-RAM-shortage.1203235.0.html 4.9s 718 B
-
- WebSearch Framework Desktop 128GB original launch price $1999 GBP £1869 2025 5.9s 2.4 KB
-
- WebFetch https://www.apple.com/uk/shop/buy-mac/mac-studio/m4-max-chip-14-core-cpu-32-core-gpu-36gb-memory-512gb-storage 3.9s 702 B
-
- WebSearch MSI EdgeXpert MS-C931 GB10 UK price GBP scan novatech 8.7s 2.8 KB
-
- WebSearch Gigabyte AI TOP ATOM GB10 UK price HP ZGX Nano Lenovo PGX price GBP 7.3s 3.3 KB
-
I have enough to answer comprehensively. The 128GB went $1,999 → $2,459 (Jan 2026, +$460), then further hikes in Feb/May/Jun 2026. Let me confirm whether there were additional hikes beyond the Jan one that brought 128GB to current levels, and check the original GBP launch price quickly.
-
Box and Currys both block WebFetch (403). Let me get the GMKtec UK price verbatim from Amazon (for the premium baseline), and try alternative HP Z2 retailers and the Currys ROG Flow Z13 via search snippets. Let me also confirm the HP Z2 Mini G1a memory tiers (does it offer 96GB?) and check Asus ROG NUC / desktop.
-
I have strong coverage. Let me confirm the GMKtec UK GBP prices directly (the user gave specific figures to verify) and the Amazon UK 96GB listing, plus check the GMKtec EVO-X1 / newer model.
-
- WebFetch https://community.frame.work/t/navigating-the-volatile-silicon-market-updates-on-memory-and-storage-pricing/78800 6.3s 1.1 KB
-
- WebFetch https://www.amazon.co.uk/GMKtec-EVO-X2-Computers-LPDDR5X-8000MHz/dp/B0F6X332N6 5.5s 829 B
-
- WebSearch HP Z2 Mini G1a memory configurations 32GB 64GB 128GB 96GB Ryzen AI Max Pro 390 395 9.7s 2.7 KB
-
- WebFetch https://www.gmktec.uk/products/evo-x2 1.6s 215 B
-
Apple pages are JS-rendered, so I'll get those prices via search. Let me fetch the Scan UK pages for MSI and Gigabyte exact GBP prices, and search for Apple UK prices and HP ZGX UK price.
-
- WebSearch GMKtec EVO-X1 Ryzen AI 9 HX 370 mini PC price UK 2026 8.3s 2.1 KB
-
- WebSearch Framework Desktop Ryzen AI Max+ 395 128GB price UK 2026 7.0s 2.9 KB
-
- WebSearch "gmktec.uk" EVO-X2 £1619 OR £1899 OR £2659 price 7.6s 2.6 KB
-
- WebSearch "ROG Flow Z13" 128GB UK price £ Currys OR scan 2026 in stock GZ302 8.0s 3.1 KB
-
- WebFetch https://www.scan.co.uk/products/msi-edgexpert-personal-ai-supercomputer-gb10-blackwell-superchip-4tb-ssd-128g-lpddr5x-20… 3.2s 222 B
-
- WebFetch https://www.scan.co.uk/products/gigabyte-ai-top-atom-desktop-supercomputer-gb10-blackwell-superchip-4tb-ssd-128gb-lpddr5… 4.4s 412 B
-
- WebSearch Apple Mac Studio UK price M4 Max £2099 64GB 96GB M3 Ultra £4199 2026 7.9s 2.2 KB
-
- WebSearch HP ZGX Nano G1n GB10 UK price GBP scan 8.3s 2.7 KB
-
Found the correct GMKtec UK URL. Let me fetch it for the exact GBP prices the user wants verified, plus the Bosgame UK page.
-
- WebFetch https://www.gmktec.uk/products/gmktec-evo-x2-amd-ryzen%e2%84%a2-ai-max-395-ai-mini-pc 4.1s 553 B
-
- WebFetch https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395 1.6s 215 B
-
I have strong data now. Key finding: Strix Halo memory is soldered, so RAM tiers are fixed per SKU; "96GB" exists only as a GPU-allocation ceiling within a 128GB box (HP, Asus ProArt), not as a purchasable RAM tier. Let me verify the HP Z2 Mini G1a live UK price (the £2,397/£2,663 figure) and the Currys ROG Flow Z13 price, plus confirm HP's actual RAM SKUs. Let me try the VideoCardz UK launch article and a Ballicom/Currys business page that may render.
-
I have enough to build the table. Let me do two final searches to confirm Mac Studio memory bandwidth per chip tier and the Lenovo PGX / Mac Mini M4 Pro 64GB UK price.
-
- WebSearch Mac Mini M4 Pro 64GB UK price £ Apple 273 GB/s bandwidth Lenovo PGX UK price 11.6s 6.4 KB
-
- WebSearch M4 Max 546 GB/s M3 Ultra 819 GB/s memory bandwidth Mac Studio specs 7.2s 2.7 KB
-
- WebFetch https://videocardz.com/newz/hp-z2-mini-g1a-amd-ryzen-ai-max-pro-compact-workstation-launches-in-uk-starting-at-1500 1.2s 222 B
-
- WebFetch https://www.ballicom.co.uk/8t219ea-hp-z2-mini-g1a-amd-ryzen-ai-max-pro-395.p1669446.html 1.6s 215 B
-
The GMKtec UK page shows £1,619.99 as the base (64GB) variant — the fetch couldn't read the per-variant JS pricing for 96GB/128GB. The user's figures (£1,619.99 / £1,899.99 / £2,659.99) match the 64GB base, so they're plausible. Let me do one targeted search to confirm the 96GB and 128GB GBP tiers, since that's the user's key focus.
-
- WebSearch GMKtec EVO-X2 96GB price £1899 UK gmktec.uk 7.2s 2.2 KB
-
- WebSearch Beelink GTR9 Pro UK price GBP Amazon £ 128GB 2026 8.0s 2.7 KB
-
Let me try the Technetbook UK launch article and a search for the current HP UK store / Misco price to pin down the live HP Z2 figure.
-
- WebFetch https://www.technetbooks.com/2025/06/hp-z2-mini-g1a-amd-ryzen-ai-max-mini-pc.html 5.5s 729 B
-
- WebSearch HP Z2 Mini G1a Ryzen AI Max+ Pro 395 128GB UK price misco OR ebuyer OR currys £2 2026 in stock 9.4s 3.2 KB
-
I have all the data needed. The live config page (current, June 16 2026) gives me the authoritative current GBP prices, and I have the USD history. Here's my report. --- # Framework Desktop & Strix Halo mainboard — UK GBP pricing (June 2026) All figures pulled live on **2026-06-16** from Framework's UK store (`frame.work/gb/en`), cross-checked against news/blog coverage of the RAM-shortage hikes. **All Framework prices exclude VAT, taxes and duties at the figures shown** — note that contrasts with the UK config page's "excludes taxes and duties" caveat; treat the numbers as ex-tax list prices that UK checkout then adds VAT to. **Crucially, every Framework price below is BAREBONES** — no SSD, no OS. That's the crux when comparing to GMKtec (which bundles an SSD + usually Windows). ## Pricing table | Config | RAM | What's included | Current GBP price | Source | Confirmed / Inferred | |---|---|---|---|---|---| | Framework Desktop DIY — Max 385 | 32GB | Barebones (chassis + board + PSU + cooler, no SSD/OS) | **£1,269** | frame.work/gb config page | Confirmed (live) | | Framework Desktop DIY — Max+ 395 | 64GB | Barebones, no SSD/OS | **£1,959** | frame.work/gb config page | Confirmed (live) | | Framework Desktop DIY — Max+ 395 | 128GB | Barebones, no SSD/OS | **£3,449** | frame.work/gb config page | Confirmed (live) — matches your £3,449 belief | | Framework Desktop Mainboard (standalone) — Max 385 | 32GB | Bare mainboard only (no chassis/PSU/cooler/SSD/OS) | **£969** | frame.work/gb mainboard product page | Confirmed (live) | | Framework Desktop Mainboard (standalone) — Max+ 395 | 64GB | Bare mainboard only | **£1,659** | frame.work/gb mainboard product page | Confirmed (live) | | Framework Desktop Mainboard (standalone) — Max+ 395 | 128GB | Bare mainboard only | **£3,149** | frame.work/gb mainboard product page | Confirmed (live) | | SIXUNITED STHT1 (Thin Mini-ITX Strix Halo board) | up to 128GB | Bare board only | **No UK price** | Liliputing / Tom's Hardware | Not announced — unpriced | | Minisforum BD395i MAX (Mini-ITX Strix Halo board) | up to 128GB | Bare board only | **No UK price** | TweakTown (CES 2026) | Not announced / not yet shipping | ### RAM tiers — 96GB? **Confirmed: Framework offers only 32 / 64 / 128GB.** There is no 96GB option. The memory is soldered LPDDR5x, so RAM is not user-upgradable — pick the tier at purchase. (32GB is the 8-core Max 385; 64GB and 128GB are the 16-core Max+ 395.) ### Pre-built / "complete" configs Framework's UK store sells the Desktop as **DIY (barebones)** plus a separately-listed pre-built "Framework Desktop" (factory-assembled, same chassis). The pre-built carries a small assembly premium over DIY but is **still sold without bundled SSD/OS by default** — you add storage and OS as the same à-la-carte options below. There is no UK config that ships with an SSD pre-installed at no extra cost. So for cost comparison purposes, treat Framework as barebones at every tier. ## Storage & OS add-on costs (GBP, from the live config page) SSD (you must add one — nothing is included): - SanDisk SN7100 (PCIe 4.0): 500GB **+£135** · 1TB **+£265** · 2TB **+£505** · 4TB **+£665** - SanDisk 850X (PCIe 4.0): 1TB **+£269** · 2TB **+£625** · 4TB **+£1,189** · 8TB **+£3,049** OS: - Windows 11 Home **+£120** · Windows 11 Pro **+£220** - **Linux = £0** (no charge; you flash your own) Other notable add-ons: translucent side panel +£40, CPU fans +£19–£29, expansion cards +£9–£99, desktop handle +£19, power cable +£5. ## Mainboard for clustering The standalone mainboard (used for Jeff Geerling-style multi-node clusters in 10" racks) is the bare board only — no chassis, PSU, cooler or SSD. UK prices: **£969 (32GB) / £1,659 (64GB) / £3,149 (128GB)**. A 4-node 128GB cluster (512GB aggregate) is therefore ~£12,600 in boards alone, before PSUs, fans, networking and SSDs. The board is ~£300 cheaper than the equivalent full Desktop DIY at each tier (you're saving the chassis/PSU/cooler kit). ## RAM-shortage price-hike history (128GB tier) - **Launch (Feb 25, 2025):** 128GB Desktop = **$1,999** (~£1,799 ex-VAT era pricing). - **Jan 12, 2026 — first-ever hike:** 128GB → **$2,459 (+$460)**. 32GB and 64GB each only +$40. Driven by 128Gbit LPDDR5x chip spikes (the 128GB board needs eight of them). - **Feb 11, 2026:** LPDDR5x up another "6–16%"; further 128GB increase. - **May 12/27, 2026:** another 128GB-only adjustment after low-cost inventory sold through to enterprise orders. - **Jun 15, 2026 (one day before this snapshot):** further 128GB-only increase; 32GB/64GB held stable again. - **Net effect:** 128GB has risen from $1,999 to today's UK **£3,449** barebones — i.e. the 128GB tier has absorbed essentially all the memory inflation while 32/64GB barely moved. Framework's CEO has warned of continued volatility "through the rest of 2026 and likely beyond," so expect more 128GB drift. ## Verdict — is Framework price-competitive in the UK right now? - **The 128GB tier is no longer the bargain it launched as.** At £3,449 barebones (ex-tax), after VAT and a 2TB SSD it's a ~£4,000+ machine — the RAM shortage has hit this exact config harder than almost any other product, and it keeps climbing. The 32GB (£1,269) and 64GB (£1,959) tiers have held steady and remain reasonable, but they're less interesting for large-model inference. - **Barebones is the catch vs GMKtec.** GMKtec's EVO-X2 (same Ryzen AI Max+ 395, 128GB) typically ships *with* an SSD and often Windows, at a lower headline price. Framework adds £265–£625 (SSD) on top before you have a bootable box, so apples-to-apples Framework is currently the pricier 128GB Strix Halo route in the UK — you pay a premium for repairability, the cleaner Linux story, and the standalone-mainboard clustering path. - **Cheapest realistic out-the-door 128GB Framework build (SSD + free Linux):** Desktop DIY 128GB **£3,449** + SanDisk SN7100 1TB **£265** + Linux **£0** = **£3,714 ex-tax** (≈ £4,200–£4,460 once UK VAT at 20% is applied at checkout). Drop to a 500GB SSD (+£135) to shave ~£130. The standalone 128GB *mainboard* route is £3,149 but then you supply your own case/PSU/cooler, so it only undercuts the full Desktop if you already have those parts or are building a cluster. - **Other standalone Strix Halo boards aren't a live option yet.** SIXUNITED STHT1 and Minisforum BD395i MAX exist (both 128GB-capable Mini-ITX) but have **no announced UK price and no confirmed retail availability** as of June 2026 — so Framework's own mainboard (£3,149 for 128GB) is currently the only buyable standalone Strix Halo board in the UK. **Honesty notes:** the 32/64/128GB Desktop and all three mainboard GBP figures are live-confirmed from Framework's UK pages today. The exact intermediate USD step-prices for the Feb/May/Jun 2026 128GB hikes were *not* published by Framework (blog gives dates and direction, not amounts) — only the Jan hike ($1,999→$2,459) is a confirmed figure; the rest are confirmed-as-happened but unquantified. The pre-built (assembled) UK premium over DIY was not separately confirmable on the live fetch, so treat "pre-built ≈ DIY + small assembly fee, still no bundled SSD/OS" as inferred. Sources: [Framework UK config](https://frame.work/gb/en/products/desktop-diy-amd-aimax300/configuration/new) · [Framework UK Mainboard](https://frame.work/gb/en/products/framework-desktop-mainboard-amd-ryzen-ai-max-300-series) · [Framework memory-pricing blog](https://frame.work/blog/updates-on-memory-pricing-and-navigating-the-volatile-memory-market) · [Notebookcheck on the $460 hike](https://www.notebookcheck.net/Framework-Desktop-now-cost-up-to-460-more-due-to-RAM-shortage.1203235.0.html) · [Tom's Hardware RAM-apocalypse hike](https://www.tomshardware.com/desktops/gaming-pcs/diy-pc-maker-framework-finally-succumbs-to-ram-apocalypse-is-raising-prices-on-its-desktops-now-starts-at-usd1-139-with-32gb-128gb-up-usd450) · [SIXUNITED STHT1 (Liliputing)](https://liliputing.com/sixunited-stht1-is-another-mini-itx-board-with-amd-strix-halo-build-your-own-ryzen-ai-max-395-mini-pc/) · [Minisforum BD395i MAX (TweakTown)](https://www.tweaktown.com/news/109642/minisforum-bd395i-max-motherboard-at-ces-2026-built-in-amd-strix-halo-apu-use-your-own-gpu/index.html)
-
I now have comprehensive data across all major brands. I have enough to deliver the table and verdict. Note the RAMpocalypse caveat (Corsair jumped from $1,999 to $3,399; Scan now lists £3,499 for the 4TB) explains the steep current pricing. Let me compile. --- # Branded "Strix Halo" (Ryzen AI Max+ 395) systems — UK pricing, June 2026 **Common chip:** AMD Ryzen AI Max+ 395 — 16 core / 32 thread Zen 5, Radeon 8060S (40 CU RDNA 3.5), 50 TOPS NPU. **Memory is soldered LPDDR5X-8000 on a 256-bit bus → 256 GB/s theoretical (~215 GB/s measured GPU-side).** This is the same silicon and same bandwidth in every box below; only RAM capacity, storage, chassis, warranty and price differ. **Critical note on "96GB":** Because the RAM is soldered, no branded vendor sells a *96GB RAM tier*. SKUs are 32 / 64 / 128 GB. "96GB" only ever appears as the **maximum slice of a 128GB box that Windows/the BIOS can hand to the GPU** (HP, Asus ProArt quote this). So if you want ~96GB usable for a model, you must buy a **128GB** unit. | Brand + Model | Form factor | RAM tiers (which has "96GB"?) | Bandwidth | SSD | Power | Current UK price (config) | Source / confirmed? | Stock | |---|---|---|---|---|---|---|---|---| | **HP Z2 Mini G1a** (workstation) | ~2.5L mini desktop | 32 / 64 / **128GB** soldered. 96GB = GPU-allocatable slice of the 128GB SKU only | 256 GB/s | 1TB (64GB) / 2TB (128GB) | ~120W ext. PSU | **£2,663.99** (395 / 128GB / 2TB), or **~£2,397** with HP code "10AI". 64GB/395 ≈ £2,016; 32GB/390 ≈ £1,668 | Technetbook UK launch table + VideoCardz; **prices confirmed at launch, June 2025** — live retailer pages (Box, Ballicom, Currys Biz) blocked automated fetch | Listed at Currys Business / Box / Misco (Misco showed out of stock) | | **Corsair AI Workstation 300** (Framework-based SFF) | SFF desktop | **128GB** only (single tier) | 256 GB/s | 2TB + 2TB NVMe | ~230W | **£3,499.99** | **Scan.co.uk — price confirmed live on page** | Pre-order only ("awaiting ETA") | | **Asus ROG Flow Z13** GZ302EA (2-in-1 tablet) | 13.4" tablet | 32 / **128GB**. 32GB units use 390 *or* 395; 128GB = 395 only | 256 GB/s | 1TB | 70–130W laptop | **£2,899.00** (RU127W, 395 / 128GB / 1TB) at Currys; **£1,899.98** for the 32GB/390 variant at Scan | Currys snippet (£2,899) + Scan page (£1,899.98 confirmed live for 32GB model) | 128GB at Currys; 32GB in stock at Scan (delivery 18 Jun) | | **Asus ProArt PX13** (2026, convertible) | 13.3" laptop | up to **128GB** (96GB GPU-allocatable) | 256 GB/s | 1TB | laptop | **Unconfirmed UK price** (CES 2026 launch; ProArt premium typically £2,800–3,500+) | TechPowerUp / Liliputing spec pages — no live UK price found | Newly announced | | **Zotac ZBOX Magnus EAMAX395C** | 2.65L mini PC | up to **128GB**; barebones variant = BYO storage | 256 GB/s | BYO / configurable | mini PC | **No price published** (announced Jan 2026, Computex) | VideoCardz / TweakTown — vendor has not listed pricing | Announced, not yet on UK shelves | | **Lenovo Yoga Pro 7a / Legion 7a** | 15.3"/16" laptops | up to **128GB** | 256 GB/s | 1TB+ | laptop | **No confirmed UK price.** EU launch ~€2,499+; US Legion 7a from $2,299 (Jul) | PC Gamer / Liliputing / Tom's HW | Rolling out 2026 | | **Framework Desktop** (DIY-ish, semi-branded) | 4.5L mini PC | 32 / 64 / **128GB** | 256 GB/s | configurable | ~230W | **No live UK store price captured.** US: $1,999 (128GB barebones) / ~$2,851 (128GB+1TB built) | VideoCardz / PCWorld; convert + ~20% VAT for UK ≈ £1,900–2,400 est. | Ships to UK via Framework store | | **GMKtec EVO-X2** (baseline OEM, for comparison) | small mini PC | 64 / **128GB** | 256 GB/s | 2TB | ~140W | **£2,099–£2,599** (128GB/2TB; Amazon UK has shown £2,599.96, lower on GMKtec direct) | Amazon UK / review (price varies, partly RAMpocalypse) | In stock | **Pricing context:** A DRAM shortage ("RAMpocalypse") through late 2025/2026 has pushed all 128GB Strix Halo prices up sharply — Corsair's flagship went from a $1,999 launch to $3,399 in the US, which is why Scan's UK 4TB unit now reads £3,499. Treat every figure as a fast-drifting snapshot. ## Verdict — is a branded Strix Halo box worth the premium over GMKtec-class? - **For pure LLM inference value, no.** Every box runs identical silicon at identical 256 GB/s bandwidth — the memory wall that caps token throughput is the same whether you pay £2,100 or £3,500. A GMKtec/Beelink-class 128GB unit (~£2,100–2,600) gives you the same model-size ceiling and the same speeds for the least money. - **The premium buys peace of mind, not performance.** HP Z2 Mini G1a (~£2,400–2,664) adds a proper 3-year onsite warranty, business support, vPro-class manageability and a workstation chassis — worth it for a business that needs accountable support, not for a hobbyist garage box. - **Corsair AI Workstation 300 (£3,499) is hard to justify** purely on value — it's a re-housed Framework board at a ~£900–1,400 premium over a GMKtec 128GB, mostly chassis/brand and the dual-2TB storage. Buy only if you specifically want the polished SFF build and Corsair support. - **Asus ROG Flow Z13 128GB (£2,899) and the laptop/ProArt class are portability plays**, not value plays — you pay a heavy premium for a screen, battery and 2-in-1 form factor that an always-on garage box doesn't need. - **Best-value sweet spot for the stated use-case (garage LLM box, max model-size-per-pound):** a GMKtec/Beelink/Framework 128GB unit. Step up to the **HP Z2 Mini G1a 128GB only if warranty/support matters**; skip the Corsair/laptop premiums entirely. And remember: to get ~96GB usable for the GPU you must buy a **128GB** SKU — no vendor sells a cheaper 96GB tier. **Honesty flags:** HP and Asus-128GB live retailer pages (Box, Ballicom, Currys, Currys Business) returned 403/402 to automated fetch, so those GBP figures come from the official UK launch table and Currys search snippets rather than a verified live page-load; treat as "confirmed at launch / snippet-confirmed, not re-verified on page today." Asus ProArt PX13, Zotac ZBOX EAMAX, Lenovo Yoga/Legion 7a UK prices are **unconfirmed** (announced but not yet UK-listed). Framework Desktop UK price not captured live (US-price-derived estimate only). -
I have enough to compile a thorough report. The user gave me their gmktec.uk figures and I confirmed the £1,619.99 base anchor and that the EU store mirrors €1,899.99 for 96GB (matching their £1,899.99). Let me compile the findings. --- # Strix Halo (Ryzen AI Max+ 395) Mini-PC Pricing — UK, June 2026 **Critical context:** Late-2025/2026 LPDDR5X shortage has roughly doubled US/CN list prices since launch (e.g. Bosgame M5 96GB was $1,489 in Feb 2026, now $2,799; Minisforum MS-S1 Max was a $2,099 pre-order, now $2,599–$3,679). UK official-store prices (GMKtec UK) have held steadier than Amazon UK marketplace listings, which are inflated. Treat all USD/EUR figures as needing **~20% UK VAT + possible import duty** on top if bought direct from China. ## Main comparison table | Brand + Model | Chip | RAM tiers (96GB?) | Mem BW | SSD | Power / PSU | Current price (GBP best route) | Source + confirmed? | Stock | |---|---|---|---|---|---|---|---|---| | **GMKtec EVO-X2** | Ryzen AI Max+ 395 | 64/1TB, **96/2TB**, 128/2TB | ~256 GB/s (LPDDR5X-8000, 8ch) | 1TB / 2TB Gen4 | External ~230W brick (120W TDP, 140W peak) | **64GB: £1,619.99** confirmed on gmktec.uk. **96GB: ~£1,899.99** (your figure; EU store mirrors €1,899.99 — page JS didn't expose per-variant GBP to fetch). **128GB: ~£2,659.99** (your figure, plausible) | gmktec.uk store page — base £1,619.99 **confirmed live & "In Stock"**; 96/128 tiers **inferred** (variant prices JS-loaded) | In Stock (UK store) | | GMKtec EVO-X2 (Amazon UK) | same | 96GB/1TB & 96GB/2TB & 128GB/2TB SKUs | same | varies | same | 96GB listed **£2,099–£2,599.96** (marketplace, inflated) | Amazon.co.uk B0FK1X9TDX / B0FR17BVSL — **price not machine-readable; from search snippets, treat as approximate** | Listed | | **Beelink GTR9 Pro** | Ryzen AI Max+ 395 | **128GB/2TB only** (no 96GB) | LPDDR5X-8000, 256-bit | 2TB Crucial (dual M.2, max 16TB) | **Internal 230W PSU**, 120W TDP / 140W load, dual 10GbE | ~$1,985 official / $1,899–1,999 Amazon US. On Amazon.co.uk (B0FPQQYWQ1) — **GBP not confirmed**, expect ~£1,700–1,900 + the bee-link.com page showed an anomalous $4,399 (likely bundle/error) | Amazon.co.uk listing exists; **GBP price unconfirmed**. STH review confirms specs | Listed (UK) | | **Bosgame M5** | Ryzen AI Max+ 395 | **96/2TB**, 128/2TB | LPDDR5X-8000 | 2TB Gen4 (dual M.2) | External brick | bosgame.com shows **$2,799** (sale, was $2,499/$2,099 earlier — volatile). UK marketplace listing seen at **£2,382.79**. bosgamepc.com (UK) returned 403 | bosgame.com **confirmed $2,799**; £2,382.79 from a price-comparison marketplace, **not official UK store** | US: substantial stock; UK unconfirmed | | **Minisforum MS-S1 Max** | Ryzen AI Max+ 395 | 64/2TB, 128/2TB (**no 96GB**) | LPDDR5X-8000, 256-bit | 2TB Gen4 (+PCIe x16 slot) | **Internal 320W PSU**; 130W perf / 95W bal / 60W quiet; dual 10GbE | **64GB: $2,599** (in stock); **128GB: $3,679** (out of stock) | store.minisforum.com — **confirmed USD live**. No GBP on page; add VAT/import | 64GB in stock, 128GB OOS | | **Aoostar NEX395** | Ryzen AI Max+ 395 | up to 128GB | LPDDR5X-**8533**, 256-bit | 2TB | 140W TDP mode | **~$2,800** (China, 128GB/2TB). Global launch confirmed but no UK price/date | VideoCardz / Notebookcheck — **CN price only, UK unconfirmed** | China; global pending | | **FEVM FA-EX9 / FAEX1** | Ryzen AI Max+ 395 | up to **96GB usable as VRAM**, 128GB | LPDDR5X | varies | OCuLink; compact 1–2L chassis | **~$1,943 (China start)**; no UK/global retail price announced | VideoCardz / Minixpc — **CN only, UK unconfirmed** | China | | GMKtec EVO-X1 (older, not Strix Halo) | Ryzen AI 9 HX 370 | 32GB/1–2TB | LPDDR5X-7500 | 1–2TB | external | **£789.99** (was £1,059.99), Sold Out on gmktec.uk | gmktec.uk — confirmed but **different/lesser chip**, 32GB cap — not a 96GB+ LLM box | Sold Out | | Acemagic | — | — | — | — | — | **No Strix Halo (AI Max+ 395) model found** — only Ryzen AI 9 HX 370 (F3A). Not relevant for 96GB+ | search — none | n/a | | AMD Ryzen AI Halo Dev Kit (reference) | Ryzen AI Max+ 395 | 128GB | LPDDR5X-8000 | — | — | **$3,999** (US, Micro Center) — premium reference, no value case vs OEMs | Tom's Hardware / STH — confirmed USD | US | ## 96GB configs — focused summary - **GMKtec EVO-X2 96GB/2TB — ~£1,899.99** on gmktec.uk, In Stock. **Cheapest confirmed-ish 96GB route in GBP.** Base 64GB £1,619.99 is firmly confirmed live; the 96GB tier price is your figure + matches the EU store's €1,899.99, but I could not machine-read the per-variant GBP (Shopify JS). Amazon UK lists the same 96GB at an inflated £2,099–£2,599. - **Bosgame M5 96GB/2TB** — only seen at **$2,799** official (very volatile) or **~£2,383** via a third-party UK marketplace. No clean official UK store price confirmed. - **FEVM FA-EX9** offers 96GB usable-as-VRAM at ~$1,943 China start — but **no UK availability/price**, so not buyable today. - **Beelink GTR9 Pro and Minisforum MS-S1 Max do NOT offer a 96GB tier** — they jump 64GB → 128GB. ## Best-value 96GB verdict (3-5 bullets) - **GMKtec EVO-X2 96GB/2TB at ~£1,899.99 (gmktec.uk, In Stock) is the clear best-value confirmed 96GB option** — official UK store, VAT-inclusive, no import hassle, 2TB SSD included. Verify the 96GB tier price on-site at checkout, as I could only firmly confirm the £1,619.99 64GB anchor via fetch. - **Avoid the Amazon UK EVO-X2 listings for the 96GB** (£2,099–£2,599) — the official GMKtec UK store is materially cheaper for the same SKU. - **If 96GB specifically isn't a hard requirement, the GTR9 Pro 128GB (~£1,700–1,900-ish range, GBP unconfirmed) is arguably better value than a 96GB box** — more RAM, internal PSU, dual 10GbE — but confirm the Amazon.co.uk GBP price before buying. - **Bosgame M5 96GB and all China-direct models (Aoostar NEX395, FEVM) are not competitive into the UK right now** — either inflated official prices ($2,799) or China-only with ~20% VAT/duty to add and no UK stock. - **RAM-shortage caveat:** these prices are moving weekly. The fact GMKtec UK still shows £1,619.99 base while US/CN equivalents sit at $1,999–$2,800 suggests the UK official-store pricing may be due a rise — buy sooner rather than later if the EVO-X2 is the pick. ### Unconfirmed / honesty flags - EVO-X2 **96GB and 128GB GBP per-variant prices** are inferred (your figures + EU mirror), not machine-confirmed — the gmktec.uk page loads variant prices via JS. - **Beelink GTR9 Pro GBP price** unconfirmed (bee-link.com returned an anomalous $4,399; Amazon.co.uk page price not readable). - **Bosgame UK official price** unconfirmed (bosgamepc.com 403'd; £2,383 is third-party). - Aoostar NEX395 and FEVM FA-EX9 are **China-priced, no UK retail** as of this check. **Sources:** [GMKtec UK EVO-X2](https://www.gmktec.uk/products/gmktec-evo-x2-amd-ryzen%e2%84%a2-ai-max-395-ai-mini-pc) · [GMKtec .com EVO-X2](https://www.gmktec.com/products/amd-ryzen%e2%84%a2-ai-max-395-evo-x2-ai-mini-pc) · [Beelink GTR9 Pro (bee-link)](https://www.bee-link.com/products/beelink-gtr9-pro-amd-ryzen-ai-max-395) · [Beelink GTR9 Pro Amazon UK](https://www.amazon.co.uk/Beelink-GTR9-Crucial-Computer-DeepSeek/dp/B0FPQQYWQ1) · [Bosgame M5 official](https://www.bosgame.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395-96gb-128gb-2tb) · [Minisforum MS-S1 Max](https://store.minisforum.com/products/minisforum-ms-s1-max-mini-pc) · [Aoostar NEX395 (VideoCardz)](https://videocardz.com/newz/aoostar-max-395-mini-pc-features-140w-tdp-mode-128gb-lpddr5x-8533-memory-and-2800-usd-price-tag) · [FEVM FA-EX9 (Minixpc)](https://minixpc.com/blogs/news/fevm-launches-fa-ex9-mini-pc-featuring-amd-ryzen-ai-max-395-strix-halo-2) · [GMKtec EVO-X1 UK](https://www.gmktec.uk/products/gmktec-evo-x1-amd-ryzen%E2%84%A2-ai-9-hx-370-mini-pc)
-
I now have comprehensive data across all categories. Here is the compiled research. --- # Adjacent unified-memory / local-AI boxes — UK GBP pricing (June 2026) Note on critical metric: for LLM token generation speed, **memory bandwidth (GB/s) matters more than raw compute** at the consumer-box level — tok/s on a memory-bound decode scales roughly with bandwidth. The whole GB10 / DGX Spark family is bandwidth-limited (~273 GB/s), which is its key weakness for the price. ## NVIDIA GB10 (Grace-Blackwell) family — DGX Spark + OEM clones All share the **same GB10 superchip, 128GB LPDDR5x unified memory, ~273 GB/s bandwidth, ~1 PFLOP FP4, ~240W, DGX OS**. They differ mainly on storage, networking, and price. None offers a 96GB tier — 128GB is fixed. | Brand + Model | Chip / Bandwidth | RAM | Storage | Power / Form | UK GBP price (inc VAT) | Source / confidence | Stock | |---|---|---|---|---|---|---|---| | NVIDIA DGX Spark (Founders, 940-54242-0009) | GB10 / ~273 GB/s | 128GB (fixed) | 4TB NVMe | ~240W, mini desktop | **£3,636 ex VAT (~£4,363 inc); listed ~£3,700 inc at some** | Ballicom/Novatech/Senetic; confirmed listing, VAT treatment varies | In stock | | ASUS Ascent GX10 | GB10 / ~273 GB/s | 128GB | **1TB** (smallest) | ~240W, stackable mini | **£3,249 inc** (cheapest GB10) | CyberPowerPC UK; confirmed | In stock | | Dell Pro Max with GB10 | GB10 / ~273 GB/s | 128GB | 4TB | ~240W, mini workstation | **~£6,000 inc** (most expensive) | itpro/TechRadar; inferred from US $4,061 + premium | Limited | | HP ZGX Nano G1n | GB10 / ~273 GB/s | 128GB | 1TB (4TB option) | ~240W, mini | **~£3,875 inc** (1TB, from) | Yorkshire price-compare / Westcoast / HP UK; confirmed range | In stock | | Lenovo ThinkStation PGX | GB10 / ~273 GB/s | 128GB | 1TB / 4TB | ~240W, mini-tower | **£3,211 ex VAT (~£3,853 inc) 1TB; £3,847 ex (~£4,616 inc) 4TB**; Lenovo UK list "from £5,200" | Ballicom/Misco/Lenovo UK; confirmed (reseller cheaper than Lenovo direct) | In stock | | MSI EdgeXpert MS-C931 | GB10 / ~273 GB/s | 128GB | 4TB self-encrypting | ~240W, desktop | **£4,499.99 inc** | Scan UK; confirmed | In stock | | Gigabyte AI TOP ATOM | GB10 / ~273 GB/s | 128GB | 4TB Gen5 NVMe | ~240W, desktop | **£4,499.99 inc** | Scan UK; confirmed | In stock | ## Apple Mac Studio (M4 Max / M3 Ultra) | Config | Chip / Bandwidth | RAM tiers (96GB?) | Base storage | Power / Form | UK GBP price | Source / confidence | Stock | |---|---|---|---|---|---|---|---| | Mac Studio M4 Max | M4 Max / **546 GB/s** | 36 / 48 / 64 / **128GB** (no 96GB on M4 Max) | 512GB | ~270W max, small desktop | **from £2,099** (36GB/512GB) | Apple UK / Macworld; confirmed start price | In stock | | Mac Studio M3 Ultra | M3 Ultra / **819 GB/s** | **96GB** / 256 / 512GB | 1TB | ~480W max, small desktop | **from £4,199** (96GB/1TB) | Apple UK / search-confirmed; confirmed | In stock | Note: the **96GB tier exists only on the M3 Ultra** (£4,199 entry). M4 Max jumps 64GB → 128GB. The earlier 128GB/256GB *M4 Max* CTO options were trimmed in some regions; 819 GB/s on the M3 Ultra is the standout bandwidth number of the whole field. ## Apple Mac Mini M4 Pro (64GB) + other AI desk boxes | Brand + Model | Chip / Bandwidth | RAM (96GB?) | Storage | Power / Form | UK GBP price | Source / confidence | Stock | |---|---|---|---|---|---|---|---| | Mac Mini M4 Pro 64GB | M4 Pro / **273 GB/s** | 64GB max (no 96GB) | 512GB+ | ~140W max, tiny | **~£1,999–2,099** (64GB CTO; base M4 Pro from £1,399) | Apple UK / Jigsaw24 / search; price inferred for 64GB CTO | In stock | | GMKtec EVO-X2 (reference) | Ryzen AI Max+ 395 / **~256 GB/s** | 64 / **128GB** | 1–2TB | ~140W, mini | **~£1,900 inc** (128GB/2TB) | TechRadar / GMKtec; user-supplied anchor | In stock | | Minisforum MS-S1 Max | Ryzen AI Max+ 395 / ~256 GB/s | 128GB (+PCIe x16 eGPU slot) | up to 2TB | mini, expandable | ~£1,900–2,400 (USD $2,299–2,959 +VAT) | search; USD-derived, inferred | Available | | Framework Desktop | Ryzen AI Max+ 395 / ~256 GB/s | 128GB | user NVMe | small desktop | ~£1,700–2,000 inc (inferred from USD) | search; inferred | Available | | Sixunited AXB88 | Ryzen AI Max+ 395 / ~256 GB/s | up to 128GB | NVMe | tiny 1.2L | TBD (new, no firm UK price) | Notebookcheck; unconfirmed | Launching | LLM-suitability one-liners (bandwidth-focused): - **GB10 family (~273 GB/s):** competent compute (1 PFLOP FP4, CUDA ecosystem, NVLink-C2C) but decode tok/s is bandwidth-throttled — barely ahead of Strix Halo despite 2x+ the price. Best when you specifically need CUDA / 128GB-in-one-box for dev, not raw inference value. - **Strix Halo boxes (~256 GB/s):** nearly the same memory bandwidth as DGX Spark at ~half the price; the value play for capacity-bound inference, weaker software stack (ROCm/Vulkan vs CUDA). - **Mac Studio M4 Max (546 GB/s):** ~2x the bandwidth of GB10/Strix — meaningfully faster decode; 128GB tier available. - **Mac Studio M3 Ultra (819 GB/s):** ~3x GB10 bandwidth — the fastest-decoding box here and the only one with a clean 96GB tier; the inference performance king of this list (at £4,199+). - **Mac Mini M4 Pro (273 GB/s, 64GB):** same bandwidth as GB10 in a much cheaper/tinier box, but caps at 64GB — fine for ~32B-class models, not 70B+. ## Verdict vs a £1,900 GMKtec EVO-X2 96GB on price-per-capability - **The entire GB10 / DGX Spark family loses the value contest for pure inference.** At £3,200–£6,000 they deliver ~273 GB/s — essentially the *same memory bandwidth* as the EVO-X2's ~256 GB/s. You pay roughly 1.7x–3x more for marginally more bandwidth, but get CUDA + NVLink + 128GB. Only worth it if your workflow genuinely needs the NVIDIA stack or guaranteed 128GB; for tok/s-per-pound the EVO-X2 wins decisively. - **Mac Studio M3 Ultra is the one box that justifies a big premium for inference:** 819 GB/s is ~3.2x the EVO-X2's bandwidth, so a 70B model that crawls on the GMKtec runs comfortably here — but at £4,199 (96GB) it's ~2.2x the price. It's the "fast, quiet, just works" pick, not the value pick. - **Mac Studio M4 Max (546 GB/s, from £2,099)** is the closest direct rival to the EVO-X2 on price: ~10% more money for ~2x bandwidth, though only 36GB at that entry price — to match the EVO-X2's 96–128GB capacity you must spend up to ~64/128GB tiers, narrowing or erasing the value gap. Capacity-per-pound still favours the GMKtec. - **Bottom line for a £1,900 EVO-X2 96GB:** it is the best capacity-per-pound and competitive bandwidth-per-pound box in this entire set. The GB10 clones don't beat it on the metric that drives LLM speed; only Apple's higher-bandwidth Silicon (M4 Max/M3 Ultra) outruns it, and only by paying more — making the EVO-X2 the value benchmark and the Macs the "pay for speed" upgrade, with DGX Spark/clones a CUDA-ecosystem tax rather than a value win. Sources: [Ballicom DGX Spark](https://www.ballicom.co.uk/940-54242-0009-000-nvidia-dgx-spark.p1693686.html), [CyberPowerPC ASUS GX10](https://www.cyberpowersystem.co.uk/system/nvidia-dgx-spark-asus-ascent-gx10-ai-computer), [Scan MSI EdgeXpert](https://www.scan.co.uk/products/msi-edgexpert-personal-ai-supercomputer-gb10-blackwell-superchip-4tb-ssd-128g-lpddr5x-200gb-connectx), [Scan Gigabyte AI TOP ATOM](https://www.scan.co.uk/products/gigabyte-ai-top-atom-desktop-supercomputer-gb10-blackwell-superchip-4tb-ssd-128gb-lpddr5x-cx7), [HP ZGX Nano UK](https://www.westcoast.co.uk/Microsites/Workstations/CZ9K1ETABUHPZGXNANOG1N1281TBUBUNTU.html), [Ballicom Lenovo PGX](https://www.ballicom.co.uk/30kl0004uc-lenovo-thinkstation-pgx.p1694630.html), [Dell Pro Max GB10 (itpro)](https://www.itpro.com/hardware/desktops/the-dell-pro-max-with-gb10-is-the-most-sophisticated-mini-ai-workstation-you-can-get-but-also-the-most-pricy), [Apple Mac Studio specs](https://support.apple.com/en-us/122211), [Macworld Mac Studio prices](https://www.macworld.com/article/2973459/2026-mac-studio-m5-release-date-specs-price-rumors.html), [GMKtec EVO-X2 (TechRadar)](https://www.techradar.com/pro/new-nvidia-dgx-spark-competitor-shows-strong-benchmark-performance-and-at-half-the-price-could-it-be-enough-to-convince-some-users-to-switch). --- Caveats on confidence: GB10-family UK VAT treatment is messy — some retailers quote ex-VAT (Ballicom, Senetic, Lenovo resellers), others inc-VAT (Scan, CyberPowerPC), so I flagged each. Confirmed inc-VAT prices: ASUS GX10 £3,249, MSI £4,499.99, Gigabyte £4,499.99, HP from ~£3,875. Dell ~£6,000 and Mac Mini 64GB CTO ~£2,099 are inferred, not pinned to a live cart. Mac Studio start prices (£2,099 / £4,199) are confirmed; the 96GB tier is M3-Ultra-only.
-
All four agents are back. Here's the synthesized picture — and the headline finding is important for your 96 GB focus: ## 🔑 The 96 GB tier barely exists — and GMKtec owns it Because Strix Halo RAM is **soldered**, almost every vendor sells only **32 / 64 / 128 GB**. A true **96 GB RAM tier** is offered by just three boxes in the UK, and the GMKtec is less than half the price of the alternatives: | 96 GB box | Bandwidth | UK price | Notes | |---|---|---|---| | **GMKtec EVO-X2 96 GB / 2TB** | 256 GB/s | **£1,899.99** ✅ | Cheapest by a mile; SSD included | | Bosgame M5 96 GB / 2TB | 256 GB/s | ~£2,380 (no clean UK store; $2,799 official) | Volatile, China-direct | | Apple Mac Studio M3 Ultra 96 GB | **819 GB/s** | **£4,199** | 3× the bandwidth, 2.2× the price | ⚠️ Watch the wording: HP and Asus quote "96 GB" as the *GPU-allocatable slice of a 128 GB box*, **not** a 96 GB RAM tier — to get that you buy (and pay for) 128 GB. **So for 96 GB specifically, the GMKtec EVO-X2 at £1,900 is essentially unchallenged.** ## Master comparison (live June 2026, UK GBP) ### Strix Halo class — same chip, same ~256 GB/s, same token speed | Box | RAM tiers | UK price | PSU / standout | Confidence | |---|---|---|---|---| | **GMKtec EVO-X2** | 64 / **96** / 128 | **£1,620 / £1,900 / £2,660** | ext. brick; SSD incl. | 64GB live-confirmed; 96/128 your figures | | **Beelink GTR9 Pro** | 128 only | **~£1,700–1,900** | **internal PSU, dual 10GbE**, 2TB | GBP unconfirmed | | **Minisforum MS-S1 Max** | 64 / 128 | $2,599 / $3,679 (+VAT) | **internal 320W + PCIe x16 slot** | USD-confirmed, no UK | | Bosgame M5 | 96 / 128 | ~£2,380 | ext. brick | no clean UK store | | Aoostar NEX395 | →128 | ~$2,800 (China) | LPDDR5X-**8533** (faster) | China-only | | **Framework Desktop DIY** | 32/64/**128** | £1,269 / £1,959 / **£3,449** | barebones — **no SSD/OS** | **all live-confirmed** | | HP Z2 Mini G1a (workstation) | 64 / 128 | ~£2,016 / ~£2,400–2,664 | 3yr onsite warranty | launch prices | | Corsair AI Workstation 300 | 128 only | **£3,499.99** | Framework board, rehoused | Scan live | | Asus ROG Flow Z13 | 32 / 128 | £1,900 / £2,899 | tablet (portability) | mixed | ### Adjacent boxes — only Apple beats Strix Halo on *speed* | Box | Bandwidth | RAM | UK price | Verdict | |---|---|---|---|---| | NVIDIA DGX Spark / Asus GX10 (cheapest GB10) | 273 GB/s | 128 | **£3,249**–£6,000 | CUDA tax; ~same bandwidth as Strix Halo | | MSI EdgeXpert / Gigabyte AI TOP ATOM (GB10) | 273 GB/s | 128 | £4,499.99 | overpriced for inference | | **Mac Studio M4 Max** | **546 GB/s** | 36→128 | from £2,099 (36GB) | ~2× bandwidth; capacity costs more | | **Mac Studio M3 Ultra** | **819 GB/s** | **96**/256/512 | from £4,199 | fastest box here; the "pay for speed" pick | | Mac Mini M4 Pro 64GB | 273 GB/s | 64 max | ~£2,099 | caps at 64GB | ## Verdict 1. **Want 96 GB? → GMKtec EVO-X2 £1,900. Done.** Nothing else is close on price, and the next-cheapest true-96 GB box (Mac M3 Ultra) is £4,199. 2. **Don't need exactly 96 GB? → two genuinely interesting alternatives surfaced:** - **Beelink GTR9 Pro 128 GB** (~£1,700–1,900) — *internal* PSU + dual 10 GbE, arguably better-built than the GMKtec for similar money, if the GBP price confirms. - **Minisforum MS-S1 Max** — uniquely has a **PCIe x16 slot**, so you could bolt on a discrete GPU (e.g. a used 3090) later. That directly addresses this project's weak leg — **Strix Halo can't really fine-tune; a slotted-in 3090 could.** Worth flagging as a hybrid play. 3. **The GB10/DGX Spark family is a poor inference value** — £3,250–£6,000 for the *same* ~273 GB/s bandwidth as a £1,900 GMKtec. Only buy for the CUDA ecosystem. 4. **The only way to meaningfully beat Strix Halo token/s is Apple Silicon** (M4 Max ~2×, M3 Ultra ~3× bandwidth) — at 1.5–2× the price. Bandwidth, not money, is the real ceiling. 5. **Framework's 128 GB is now £3,449 *barebones*** (live-confirmed; +£265 SSD + free Linux ≈ £3,714 ex-VAT) — the RAM crisis has made it uncompetitive vs the GMKtec 128 GB (£2,660 complete). Its only remaining case is repairability or the £3,149 standalone-mainboard clustering route. **Confidence flags:** GMKtec 96/128 GB and Beelink/Bosgame GBP prices weren't machine-readable (JS carts / blocked pages) — treat as ±. Framework, Corsair, the GB10 family, and Apple start prices are live-confirmed. All Strix Halo prices are drifting upward weekly due to the RAM shortage. --- Two follow-ups I'd suggest: 1. **Update `index.html` + memory.md** — fix the wrong Build L price (£1,620→£1,900 for 96 GB, £2,660 for 128 GB), and add new build cards for the standouts: **Beelink GTR9 Pro**, **Minisforum MS-S1 Max (the eGPU-slot hybrid)**, the **GB10/DGX Spark** family, and **Mac Studio M4 Max** — plus a "96 GB is rare" callout. Bump the re-priced date. 2. I can **re-query the GMKtec/Beelink live prices** more precisely (the carts were JS-blocked) if you want firmer numbers before I write them in. Want me to do the catalogue update, the price re-verification, or both?
-
the gmktec is out of stock sadly
-
That knocks out the value pick. Let me find what's actually **in stock and buyable in the UK right now** rather than guess — checking the top alternatives live.
-
- WebSearch Beelink GTR9 Pro Ryzen AI Max 395 128GB UK price Amazon in stock 2026 7.4s 2.8 KB
-
- WebSearch Bosgame M5 Ryzen AI Max 395 96GB UK buy in stock price GBP 7.7s 2.3 KB
-
- WebFetch https://www.gmktec.uk/products/gmktec-evo-x2-amd-ryzen%e2%84%a2-ai-max-395-ai-mini-pc 5.6s 714 B
-
Quick note: the cached GMKtec page still reads "In Stock" with a "notify me" button — that cache is ~15 min old and page-level, so the specific **96 GB variant** you're seeing sold out is the real signal. I'll trust your eyes on that. Let me answer your AliExpress question directly and check it live.
-
- WebSearch GMKtec EVO-X2 official store AliExpress Ryzen AI Max 395 128GB price 8.1s 2.4 KB
-
Yes — the GMKtec EVO-X2 is sold on **AliExpress, including GMKtec's own official store there**, and it's the cheapest route by a wide margin. But the UK import math matters, so here's the full picture: ## EVO-X2 on AliExpress (official GMKtec store + resellers) | Config | AliExpress price | ≈ GBP (pre-tax) | |---|---|---| | 128 GB / 2TB | **$1,999** | ~£1,575 | | 128 GB / 4TB | $2,156 | ~£1,700 | | 96 GB / 1TB (seen on Newegg, likely on AliExpress too) | ~$2,229 | ~£1,755 | ## The UK landed-cost reality You'll pay **20% import VAT** on top — but here's the good news: **mini-PCs are customs-duty-free** in the UK (they classify as "automatic data processing machines," HS heading 8471, 0% duty). So it's VAT only, plus a small courier handling fee (~£12): - **128 GB / 2TB: ~$1,999 → ~£1,575 + 20% VAT ≈ £1,900 landed.** That's the punchline: a **128 GB** EVO-X2 imported via AliExpress lands at **~£1,900 — about £760 *cheaper* than gmktec.uk's £2,659.99 for the same 128 GB box**, and roughly the same price the UK store wants for just the 96 GB. The gmktec.uk store carries a hefty UK markup. AliExpress is also a way to **get stock when the UK store's 96 GB is sold out** — check the official GMKtec AliExpress store for the 96 GB SKU directly. ## The catches (why it's not a no-brainer) - **Warranty & returns** are handled from China — RMA means shipping a heavy box back internationally; slow and costly if it's faulty. The UK store gives you UK consumer rights and easy returns. - **VAT collection is inconsistent** — for orders over £135, AliExpress *sometimes* pre-collects UK VAT at checkout (price already includes it) and sometimes leaves it to the courier to bill you on delivery. Check which at checkout so you're not surprised, and so you don't accidentally pay it twice. - **DOA risk** on a £1,900 import is the real downside vs paying the UK premium for hassle-free support. ## Practical take, given the UK 96 GB is out of stock 1. **Cheapest:** AliExpress official GMKtec store — 128 GB/2TB lands ~£1,900, in stock, but you carry the import/warranty risk. 2. **Safer, UK-side, duty-paid:** **Bosgame M5 96 GB/2TB ~£2,383**, ships from a German/EU warehouse duty-free to the UK with a UK-reachable warranty — the closest "buyable now" 96 GB box if you want to avoid a China import. 3. **128 GB alternative with a UK warehouse:** **Beelink GTR9 Pro** direct from bee-link.com (~$1,985, regional UK warehouse) — internal PSU + dual 10 GbE. Ignore the Amazon UK listing at £4,173, that's a marketplace-inflated price. If you want, I can dig into the **official GMKtec AliExpress store** specifically to confirm the live 96 GB SKU + whether it shows VAT-inclusive pricing for UK delivery — that's the one thing I'd verify before you commit.
-
check the gmktec aliexpress store for the 96gb and link it to me, then look at potential import costs
-
- WebFetch https://www.aliexpress.com/item/1005009919892554.html 3.8s 459 B
-
- WebSearch GMKtec official store aliexpress EVO-X2 96GB 2TB price ship UK 8.9s 2.7 KB
-
AliExpress blocks automated reading of its product pages (I get only the page footer), so I **couldn't machine-verify the seller badge or the live 96 GB variant price** — you'll need to eyeball that one thing yourself. But here's what I found and the import math. ## The 96 GB listing - **GMKtec EVO-X2 listing (variant selector — pick "96GB + 2TB"):** https://www.aliexpress.com/item/1005009584792500.html - A second EVO-X2 listing: https://www.aliexpress.com/item/1005011734933566.html - A reseller ("Honor Mini Pc Store") was flagged as carrying the **96 GB / 2TB** SKU with **shipping to the UK (~$63.76)**. ⚠️ **Verify before buying:** check the storefront name reads **"GMKTEC Official Store"** (top of the listing) rather than a third-party reseller — AliExpress blocked me from confirming it. Official-store pricing for 96 GB/2TB is currently **~$1,799 promo, ~$1,673 with the extra-$140 code** (matches GMKtec's own site). ## Import cost into the UK Good news first: **mini-PCs are customs-duty-free** in the UK — they classify as automatic data-processing machines (commodity code 8471), so **0% duty**. You only pay **20% import VAT** on (goods + shipping). Worked example at the 96 GB/2TB price: | | Lower (with code) | Upper (promo) | |---|---|---| | Item | $1,673 (~£1,322) | $1,799 (~£1,421) | | Shipping | +$64 (~£50) | +$64 (~£50) | | **Customs value** | ~£1,372 | ~£1,471 | | Import VAT @ 20% | +£274 | +£294 | | Courier handling fee | +~£12 | +~£12 | | **Landed total** | **~£1,660** | **~£1,780** | *(USD→GBP at ~£0.79/$; rate moves, so treat as ±.)* **vs gmktec.uk's £1,899.99 for the same 96 GB box** → AliExpress lands **~£120–£240 cheaper** *and* is in stock when the UK store's 96 GB isn't. ## Two things that change the math 1. **AliExpress usually pre-collects UK VAT at checkout.** For UK delivery addresses, AliExpress is VAT-registered and typically **adds the 20% at checkout**, so the parcel clears without a separate courier bill. If so, **the price you see with a UK address entered ≈ the landed price** above — no surprise charges. Confirm the checkout total includes VAT so you don't get double-billed by the courier. 2. **Some sellers ship from an EU/UK warehouse** (duty-paid, ~3–5 days) rather than China (~2 weeks). The China route is cheapest; a UK-warehouse listing is safer/faster but may cost a bit more. ## The real cost isn't money — it's recourse At ~£1,700 landed you save a little, but if it arrives DOA or dies in month 2, **RMA means shipping a heavy box back to China** and slow support. The gmktec.uk store (or Amazon UK) gives you UK consumer rights and easy returns for the ~£120–240 premium. For a £1,700 purchase, many people pay the premium just for that. **My read:** AliExpress official store is the cheapest in-stock way to get the 96 GB right now — go for it **if** (a) the seller badge says GMKTEC Official Store, (b) checkout shows VAT included for your UK address, and (c) you're comfortable carrying the warranty risk. Otherwise the **Bosgame M5 96 GB (~£2,383, EU warehouse, duty-free, UK-reachable warranty)** is the lower-risk in-stock 96 GB option. Want me to write all of this — the corrected GMKtec tier pricing, the AliExpress import route, and the in-stock alternatives — into `index.html` + `memory.md` so it's captured? The catalogue still has the wrong £1,620/128 GB figure. -
the price on aliexpress seems way more, 2709 for 96gb 1tbssd
-
You're right and I was wrong — my $1,673–1,799 figures were promo/US prices off GMKtec's own blog, and they don't survive contact with the actual UK checkout. The **£2,709 you're seeing is the real number**: AliExpress pre-collects UK VAT into the displayed price, and it reflects current RAM-inflated pricing, not the old promo. So: **AliExpress 96 GB / 1TB → £2,709 (VAT-inclusive, and only 1TB SSD).** That **blows up the "AliExpress is cheaper" case entirely.** Compared straight: | Route | Config | Price | In stock? | Storage | |---|---|---|---|---| | gmktec.uk | 96 GB / **2TB** | **£1,899.99** | ❌ (96 GB out) | 2TB | | **AliExpress** | 96 GB / **1TB** | **£2,709** | ✅ | 1TB only | | **Bosgame M5** | 96 GB / **2TB** | **~£2,383** | ✅ (EU warehouse) | 2TB | So AliExpress is the **worst** of the three — most expensive, least storage, most risk. My import math was based on optimistic promo USD that the UK checkout simply doesn't honour. Lesson taken: the live price you see beats my aggregated figures. **Updated practical recommendation, given the 96 GB you actually want:** 1. **Bosgame M5 96 GB / 2TB at ~£2,383** is now the best *in-stock* 96 GB option — cheaper than AliExpress, double the SSD, ships duty-free from an EU warehouse, and has a UK-reachable warranty. This is the one I'd lean to. 2. **Wait for gmktec.uk to restock the 96 GB at £1,899.99** — still the cheapest if you can catch it back in stock (set the "notify me"), but the banner warns of a ~£300 rise, so that window may close. 3. **Skip AliExpress** at £2,709 — no advantage left. The bigger truth underneath all this: **the cheap-Strix-Halo era is over.** The RAM shortage has pushed every 96–128 GB box up, and prices are still climbing weekly. There's no longer a ~£1,600 bargain here — realistic entry for a 96 GB box in stock today is **~£2,400**. Want me to capture this corrected reality in the catalogue — Bosgame M5 as the in-stock 96 GB pick, the AliExpress route marked "not worth it," and a note that RAM-shortage pricing has reset the whole tier upward?
-
[Image #1] looks to me like 128gb is 2083? is the memory bandwith good?
-
[Image: source: /home/megaproxy/.claude/image-cache/9bb35d8e-489b-41f0-b68d-3c3a21e85663/1.png]
-
Yes — reading it right: **£2,083 for 128 GB / 2TB** (down from £2,455), free shipping, UK plug. And note the **96 GB option is greyed out = unavailable**, so 96 GB is out at Bosgame too. That's actually fine, because at £2,083 the 128 GB undercuts gmktec.uk's £2,659.99 by ~£575, it's **in stock, UK plug, free shipping, EU-warehouse warranty** — this is now the **best in-stock big-memory Strix Halo deal we've found.** Grab it before it reverts to £2,455. ## Is the memory bandwidth good? Short answer: **good *for this class of machine*, but it's the defining bottleneck — and it's the same on every Strix Halo box.** The Ryzen AI Max+ 395 runs a 256-bit LPDDR5X-8000 bus = **~256 GB/s** theoretical (~210–215 GB/s measured GPU-side). In context: | Hardware | Bandwidth | vs Bosgame | |---|---|---| | Normal desktop (dual-channel DDR5) | ~80–100 GB/s | ⅓ | | **Bosgame M5 / all Strix Halo** | **~256 GB/s** | — | | NVIDIA DGX Spark / Mac M4 Pro | ~273 GB/s | ~same | | Mac Studio M4 Max | 546 GB/s | 2× faster | | Mac Studio M3 Ultra | 819 GB/s | 3× faster | | **RTX 3090 / 4090 (discrete)** | **~940–1000 GB/s** | **~4× faster** | **Why it matters:** LLM token generation is bandwidth-bound — tok/s scales almost linearly with this number. So 256 GB/s is *much* better than a normal PC (which is why it can run big models at all), comfortably mid-pack for an AI mini-PC, but **~4× slower than a real GPU**. What that means in practice on this box: - **MoE models fly** — gpt-oss-120B (~5B active params) ≈ **40–55 t/s**. This is the box's sweet spot and the reason to buy it. - **Mid-size dense is fine** — 30B ≈ 10–13 t/s, 8B ≈ 30–45 t/s. - **Big dense crawls** — Llama-70B ≈ **4–6 t/s** (readable, not snappy). So: the bandwidth is **"good enough that you can load 128 GB of model and run MoE models at conversational speed,"** but **not** good enough to make dense 70B+ fast. You're buying *capacity at usable speed for MoE*, not raw GPU speed. If dense-70B-at-speed is the goal, only Apple (2–3× bandwidth) or a discrete GPU beats it — at much higher cost. For your use case (run big/MoE models on one quiet box, offload cheap work from Claude), **£2,083 for 128 GB at 256 GB/s is a genuinely good buy.** Want me to lock the Bosgame M5 128 GB @ £2,083 into the catalogue as the current in-stock pick?
-
is this the same as the framework basically?
-
For **running LLMs? Yes — functionally identical.** Same AMD Ryzen AI Max+ 395 silicon, same ~256 GB/s bandwidth, same 128 GB soldered LPDDR5X. That means **identical model capability and identical token/s** — gpt-oss-120B runs ~40–55 t/s on both, dense 70B crawls at ~4–6 t/s on both. From a "what can I run and how fast" standpoint there is zero difference. Where they differ is **everything around the chip:** | | Bosgame M5 | Framework Desktop | |---|---|---| | Chip / bandwidth / RAM | AI Max+ 395 / 256 GB/s / 128 GB | **identical** | | LLM performance | — | **identical** | | 128 GB price | **£2,083 complete** | **£3,449 barebones** (ex-VAT) | | Included | 2TB SSD + OS, UK plug, free ship | **nothing** — add SSD (~£265–505) + OS | | Realistic out-the-door 128 GB | **~£2,083** | **~£3,950+ ex-VAT (≈£4,200 inc)** | | Build / PSU | Chinese OEM mini-PC, external brick | premium, internal PSU, ~40 W cooler/quieter | | Repairability | sealed-ish (RAM soldered anyway) | modular, repairable (but RAM still soldered) | | Support / warranty | EU warehouse, China-brand RMA | established company, proper support | | Clustering | no | sells standalone mainboard for multi-node | So: **same machine for AI, very different proposition for your wallet and your support experience.** The Framework costs roughly **2× as much** for the same model performance — you're paying ~£1,900 extra for build quality, repairability, internal PSU, a real company's support, and the clustering option. Neither has upgradeable RAM (that's a Strix Halo limitation, not a brand choice). **Bottom line:** if the goal is "run big/MoE models on one quiet box for the least money," the **Bosgame M5 at £2,083 is the rational pick** — it does literally everything the £4,000+ Framework does for LLMs. The Framework only makes sense if you specifically value the repairable-PC ethos, the support, or the mainboard-clustering path enough to pay double. In a garage, where its quiet/cool/pretty advantages don't matter to you, that's a hard premium to justify.
-
are there reddit threads with people using bossgame for ai?
-
- WebSearch Bosgame M5 Strix Halo LLM ollama llama.cpp reddit LocalLLaMA 11.0s 2.7 KB
-
- WebSearch Bosgame M5 Ryzen AI Max 395 local LLM reddit experience 2.9s 424 B
-
Reddit is the one source my tools can't reach — Anthropic's web crawler is blocked from reddit.com, so I can't pull thread contents directly. Let me try to at least surface thread links and the community consensus.
-
- WebSearch Bosgame M5 reddit LocalLLaMA review gpt-oss 120b tokens 18.5s 2.3 KB
-
- WebSearch reddit r/LocalLLaMA Strix Halo Ryzen AI Max 395 128GB worth it experience gpt-oss 20.8s 2.8 KB
-
Honest answer: **I can't read Reddit with my tools** — Anthropic's web crawler is blocked from reddit.com, so both search and fetch come back empty for it. I'm not going to pretend to summarize threads I can't actually see. But here's the useful version: ## Two things worth knowing 1. **Bosgame-specific Reddit threads are thin** — it's a smaller brand. Almost all the community discussion is about **"Strix Halo / Ryzen AI Max+ 395" generically**, not any one box. 2. **That's fine, because it transfers directly.** The Bosgame M5, GMKtec EVO-X2, Beelink GTR9, and Framework Desktop are all the **same Sixunited platform + identical silicon + identical 256 GB/s**. So a Reddit report of someone running gpt-oss-120B on a GMKtec or Framework is an accurate proxy for the Bosgame — the AI experience is the same; only the chassis differs. ## Search these yourself (clickable) - **Bosgame M5 across Reddit:** [google `site:reddit.com bosgame m5`](https://www.google.com/search?q=site:reddit.com+bosgame+m5+llm) - **Strix Halo on r/LocalLLaMA (the real signal):** [google `site:reddit.com/r/LocalLLaMA strix halo`](https://www.google.com/search?q=site:reddit.com/r/LocalLLaMA+strix+halo+ryzen+ai+max) - **Reddit's own search:** [reddit.com/search?q=strix halo gpt-oss](https://www.reddit.com/search/?q=strix%20halo%20gpt-oss) - Best subreddits: **r/LocalLLaMA**, **r/LocalLLM**, **r/MiniPCs**, and **r/FrameworkDesktop** (very active on exactly this chip). ## What the broader Strix Halo community consensus actually is (From sources I *can* read — the [Level1Techs benchmark thread](https://forum.level1techs.com/t/strix-halo-ryzen-ai-max-395-llm-benchmark-results/233796), the [strix-halo-guide on GitHub](https://github.com/hogeheer499-commits/strix-halo-guide), and [hardware-corner's optimization writeup](https://www.hardware-corner.net/strix-halo-llm-optimization/) — all of which apply to the Bosgame): - 👍 **gpt-oss-120B is the crowd favourite** — "~40–55 t/s, genuinely usable, the reason to buy this box." - 👍 **MoE in general** (Qwen3-30B/235B) praised; people love loading 70–120 GB of model in a silent box. - 👎 **Dense 70B is universally called slow** (~4–6 t/s) — the recurring "don't buy this expecting fast dense models" warning. - 🔧 **Software setup is the common gripe** — ROCm on gfx1151 still maturing; most people land on **llama.cpp Vulkan or LM Studio** as the path of least resistance, and the `ROCBLAS_USE_HIPBLASLT=1` tip comes up a lot. - 🔧 **Linux > Windows** for getting the full memory allocation (the 96–120 GB VRAM trick). So the community verdict matches what I've told you: **excellent MoE/big-model inference box at usable speed, slow at dense 70B+, and budget an afternoon for driver setup.** If you want, I can spin up a background agent to scrape the specific thread *titles/links* via Google's cache (which sometimes surfaces reddit text even when reddit itself is blocked) so you've got a reading list — but I can't promise it'll get through. Want me to try?
-
how much power does the bosgame use
-
- WebSearch Bosgame M5 power consumption idle load watts TDP review measured 9.0s 2.7 KB
-
Here's the power picture for the Bosgame M5 (measured in reviews): | State | Power draw | |---|---| | **Idle** | ~10–20 W (typical for this Strix Halo mini-PC class; reviews didn't pin an exact figure) | | **LLM inference (token-gen)** | ~70–120 W — it's memory-bandwidth-bound, so the GPU isn't maxed | | **Sustained full load** | ~120 W SoC (performance mode) | | **Peak** | up to ~140 W SoC / **<200 W at the wall** | The chip is the AMD Ryzen AI Max+ 395 with a **configurable TDP**, and the M5 has a **hardware switch to toggle normal vs performance mode** so you can cap it for quieter/cooler running. Two useful efficiency points from the reviews: - **Diminishing returns above ~80 W** — going from 80 W → 120 W gives only ~10% more performance for ~50% more power. So for a 24/7 LLM box you'd likely run it in the lower-power mode and lose almost nothing. - **It's genuinely low-power for the capability** — running a 120 GB MoE model at ~50 W–100 W is a fraction of what an equivalent multi-GPU rig draws. A dual-3090 box (Build H in your catalogue) idles higher than this thing's *peak* and pulls 600–700 W under load. **For your garage/always-on, offload-from-Claude use case:** budget roughly **~15 W idle, ~80–120 W when actively serving models, <200 W worst case.** At UK ~28 p/kWh, idling 24/7 is ~£37/year; a realistic mixed-use ~50 W average is ~£123/year. That low-power profile is one of the genuine advantages of the Strix Halo route over a discrete-GPU build — same point applies to the GMKtec and Framework since it's the same silicon. Sources: [guru3D M5 review](https://www.guru3d.com/review/review-bosgame-m5-ai-mini-desktop-ryzen-ai-395/) · [Notebookcheck Bosgame M5](https://www.notebookcheck.net/Mini-PC-with-AMD-Strix-Halo-and-128-GB-RAM-AMD-Ryzen-AI-Max-395-and-AMD-Radeon-8060S-impress-in-Bosgame-M5.1088558.0.html)
-
does bosman use external power brick?
-
- WebSearch Bosgame M5 power supply external brick or internal PSU adapter watts 7.6s 1.8 KB
-
**Yes — the Bosgame M5 uses an external power brick** (a ~240 W external PSU adapter), not an internal power supply. Practical implications: - **One more thing on the floor/shelf** — a chunky laptop-style brick inline on the cable, which some people find annoying in a tidy setup. For a garage/rack box it's a non-issue. - **Slightly worse cable management** than an internal-PSU box, but the unit itself stays smaller. - The 240 W brick gives comfortable headroom over the system's <200 W peak. For contrast across the Strix Halo boxes: - **External brick:** Bosgame M5, GMKtec EVO-X2 - **Internal PSU:** Beelink GTR9 Pro, Minisforum MS-S1 Max (320 W), Framework Desktop So if "no external brick" matters to you, the **Beelink GTR9 Pro** (internal PSU, dual 10 GbE, 128 GB) is the standout alternative — but it's the same chip/speed and you'd be buying it for the cleaner build, not better AI performance. For most uses the brick is a minor cosmetic trade-off, not a functional one.
-
do you think with this little ai box i could go from a max to a pro on claude and save money and offload a lot of work to the box?
-
Honest answer: **probably not the way you're hoping — because the box is weakest at exactly the workload that usually makes people need Max.** Let me walk through it, then ask the one thing that decides it. ## The money math Box = £2,083 one-off (+ ~£50–120/yr electricity). Subscription saving depends on which Max you're on: | Move | Monthly saving | Payback on the box | |---|---|---| | Max 5× ($100) → Pro ($20) | ~$80 (~£63) | **~33 months** (~2.75 yrs) | | Max 20× ($200) → Pro ($20) | ~$180 (~£142) | **~15 months** | So on **Max 5×** it's a slow ~3-year payback; on **Max 20×** it pays for itself in ~15 months. (Prices approximate — verify your exact plan.) ## The catch that matters more than the math **What drives Max usage is almost always heavy agentic *coding* (Claude Code) — and that's the one thing gpt-oss-120B is worst at.** Recall the capability split: - ✅ The box absorbs well: summaries, drafts, classification, extraction, boilerplate, batch/bulk text, private-data work. - ❌ The box is *weak* at: complex multi-file coding, judgment, long-context, large-repo work — i.e. the stuff that burns your Max limits. So there's a mismatch: the work you'd *want* to offload to escape Max is the work the box can't do well, and the work the box *can* do is usually the lighter stuff that wasn't pushing you over Pro's ceiling anyway. On top of that, Pro's coding/Opus limits are genuinely tight — if you're on Max because you hit walls doing real coding, dropping to Pro means hitting those walls constantly, and the box won't catch the overflow. There's also the orchestration tax I flagged earlier: when Claude routes work to the box, you still pay Claude tokens to read the results back — so offloading saves less Claude usage than it looks unless you architect it carefully (local writes to files, only summaries return). ## So when does your plan actually work? - **If your heavy usage is bulk text** (summarizing, drafting, classifying, processing lots of documents) → **yes, plausibly.** The box can eat a big chunk of that, and remaining judgment work might fit in Pro. Worth it especially if you're on Max 20×. - **If your heavy usage is coding/agentic** (the usual reason for Max) → **no.** Pro will frustrate you and the box can't cover the gap. Keep Max; buy the box for the *other* value (privacy, offline, 24/7 cheap inference, batch jobs), not as a subscription downgrade. ## The deciding question Two things flip the answer — can you tell me: 1. **Which Max are you on** — 5× ($100) or 20× ($200)? 2. **What's actually eating your Max usage** — mostly coding in Claude Code, or mostly bulk text/analysis work? With those I can give you a straight yes/no plus the realistic payback for *your* situation. My instinct, though: the box is a great **complement** to Claude (offload the cheap 50–60%, get privacy + unlimited local inference) but a poor **substitute** for Max-tier coding — so if coding is your driver, treat it as "keep Max, add the box," not "trade down to Pro."
-
im on 5x, you can look at the folders here in projects etc to get an idea. Sometimes its research sometimes its code
-
Let me actually look at what you've been working on across the folders to ground the advice.
-
- Bash cd /home/megaproxy/claude 2>/dev/null; echo "=== TOP LEVEL ==="; ls -1 .; echo; for d in projects research ideas; do ech… 2.5s 488 B
-
Interesting — you're already deep in this space (`lmstudio-mcp`, `lm-studio-tuning`, `h200-2x-llm-testing`, `claude-usage-widget`). Let me look closer at activity levels and the most telling folders.
-
- Bash cd /home/megaproxy/claude 2>/dev/null echo "=== COMMIT ACTIVITY (last 90 days, per repo) ===" for d in projects/* resear… 5.0s 1.4 KB
-
That survey is decisive. Here's what your folders say about you: **You're a heavy, active coder.** In the last 90 days: - `openworld-coop` — **250 commits** (active this week) - `tiletopia` — **147 commits** - `rimlike` — **97**, `tavernkeep` 47, `tileforge` 34, `claude-usage-widget` 35, `zoetrope` 19, `lostways-perforce` 18… That's hundreds of commits across big, live game/app projects. Research folders (`llm-box-uk`, `h200-2x-llm-testing`, `lm-studio-tuning`) are much lighter — 1–10 commits each. So **your Max usage is overwhelmingly driven by agentic coding, not bulk text.** And that's the bad news for the downgrade plan: **coding is precisely what gpt-oss-120B can't take off your plate, and precisely what Pro throttles hardest.** Someone doing 250 commits/90 days of AI-assisted coding will hit Pro's Opus/Claude-Code limits *constantly*. The box can't catch that overflow — it's an o4-mini-tier model that's weak at multi-file agentic work in a big repo. ### So, straight answer for *your* situation (Max 5×, coder): **No — don't trade Max 5× for Pro + this box.** Three reasons: 1. **Capability mismatch** — your driver (heavy coding) is the box's weakest area. You'd offload the wrong 10%. 2. **Pro would bite** — your commit cadence implies you'd hit Pro's limits fast and be stuck, with no good local fallback for *coding*. 3. **Weak economics anyway** — 5× → Pro saves ~£63/mo, so a £2,083 box is a **~33-month payback** even if it worked. ### But here's the framing that *does* work for you You're clearly already heading this way — you've built **`lmstudio-mcp`** and have **`lm-studio-tuning`** research. So buy the box as a **complement that keeps you on 5× longer**, not as a downgrade: - **Pressure-relief valve:** offload the cheap, high-volume stuff so you *don't* get pushed from 5× up to 20× ($100→$200) as your projects grow. Avoiding the *upgrade* is the realistic saving, not enabling a downgrade. - **What to route to the box** (via your existing `lmstudio-mcp`): commit-message generation, changelog/doc drafting, test-stub scaffolding, summarizing research sources (like all the web pages in this very session), classifying/triaging, regex/boilerplate. All the stuff that nibbles your quota without needing Claude's judgment. - **Keep Claude (on 5×) for the actual coding** in openworld-coop/tiletopia/rimlike — the work that needs it. - **Bonus value** that has nothing to do with subscriptions: privacy, offline, 24/7 local agents, and unlimited local inference for experiments (you're already tuning LM Studio). ### Bottom line Treat the £2,083 Bosgame as **"keep Max 5×, add a local workhorse to stop usage creeping toward 20×"** — that's a defensible buy given you're already building the MCP plumbing for it. The **"drop to Pro and save money"** version doesn't hold for a coder at your activity level: the box is a complement to Max, not a substitute for it. One thing that would sharpen this: do you actually *hit* the Max 5× ceiling now, or are you comfortably under it? If you're nowhere near the cap, the box is a "nice capability + privacy" purchase, not a money-saver at all — and that's worth being honest with yourself about before spending £2k.
-
what about qwen?
-
Good question — Qwen genuinely changes the local-coding story, because unlike gpt-oss-120B (which is mediocre at code), Qwen has *dedicated coding models*. Let me verify the current state rather than rely on memory.
-
- WebSearch Qwen3-Coder 30B A3B local coding agentic quality vs Claude Sonnet 2026 best local coding model 10.2s 3.1 KB
-
- WebSearch Qwen3-Coder 30B Strix Halo Ryzen AI Max tokens per second llama.cpp benchmark 9.5s 2.7 KB
-
Qwen changes the picture more than gpt-oss did — because Qwen actually makes *dedicated coding models*, and they run beautifully on this box. This is the most relevant thing for your situation yet. ## The key model: Qwen3-Coder-30B-A3B - **Speed on Strix Halo: ~70–98 t/s** (it's a 30B MoE with only 3.3B active, so it decodes at 3B speed but punches at 30B quality). That's genuinely *fast* — comparable to or faster than cloud, with no rate limits. - **Fits even the 64 GB box** (~18 GB loaded) — you don't need 128 GB for it. - **Built for agentic coding** (long-horizon RL on SWE-bench), and people run it as a **"self-hosted Claude Code alternative" via OpenCode / Cline** — exactly the workflow you'd want. There are write-ups of people doing agentic coding on cheap NUCs with it. ## The honest ceiling - Quality is described as **~Claude Sonnet 4.0 / GPT-4 Turbo tier** — "very capable but needs more guidance on complex tasks." You're currently on **Opus 4.8**, which is meaningfully stronger, especially on the hard multi-file/architecture/subtle-bug work your big live projects (openworld-coop, tiletopia, rimlike) involve. - The Qwen model that genuinely *matches* Sonnet 4 is **Qwen3-Coder-480B** — and that **won't fit in 128 GB** (~240 GB at Q4). So the box gives you the 30B, not the 480B. - There's also **Qwen3-235B-A22B** (fits 128 GB at Q3, ~10–15 t/s) — higher quality than the 30B but slow and memory-tight, and Q3 quant costs you some quality. ## What this does to your Max-5×-→-Pro question It **upgrades the box from "cheap text complement" to "genuine junior coding partner."** With Qwen3-Coder-30B you can offload real *coding* subtasks — boilerplate, single-file edits, scaffolding, test generation, refactors, smaller modules, simpler agentic loops — not just summaries. That makes the **"stay on 5×, offload more, avoid creeping to 20×"** case much stronger, and it makes **"drop to Pro"** a *real possibility* for the first time… but I still wouldn't commit blind. The routine 50–70% of coding, yes; the hardest 30–50% that your active projects lean on, Qwen-30B will make you babysit, and Pro's limits would bite on what's left. ## The smart move — and you're already set up for it You have **`lmstudio-mcp`** and **`lm-studio-tuning`** — you don't need to spend £2k to find out. **Trial it first:** 1. Run **Qwen3-Coder-30B-A3B** locally (LM Studio / llama.cpp) on whatever hardware you have now. 2. Point **OpenCode or Cline** at it and use it on your *actual* projects for a week or two — **while still on Max**. 3. Measure: what fraction of your coding does it handle acceptably? How often do you fall back to Claude? That number is your answer. If Qwen-30B comfortably absorbs (say) 60%+ of your coding and the rest fits in Pro's limits → buy the box, drop to Pro, real savings. If you're constantly falling back to Opus for the work that matters → keep Max 5×, and the box becomes the "stay on 5× longer + privacy + 24/7 local" complement instead. Either way the box earns its keep — but Qwen is what turns "test before you decide" from a formality into a genuinely open question. Want me to write up a quick trial setup (Qwen3-Coder-30B + OpenCode pointed at a local endpoint) so you can run the experiment? Sources: [agentic coding on a NUC with Qwen3-Coder-30B](https://medium.com/@smbaker/agentic-coding-on-an-inexpensive-nuc-with-opencode-and-qwen3-coder-30b-6fd0dddc2ded) · [self-hosted Claude Code alt with OpenCode + Qwen3-Coder-30B](https://medium.com/@lexy_eyn/how-to-connect-a-local-qwen3-coder-30b-to-opencode-and-create-a-self-hosted-claude-code-alternative-4f0db7f38cc2) · [Strix Halo Qwen3-Coder-30B benchmark (~70–98 t/s)](https://github.com/pablo-ross/strix-halo-gmktec-evo-x2/blob/main/QWEN3-CODER-30B_BENCHMARK.md)
-
what else could the box run? not just ai models etc
-
Good angle — this is a full, fairly powerful x86 PC, not just an AI appliance. Under the hood it's a **16-core/32-thread Zen 5 CPU + Radeon 8060S iGPU (~RTX 4070-laptop / desktop 4060-class) + up to 128 GB RAM + fast NVMe + USB4/2.5–10 GbE**, sipping ~15 W idle. That opens up a lot — and several map directly onto what your folders show you already do. ## Most relevant to *you* (game dev + homelab) - **Game-dev build/test box** — you've got rimlike, tavernkeep, tileforge, tiletopia, openworld-coop, zoetrope. A 16-core Zen 5 with 128 GB RAM is a strong **compile / asset-bake / lightmap-bake** machine, and it runs Unreal/Unity/Godot fine for testing. Offload long builds to it and keep your main machine free. - **Self-hosted CI runner** — point a **Forgejo Actions runner** at it for your repos (you're already on git.rdx4.com). 16 cores chews through builds. - **Homelab host** — you already run Unraid, Forgejo, Wiki.js, Portainer. This box makes a great **Proxmox / Docker host**: 128 GB RAM = a stack of VMs/containers (your wiki, git, dashboards, dev environments) on one quiet ~15 W-idle node. ## Gaming & emulation - The **8060S iGPU games genuinely well** — 1080p high / 1440p medium on modern titles, ~RTX 4070-laptop territory. Steam, Game Pass, etc. - **Emulation powerhouse** — retro through to modern (PS3, Switch, even early PS4-era) thanks to the strong CPU + iGPU + 128 GB. ## Media & creative - **Plex / Jellyfin server** — RDNA 3.5 has dedicated media engines with **AV1 hardware encode/decode**; effortless 4K transcoding, multiple streams. - **Video editing** — DaVinci Resolve / Premiere run well; the AV1 encoder is a real perk for exports. - **3D / Blender** — Cycles GPU rendering via HIP, plus heavy CPU rendering on 16 cores. Good for the kind of asset/render work games need. - **CAD / 3D-print prep** — Fusion 360 / FreeCAD / slicers (relevant since you 3D-print cooling shrouds and such). ## General compute - **Daily-driver workstation** (Windows or Linux) — it's just a fast small PC. - **Data / in-memory work** — 128 GB RAM is great for large datasets, in-memory databases, big Docker stacks. - **Virtualization lab** — spin up test OSes/environments freely. ## The one honest caveat **It's a single box, so you time-share it.** If you dedicate it to serving AI models 24/7, it isn't simultaneously your gaming rig or build farm. The clean solution is **Proxmox** (or dual-boot): partition it into an always-on LLM/server VM plus on-demand dev/gaming, so the same £2,083 covers several roles. Also note: **AMD iGPU = no CUDA**, so a few pro apps that demand CUDA (certain Blender OptiX paths, some niche ML tooling) fall back to slower ROCm/HIP/Vulkan — rarely a dealbreaker, but worth knowing. **Net:** for you specifically, the strongest non-AI justifications are **game-dev build/CI box** and **homelab/Proxmox host** — both things you're already doing on other hardware. That materially improves the buy case: it's not a single-purpose AI gadget, it's a 16-core server + capable GPU that *also* happens to run 120 GB LLMs. Stack those roles and the £2k looks a lot more defensible than "AI box that's idle most of the day." Want me to capture all of this — the multi-role framing, the Qwen coding-offload angle, and the corrected pricing — into the catalogue, maybe as an "is the unified-memory box worth it?" sidebar on Build L?
-
can i use it to train smaller models? Im thinking it could be fun to make some games with a AI llm for some of the npc's / story events etc.. maybe a murder mystery or something
-
Fun direction — and a murder mystery is a *great* fit for local LLMs. Let me check the current state of training-on-Strix-Halo (it's been the platform's weak spot, but ROCm moves fast) before I answer.
-
- WebSearch fine-tune LoRA Strix Halo Ryzen AI Max ROCm 2026 training small model bitsandbytes unsloth 12.5s 3.6 KB
-
The state has actually improved since this project's original "training is a no-go" verdict — let me give you the honest current picture, then the more important point about your murder-mystery idea. ## Can the box train models? Yes — more than it used to, with caveats Community **fine-tuning toolboxes for Strix Halo (gfx1151) now exist**, and people are running real LoRA/QLoRA on it. Realistic envelope on the 128 GB box: - **Full fine-tune:** up to ~**12B** params - **LoRA / QLoRA:** ~**20–30B** models - Works on Gemma-3, Qwen-3, gpt-oss-20B, etc. **The caveats are real, though:** gfx1151 only got experimental ROCm support in late 2025, so it's fiddly — `bitsandbytes` and `torchao` crash on import, Unsloth needs patching, you install nightly ROCm PyTorch wheels, and **it's slow** (the iGPU is not a training card — a LoRA run that's minutes on a 3090/cloud is hours here). So: *possible and free, but a tinkering project, not turnkey.* Budget an afternoon to get the toolbox working. ## But for your murder mystery — you probably don't need to train at all This is the key insight. NPC dialogue and story events are mostly a **prompting + state** problem, not a training problem. A good instruct model plays characters brilliantly with zero fine-tuning if you give it: 1. **Character cards** — each suspect gets a persona, what they know, their secret, their alibi, whether they're lying. Just text. 2. **A structured world state** — the solution (who did it), clue database, timeline, relationships — stored in your game, fed to the model per scene (RAG-style). This keeps the mystery *consistent* across a playthrough without baking anything into weights. 3. **Constrained generation (GBNF grammars in llama.cpp)** — force NPCs to emit valid game output: dialogue + "clue revealed" flags + a suspicion meter, in a strict schema your game can parse. 4. **Few-shot examples** for tone. You change an NPC by editing text, not retraining — far more maintainable for a game you're iterating on. ## The genuinely cool part: the 128 GB lets you run a whole *cast* at once Murder-mystery architecture that plays to the box's unified memory: - A big **"Game Master / director" model** (gpt-oss-120B or Qwen3-30B) that secretly knows the full solution, adjudicates player questions, paces reveals, and keeps everyone consistent. - Several **small fast NPC "brains"** (Qwen3-4B/8B, Gemma-3-4B, Mistral-Nemo) — one per suspect, each at **30–100+ t/s** for snappy real-time dialogue. - All loaded **simultaneously** in 128 GB. A director + 4–5 suspect models living in memory at once is exactly the kind of thing this box does that a single-GPU rig can't. ## When training *does* help (and how to do it well) Fine-tune only when prompting hits a wall — usually to **lock a consistent voice/format** or **shrink prompts** (bake persona/schema into weights so you're not spending context on it every call). For that, a small **LoRA** is the tool. The pragmatic pattern: **train the LoRA on a cheap rented cloud GPU** (a small run is a few £ on RunPod/Vast, fast and painless), then **run the fine-tuned model locally** on the box. Train in the cloud, infer at home — that plays to the box's actual strength and sidesteps the ROCm-training friction. Use local training for hobby/occasional runs where slow-but-free is fine. ## Bottom line - **Training on the box:** yes for small models / LoRA, but it's a slow, fiddly tinkering path — fine for hobby, not for fast iteration. - **Your game:** start with prompting + structured world state + grammars — you'll get a working murder mystery with *no* training, and the box is genuinely excellent at *running* a multi-model cast. - **If you later want a custom NPC voice:** LoRA it (cloud-train, local-run). This is honestly one of the better justifications for the box yet — a game-dev hobby project that needs exactly what it's good at (serving several models at once, locally, free, unlimited). Want me to sketch the murder-mystery architecture concretely — the director/NPC model split, the world-state schema, and a GBNF grammar for structured NPC output — maybe as an `ideas/` entry so it's captured?
-
are there latops with similar specs or a npu to help speed it up in similar price ranges
-
Two things in your question — laptops with similar specs, and whether an NPU speeds it up. Let me bust the NPU part first (it's a common misconception), then get current laptop prices.
-
- WebSearch HP ZBook Ultra G1a Ryzen AI Max 395 128GB UK price 2026 5.9s 3.3 KB
-
- WebSearch does NPU help LLM token generation speed Strix Halo iGPU vs NPU inference bandwidth 11.3s 2.9 KB
-
Two honest answers here, and the NPU one might save you money. ## The NPU won't speed up the work you care about This is a common misconception worth clearing up. **LLM token generation is memory-bandwidth-bound, and the NPU doesn't change that ceiling.** From AMD's own numbers: - On a **120B MoE model, NPU and iGPU give nearly identical ~34–38 t/s** — the NPU adds essentially nothing. - The **iGPU is actually *better* for sustained token generation** (~2.4× higher throughput than NPU-only). - The NPU's one trick is **faster time-to-first-token** (prompt processing) in hybrid mode — but only for **small models (~8B cap)**, and at low power. So the NPU is for **low-power background AI** (Copilot features, small on-device models) and shaving prefill latency on small models — **not** for making your 120B/Qwen-30B work faster. Don't pay extra "for the NPU." What matters is **memory bandwidth (256 GB/s) + the iGPU**, and that's identical across every Strix Halo box, laptop or mini-PC. ## Laptops with matching specs = only Strix Halo laptops Generic "AI laptops" with NPUs (Intel Lunar Lake, Snapdragon X, Ryzen AI 300) **don't count** — they cap at 16–32 GB RAM, so they physically can't load the big models, and (per above) the NPU doesn't help anyway. The *only* laptops that match the capability run the same Ryzen AI Max+ 395: | Laptop | Form | 128 GB price | vs Bosgame (£2,083) | |---|---|---|---| | **Asus ROG Flow Z13** (2026) | 13" tablet | **~£2,899** | +£800 | | **HP ZBook Ultra G1a** 14" | workstation laptop | **~£3,500–4,500** (unconfirmed; premium) | +£1,400–2,400 | | Asus ProArt P16 / Lenovo Yoga/Legion 7a | creator/gaming laptops | rolling out 2026, premium | + | ## The honest trade-off A Strix Halo *laptop* gives you the same 128 GB / 256 GB/s AI capability **plus portability + screen + battery** — but: - **You pay £800–2,400 more** for the same AI performance. - **Sustained AI is *worse*** — thin laptop cooling throttles under long loads, where the mini-PC holds ~120 W; on battery it downclocks hard. - **Wrong shape for an always-on offload/server box** — you'd run it plugged in, lid closed, wasting the screen and battery you paid for. ## Recommendation - **If you genuinely want a portable do-everything machine** (carry it, code on the train, occasional local AI) → a Strix Halo laptop is a legit 2-in-1, and the **ROG Flow Z13 (~£2,899)** is the closest to your price range. - **If it's an AI/offload/dev box that lives in the garage** (which is your use case) → **the mini-PC wins clearly**: cheaper, faster under sustained load, and the right form factor. The laptop premium buys portability you wouldn't use. So: no NPU shortcut, and laptops cost more for the same (or worse, under load) AI performance. For an always-on box, stick with the **Bosgame M5 at £2,083** — unless portability itself is the feature you want, in which case the Flow Z13 is the pick.
-
if i did have the gmktec box, how much work could we realistically offload to it? Or would it be good to drive from the box and the box can call claude when it needs
-
Great question — and the "box drives, calls Claude when stuck" inversion is genuinely the more interesting architecture. Let me answer both parts honestly. ## How much could realistically offload (by task type) It's not one number — it depends heavily on the work. Realistic fractions the box (Qwen3-Coder-30B + gpt-oss-120B) can absorb: | Work type | Offloadable to box | Why | |---|---|---| | **Bulk text** (summaries, classification, extraction, drafts, commit msgs, doc gen) | **80–95%** | Box is great at this; rarely needs Claude | | **Routine coding** (boilerplate, single-file edits, tests, scaffolding, simple refactors) | **50–70%** | Qwen-30B handles it at 70–98 t/s | | **Complex coding** (multi-file features in big repos, architecture, subtle bugs — your openworld-coop/tiletopia work) | **10–30%** | Quality gap; Claude does the heavy lifting | | **Research/judgment** (like this session) | **30–50%** | Box gathers/summarizes; Claude synthesizes | Across your mixed workload, **~40–60% of total token-volume** could move to the box — but the *hard, high-value* share stays with Claude. The box eats quantity, not difficulty. ## The two architectures **A — Claude drives, offloads to box** (what we discussed before) Claude Code (Opus, on Max) plans/verifies; box does cheap subtasks. Simplest, best quality, but Claude stays in the loop for everything, so it trims Max usage rather than slashing it. **B — Box drives, escalates to Claude** (your new idea) A local agent (OpenCode/Cline + Qwen-30B) runs the loop locally — free, unlimited, private — and calls Claude *only* when it hits something hard. This **inverts the cost model**: instead of paying Claude for everything and offloading scraps, you do everything free locally and pay Claude only for the genuinely hard 10–30%. ## The billing crux nobody mentions This is the part that decides it: **your Max subscription powers *Claude-driven* work (Claude Code). If the *box* drives and calls Claude, that's the Claude API — pay-per-token — which is separate from Max.** So: - **Architecture B effectively means: drop Max, pay Claude API per escalation.** That's cheaper *if* your hard-work volume is low, and more expensive if it's high (Max's flat rate beats per-token once you use a lot). - **Architecture A keeps Max** and is most cost-effective when you do *lots* of hard work (flat rate wins). So the right choice tracks your hard-work volume: - **Heavy hard coding** (your big live projects) → flat-rate Max + Claude-driven wins. Keep A for that lane. - **Lots of routine/bulk/hobby work** (game NPC stuff, summaries, scaffolding, experiments) → box-driven + occasional API escalation is cheaper. B shines here. ## My honest recommendation: don't pick one globally — route by lane Given you're a heavy coder on Max 5× **and** you build MCPs: 1. **Keep Claude Code (Max) as the driver for your serious repos** — openworld-coop, tiletopia, rimlike. The quality gap matters there and the flat rate is good value for that volume. 2. **Build the box-driven loop for the cheap/bulk/hobby lane** — the murder-mystery game, research summarizing, doc/test generation, experiments. Local agent (OpenCode + Qwen-30B) with an **"ask_claude_hard" escalation tool** hitting the Claude API. You'd pay pennies of API only when it escalates. 3. The escalation trigger is the hard part — a weak driver doesn't always know when it's stuck. Use **concrete triggers** rather than the model's self-assessment: escalate when tests fail N times, when a diff exceeds X files, on a confidence threshold, or just **keep yourself in the loop** ("box, try it; I'll tell you when to call Claude"). That gives you the best of both: flat-rate Claude where quality pays for itself, free local-first everywhere else, and Claude API as a precision tool the box reaches for — not your default. You're well-positioned to build B (you already have `lmstudio-mcp`). The honest caveat: **box-as-driver is lower-reliability than Claude-as-driver** — it'll go down wrong paths more often, so it's best on work where a wrong turn is cheap (hobby, bulk, experiments), not on your production game code. Want me to spec the box-driven setup — OpenCode + Qwen3-Coder-30B + an `ask_claude` escalation MCP with concrete trigger rules? -
would we be able to use things like workflow, and some of the subagents are actually a llm on the box?
-
Good architecture question — and the answer about *which* slots can be the local model vs. Claude is specific enough that I want to verify Claude Code's current internals rather than wing it. Let me confirm.
-
- Agent I need precise, current answers about Claude Code's architecture for a user who wants to run a local LLM (Qwen3-Coder-30… 104.1s 8.1 KB
-
I need precise, current answers about Claude Code's architecture for a user who wants to run a local LLM (Qwen3-Coder-30B / gpt-oss-120B on an OpenAI-compatible endpoint on their LAN) and integrate it with Claude Code's orchestration. Please verify against current docs/behavior, not assumptions: 1. **Subagent models:** When defining a subagent (the Agent/Task tool, or `.claude/agents/*.md` definitions), what `model` values are supported? Can a subagent be configured to run on a NON-Claude model — e.g. a local OpenAI-compatible endpoint — or are subagents always backed by Claude models (opus/sonnet/haiku/inherit)? 2. **Custom/local backends for Claude Code itself:** Can Claude Code's main driver be pointed at a custom or local model backend (e.g. via ANTHROPIC_BASE_URL, a proxy/gateway, Bedrock/Vertex)? Is there a supported way, and are there community tools (e.g. claude-code-router, LiteLLM proxy) that route different requests/subagents to different models? Be clear about what's official vs unofficial. 3. **The Workflow tool:** Workflow scripts call `agent(...)` to spawn subagents. Are those agents always Claude-backed, or can an agent() call target a local/non-Claude model? Can workflow agents call MCP tools and use custom agentType definitions? 4. **The realistic integration pattern:** What's the supported way to have a local LLM do work *inside* a Claude-orchestrated flow? Specifically: exposing the local endpoint as an MCP server tool (so Claude's main loop / subagents / workflow agents can CALL the local model as a tool), vs trying to make a subagent BE the local model. Confirm whether MCP tools are reachable from subagents and from Workflow agents. Give me a clear, accurate summary of what's possible today, flagging anything official vs community/experimental. Cite docs where you can.
-
I'll verify the current state of Claude Code's architecture against official documentation. Let me fetch the relevant docs.
-
- WebFetch https://platform.claude.com/llms.txt 5.1s 1.8 KB
-
- WebFetch https://code.claude.com/docs/en/claude_code_docs_map.md 351ms 2.2 KB
-
Let me get more specific documentation on subagents, workflows, and MCP integration.
-
- Bash curl -s https://code.claude.com/docs/en/mcp-servers.md 2>/dev/null | head -300 6.5s 17.2 KB
-
- Bash curl -s https://code.claude.com/docs/en/workflows.md 2>/dev/null | head -200 4.5s 15.8 KB
-
- Bash curl -s https://code.claude.com/docs/en/subagents.md 2>/dev/null | head -200 2.2s 12.8 KB
-
Now let me get more specific information about model selection in subagents and workflows, and whether custom models are supported.
-
- Read /home/megaproxy/.claude/projects/-home-megaproxy-claude-research-llm-box-uk/9bb35d8e-489b-41f0-b68d-3c3a21e85663/tool-re… 49ms 3.0 KB
-
- WebFetch https://code.claude.com/docs/en/subagents.md 753ms 2.2 KB
-
Let me search for more details on model configuration in subagents and custom backends.
-
- WebFetch https://code.claude.com/docs/en/model-config.md 1.0s 45.5 KB
-
Now let me check for details on MCP tool availability to subagents and workflows, and whether agents can use custom backends.
-
- Bash curl -s https://code.claude.com/docs/en/agent-sdk/overview.md 2>/dev/null | head -300 2.9s 11.8 KB
-
- Bash curl -s https://code.claude.com/docs/en/workflows.md 2>/dev/null | grep -A 20 "agent\|model\|MCP\|tool" | head -150 2.3s 11.8 KB
-
Let me fetch the workflow documentation to see how agents are invoked in workflows and whether they support custom models.
-
- Read /home/megaproxy/.claude/projects/-home-megaproxy-claude-research-llm-box-uk/9bb35d8e-489b-41f0-b68d-3c3a21e85663/tool-re… 29ms 132 B
-
Let me fetch the workflow docs directly to find the agent() function details.
-
- WebFetch https://code.claude.com/docs/en/workflows.md 185ms 20.4 KB
-
Now let me search for the actual workflow agent() API and whether it documents model selection or MCP tool access.
-
- WebSearch Claude Code workflow agent() function parameters model MCP 2026 12.3s 2.9 KB
-
- WebFetch https://code.claude.com/docs/en/workflows-reference.md 448ms 215 B
-
Let me try to fetch the workflow API reference from code.claude.com directly with a different path.
-
- Bash curl -s https://code.claude.com/docs/en/ 2>/dev/null | grep -i workflow | head -20 1.3s 2.2 KB
-
- Bash curl -s https://code.claude.com/docs/en/workflow-api.md 2>/dev/null | head -400 2.4s 4 B
-
Let me check the LLM gateway documentation to see if that's the documented way to use custom models with Claude Code.
-
- WebFetch https://code.claude.com/docs/en/llm-gateway.md 485ms 9.9 KB
-
Perfect. Now let me verify whether MCP tools are accessible from subagents and workflow agents by checking the subagent tools configuration docs.
-
- Read /home/megaproxy/.claude/projects/-home-megaproxy-claude-research-llm-box-uk/9bb35d8e-489b-41f0-b68d-3c3a21e85663/tool-re… 24ms 16.1 KB
-
- Bash curl -s https://code.claude.com/docs/en/subagents.md 2>/dev/null | grep -A 100 "#### Available tools\|### Available tool… 2.0s 4.5 KB
-
Excellent. Now let me look for the "available tools" section that explains which tools subagents can use, including MCP tools.
-
Perfect! Now I have all the information I need. Let me compile a comprehensive, accurate answer based on the official documentation I've fetched. ## Summary: Claude Code Architecture for Custom/Local LLM Integration Based on current official documentation (Claude Code v2.1.154+), here's what's **officially supported** vs **not supported** for running a local LLM with Claude Code orchestration: --- ### **1. Subagent Models — Claude-only, officially** **Answer: Subagents are Claude-only.** - The `model` field in subagent frontmatter accepts these values only: - **Model aliases**: `opus`, `sonnet`, `haiku`, `best`, `fable`, `opusplan`, or `opus[1m]`, `sonnet[1m]` - **Full model names**: Anthropic API model IDs (`claude-opus-4-8`, `claude-sonnet-4-6`, etc.), or provider-specific IDs on Bedrock/Vertex/Foundry - **No custom model support**: There is no documented way to point a subagent at a local OpenAI-compatible endpoint or non-Claude model. The `model` field is hardcoded to resolve against Anthropic's model registry or your configured provider (Bedrock, Vertex, Foundry). - Reference: [Subagents — "Choose a model"](https://code.claude.com/docs/en/subagents.md) and [Model configuration — Available models](https://code.claude.com/docs/en/model-config.md) --- ### **2. Custom/Local Backends for Claude Code Itself** **Answer: Only via official LLM gateway + proxy pattern.** Claude Code **does not natively support** local models or OpenAI-compatible endpoints. However, there is an **official documented path**: **Supported route:** LLM gateway proxy (`ANTHROPIC_BASE_URL` + [LLM Gateway Configuration](https://code.claude.com/docs/en/llm-gateway.md)) - Set `ANTHROPIC_BASE_URL=http://your-local-gateway:port` to route Claude Code requests through a gateway that: - Implements **Anthropic Messages API format** (recommended) at `/v1/messages`, `/v1/models`, `/v1/messages/count_tokens` - OR Bedrock InvokeModel / Vertex rawPredict formats - Forwards headers: `anthropic-beta`, `anthropic-version` - Claude Code can then query the gateway's `/v1/models` endpoint (with `CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1`) to populate the `/model` picker with custom model IDs. - **LiteLLM is the documented proxy example**: [LiteLLM gateway configuration](https://code.claude.com/docs/en/llm-gateway.md#litellm-configuration). You can run LiteLLM Proxy Server locally and route Claude Code → LiteLLM → your local LLM endpoint (e.g., vLLM, Ollama, TensorRT-LLM serving Qwen3-Coder-30B). **Not officially supported:** - Direct `ANTHROPIC_API_KEY` with local OpenAI-compatible endpoints (the gateway must implement the Anthropic Messages format, not raw OpenAI format) - Community tools like `claude-code-router`: not in official docs; **use at your own risk** --- ### **3. Workflow Agents — Claude-only, no explicit model override** **Answer: Workflow `agent()` calls always use Claude; no model selection parameter documented.** From [Workflows documentation](https://code.claude.com/docs/en/workflows.md): - A workflow orchestrates subagents via JavaScript `agent()` calls - **Agents spawned from a workflow inherit the main session's model** unless explicitly configured in the subagent definition's `model` field - **There is no documented `model` parameter to `agent()`** — the function signature is not exposed in the public docs, but agents use their configured model or the parent session's model - All agents in a workflow run on Claude by default: *"Every agent in a workflow uses your session's model unless the script routes a stage to a different one."* The only routing method documented is via subagent `model` frontmatter (Claude-only models). --- ### **4. MCP Tool Availability — Full Access from Subagents & Workflows** **Answer: Yes, MCP tools ARE available to subagents and workflow agents.** - **Subagents inherit all MCP tools by default** (and all internal tools: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, etc.) - **You can restrict MCP tools** per subagent using: - `disallowedTools: mcp__github` → block one server - `disallowedTools: mcp__*` → block all MCP tools - `tools: Read, Bash` → allowlist only those tools (no MCP by default) - **Workflow agents** (subagents spawned from a workflow) have the same tool inheritance, so they can call MCP tools - You can **scope MCP servers to a subagent** with the `mcpServers` field (inline or reference existing servers) Reference: [Subagents — Available tools](https://code.claude.com/docs/en/subagents.md) --- ### **5. The Realistic Integration Pattern** **For a local LLM to work *inside* a Claude-orchestrated flow, expose it as an MCP server tool:** 1. **Make your local model callable via MCP**, not as a model backend: - Build or adapt an MCP server that wraps your local LLM endpoint (e.g., wrapping a vLLM/TensorRT-LLM instance) - Example pseudocode: ```bash # Local LLM endpoint (e.g., vLLM on :8000) vLLM --model Qwen/Qwen2.5-Coder-32B --port 8000 # MCP server wrapping it # (e.g., Node.js server that calls http://localhost:8000/v1/completions) npx -y local-llm-mcp-server --port 8001 ``` 2. **Register the MCP server with Claude Code:** ```bash claude mcp add --transport http local-model http://localhost:8001 ``` 3. **Claude (and subagents/workflow agents) call the local model as a tool**, not as the backbone: - Workflow agents can call the MCP tool: `callLocalModel(prompt)` - Subagents can call it if `mcp__local-model` is in their `tools` list - Main session Claude Code calls it like any other tool 4. **Cost/latency trade-off:** - Claude remains the orchestrator (full reasoning, planning, tool coordination) - Local LLM handles specific subtasks (inference-only workloads you specify) - You pay Claude API for tokens used by Claude; local LLM cost is your own compute --- ### **Official vs. Community** | Feature | Status | Documentation | |---------|--------|---| | Subagent model selection (Claude-only) | ✅ Official | [Model config — Subagent models](https://code.claude.com/docs/en/model-config.md) | | LLM Gateway + `ANTHROPIC_BASE_URL` | ✅ Official | [LLM Gateway config](https://code.claude.com/docs/en/llm-gateway.md) | | LiteLLM proxy example | ✅ Official | [LiteLLM in gateway docs](https://code.claude.com/docs/en/llm-gateway.md#litellm-configuration) | | MCP tool availability to subagents | ✅ Official | [Subagents — Available tools](https://code.claude.com/docs/en/subagents.md) | | Workflow agent `model` parameter | ❓ Undocumented | Not in public API docs | | `claude-code-router` | ⚠️ Community | Not in official docs; verify before using | | Direct OpenAI-compatible endpoint without gateway | ❌ Not supported | N/A | --- ### **TL;DR for Your Use Case** To run Qwen3-Coder-30B alongside Claude Code: **Best supported path:** 1. Run Qwen3-Coder-30B on your LAN (vLLM, TensorRT-LLM, Ollama, etc.) 2. Wrap it in an MCP server or expose an HTTP endpoint that speaks Anthropic Messages API 3. Configure an LLM gateway (LiteLLM or custom) to translate calls to your endpoint 4. Point Claude Code at the gateway: `export ANTHROPIC_BASE_URL=http://your-gateway:4000` 5. Use the local model **as a tool** in workflows/subagents (call it explicitly) rather than trying to make it the backbone Claude remains the main orchestrator; the local model becomes a specialized tool for specific tasks (code inference, focused reasoning, etc.). Sources: - [Claude Code Model Configuration](https://code.claude.com/docs/en/model-config.md) - [Claude Code Subagents](https://code.claude.com/docs/en/subagents.md) - [Claude Code Workflows](https://code.claude.com/docs/en/workflows.md) - [Claude Code LLM Gateway Configuration](https://code.claude.com/docs/en/llm-gateway.md) - [Claude Code MCP Integration](https://code.claude.com/docs/en/mcp-servers.md) -
Verified against current Claude Code docs — here's the precise answer. ## Can a subagent (or a Workflow `agent()`) *be* the box's LLM? No — natively they're Claude-only Subagent and Workflow-agent slots resolve their `model` against Claude's registry (`opus`/`sonnet`/`haiku`/`fable`, full Claude IDs, or your Bedrock/Vertex provider). **There's no native "this subagent runs on my local Qwen" option.** So you can't, out of the box, have a Workflow where stage 1 is Claude and stage 2 is the box-as-a-subagent. ## But you get what you actually want two ways ### Path 1 — local model as an **MCP tool** (recommended, supported, do-it-today) MCP tools **are reachable from the main loop, from subagents, *and* from Workflow agents** (confirmed). So: - Wrap the box's endpoint in an MCP server exposing e.g. `delegate_to_local(task, input)`. - In a Workflow, the orchestrating agents stay Claude, but **any stage can *call* the box** to do the grunt work. That delivers the spirit of your idea — "some of the workflow runs on the box" — the box is a **worker the agents call**, not a subagent slot. Concretely: ```js // Workflow: Claude orchestrates, box executes the bulk const files = await agent("List files needing docstrings", {schema: FILES}) // Claude await pipeline(files.items, f => agent(`Call delegate_to_local to draft a docstring for ${f}, write it to the file`, {phase: 'Draft'}), // Claude agent, but it offloads the writing to the box via MCP f => agent(`Review the box's docstring in ${f}; fix if wrong`, {phase: 'Verify'}) // Claude verifies ) ``` Here the box generates (free, unlimited), Claude plans + verifies. This is the **"Claude drives, offloads to box"** model — runs under your Max subscription. ### Path 2 — **LLM gateway** (officially supported; closest to "a subagent that *is* Qwen") Claude Code officially supports pointing its backend at a gateway via `ANTHROPIC_BASE_URL` (LiteLLM is the documented example). The gateway must speak the **Anthropic Messages API format**, and with model-discovery enabled, custom model IDs show up in the `/model` picker. In principle you can then set a subagent's `model:` to an ID the gateway routes to the box — that's the nearest thing to "a subagent literally backed by Qwen." - Cost: more setup, the gateway has to translate OpenAI↔Anthropic format, and it's gateway-territory rather than a one-liner. - The community **`claude-code-router`** does exactly this per-request routing (e.g. "send haiku-tier calls to the local box") — but it's **unofficial**, so weigh reliability. ## How this ties to the driver question from before - **Path 1 keeps Claude as the driver** (Max subscription) — the box trims your cheap work. This is the robust, supported route, and it's what I'd start with. - **To make the *box* the driver** (your "box drives, calls Claude when stuck" idea) you don't use Claude Code's Workflow at all — you'd run a **local agent harness (OpenCode/Cline) on Qwen**, with an `ask_claude` MCP tool calling the Claude *API* for escalations. Claude Code/Workflow is inherently Claude-driven; OpenCode is the tool for box-driven. ## So, practically for you - **Want Claude's Workflow + subagent orchestration, with the box doing grunt work?** → Path 1 (MCP tool). Fully supported, reachable from workflow agents, runs on Max. Best first build. - **Want the box to actually *drive* and call Claude sparingly?** → OpenCode + Qwen + `ask_claude` escalation (per last turn), *not* Claude Code Workflow. - **Want per-subagent model mixing inside Claude Code?** → LLM gateway (LiteLLM, official) or `claude-code-router` (community) — most powerful, most setup, some fragility. You already build MCP servers, so **Path 1 is a short hop** — a `delegate_to_local` MCP wrapping the box's OpenAI endpoint, and your Workflows/subagents can offload to it immediately. Want me to write that MCP server + a sample offload Workflow as an `ideas/` entry so it's captured and ready to build when the box arrives? -
potentail usage saving using the mcp option?
-
Honest answer, and there's a key nuance that most "just offload to local!" takes miss. ## First: on flat-rate Max, "saving" = headroom, not dollars You're on Max 5× — a flat ~$100/mo with usage *limits*, not per-token billing. So offloading doesn't put money back per token. It **stretches your quota**: ~X% less Claude usage = ~X% more work before you hit 5× limits. That converts to actual dollars **only if it keeps you from upgrading to 20× ($200/mo)** — that avoided upgrade is the real ~$100/mo saving. If you're already comfortably under the 5× cap, the dollar saving is £0; it's just headroom you may not need. ## The nuance that kills naive estimates: task% ≠ token% People assume "I can offload 50% of my tasks → 50% saving." Not so. **The offloadable work is also the cheap work.** Your expensive Claude tokens go into hard reasoning over large context in your big repos — exactly what *can't* be offloaded. The boilerplate/scaffolding you *can* hand to the box was token-light to begin with. So: > Offloading 50–60% of your *tasks* typically removes only ~20–35% of your *tokens*. You shed the many-but-cheap; you keep the few-but-expensive. ## Realistic Claude-usage reduction (with the MCP pattern) | Your session type | Realistic token reduction | |---|---| | Heavy complex coding (openworld-coop/tiletopia) | **~10–20%** (little is offloadable) | | Mixed coding | **~20–35%** | | Bulk/research/docs/hobby-heavy | **~35–50%** | …**before** subtracting the orchestration tax. ## The architecture decides whether you get 35% or 10% Every offload adds *some* Claude tokens (the delegate call + reading the box's result back into context). That tax is small or large depending on how you wire it: - ❌ **Naive** (box generates → Claude re-reads the whole output → reworks it): you pay generation savings back as input tokens. Net saving can collapse to ~5–10%. - ✅ **Disciplined** (box writes directly to files; Claude reads only a diff / summary / pass-fail; output tokens — the ~5×-pricier ones — move to the box): net saving lands at the top of the ranges above. ## Worked example (a typical coding session) Say Claude's tokens split 40% routine / 60% complex. Offload 60% of routine to the box: - Removed: 0.6 × 40% = **24%** of tokens - Add back orchestration overhead: **~5%** - **Net ≈ 19%.** Add offloading your research-summarizing + docs + commit messages → **~25–30%**. ## Bottom line for you - Expect a realistic **~20–30% Claude-usage reduction** with a well-architected MCP offload (files + summaries, not full read-back) across your mixed workload — lower on pure deep-coding days, higher on research/hobby days. - On Max 5× that's **headroom, not cash** — valuable mainly as *insurance against needing 20×* as your projects grow. If you're nowhere near the 5× ceiling today, treat the box's value as capability/privacy/the game project, not subscription savings. - The **20–30%** only holds with discipline; wired naively it's closer to **10%**. The win is in the plumbing, not the hardware. If it'd help, I can sketch the disciplined-offload MCP design (box-writes-to-files, Claude-reads-diffs) so the saving lands at the top of the range rather than the bottom — that's the difference between this paying off and not.
-
if we look at tiletopia project, how much of that could have been offloaded to qwen or similar via mcp
-
Let me actually analyze the tiletopia repo rather than estimate in the abstract.
-
- Bash cd /home/megaproxy/claude/projects/tiletopia 2>/dev/null || { echo "NO REPO"; exit; } echo "=== STRUCTURE ==="; ls -1; e… 3.9s 898 B
-
Tauri app (React/TS + Rust) with an MCP component. Let me read what it actually is and look at the real work breakdown.
-
- Bash cd /home/megaproxy/claude/projects/tiletopia 2>/dev/null echo "=== memory.md (decisions + recent sessions) ==="; sed -n … 246ms 2.1 KB
-
- Read /home/megaproxy/.claude/projects/-home-megaproxy-claude-research-llm-box-uk/9bb35d8e-489b-41f0-b68d-3c3a21e85663/tool-re… 283ms 32.8 KB
-
I read the full history and memory — this is a genuinely hard project (a Tauri tiling terminal multiplexer: PTY management, split-tree layout, an MCP control server, SSH, multi-window, deep Windows/WSL/xterm debugging). That matters, because **tiletopia sits at the *low* end of the offload spectrum** — it's exactly the cross-boundary-debugging-heavy work the box is worst at. Here's the grounded breakdown. ## The work, classified by offloadability to Qwen-via-MCP | Category | ~Commits | ~Token weight | Box-offloadable | Examples from your history | |---|---|---|---|---| | **Housekeeping** | ~40 | ~8% | **~85%** | 18 `memory:` logs, 6 version bumps, README/shortcuts generation, Cargo.lock/pnpm-lock pickups, gitignore, "CLAUDE.md: React not Svelte" | | **Routine features** | ~35 | ~22% | **~55%** | 43 vitest tests for `tree.ts`, clipboard/opener plugin wiring, PowerShell shell option, unicode11 + search addon, zoom, terminal colors, help overlay | | **Complex / architectural / hard debugging** | ~70 | ~70% | **~15%** | the whole MCP v1+v2 server (policy engine, SSH safeguards), Svelte→React migration, the multi-day **context-bar/OSC-7 saga** (shelved as *unsolvable from transcripts*), React reactivity bugs (active-border, direct-DOM workarounds), PTY/SIGWINCH corruption, tokio close-crash, multi-window IPC scoping | ## The estimate Multiplying weight × offloadable and summing gives a **gross ~28–32%**, but subtract the orchestration tax (Claude still drives, reads results back) and the realistic figure is: > **~20–25% of Claude usage offloadable on tiletopia — *if* well-architected (box writes to files, Claude reviews diffs). Naively wired, closer to ~12%.** And critically: **almost all of that saving comes from the housekeeping + routine + drafting buckets, not the work that defined the project.** The token-heavy 70% — the OSC-7 saga, the reactivity bugs, the MCP security surface — is precisely what Qwen-30B can't carry and Claude can't cheaply delegate-and-verify. ## Concretely, what *would* have offloaded well - The **43 unit tests for `tree.ts`** — generate from the known module, Claude spot-checks. - **README / shortcuts-table generation** (you already generate it from `shortcuts.ts` — Qwen drafts it fine). - **Version bumps, lockfile commits** — these shouldn't even touch an LLM; that's a script. - **Boilerplate plugin wiring** — clipboard-manager, opener, PowerShell-as-shell, the xterm addon loads. - **Drafting the `memory:` session logs** — Qwen writes a first pass from the diff, you/Claude refine the hard-won insights. ## What would *not* have offloaded - The **context-bar investigation** — multi-day cross-boundary debugging (Windows host ↔ WSL PID ↔ xterm alt-screen ↔ transcript format) that ended in "fundamentally unsolvable." Pure Claude, and even Claude couldn't fully crack it. - **MCP server architecture + the policy/audit/hard-deny security engine** — milestone-scale, security-sensitive, needs a design doc. - **React reactivity bug fixes** — the non-obvious `activeElement` polling / `queueMicrotask` render-phase / direct-DOM workarounds. - **PTY resize corruption, the tokio close-crash, multi-window pane-transfer IPC** — subtle systems work. ## The takeaway tiletopia is the *honest worst case* for the offload pitch: a debugging-and-architecture-heavy app where **~20% is the realistic ceiling**, concentrated in the cheap/mechanical lane. That's not nothing — it'd have trimmed the busywork and freed your Claude quota for the hard parts — but it confirms the earlier point bluntly: **on your flagship projects the box is a busywork-shaver, not a co-developer.** Where it'd shine instead is a *different* kind of project — the murder-mystery game, content/CRUD-heavy work, research summarizing — where the offloadable fraction is 40%+. Want me to do the same analysis on a contrasting repo (say `openworld-coop` at 250 commits, or `quotebot`) to see whether any of your projects land in that high-offload zone? That'd tell you what the box is actually worth across your whole portfolio.
-
what would happen if i were to offload all of it? Would the llm fall oveR? would it just be junk code?
-
Honest answer: **it wouldn't dramatically "fall over" — and that's exactly the problem.** A clean crash would be *safer*. What you'd actually get is a model that keeps going confidently, produces plausible-looking code, and fails *silently* — so you discover the damage downstream when things mysteriously break. Here's what really happens if you point a solo local model (Qwen-30B via OpenCode, no Claude/human gate) at a project like tiletopia: ## It's a spectrum, not on/off - **The easy 20–40%** (tests, boilerplate, plugin wiring, README, version bumps): **genuinely fine.** Qwen-30B is a competent junior here. Usable code, light review needed. Not junk. - **The hard 60–80%** (the OSC-7 context bar, reactivity bugs, the MCP security engine, PTY/SIGWINCH, the tokio crash): this is where it breaks down — and not by stopping. It: 1. **Produces confident wrongness** — code that compiles, looks right, and doesn't actually work. Then when you say "still broken," it **thrashes**: random changes, band-aids that break other things, removing the wrong line, and frequently **declaring success when it isn't** (weak models are bad at knowing they failed). 2. **Violates your existing architecture** — it has no memory of *why* tiletopia uses the binary-split-tree, the id-swap remount, or the hybrid mutable+replace state model. It'll happily re-introduce prop drilling or break the remount invariant, because those decisions live in your `memory.md`, not in the code's surface. ## The real killer: compounding rot Over 147 commits with no review gate, the subtle regressions **accumulate**. Your own history shows how many tiletopia bugs were subtle and needed deep diagnosis — the active-border reactivity bug, the ghost-cursor, the broadcast no-op, the window-close tokio panic. A solo weak model would have **shipped those bugs and built on top of them.** Commit 50's silent regression becomes commit 80's baffling breakage. The codebase doesn't explode — it slowly stops working while the model insists it's done. And the context-bar saga is the clincher: that took *days* of cross-boundary debugging (Windows host ↔ WSL PID ↔ xterm alt-screen ↔ transcript format) and ended in "**fundamentally unsolvable from transcripts.**" A local model wouldn't have reached that insight — it would have generated five plausible "fixes," each subtly broken, and either thrashed forever or confidently shipped one that didn't work. ## Does it technically "fall over"? Not the machine — but the *session* can effectively deadlock: - **Agentic loops**: it gets stuck repeating a failing edit; OpenCode/Cline have loop guards but they're imperfect. - **Context pressure**: feeding it a big codebase repeatedly means slow prompt-processing at 256 GB/s and KV-cache memory pressure (a real risk on the 64 GB box; fine on 128 GB but slow). It degrades, it doesn't crash. - **False completion**: the most common "failure" is it just *stops and says it's done* when it isn't. ## The mental model It's a **fast, tireless junior with no memory of your project and poor self-awareness about being wrong.** On work within its competence, fine. On tiletopia's hard majority, you'd get a mix of subtly-broken, architecture-violating, and thrashed code — and crucially, **you can't tell which is which without review, and reviewing bad code you didn't write is often slower than writing it yourself.** ## The nuance "Offload all of it" isn't universally doomed — on a **greenfield CRUD app, scripts, or the murder-mystery game's content**, where the work is within competence and errors are cheap and visible, a solo local model can carry a lot. tiletopia is the *worst case*: hard, subtle, cross-boundary, security-sensitive. ## Why this matters for your architecture This is the concrete argument for **why box-drives-escalate-to-Claude needs external triggers, not the model's self-judgment** — it can't be trusted to know when it's stuck. And it's why a **review gate is non-negotiable**: the box generates, but something competent (Claude or you) has to verify before it commits. Offload the work, never offload the *judgment*. That's the line that keeps it from turning into silent rot.
-
what if i were to use z.ai newset glm model. Its a lot cheaper than claude, this could be the brains and the llm box takes up what it can https://z.ai/blog/glm-5.2
-
Let me actually read the GLM-5.2 announcement and check pricing + how it slots into a coding workflow, rather than guess.
-
- WebFetch https://z.ai/blog/glm-5.2 5.0s 376 B
-
- WebSearch Z.ai GLM coding plan price per month Claude Code compatible Anthropic endpoint GLM-5.2 10.4s 2.7 KB
-
- WebSearch GLM-5.2 SWE-bench coding benchmark vs Claude Opus Sonnet real world agentic 2026 12.8s 2.7 KB
-
The blog is JS-rendered (didn't load), but the ecosystem facts are clear. Let me get GLM-5.2's actual coding quality vs Claude before I assess it as "the brains."
-
Now I can give you a real assessment — and honestly, this is the most coherent stack you've floated yet. Let me lay it out straight. ## The facts on GLM-5.2 - **It's genuinely near-frontier.** Reports have it "surpassing Claude Opus 4.7," competitive with the frontier on many coding tasks. GLM-5.1 scored 58.4% on SWE-bench Pro vs Opus 4.8's 69.2%; 5.2 is newer. - **But it trails Opus 4.8 specifically on long-horizon, large-repo agentic coding** — the gaps are on NL2Repo (48.9 vs 69.7), SWE-Marathon (13 vs 26), Tool-Decathlon (48.2 vs 59.9). **That's exactly tiletopia/openworld-coop territory** — the subtle, multi-day, cross-boundary debugging where Claude earned its keep (remember the OSC-7 saga that even Opus called "unsolvable"). - **Unverified where it matters:** Z.ai published *no* benchmarks for 5.2 at launch, and nobody's run it over a real 200K-line repo yet. Treat "beats Opus" with skepticism until third parties test the long-horizon dimension. - **Drop-in for your workflow:** Z.ai has an Anthropic-compatible endpoint — set `ANTHROPIC_BASE_URL` + key and **Claude Code, your CLAUDE.md, slash commands, and your MCP servers all keep working.** So the local-box offload plumbing we discussed works unchanged. - **Pricing:** GLM Coding Plan ≈ **Lite $10 / Pro $30 / Max $80** per month — vs Claude Max 5× at ~$100. Cheaper, and Pro is *dramatically* cheaper. - **Open-weight, but too big for the box.** GLM-5.2 is a huge MoE — it won't fit the 128 GB box at a usable quant. So **GLM = cloud API, box = Qwen/gpt-oss locally.** Your instinct ("GLM brains, box takes what it can") is exactly the right split. ## The smart version: three tiers, routed by difficulty Don't think "GLM replaces Claude." Think **three cost tiers, hardest work to the strongest brain:** | Tier | Engine | Cost | Handles | |---|---|---|---| | **Local box** | Qwen-30B / gpt-oss | free, private | the cheap 20–40% — boilerplate, tests, drafts, bulk, the murder-mystery game | | **GLM-5.2** | Z.ai cloud | ~$30–80/mo | the routine→moderately-hard 50–70% — most daily coding | | **Claude Opus** | Max or API | pay-as-needed | the hardest long-horizon/large-repo work — your flagship debugging sagas | GLM is the new *middle* tier that does the heavy lifting cheaply; the box mops up the bottom; Claude becomes the precision tool you reach for at the ceiling, not your default. ## What you'd actually pay - Drop **Claude Max 5× ($100)**. - Run **GLM Pro $30 or Max $80** as your daily driver. - Keep a **Claude API key** (pay-per-use) for occasional hard escalations — or just keep Claude around for your two flagship repos. - **Box** for offload + privacy + the game (one-time ~£2k). - Net: ~$100/mo → **~$30–80/mo + occasional Claude API**, *if* GLM-5.2 holds up on your work. ## The honest risks 1. **The Opus gap is on exactly your hardest projects.** GLM-5.2 will likely make tiletopia/openworld-coop-class debugging more of a babysitting/thrash exercise. For the cheap-and-routine stuff it'll be fine; for the gnarly sagas you'll miss Opus. 2. **It's unproven on long-horizon work** — no third-party large-repo numbers yet. Vendor "beats Opus" claims are marketing until verified. 3. **Data goes to a Chinese cloud API.** You were drawn to the box partly *for privacy* — note that GLM-as-brains sends your code to Z.ai. The combo actually helps here: route sensitive work to the **local box**, non-sensitive to GLM, and only the hardest (your call) to Claude. ## Recommendation This is worth doing — but **trial GLM-5.2 on a *hard* project before you drop Claude.** Point Claude Code at the Z.ai endpoint, run it on a real tiletopia/openworld-coop task for a week (keep Claude available as the fallback), and measure how often you have to escalate. That tells you whether GLM can be your daily brain or just your middle tier. Regardless of how that goes, the **box earns its place** — it's the free bottom tier in all three scenarios (Claude-brains, GLM-brains, or hybrid). The GLM decision is independent: it's about whether your *cloud* brain costs $100 or $30. This is genuinely a good plan — a free local tier + a cheap near-frontier middle tier + Claude reserved for the ceiling is a smarter cost structure than flat Max. Want me to write up the concrete setup — the `ANTHROPIC_BASE_URL` swap, a model-routing config, and how the local-box MCP slots in — as an `ideas/` entry so it's ready to build when the box lands? Sources: [GLM-5.2 vs Opus 4.8 (llm-stats)](https://llm-stats.com/blog/research/glm-5-2-vs-claude-opus-4-8) · [SWE-bench Pro leaderboard](https://www.morphllm.com/swe-bench-pro) · [GLM-5.2 surpasses Opus 4.7 (Gigazine)](https://gigazine.net/gsc_news/en/20260617-z-ai-glm-5-2-release/) · [GLM Coding Plan pricing](https://codingplan.run/plans/glm-coding-plan) · [Using Z.ai with Claude Code](https://hboon.com/using-z-ai-with-claude-code-for-cheaper/) -
would something like hermes agent help with things on this box
-
"Hermes" is a bit ambiguous — let me make sure I'm telling you about the current version and not guessing.
-
- WebSearch Nous Research Hermes 4 agentic tool use function calling local model 2026 sizes 12.9s 2.3 KB
-
- WebSearch Hermes model roleplay NPC character steerable uncensored vs Qwen function calling local 13.4s 2.5 KB
-
Good instinct — and there are actually *two* Hermes things, both of which run on this box and both relevant: 1. **Hermes models** (Nous Research) — open-weight fine-tunes. The notable one is **Hermes 4 35B-A3B** (a Qwen-based MoE, 3B active), trained on real agentic traces and described as *"the first local model trusted to hold a 100-step research task without derailing."* Runs ~22 GB at Q4 with 128K context — trivially fits your box, fast. 2. **Hermes Agent** (Feb 2026, MIT-licensed) — an open-source **Python agent runtime** with persistent memory, autonomous skill creation, **40+ built-in tools**, and local inference via Ollama/vLLM/llama.cpp. It's model-agnostic and auto-handles the different tool-call formats (Llama vs Qwen vs Mistral). So "would Hermes Agent help" — **yes, for specific things, and it directly addresses two problems I raised earlier.** But it's not a fix for everything. Here's the honest split: ## Where it genuinely helps - **The "box drives autonomously" architecture** — this is the strongest fit. My earlier worry was that a weak local model *thrashes and derails* on long agentic loops. Hermes Agent is purpose-built for exactly that, and the 35B-A3B is specifically tuned **not to derail over long horizons** (the 100-step claim). So as the **local autonomous-worker harness**, it's a more robust choice than pointing OpenCode at a raw model — persistent memory, built-in tools, offline. If you want the box to run background/autonomous jobs and escalate to GLM/Claude when stuck, Hermes Agent is a good backbone for it. - **The murder-mystery game / NPCs** — Hermes models are *renowned* for roleplay, character consistency, steerability, and **reduced refusals** (so villains and dark plot beats don't get safety-refused, which safety-tuned models do). For NPC brains this is arguably the best-fit model family of anything we've discussed. Mix it: Hermes for the suspects' voices, a bigger director model (gpt-oss-120B) for plot adjudication. - **Steerable/uncensored creative work** generally. ## Where it does *not* move the needle - **Serious coding** — Hermes isn't a code specialist. For your offload-coding lane, **Qwen3-Coder-30B still beats it** (it's the one tuned for SWE tasks). Don't swap that. - **The capability ceiling** — it's still a local model in the same tier as Qwen/gpt-oss. Hermes Agent makes the box a *better-organized* worker; it doesn't make it smarter than its weight class. Your hard tiletopia-class work still needs GLM/Claude. ## How it slots into the stack we've built | Role | Tool | |---|---| | Local autonomous-worker **harness** | **Hermes Agent runtime** (persistent memory, 40+ tools, offline) | | Coding offload | Qwen3-Coder-30B | | Game NPCs / roleplay | **Hermes 4** models (steerable, low-refusal) | | Big local reasoning | gpt-oss-120B | | Cloud brain (escalation) | GLM-5.2 / Claude | So: Hermes Agent is the **glue** that turns "a box with some models on it" into "an autonomous local worker that can hold a multi-step task and call out when stuck" — which is precisely the box-drives-escalate pattern you were asking about. It's a better answer to *that* than a naive OpenCode loop. ## Caveats - An autonomous agent with skill-creation and 40+ tools running offline is powerful — **give it guardrails** (sandboxed dir, allowlisted tools, a review gate before commits). The "confident wrongness" risk is reduced by Hermes-4's anti-derail tuning but not eliminated. - It's local-tier quality. Use it for the work *within* that tier (autonomy on bounded tasks, NPCs, bulk) — keep the hard judgment on GLM/Claude. **Net:** worth adopting as the **local agent harness** (the box-drives backbone) and as the **NPC engine for your game** — two real wins. Not as a coding-offload upgrade (Qwen-Coder wins) and not as a ceiling-raiser. Want me to fold Hermes Agent into the `ideas/` writeup as the local-worker runtime — Hermes Agent + Qwen-Coder for offload + Hermes-4 for the game + GLM/Claude escalation — so the whole architecture is captured in one place?