Infrastructure June 2, 2026 bullish ⇧ 507 pts across 3 threads

Wasted GPU capacity is a real business now

Expanse, a YC P26 company, launched on HN today with a pitch built around one specific fact: data centers run at 30-40% effective utilization. Their product tries to unlock that idle capacity. The thread got into the weeds on why data centers do not just pass the savings to customers directly, and the answer is basically that tiered pricing is hard to build and burst demand is unpredictable.

This sits alongside the Groq thread, where fast inference is being positioned as a premium worth paying for. The through-line is that the GPU market is bifurcating. On one end, you have companies trying to arbitrage underutilized capacity for cheaper compute. On the other, you have companies charging a premium for speed and reliability. Both are real businesses, but they serve different customers. The cheap-compute play is for batch workloads and training runs. The premium-inference play is for latency-sensitive production applications.

Also relevant: OpenAI's Codex is now on AWS, and the HN thread noted it is more expensive than going direct to OpenAI. The reason people will pay: enterprise procurement. Getting a new vendor approved at a big company can take months. If AWS is already approved, the markup is worth it. This is the infrastructure layer of AI becoming boring in the best way.


So what?

Founders building AI products need to think about compute costs as a first-class business variable, not an afterthought. The price gap between cheap burst capacity and premium inference is widening, and your product architecture should reflect which bucket you are in. If you are using AWS to access OpenAI because procurement is easier, you are paying a tax that a smaller competitor may not be paying.

Read these