Infrastructure June 29, 2026 neutral ⇧ 395 pts across 2 threads

Infrastructure performance gaps hiding in plain sight

The NUMA architecture thread today is a reminder that most production systems have performance problems their operators don't know about. The specific example: a Go-based LLM gateway deployed in Kubernetes on a many-core server, without GOMAXPROCS set, with the Go runtime scheduling goroutines across NUMA nodes and wrecking latency. The thread notes that NUMA is 'invisible' because unless you already know to look for it, the slowness just looks like normal slowness.

The historical memory pricing thread adds context: DDR5 pricing has rolled back to roughly 2010 levels in real terms, but every app now runs in a browser instance, so the gains get eaten by abstraction layers. More memory doesn't fix a scheduling problem.

These threads together point at a class of performance issues that are invisible at small scale, only visible under production load, and require expertise most teams don't have on staff.


So what?

If you're running LLM inference or any high-throughput workload in Kubernetes on modern many-core hardware, check your NUMA configuration and GOMAXPROCS settings before you assume you need more compute. The gains from fixing a NUMA mismatch can be larger than the gains from scaling horizontally.

Read these