Local AI models: real progress, real limits
A busy Ask HN thread on replacing Claude and GPT with local models for daily coding drew dozens of responses. The honest summary from the thread: local models are genuinely viable for some workflows but not yet a clean replacement for cloud. Qwen 3.6 27B dense gets compared favorably to Claude Haiku 4.5, and people running RTX Pro 6000 Blackwell GPUs get 160 tokens per second on DeepSeek V4 Flash. But on Apple M4, token speeds lag cloud offerings noticeably, and the hardware investment for serious local inference is non-trivial.
The pattern across the thread is that local AI has crossed from 'toy experiment' to 'real option,' but mostly for people with specific hardware setups or tolerance for slower responses. Antirez's ds4 project for Strix Halo is getting watched closely. The homelab AI dev platform post reinforces this, with someone building a full local inference stack at home and getting it to the HN front page.
The gap is still throughput and enterprise tooling. Multiple commenters note there is no good tooling for model selection and orchestration outside the cloud providers. OpenRouter gets mentioned as a middle ground, but pure local remains a hobbyist or power-user story for now.
So what?
If you are building dev tools or AI-assisted products, local model support is becoming a feature request, not a fringe ask. Founders in the inference infrastructure space have a real opening on the enterprise tooling layer: nobody has solved model selection, routing, and management for local deployments at a professional level yet.