OpenRouter's Fusion API brings ensemble LLMs to production
OpenRouter launched a Fusion API that combines outputs from multiple LLMs at inference time, and the HN thread is genuinely interested. The practical use cases people are proposing are narrow but compelling: writing markdown specs before coding starts, analyzing them for gaps, tasks where the token count is low enough that throwing multiple models at it is affordable. Someone also raised the idea of running Fusion across local machines to get better local AI by combining token streams.
The pattern: developers are no longer treating LLM selection as a binary choice. The question is shifting from 'which model do I use' to 'which combination of models, weighted how, for which task'. Rio de Janeiro's municipal government released a 397B parameter model that turned out to be a merge of existing open-weight models, and the HN comment asking 'can we just merge all open-weight models and get something better' got real engagement, not dismissal.
Both threads point at the same emerging practice: model merging and ensemble inference are moving from research curiosity to production technique. The tooling is catching up with the idea.
So what?
If you are building on top of LLMs, the cost curve for ensemble approaches is about to get more favorable as OpenRouter and similar providers commoditize the routing layer. For narrow, high-stakes tasks like spec review, contract analysis, or code verification, running two or three models and taking the consensus output is now a legitimate production pattern, not a research experiment. Factor this into your model cost assumptions.