Model Routing Will Control the Future of AI Economics

_{Guest contributor: Tomás Hernando Kofman is the CEO and co-founder of}_{Not Diamond}_{, the world’s largest vendor of intelligent model routing, powering auto-routing in OpenRouter and working with global Fortune 500 enterprises like SAP.}

TLDR: Intelligent model routing involves choosing the most cost-efficient model for a given prompt (regardless of your AI gateway). Routing is a challenging problem because you need to get it right over and over as models ship weekly, their pricing changes and harnesses evolve. Getting this right is critical as companies look to tame their AI budgets.

I’ve been speaking with a lot of Fortune 500 executives, and they are nervous.

Annual budgets and multi-year contract commits are getting consumed in months, individual developers are spending thousands of dollars per day on coding agents, and public markets are beginning to question whether it’s acceptable to pay $100 for an internal powerpoint presentation.

Marc Benioff, Aaron Levie, Uber, ServiceNow, and whoever spent half a billion dollars last month—they’re all on the rocket ship.

People are swinging between “what the fuck is happening” and “don’t make it stop”. Because if they stop, every competitor is going to lap them.

The good news, and the bad news, is that it is not going to stop. We have not even scratched the surface on how much money is going to get sucked into AI, nor how much is going to come back out.

Because the crazy thing is that most of the anxious F500 leaders I’m talking to are still only spending <$1000 a month per individual engineer on coding agents. Meanwhile, people like Pete Steinberger are personally spending $1M / month. Long-horizon tasks, sub-agents, parallelization, and expanding skill and adoption curves amongst developers means the ceiling is so much higher than most leaders even realize. And the better the models get, the more autonomous, the more we will spend.

Costs are not only going to grow in coding agents, which today represent more inference spend than every other AI category combined. The next two years will see non-coding use cases blow up as well.

A global cloud security company I’m working with is spending tens of millions of dollars per year on coding agents, but they’re spending even more on the AI-powered incident investigation product they make available to their customers. Other companies I’m working with are similarly seeing product spend eclipse coding agent spend.

Then think about what will happen when consumer agent usage fully unlocks. You won’t need to be a power user to spend a shit ton of money on AI. You will just need your AI agent to be a power user to spend a shit ton on AI.

In any domain, solving an overspending problem requires us to be able to understand where the cost is coming from and what the true value we’re receiving is relative to the market. We overspend only when the value we receive is less than what we could have gotten from an equivalent supplier.

In AI, the source of spend is inference across the various AI models available at any given time, and the value we receive is intelligence. To systematically solve the AI cost problem then, we need to be able to optimize the relationship between the models we’re invoking and the intelligence of their outputs. We need to be able to intelligently select the right model at the right time. This requires us to understand not only how much a token costs, but also how much intelligence itself costs.

A token is not a unit of value. It is no better a measure of value than word count is an effective measure of the value of a book. A token is a container for value, in which data is acted on through computational resources to produce intelligence. Intelligence is the unit of value; computation is the price. For the first time in history, we have a direct relationship between capital and intelligence. But what is intelligence, and how do you meter it? As the economic inputs and outputs of AI begin to explode beyond all human proportion, it is no longer enough to simply meter tokens.

Intelligent model routing predicts when to use which model on each input to optimally produce intelligence. It is not deterministic model routing, which sends tokens to one model or another based on hand-written rules. Intelligent model routing allows us to optimize the relationship between computation and intelligence.

Every engineer knows that we don’t need to use Opus 4.8 max for writing commit messages; these are one of the simplest tasks in any development workflow. Yet we personally default to doing so anyways because the marginal downside of the token cost is so low while the convenience and assurance of quality is so high. This continuous marginal waste, aggregated across the entire economy, amounts to billions of dollars poured every day into nothing more than the warming of circuits in data centers, and by extension, the planet.

The problem is that solving intelligent model routing is enormously difficult. How do you predict the intelligence of an output with less intelligence than it costs to produce it?

It’s not enough to just predict the right model at each turn, which is itself extremely challenging. Choosing the wrong weak model at the wrong time can lead to even more spending down the line as more powerful models have to go back and fix the mistakes. You also need to select the right reasoning effort, monitor the KV cache to avoid costs from switching models at the wrong time, and link these decisions together over long horizons with sparse rewards.

So while the premise of intelligent model routing is simple, I have never spoken with a team that has tried to build this internally and succeeded.

What’s worse is that it is not enough to solve this problem once: you have to re-solve it every week as new models are released, harnesses evolve, and pricing updates. And all the while, you need to maintain the quality of the frontier models without any degradation in user experience. Nobody likes buying the budget option. We want the best.

A few years ago, I made a bet that the future of AI would not be dominated by a single model. Since then, my company Not Diamond has been laser-focused on solving the problem of intelligent model routing, growing from our release of the first open-source model router to becoming the world’s largest provider of intelligent model routing, powering auto-routing in OpenRouter and working with global F500s like SAP. But we are still in our earliest days.

As AI eventually grows to becomes fully synonymous with the economy, I believe that intelligent model routing will become the control theory for the mediation of all economic value. Put simply, control theory is the discipline of algorithmically governing dynamical systems to achieve desired outputs. Similarly, I believe algorithmic model routing will give our society the ability to govern when, how, and why we use AI, promote a diverse market of providers, shift the locus of power from labs to consumers, and significantly reduce the ecological impact of AI while maximizing its effectiveness.

We are working with many of our customers on a soon-to-be released version of our product purpose-built for coding agent workloads, achieving significant cost reductions while maintaining Opus 4.8 max quality, as measured on benchmarks and real usage / developer productivity metrics. We have an early access program for select developers and F500 enterprises. If you want to work with me please email [email protected].

Tomás Hernando Kofman is the CEO and co-founder of Not Diamond, an AI infrastructure company pioneering intelligent model routing. A mathematician and entrepreneur, he focuses on helping enterprises optimize AI performance, cost, and reliability in the rapidly evolving multi-model ecosystem.

Model Routing Will Control the Future of Economic Value

Reply