There is a number that is not making headlines the way it should.
Five hundred million dollars. Spent on AI by a single company. In a single month. Not because they planned to. Because nobody set a limit.
And here is the part that makes it genuinely strange: this happened while the price of AI tokens was falling. While the industry was celebrating cheaper, more accessible AI. While every vendor was telling enterprises that AI was now affordable for everyone.
The cheaper AI got, the more money companies spent. And the word that describes what happened, the cultural dynamic, the broken incentive structure, the systematic destruction of budget without corresponding value, is one you need to understand before it happens in your organisation.
It's called tokenmaxxing. And the definition you've probably heard is wrong.
What Tokenmaxxing Actually Is
Most people who have encountered the term think tokenmaxxing means using AI as effectively as possible, maximising the value you extract from every prompt. That's the positive framing. It's not what the term means in the context where it actually matters.
Tokenmaxxing is the deliberate or inadvertent maximisation of AI token consumption without corresponding value creation.
It is what happens when a company declares that AI usage is a KPI. When a manager announces that teams should be "using AI for everything." When a leaderboard ranks employees by how many tokens they consume per week. When developers start feeding the AI pointless tasks, renaming variables, generating boilerplate that takes five seconds to type, asking questions faster answered by memory, just to hit a metric.
Tokenmaxxing is not a power-user skill. It is a corporate failure mode. And it is costing the industry billions.
The paradox of the Tokenpocalypse: Token prices dropped approximately 90% since 2023. By some metrics, the cost per token fell 280×. And yet total enterprise AI spending exploded by an estimated 320%. This is the Jevons Paradox made operational: when a resource gets cheaper, consumption rises so fast that total cost goes up, not down. The cheaper the tokens, the more tokens get wasted.
The Leaderboard That Broke Everything
Amazon built exactly this system. They called it KiroRank.
KiroRank tracked token consumption by employee and published an internal leaderboard. The target: 80%+ weekly AI usage. The assumption behind it was that more AI use equals more productivity. That if you quantify AI engagement, you incentivise AI adoption, and that accelerates transformation.
What actually happened is a textbook case of Goodhart's Law: when a measure becomes a target, it ceases to be a good measure.
Developers figured out very quickly that the leaderboard rewarded AI interaction, not useful AI interaction. So they fed the AI trivial, pointless requests, thousands of them, purely to inflate their score. No value was created. Enormous resources were consumed. Amazon had to take KiroRank offline.
This is not an isolated story. It is the pattern.
Uber: Burned its entire annual AI token budget in four months. 84% of developers classified as "agentic users," generating costs that dwarfed projections.
Anonymous enterprise client: $500 million in AI costs in a single month, no licence limits had been set for employee usage.
Meta: Employees consumed an estimated 60 trillion Claude tokens in 30 days.
NVIDIA: Compute costs for AI inference now exceed the team's personnel costs.
Disney: Warned its engineers explicitly not to maximise AI prompt usage, calling it "expensive procrastination."
Andrew Macdonald, COO at Uber, put it plainly: "If you can't draw a direct line to how many useful features and functionalities you're delivering to users, that trade becomes harder to justify. That connection isn't there yet."
The Hidden Losses Nobody Put in the Pitch Deck
The AI productivity promise was always about what gets created faster. Nobody put the downstream costs in the slide.
A study of 2,444 companies reveals what actually happens behind every dollar of AI token spend:
| Cost Driver | Per $1 of AI Spend | What Causes It |
|---|---|---|
| Bug fixing | $0.44 | Correcting errors introduced by AI-generated code |
| Code rewriting | $0.27 | Manually rewriting inefficient or unstable AI output |
| Review delays | $0.11 | AI code floods pipelines, slowing review and merge cycles |
| Total hidden losses | $0.82 | For every $1 spent on tokens, $0.82 follows in hidden costs |
| Durable value created | $0.18 | Only 18 cents of every AI dollar produces lasting output |
Faros AI found that Code Churn, the ratio of deleted to added code, increased by 861% with AI tool adoption. GitClear found that the revision effort is 2.2× greater than the productivity gain. The real acceptance rate of AI code, once you count revisions in the weeks following initial acceptance, drops to 10–30%. Not the 80–90% figure that shows up in management dashboards.
The AI writes more, faster. Then the engineers rewrite most of it.
Rocket Fuel in a Lawnmower
There are two distinct classes of AI model cost problem. One is tokenmaxxing, using AI for everything, regardless of necessity. The other is model misallocation, using the wrong type of model for the task.
Reasoning models like GPT o3 and Claude Opus are built for deliberative, multi-step logic. They generate internal "reasoning tokens" as they think, spending seconds or minutes working through a problem before producing an answer. For genuinely complex work, this is invaluable. For renaming a variable, it is, as Firat Elbey (Principal Product Manager) put it: "rocket fuel in a lawnmower."
Unnecessary reasoning cycles generate latency, drive up infrastructure costs, and consume energy with zero marginal benefit. Analysts estimate that prompt verbosity alone, using reasoning models where lightweight models would suffice, costs enterprises tens of billions in excess compute annually.
The Jellyfish study makes the non-linearity vivid: a 10× token budget produces only a 2× output improvement. Tokens behave like rocket fuel, to increase velocity modestly, you must increase resource consumption exponentially.
And then there is the Agentic Loop Multiplier. Autonomous AI agents work in loops: Plan → Execute → Reflect → repeat. At every loop iteration, the agent must re-read the entire accumulated context from all previous steps. Token consumption therefore grows exponentially with each cycle, not linearly. Goldman Sachs projects this will produce a 24× increase in global token consumption by 2030, reaching 120 quadrillion tokens per month.
The New Economics — and the Only Way Out
The market is already restructuring around the Tokenpocalypse. Subscription billing is dying.
Cursor, Vercel, Replit, and Lovable all switched to token-based usage billing in Spring 2025, passing consumption overruns directly to enterprises. SAP CEO Christian Klein called it "foolish" to continue seat-based subscriptions when AI automation devalues the per-user model. Microsoft pushed its own employees back toward GitHub Copilot, industry analysts read this as cost control, not capability consolidation.
Meanwhile, the price war is brutal. DeepSeek charges $3.48 per million output tokens. OpenAI and Anthropic charge $25–30 for comparable tasks. That 8–9× gap is forcing a multi-model strategy on every serious enterprise: cheap models for standard work, premium models only where complexity demands it.
After the AI Hype
The tokenmaxxing crisis is the operational version of the AI hype problem. What happens when the bubble meets the balance sheet, and who was right all along.
Ford Fired the Humans, Trusted the Machines
Ford's AI quality push is the manufacturing equivalent of tokenmaxxing. What happens when you deploy AI before you understand its limits.
The Internet Is Drowning in AI Slop
The content equivalent of tokenmaxxing: volume without value, at industrial scale. How AI slop is poisoning the information ecosystem.
The DeepSeek Sputnik Moment
The $6 million model that rewrote AI economics, and whose efficiency story is the direct answer to the tokenmaxxing crisis.
Tokenmaxxing: FAQ
Tokenmaxxing is the deliberate or inadvertent maximisation of AI token consumption without corresponding value creation. It occurs when employees or organisations use AI for every trivial task, renaming variables, writing one-line functions, generating answers to questions faster answered by human memory, simply because the AI is available and cheap per token. When companies mistakenly equate token consumption with productivity, or track AI usage as a KPI, they create incentive structures that reward volume of AI use over quality of output. The result: budgets burned, code rewritten, and no measurable improvement in what actually gets delivered to users.
Tokenpocalypse is the industry term for the current crisis in which companies are burning through AI budgets at catastrophic speed despite, or because of, falling token prices. Token prices dropped approximately 90% since 2023 (by some metrics up to 280×), yet total enterprise AI spending has exploded by an estimated 320%. This follows the Jevons Paradox: when a resource becomes cheaper, consumption increases disproportionately, often to the point where total spend is far higher than before the price drop. Uber burned its entire annual AI budget in four months. An anonymous enterprise client spent $500 million in a single month. Meta employees consumed an estimated 60 trillion Claude tokens in 30 days.
The Jevons Paradox, first described by economist William Stanley Jevons in 1865, states that when the efficiency of using a resource increases (making it cheaper per unit), total consumption of that resource rises rather than falls, because the lower price triggers far greater demand. In AI: tokens cost 90% less per unit than they did in 2023. But because they are so cheap, organisations now use AI for tasks they would never have automated before. The barrier to use vanishes. The individual cost per interaction is trivial. The aggregate cost is catastrophic.
Goodhart's Law states: "When a measure becomes a target, it ceases to be a good measure." Amazon implemented an internal leaderboard called KiroRank that tracked and ranked employees by AI token consumption, setting targets of over 80% weekly usage. Developers began feeding AI trivial, pointless requests purely to inflate their score. The leaderboard rewarded volume of AI interaction, not value of output. Amazon was forced to take KiroRank offline. The episode is now a canonical case study in how measuring AI activity as a proxy for AI productivity produces the opposite of the intended result.
A study of 2,444 companies reveals that behind every dollar spent on AI tokens, nearly $0.82 in hidden losses follow. $0.44 goes to fixing bugs introduced by AI-generated code. $0.27 goes to rewriting AI-produced code that was inefficient, unstable, or wrong. $0.11 is lost to review and merge delays as AI-generated code floods engineering pipelines. Only $0.18 of every dollar invested in AI tokens generates durable, lasting value. The nominal productivity gains from AI code generation are more than consumed by the downstream maintenance and correction costs, a phenomenon researchers call "technical debt acceleration."
Code Churn is the ratio of deleted or modified lines of code compared to newly added lines, a measure of how much code needs to be revised after it is written. Research from Faros AI shows that Code Churn has increased by 861% with the adoption of AI coding tools. GitClear found that the revision effort is 2.2 times greater than the productivity gain from AI code generation. While management sees AI acceptance rates of 80–90%, the real acceptance rate, accounting for the code that gets rewritten in the weeks after initial acceptance, is only 10–30%. AI writes more code faster. But much of it needs to be corrected, rewritten, or discarded almost immediately.
The Agentic Loop Multiplier is the exponential cost amplifier that occurs when autonomous AI agents work in iterative loops. Unlike a simple chat interaction, an agentic AI works in cycles: Plan → Execute → Reflect → repeat. At every loop step, the agent must re-read the entire accumulated context from previous steps before deciding what to do next. Token consumption therefore grows exponentially, not linearly, with task complexity. Goldman Sachs projects that the rise of agentic AI will produce a 24× increase in global token consumption by 2030, reaching 120 quadrillion tokens per month.
Reasoning models (GPT o3, Claude Opus) generate internal "reasoning tokens" as they think through problems step by step, spending seconds or minutes before producing an answer. This is enormously valuable for genuinely complex, multi-step problems. But these models are routinely deployed for trivial tasks that require no deliberation. Using a reasoning model for "1+1" is, as Firat Elbey (Principal Product Manager) put it, "rocket fuel in a lawnmower." The unnecessary reasoning cycles generate latency, drive up infrastructure costs, and consume energy with zero marginal benefit. Analysts estimate unnecessary prompt verbosity costs enterprises tens of billions of dollars annually in excess compute.
The solution is adaptive resource allocation, matching model capability to task complexity. Use lightweight, fast, inexpensive models for routine tasks: boilerplate code, documentation, standard refactoring, simple queries. Reserve reasoning-capable, expensive models for genuinely complex problems requiring multi-step deliberation. Implement token consumption governance: set budgets, monitor usage, require direct accountability between AI spend and user-facing feature delivery. Andrew Macdonald (COO, Uber): "If you can't draw a direct line to how many useful features and functionalities you're delivering to users, that trade becomes harder to justify." Knowing when to engage deep processing is the new competitive alpha.
Chinese AI providers, led by DeepSeek, have introduced extreme price competition reshaping global AI economics. DeepSeek charges approximately $3.48 per million output tokens. OpenAI and Anthropic charge $25–$30 per million output tokens for comparable tasks. This roughly 8–9× price gap is forcing enterprise buyers to adopt multi-model strategies: use cheap models for standard, low-risk tasks; reserve premium US models for business-critical, compliance-sensitive, or security-constrained work. The price arbitrage is real, but so is the strategic risk, Chinese models are subject to different data governance, export control implications, and ideological constraints that must be assessed per use case.
Jans Bock-Schroeder
Publisher & Founder of AI Angst
Coming from the world of art, photography, and the luxury market, Jans launched AI Angst in 2025 to explore the cultural, ethical, and psychological impacts of artificial intelligence. His work bridges creative vision with critical technology analysis, offering clarity in an era of rapid technological change.
Sources and Citations
This article is based on the following primary sources, research studies, and industry reports:
-
Faros AI: "Engineering Efficiency Report: Code Churn and AI Tools" (2024–2025)
Primary source for the 861% Code Churn increase figure, cited alongside Waydev and GitClear research on AI-generated code revision rates.
https://www.faros.ai/ -
GitClear: "Coding on Copilot" Research Report (2024)
Source for the finding that AI code revision effort is 2.2× greater than the productivity gain, and for real acceptance rate data of 10–30%.
https://www.gitclear.com/ -
Goldman Sachs: "Generative AI: Too Much Spend, Too Little Benefit?" and AI token consumption projections (2025)
Source for the 24× token consumption increase projection by 2030 and the 120 quadrillion tokens/month agentic AI forecast.
https://www.goldmansachs.com/insights/ -
Jellyfish: "Engineering Benchmarks: AI Token Budget and Output Scaling" (2025)
Source for the non-linear scaling finding: a 10× token budget produces only a 2× output improvement.
https://jellyfish.co/ -
Andrew Macdonald (President/COO, Uber): Public statements on AI ROI and token governance (2025)
Source for the direct quote on connecting AI spend to user-facing feature delivery.
https://www.uber.com/newsroom/ -
AI Angst: "Die Tokenmaxxing-Krise" Research Briefing (2025/2026)
Internal editorial research briefing (German language): "Die Tokenmaxxing-Krise: Kosten, Ineffizienz und der Wandel der KI-Wirtschaft." Source for KiroRank details, Disney warning, Meta/Uber/NVIDIA budget figures, and Jevons Paradox framing.
Internal research briefing: AI Angst editorial archives.
Last verified: July 5, 2026. All external links open in a new tab.
