- Pascal's Chatbot Q&As
- Posts
- Token unit costs are down, but total compute bills are up, because users demand newer, more powerful models, and these consume far more tokens to complete increasingly complex tasks.
Token unit costs are down, but total compute bills are up, because users demand newer, more powerful models, and these consume far more tokens to complete increasingly complex tasks.
Moreover, flat-rate pricing strategies, designed to fuel growth (e.g., $20/month subscriptions), are now economically suicidal for AI startups as power users orchestrate compute-heavy operations.
The Rising Cost of Tokens — Implications for Content and Rights Owners and the Wider Economy
by ChatGPT-4o
Ethan Ding’s provocative analysis, “Tokens are getting more expensive,” dissects a foundational problem in the current AI economy: the unsustainable economics of subscription-based language model use amid exploding token consumption. While it’s primarily aimed at startup founders and venture capitalists, the essay contains powerful insights with wide-reaching consequences for content and rights owners — and for any industry facing disruption by generative AI.
Below, I will extrapolate the implications for scholarly publishers, media companies, software vendors, educational platforms, and regulators, while evaluating the sustainability of AI’s current business models and proposing concrete recommendations.
I. Summary: What’s Happening?
At the heart of Ding’s thesis is a brutal reality: while the cost per token of large language models (LLMs) like GPT-3.5 has indeed dropped, the number of tokens consumed per task has exploded exponentially.
This results in a paradox:
Token unit costs are down, but
Total compute bills are up, because users demand newer, more powerful models, and these consume far more tokens to complete increasingly complex tasks (e.g., autonomous multi-agent workflows).
Moreover, flat-rate pricing strategies, designed to fuel growth (e.g., $20/month subscriptions), are now economically suicidal for AI startups. Power users orchestrate continuous, compute-heavy operations that no pricing tier can sustainably cover.
The so-called "token short squeeze" emerges when compute costs vastly outpace revenues — leading to a collapse of the business model. Ding shows that even sophisticated attempts like Claude Code’s $200/month “unlimited” plan failed. Startups are caught in a Prisoner’s Dilemma: usage-based pricing would ensure sustainability, but consumers hate it — and competitors who offer flat rates win market share temporarily until they burn out.
II. What It Means for Content and Rights Owners
1. A Shift from Licensing Access to Licensing Consumption
Content owners have long focused on licensing “access” — e.g., access to a journal, an image archive, a music catalog. But AI changes the economics entirely. Models ingest and utilize content through compute-heavy operations (inference), and monetization needs to reflect how intensively content is used — not just whether it’s accessed.
Just as tokens are metered and charged in usage-based pricing, so too must content consumption become metered and monetized. This points to a future where:
Licensing agreements are per-token or per-inference-based,
Heavier model use (e.g., deep summarization, translation, or knowledge extraction) commands higher royalties,
Rights owners demand visibility into how much of their content is burned during AI inference.
2. Unviability of Flat-Rate Models for Licensed Content
Just as AI startups can’t sustainably offer unlimited compute, content aggregators, publishers, and digital platforms will also struggle to offer flat-rate access to AI tools trained on their content. Licensing models must evolve away from “all-you-can-eat” subscriptions — particularly when AI applications are integrated into enterprise workflows.
If publishers continue to license large training corpora for a flat fee without accounting for real-time inferencing usage, they will eventually be underpaid relative to the true economic value their content enables.
3. The Risk of Model Cannibalization
Newer models become the de facto standard overnight, rendering older ones obsolete — much like yesterday’s newspaper. This dynamic creates a hyper-accelerated obsolescence cycle where training models on yesterday’s content has diminishing returns, while rights owners continue to bear the long-term cost of widespread availability of their older works.
This suggests rights owners may want to:
Place stricter temporal limits on AI training licenses (e.g., expire after 12 months),
Require renewals or re-licensing for newer model iterations,
Distinguish between training for R&D purposes vs. commercial deployment in frontier models.
III. Wider Consequences Across Industries
A. Enterprise Software & SaaS
SaaS providers bundling AI services (e.g., GitHub Copilot, Notion AI) face a reckoning. The core challenge: every improvement in AI quality demands more compute, which directly inflates cost. They will either:
Move to usage-based pricing (which enterprise buyers may tolerate), or
Try to own the stack end-to-end (vertical integration, à la Replit).
Expectations that AI will "automate everything" at low marginal cost are colliding with a far costlier reality — AI services scale in compute, not efficiency.
B. Education & Research Platforms
Higher education institutions that deploy AI for tutoring, writing assistance, or research automation must prepare for escalating costs. The “per student license” model may not be viable for AI agents running for hours per query.
Publishers and universities will need shared frameworks for:
Cost attribution per user/per token,
AI agent usage limits,
Cost-sharing for heavy use scenarios (e.g., research simulations).
C. Media & Entertainment
If token consumption is tied to creative outputs (e.g., screenwriting, dubbing, animation prompts), then rights owners will want tiered pricing schemes:
Low-intensity AI use (e.g., story ideation),
High-intensity AI use (e.g., full scene generation).
Media firms may demand real-time telemetry into how their IP is used in generative pipelines — not just at the training stage.
AI economics are rapidly concentrating power. Ding’s piece implies that only vertically integrated or hyperscale companies can survive the token short squeeze. This should raise antitrust red flags, as:
Small AI startups die out due to unsustainable economics,
Only players with massive infrastructure (e.g., Amazon, Microsoft, Google) can bundle loss-making inference with profit-making infrastructure or apps.
Regulators must scrutinize AI pricing practices and cross-subsidization strategies to ensure competitive fairness.
IV. Do I Agree With Ding?
Yes — emphatically. Ding’s economic framing is brutally honest and well-supported. His critique lands because it exposes a fundamental tension:
Consumers want state-of-the-art AI at low predictable prices,
Companies cannot offer both without going bankrupt.
This is not just an AI startup problem. It’s a macroeconomic challenge that will ripple across all AI-consuming industries. Unless pricing, rights licensing, and user expectations evolve in sync, an economic crisis in AI service delivery is imminent.
V. Recommendations for Stakeholders
📌 Content and Rights Owners
Shift toward per-token or usage-based licensing models.
Demand transparency into AI agent usage and content consumption.
Set expiration dates for training rights; require relicensing for model upgrades.
Avoid one-off licensing deals that enable unlimited downstream exploitation.
📌 AI Startups
Abandon flat-rate pricing as a core strategy.
Focus on vertical integration or enterprise stickiness to maintain margins.
Embrace usage-based pricing where tolerable (e.g., in B2B).
📌 Regulators
Monitor hyperscaler behavior for anti-competitive bundling.
Require cost disclosure standards in AI pricing.
Incentivize infrastructure-level competition and AI middleware neutrality.
📌 Investors
Prioritize capital-efficient AI startups with defensible infrastructure or differentiated models.
Question flat-rate ARR metrics when usage costs are opaque or trending upward.
Understand the cost-to-serve ratio at high usage levels — especially in agent-based startups.
VI. Conclusion
As Ding concludes with a note of techno-irony: yes, the models will be 10x cheaper next year — but users will expect 20x more. The AI economy is caught in an arms race of capability and consumption, where unit cost improvements are outpaced by complexity and demand.
Unless content owners, startups, and regulators rethink how value is created, priced, and protected, AI’s golden age risks becoming an economic implosion. The answer lies not in cheaper tokens alone, but in smarter, more equitable models of value capture — for every player in the AI ecosystem.
