- Pascal's Chatbot Q&As
- Posts
- Safety and integrity becomes structurally subordinate when its success metrics (reduced harm, reduced exposure) collide with the company’s growth metrics (time spent, shares, comments, ad inventory).
Safety and integrity becomes structurally subordinate when its success metrics (reduced harm, reduced exposure) collide with the company’s growth metrics (time spent, shares, comments, ad inventory).
In that environment, “harm” isn’t a bug; it becomes a predictable by-product of the optimization target.
The Rage Dividend: How the Algorithm Arms Race Turned Outrage into a Business Model—and Why AI Will Inherit the Same Failure Mode
by ChatGPT-5.2
The BBC’s reporting—based on testimony and documents from more than a dozen whistleblowers and insiders—describes an “algorithm arms race” triggered by TikTok’s explosive growth and the scramble by rivals (notably Meta) to match its engagement performance. The core allegation is not simply that harmful content exists online, but that after internal research confirmed outrage reliably drives engagement, product and leadership choices effectively tolerated, enabled, or failed to prevent the rise of “borderline” harmful material in feeds—because it helped retention, watch time, and revenue. The article also surfaces a second, less discussed dynamic: safety and integrity work becomes structurally subordinate when its success metrics (reduced harm, reduced exposure) collide with the company’s growth metrics (time spent, shares, comments, ad inventory). In that environment, “harm” isn’t a bug; it becomes a predictable by-product of the optimization target.
What emerges is a picture of modern platforms as incentive machines: recommendation systems trained to maximize engagement signals; organizations arranged so the growth engine has veto power over brakes; and public messaging shaped to preserve legitimacy while internal teams live with the moral residue of decisions made upstream.
What the whistleblowers are really saying (beyond the headlines)
The story’s most consequential claim is that these companies weren’t merely surprised by harmful outcomes—they had internal evidence of the mechanism (“outrage drives engagement”) and still made choices that increased exposure to borderline content. That matters because it reframes the debate from “content moderation is hard” to “optimization goals are misaligned with user wellbeing, and the misalignment is known.”
Most surprising, controversial, and valuable statements and findings
Surprising
“Borderline” harmful content was reportedly allowed to rise to compete with TikTok—explicitly linked to business pressure. A Meta engineer describes being told to allow more borderline content (including misogyny and conspiracy theories) in feeds because the stock price was down and competition was fierce.
Reels scaled fast while safety capacity lagged. One former senior Meta employee claims Reels growth got major resourcing (hundreds of staff) while safety teams were denied relatively small headcount requests for child protection and election integrity—suggesting organizational priorities were set by competition, not risk.
Internal Meta research allegedly framed the platform as “fast-food” for users. One internal study (as described in the BBC report) warns that Facebook’s incentives and algorithms offer creators a “path that maximizes profits at the expense of their audience’s wellbeing,” and that the financial incentives created by the algorithms did not appear aligned with Meta’s mission.
A TikTok trust-and-safety employee showed internal dashboards implying political cases could outrank child-safety complaints. The reporting describes internal case prioritization where a politician mocked as a chicken was treated as higher priority than reports involving teenagers (including sexualized images and impersonation), raising questions about what the system is optimized to protect first.
“We have no control of the deep-learning algorithm in itself.” A former TikTok ML engineer is quoted describing the recommendation engine as a difficult-to-scrutinize “black box,” and says engineers treat content as IDs while relying on safety teams to remove harmful posts—an “engine vs brakes” separation that can become a governance loophole.
Controversial
Large-scale experiments on users “who often had no idea.” A senior Meta researcher describes running massive experiments (up to hundreds of millions of people) testing feed ranking—fuel for the argument that platforms treat societies as live A/B testing environments with limited informed consent norms.
The implicit trade-off: “protecting people” vs “engagement.” The reporting describes a “common trade-off” logic—suggesting safety is not a hard constraint but a variable negotiated against growth targets.
Leadership posture allegedly shifted from introspection to defensiveness. A former insider describes a period of genuine reflection that “calcified into… defensiveness,” with a view that the company is not responsible for polarization—an organizational narrative that can justify inertia even when incremental changes could reduce harm.
Valuable
The key insight isn’t content; it’s incentives. The article repeatedly points to the “outrage → engagement” loop and how algorithmic ranking interprets disproportionate engagement as preference (“users like it”), producing a feedback system that can normalize extremity and desensitize users.
The “separation of duties” fallacy in safety engineering. “We’re the engine; someone else is the brakes” works only if the brakes are adequately staffed, empowered, and integrated—and if the engine is designed to respect braking constraints. The account suggests the opposite: brakes can be understaffed, overridden, or measured by the wrong success criteria.
Political risk management can distort safety priorities. If a company believes it must “maintain a strong relationship” with politicians to avoid bans or regulation, safety triage can become reputational/regulatory triage—protect the business first, users second.
Why would tech companies behave this way? Motivations and organizational logic
This isn’t best explained by “evil executives,” even if some decisions are ethically indefensible. It’s better explained as a stable equilibrium produced by four interacting forces:
Revenue mechanics reward attention, not wellbeing. Ad-based platforms convert minutes and interactions into money. Outrage is cheap fuel: it spikes comments, duets, shares, and repeat checking. When internal research confirms that, the system’s “rational” response—under shareholder pressure—is to lean into whatever keeps attention high.
Competitive panic collapses safety margins. The reporting depicts TikTok’s weekly iteration cadence and Meta’s urgency to “catch up,” hunting for marginal gains (“2%, 3% revenue next quarter”). In that environment, safety becomes “drag” unless it is enforced as a non-negotiable constraint.
Internal power asymmetry: growth teams hold the steering wheel. Safety teams often need approvals from product orgs whose bonuses and prestige depend on engagement growth. If the growth org can veto safety features (or slow-walk them), safety loses structurally—regardless of individual intent.
Legitimacy management: public trust vs internal reality. The article contrasts insiders’ accounts with public statements. That gap is not accidental; it is part of risk containment. Admitting the system is optimized toward outrage invites regulation, lawsuits, reputational damage, and advertiser exits. So the default posture becomes: “we invest heavily in safety,” “claims are fabricated,” “we’re a mirror,” “we have policies.”
In short: once a platform’s core metric is engagement, and competition is brutal, the system evolves toward whatever reliably increases engagement—even if it corrodes the social substrate.
Spillover: how this approach will infect AI development and deployment
The article is about social feeds, but the pattern is bigger: optimization targets + black-box models + institutional incentives + weak external accountability. That is exactly the shape of many AI deployments now forming.
1) “Engagement” becomes “usage,” and harm becomes “externality”
In AI products, the analogous metric is not likes—it’s daily active users, tokens consumed, task completion, subscription retention, or enterprise seat expansion. If the business rewards growth, then safety becomes negotiable unless it is enforced as a hard constraint. Expect pressure to:
relax guardrails that reduce “helpfulness,”
permit more “borderline” outputs that drive virality or utility,
shift liability downstream (users, customers, integrators).
2) The “engine vs brakes” organizational split repeats—at higher speed
Just as TikTok’s ML engineer described treating content as IDs and relying on safety teams, AI teams can treat outputs as statistical artifacts while “policy” teams handle harm. But in AI, the brakes are harder:
outputs are generated in real time,
misuse is adaptive,
model behavior is emergent and can change with updates, prompts, and integrations.
If safety is bolted-on, you get the same failure mode: the engine is optimized to accelerate; brakes are under-resourced, under-powered, and blamed after the crash.
3) The black-box excuse becomes institutionalized
“We have no control of the deep-learning algorithm” is a warning flare. In AI deployment, organizations may increasingly normalize:
“We can’t fully explain why the model did that,”
“The model is probabilistic,”
“It’s an edge case,”
“We’ll patch it in the next release.”
But as AI systems become embedded in hiring, health, education, policing, and financial services, this posture becomes socially and legally intolerable. Still, the incentive to ship first and apologize later is strong.
4) Political relationship management will shape AI safety priorities
The TikTok allegations about prioritizing political cases over child harm point to a broader risk: AI providers will prioritize what threatens market access and regulation over what harms users. For AI this could manifest as:
prioritizing government and enterprise demands (surveillance, content influence, defense, censorship compliance) over end-user protections,
“safety” tuned to reputational risk rather than real-world harm,
uneven protections by geography (stronger where regulators bite, weaker elsewhere).
5) Experimentation on societies scales from feeds to institutions
A/B testing ranking is one thing; A/B testing AI decision-support inside public services is another. But the logic is similar: rapid iteration, opaque trade-offs, limited consent. The spillover risk is the normalization of continuous experimentation on the public—in welfare systems, classrooms, immigration workflows, news distribution, and clinical triage—without robust democratic oversight.
How to prevent the negative consequences—and what each stakeholder must do
If the diagnosis is “misaligned incentives + weak constraints,” then prevention requires structural interventions, not just better PR or another trust-and-safety reorg.
1) Redesign incentives: make safety a hard constraint, not a soft goal
Companies must treat safety thresholds like aviation tolerances: if you can’t meet them, you don’t ship—or you degrade functionality until you can. That requires:
executive compensation tied to measurable harm reduction, not just growth,
internal “stop-ship” authority for safety teams,
budgets that scale with exposure and risk, not with media pressure.
2) Force transparency where it matters (without leaking trade secrets)
Regulators should require:
meaningful algorithmic impact assessments (how ranking/recommendation affects harms),
independent audits of prioritization systems (what gets escalated, what gets deprioritized),
disclosure of key safety-relevant metrics (prevalence of harmful content, response times, repeat exposure rates),
clear documentation of major ranking or model changes and expected risk deltas.
3) Establish liability regimes that price the externality
If harm is cheaper than safety, harm will persist. Lawmakers can change the economics by:
imposing penalties for systematic failure to address foreseeable harms (especially involving minors),
requiring duty-of-care standards for recommender systems and AI decision tools,
enabling private rights of action in narrowly defined high-harm categories (e.g., sexual exploitation, blackmail facilitation, terrorism recruitment, medically dangerous advice).
4) Build real governance inside companies: separation, power, and evidence
“Trust and safety” cannot be a service desk. Platforms and AI firms need:
governance that prevents growth teams from vetoing safety changes unilaterally,
cross-functional risk councils with independent reporting lines,
robust incident response playbooks (including rollback capability),
documented, reviewable rationales when harm trade-offs are accepted.
5) Give users and civil society leverage—not just settings menus
The BBC report suggests user controls don’t reliably prevent repeat exposure. Platforms should provide:
stronger “do not recommend” enforcement,
explainable reasons for why content is shown (and how to change it),
friction for virality in high-risk categories (forwarding limits, repost throttles),
data access for qualified researchers with privacy-preserving safeguards.
6) Treat children as a special class of protection, not a PR risk
Everyone—companies, regulators, schools, parents—must converge on a principle: systems that profit from attention must not be permitted to run “growth experiments” on minors. That means:
default high-protection modes,
stricter recommender constraints for teen accounts,
enforceable minimum standards on moderation capacity and response times,
meaningful age assurance regimes that do not simply shift risk onto children.
7) For AI specifically: mandate “control surfaces” and post-deployment accountability
To avoid importing the social-media failure mode into AI:
require monitoring, logging, and incident reporting for high-impact AI systems,
require the ability to disable or degrade risky features quickly,
require provenance and evaluation standards (what data, what tests, what failure rates),
require human accountability (named owners) for deployment decisions.
Closing: the uncomfortable truth
The article’s most disturbing implication is that outrage is not merely tolerated; it is instrumentally useful in an attention economy—and companies learned this from their own research. Once that feedback loop exists, the system will keep drifting toward more extreme stimulation unless constrained by governance, law, and economics. The stakes rise as the same optimization logic migrates into AI systems that mediate not just what we watch, but how decisions are made about jobs, health, education, and rights.
Preventing the next decade from becoming “the rage machine, but everywhere” requires a shift from voluntary ethics to enforceable constraints—paired with organizational redesign so that safety is not a cost center begging for headcount, but a binding condition of operation.
