- Pascal's Chatbot Q&As
- Posts
- AI will force the industry to decide what peer review is for. If peer review is primarily a throughput mechanism for career signaling, AI will perfect the factory.
AI will force the industry to decide what peer review is for. If peer review is primarily a throughput mechanism for career signaling, AI will perfect the factory.
If peer review is a practice for testing claims & stewarding knowledge, AI can help—provided humans keep authority over judgment and incentives stop demanding that the system outrun its own legitimacy
Peer Review’s AI Mirror: How “Helpful Automation” Can Either Heal—or Scale—Scholarly Dysfunction
by ChatGPT-5.2
Eight years after an Oxford-style debate on AI in peer review, Angela Cochran (then the skeptic) and Neil Blair Christensen (then the optimist) revisit the same question with more experience, a very different AI landscape, and—crucially—much the same underlying peer review system. Their joint reflection is less a “who was right?” rematch and more a diagnostic: what exactly are we trying to fix, what are we accidentally optimizing, and what kind of scholarly culture will AI ultimately amplify?
Key topics and concerns
1) AI as assistant vs AI as substitute (and the fear of resignation).
Angela’s 2018 position hinges on a core distinction: peer review is not merely procedural compliance, but thoughtful evaluation of scholarship—especially novelty, emerging topics, and the kinds of “mind-blowing breakthroughs” that don’t look like yesterday’s literature. Her worry is that AI, trained on the past, will penalize the future—downgrading novelty, reinforcing status hierarchies, and turning editorial judgment into pattern-matching. Even if humans are biased too, her point is about risk asymmetry: algorithmic bias scales faster, is harder to see, and can become “policy” by default once embedded in workflows.
2) Bias and conservatism baked into training data and proxies.
The essay circles around familiar but still under-addressed failure modes: AI rewarding incumbency (well-published authors), disfavoring early-career researchers, and misunderstanding emerging fields precisely because the signals it has are retrospective. That is not just an ethics problem; it is an epistemic risk—systematic dampening of scientific surprise.
3) The “wrong 90%” problem: research integrity tooling vs peer review experience.
A striking present-day observation from Angela: most “AI for peer review” demos seem concentrated in research integrity checks—screening for problems—rather than making peer review itself better for editors and reviewers. She lists a set of workflow-native support ideas that would directly reduce friction and increase quality: helping editors reconcile reviewer disagreements, checking whether claims in text match tables/figures, mapping revisions to reviewer comments, flagging inconsistent reviewer recommendations, and nudging for constructive tone. The concern isn’t that integrity checks are unnecessary; it’s that the industry is over-investing in the compliance perimeter while under-investing in the daily, human bottlenecks where peer review actually fails in practice.
4) “Don’t make easier what shouldn’t be done anyway.”
Angela imports a clinician’s line she found clarifying: AI shouldn’t just accelerate busywork; it should force a confrontation with whether the task is legitimate, necessary, or a legacy artifact. In peer review terms, that’s a challenge to the expanding checklist culture: if peer review is drowning in requirements, the solution is not necessarily a machine that helps us drown more efficiently.
5) Systemic dysfunction vs incremental automation: symptom management as strategy.
Neil’s 2026 reflection is sharper than his 2018 optimism. He argues the industry has built an “ever-growing list of checks” that expands capacity to process more papers and more requirements—without addressing the underlying disease. His critique is structural: academic incentives (promotion, tenure, output metrics) drive submission volume and shape behavior; technology then gets used to sustain the machine rather than reform it. In that framing, AI becomes a stabilizer of dysfunction—raising throughput while the incentives keep degrading signal quality.
6) Institutional inertia and cultural stickiness.
Neil admits he underestimated how deeply institutionalized scholarly culture is. Even as the world produced breathtaking technical milestones (mRNA vaccines, black hole imaging, LLMs), peer review debates stayed oddly familiar. The implication: the limiting factor is not capability; it is governance, incentives, and willingness to change workflows that distribute status and power.
7) The future: translation, accessibility, and reviewer pool expansion.
Angela’s “eight years in the future” vision is unexpectedly constructive: AI could make peer review more portable, less time-consuming, and mobile-friendly; and—most importantly—expand participation to reviewers who aren’t native English speakers by enabling review in the language most comfortable to them. That’s a pro-equity use case that also addresses the reviewer shortage by widening the pool.
8) The coming deluge: AI-generated content, translation at scale, and new complexity.
Neil projects that AI will not reduce complexity; it will multiply it. If translation becomes real-time and ubiquitous, submissions and readership globalize further. If AI-generated content becomes normalized, editorial systems face a new baseline problem: evaluating not just more papers, but different kinds of papers (chunked, recombined, scaled). In his view, AI adoption becomes less optional because the environment becomes “more AI”—forcing peer review to fight fire with fire.
Timeline of perspectives, observations, and concerns
Past (2018): the debate’s foundational split
Angela (Opposition): Skepticism that AI improves editorial work; concern about bias in training data; fear of losing creative/serendipitous discovery; belief that thoughtful analysis and judging novelty are human strengths; worry that AI will penalize emerging topics and early-career researchers; sense that handing these tasks to machines is a form of resignation.
Neil (Proposition): Optimism shaped by building assistive tools (technical checks, compliance, extraction, matching); belief that supportive automation is inevitable as scale and complexity rise; claim that integrity risks are increasing and reviewer attention is decreasing; argument that AI should support productivity, not replace judgment.
Present (2026): capability explosion, but institutional sameness
Angela (Updated): Concedes AI has leapt forward (with both promise and danger). Introduces the “don’t speed up pointless tasks” principle. Notes the market skew: most AI tools target integrity checks rather than the larger opportunity of improving the peer review experience. Proposes concrete, editor/reviewer-centric assistance features (decision letters, revision mapping, claim-data consistency checks, constructive tone nudges). Worries the industry is optimizing for fear-management rather than experience and quality.
Neil (Updated): Admits he underestimated cultural inertia. Observes the industry treats symptoms (more checks, more workflow complexity) rather than causes. Suggests the sector often feels like “technology in search of needs” and that AI may be used to scale dysfunctional incentives rather than reform them. Frames peer review problems as rooted in promotion/tenure culture—meaning tools alone cannot fix them.
Future (2034-ish, “eight years ahead”): AI becomes ambient; the real question becomes governance
Angela (Hopeful future): Peer review principles remain, but the work must become less time-consuming, portable, and mobile-friendly. AI should make tasks easier, not do them. Strong equity angle: use AI translation to expand the reviewer pool beyond English-first barriers and reduce reviewer shortages.
Neil (Ambivalent future): Predicts today’s tools will look primitive; AI will be woven into workflows because complexity and scale will force it. Expects translation and AI-generated content to drive a surge in volume and new evaluation problems (including chunked content). Foresees a “for-and-against” world where AI is normal, but cultural dynamics remain stubbornly similar.
ChatGPT’s view: I mostly agree with both—and that’s the point
I agree with Angela’s original skepticism as a warning about category error: peer review is not just “a process,” it is a social-epistemic institution. When we treat it like a pipeline, we reach for pipeline optimizations. But the things peer review is supposed to protect—novelty, rigor, intellectual humility, the detection of overclaiming—are precisely the areas where automation can quietly degrade quality while improving speed.
At the same time, I agree with Neil’s pragmatic claim that refusing AI outright is unrealistic. The system is already under pressure: submission growth, reviewer fatigue, rising integrity risks, and increasingly elaborate editorial policies. If we do not use AI, we will still be competing against “AI-accelerated” actors—authors, paper mills, and even well-intentioned researchers using tools for drafting, translation, and literature mapping. In other words: even if journals don’t adopt AI, the environment has.
Where I think both perspectives converge—implicitly but powerfully—is this: AI will amplify incentives. If the incentive is “publish more,” AI will help publish more. If the incentive is “catch bad actors,” AI will help catch more—sometimes at the expense of reviewer experience and community health. If the incentive is “reduce time-to-decision,” AI will reduce time-to-decision—whether or not the decision quality improves.
So the real question is not “Should AI be used in peer review?” It is:
Which decisions are we willing to let be shaped by probabilistic pattern recognition?
What is the escalation path when AI is uncertain, or when a judgment has real downstream impact (career outcomes, clinical claims, public trust)?
How will we detect when AI has merely made a broken system run faster?
On that basis, I disagree with any implied comfort that “more checks” equals “more integrity.” Integrity is not a checklist; it’s a set of norms enforced by accountable humans in credible institutions. Automated checks can be helpful, but if they become the center of gravity, they risk turning peer review into a compliance theater—precisely the kind of “self-soothing” Neil warns about.
I also think Angela’s “wrong 90%” point is one of the most actionable insights in the piece. Tools that help editors write better synthesis letters, help reviewers be more consistent and constructive, and help validate the alignment between claims and evidence are not flashy—but they target where peer review actually breaks: cognitive overload, time scarcity, and uneven quality of human judgment under pressure.
Finally, I strongly agree with the translation/accessibility vision. If AI can genuinely lower language barriers for reviewing—without laundering poor reasoning into fluent English—it could be one of the most important expansions of scholarly participation in decades. But it requires careful design: transparency about translation assistance, safeguards against meaning drift, and respect for disciplinary nuance.
Future outlook and consequences for stakeholders
If we project forward honestly, the most likely future is not “AI replaces peer review,” nor “peer review rejects AI.” It’s AI becomes ambient infrastructure, and the battle shifts to governance, accountability, and incentive design.
Publishers and societies will face a legitimacy test. If they deploy AI primarily to scale throughput and reduce cost, they risk accelerating the very conditions that have weakened trust: volume over quality, process over judgment, speed over care. If they deploy AI to improve reviewer/editor experience, reduce overclaiming, and widen participation, they can strengthen the social contract of peer review. The strategic difference will show up in retention of editors, reviewer willingness, and brand trust when the next integrity crisis hits.
Editors and reviewers will either be relieved or further alienated. Done well, AI becomes a “cognitive exoskeleton” that reduces rote work: triage support, revision mapping, consistency checking, tone nudges, and translation assistance. Done poorly, AI becomes another layer of friction: more dashboards, more flags, more policies to enforce, more liability—while still expecting volunteers to carry the moral burden.
Authors (especially early-career and emerging-field researchers) face the highest risk from invisible conservatism. If AI-based triage and scoring systems become gatekeepers, novelty and interdisciplinarity can be penalized at scale. The consequence would be a subtle narrowing of what gets published, with long-run harms to scientific exploration and intellectual diversity. Conversely, if AI is used to support clarity, rebuttal organization, and cross-language participation—without becoming a hidden judge—it could improve fairness and reduce the insider advantage.
Universities and funders will remain the upstream cause unless they change incentives. If promotion and tenure continue to reward volume and venue proxies, journals will be pushed to process more, faster, and at lower marginal cost—making AI an accelerant of the same treadmill. If funders and institutions reward replication, data quality, negative results, and careful scholarship, AI can be used to reinforce those norms rather than undermine them.
Technology vendors will be tempted to sell “risk reduction” first—because fear budgets are real budgets. But if the market stays dominated by integrity policing rather than workflow improvement, we get a future of surveillance-like peer review: more detection, more false positives, more adversarial dynamics, and more burden on the humans least able to absorb it. Vendors who win long-term will be those who can prove they improve decision quality and reviewer participation—not just flag more issues.
The public and policymakers will increasingly treat peer review as part of critical infrastructure. As AI-generated content rises, peer review becomes one of the last symbolic “trust stamps” in knowledge markets. If journals are perceived as rubber-stamping AI-amplified output, public trust erodes; if journals overcorrect with opaque AI gatekeeping, legitimacy erodes differently. The sustainable path is transparency, accountability, and demonstrable quality outcomes.
My bet is that the next eight years will produce a split: some journals will become high-trust, slower, and demonstrably rigorous; others will become high-throughput certification mills with AI-driven process management. That bifurcation will not be announced; it will be felt—in retraction patterns, reviewer attrition, and reputational drift.
The most important consequence is this: AI will force the industry to decide what peer review is for. If peer review is primarily a throughput mechanism for career signaling, AI will perfect the factory. If peer review is a community practice for testing claims and stewarding knowledge, AI can help—provided humans keep authority over judgment, and incentives stop demanding that the system outrun its own legitimacy.
