- Pascal's Chatbot Q&As
- Posts
- GPT-4o: The Encyclopædia Britannica v. Perplexity AI case is one of the most detailed, strategic, and potentially precedent-setting lawsuits in the generative AI copyright arena to date.
GPT-4o: The Encyclopædia Britannica v. Perplexity AI case is one of the most detailed, strategic, and potentially precedent-setting lawsuits in the generative AI copyright arena to date.
Unlike speculative claims around training data, this case zeroes in on live crawling, RAG inputs, and output substitution—issues that will shape the future of AI-powered search and content generation.
Encyclopædia Britannica and Merriam-Webster v. Perplexity AI—A Landmark Copyright and Trademark Lawsuit in the Age of Generative AI
by ChatGPT-4o
a) Grievances and Quality of the Evidence
At the core of this case is a multi-pronged legal and ethical grievance: Encyclopædia Britannica and Merriam-Webster(the plaintiffs) allege that Perplexity AI systematically violated their copyrights and trademarks in order to power and market its “answer engine.” The grievances are outlined across three main legal violations:
Copyright Infringement
Perplexity allegedly:Crawled and scraped Plaintiffs’ websites (against terms of use).
Copied and stored original articles to use as input for its Retrieval-Augmented Generation (RAG) system.
Generated outputs that were verbatim or near-verbatim reproductions of those inputs without authorization or compensation.
Trademark Infringement and False Designation of Origin
Perplexity is accused of:Displaying hallucinated or incomplete AI outputs alongside Britannica or Merriam-Webster’s marks.
Creating the false impression that Plaintiffs approved or authored the outputs.
Unfair Competition and Deceptive Practices
Plaintiffs argue that Perplexity’s model competes unfairly by substituting links and depriving them of traffic, advertising revenue, and licensing fees.
Quality of the evidence is strong and unusually specific for a tech copyright case:
The complaint includes screenshots of Perplexity outputs reproducing or paraphrasing copyrighted articles.
Britannica provides examples of precise matches and partial omissions (used to reinforce the trademark claim).
The lawsuit references public statements by Perplexity’s CEO and product literature that admit to practices underpinning the alleged infringement (e.g., RAG architecture, use of “trusted sources,” the “no hallucination” principle).
Technical evidence, such as crawling behavior bypassing
robots.txt
and using undeclared user agents, is supported by third-party reports (e.g., Cloudflare, Wired, Stackdiary).
This case, unlike many abstract LLM lawsuits, relies heavily on concrete examples, direct quotes, source code behavior, and economic harm—making it legally potent.
b) Surprising, Controversial, and Valuable Statements
Surprising:
Perplexity’s use of stealth crawlers to evade
robots.txt
protocols—even while claiming otherwise in public documentation—suggests a knowing violation of digital norms.Verbatim reproduction of Merriam-Webster’s dictionary entries and Britannica articles is not hypothetical—it’s demonstrated in screenshots.
Perplexity has licensing deals with several publishers (e.g., Time, LA Times, Wiley), implying selective compliance rather than ignorance of copyright obligations.
Controversial:
Framing Perplexity as a plagiarism engine and "rent-seeking middleman" that “repackages others’ work for free” is a moral critique, not just a legal one.
Britannica asserts that even Perplexity’s “source citations” are not a valid defense—challenging a common AI industry claim that linking suffices.
Valuable:
The complaint argues that RAG is not fair use, distinguishing it from traditional LLM training. This distinction is crucial for future legal doctrine.
The filing highlights economic harms: lost ad revenue, blocked licensing opportunities, and market substitution effects—key factors courts weigh in fair use cases.
c) How Other Rights Owners Could or Should Use This Information
Strengthen Terms of Use
Publishers should explicitly ban use of their content for AI training or retrieval in both Terms of Use and robots.txt files—mirroring Britannica’s approach.Document Infringement
Rights holders should run tests similar to those Britannica conducted (e.g., inputting known prompts and comparing outputs). Screenshots of overlaps will be persuasive in court.Audit Crawler Activity
Collaborate with vendors (like Cloudflare or Netacea) to detect and log undeclared crawler traffic. This is direct evidence of unauthorized access.Consider Filing Suits or Sending Cease & Desist Letters
For organizations not yet under license, this case provides a template and legal roadmap for asserting copyright and trademark claims.Leverage Public Pressure
Britannica’s complaint amplifies media coverage accusing Perplexity of plagiarism. Rights holders can do the same to raise the reputational stakes.Negotiate Licenses
The complaint’s reference to Perplexity’s paid licensing deals with others (including Wiley) confirms there is a market for AI licenses. Use that as leverage in negotiations.
d) Predicted Outcomes of the Case
Given the strength and specificity of the evidence, the case has several possible outcomes—depending on Perplexity’s defense strategy, court rulings on fair use, and potential settlement negotiations.
Most Likely Short-Term Outcomes:
Denial of Motion to Dismiss
As in the Dow Jones case, the court is unlikely to dismiss Britannica’s claims early. The jurisdiction is solid, and the evidence is detailed.Preliminary Injunction (Possibly Limited)
Britannica may win a partial injunction barring further scraping or requiring source removals, especially if the court finds willful violation ofrobots.txt
.Settlement Negotiation
To avoid discovery and reputational damage, Perplexity may opt for a confidential licensing settlement, similar to past deals with other publishers.
Long-Term Possibilities:
Judicial Definition of RAG vs. LLM Training: If this case is litigated fully, it could set a precedent distinguishing retrieval-based outputs from LLM-only fair use defenses. Courts might find RAG to be more infringing than traditional AI training.
Trademark Precedent for “AI Hallucinations”: Courts might rule that falsely attributing hallucinated content to a real brand constitutes false designation of origin. This could reshape how LLMs use citations or display logos.
License Market Acceleration: A win (or even a strong settlement) for Britannica will signal that licensing is not optional—potentially triggering a broader shift toward compliance.
Conclusion
The Encyclopædia Britannica v. Perplexity AI case is one of the most detailed, strategic, and potentially precedent-setting lawsuits in the generative AI copyright arena to date. Unlike speculative claims around training data, this case zeroes in on live crawling, RAG inputs, and output substitution—issues that will shape the future of AI-powered search and content generation.
Rights holders across publishing, education, media, and entertainment should closely monitor the outcome. If successful, Britannica’s strategy provides a playbook for asserting and monetizing rights in the AI era. Conversely, if Perplexity prevails, it could undermine decades of copyright jurisprudence and further destabilize the economics of high-quality factual content online.
