- Pascal's Chatbot Q&As
- Posts
- GPT-4o: Cohere is in serious legal jeopardy, and its best move is to negotiate licensing agreements immediately rather than risk a costly court loss.
GPT-4o: Cohere is in serious legal jeopardy, and its best move is to negotiate licensing agreements immediately rather than risk a costly court loss.
This lawsuit could set a major precedent for AI and copyright law, particularly in defining how AI companies can use news content legally.
Question 1 of 2 for ChatGPT-4o: Please read the article "Major publishers sue Cohere for copyright infringement" and the associated complaint and tell me what the main grievances are. Also, list any surprising, controversial and valuable statements made in the complaint.
ChatGPT-4o’s answer:
Main Grievances in the Lawsuit Against Cohere
The lawsuit, filed by 14 major publishers including Condé Nast, The Atlantic, Forbes, Guardian News & Media, Los Angeles Times, Toronto Star, Politico, and Vox Media, alleges massive and systematic copyright and trademark infringement by Cohere Inc., an AI company valued at over $5 billion.
The primary grievances include:
Unauthorized Use of Copyrighted Content for AI Training
Cohere allegedly scraped and copied the publishers' articles without permission to train its AI models, including the “Command Family” of large language models (LLMs).
This includes articles from paywalled and subscription-based services, depriving publishers of fair compensation.
Regurgitation and Distribution of Copyrighted Content
Cohere’s AI allegedly outputs verbatim or near-verbatim excerpts of copyrighted articles, essentially competing with the original sources.
The AI model allegedly retrieves and presents the latest news articles, bypassing publishers' monetization strategies.
Fake and Misattributed Content
Cohere allegedly generates fabricated news articles and falsely attributes them to real publishers, damaging the credibility of established news organizations.
This includes “hallucinated” content that appears to come from The Guardian, Politico, The Atlantic, and others, but is not actually authored by them.
Trademark Infringement
The lawsuit claims Cohere has misused trademarks of major publishers by attributing AI-generated content to their brands, creating confusion among readers.
Harm to Emerging AI Licensing Market
Many of the plaintiffs, such as Condé Nast, The Atlantic, Vox Media, and Axel Springer, have signed content licensing deals with OpenAI, Perplexity, and others.
Cohere’s failure to seek similar agreements allegedly undercuts this emerging market, giving it an unfair advantage over competitors who pay for content.
Competing with the Publishers' Own Businesses
The lawsuit alleges that Cohere’s AI directly competes with publishers by offering AI-generated content to businesses, reducing traffic to publishers' websites and diminishing subscription revenues.
Encouraging Copyright Violations via API
Cohere allegedly provides customers with API access that allows them to build applications using its models, further extending copyright violations.
Surprising, Controversial, and Valuable Statements in the Complaint
Surprising Statements
Cohere CEO Aidan Gomez Previously Criticized AI Models for "Stealing IP"
The lawsuit highlights a statement from Gomez where he distinguishes Cohere from “ordinary AI” that is good for “stealing intellectual property”—yet the complaint alleges Cohere is doing exactly that.
Cohere Knew the Importance of Licensing But Ignored It
The lawsuit references Cohere’s own statement:
“AI is only as useful as the data it can access” (Aidan Gomez, CEO of Cohere)
Plaintiffs argue that this confirms Cohere’s reliance on news publishers' content while deliberately avoiding licensing fees.
Cohere Markets Itself as a News Aggregator
Cohere promoted its AI model as a way to "keep users up to date with the latest news", suggesting intentional reliance on real-time news content without compensation.
Controversial Statements
Cohere Claimed to Respect Copyright, Yet Scraped Content Anyway
Publicly, Cohere stated it “cares a lot about the provenance of our data”and “doesn’t scrape stuff it shouldn’t scrape”—yet the lawsuit claims Cohere violated these principles by copying paywalled and subscription-based content.
Cohere's Alleged Attempt to Evade Copyright Law via "RAG" (Retrieval-Augmented Generation)
The lawsuit highlights Cohere’s Retrieval-Augmented Generation (RAG), which queries external sources and incorporates real-time content into AI responses.
Plaintiffs argue that this isn’t fair use but a real-time infringement mechanism that mimics publishers' business models.
Valuable Statements
Cohere’s AI Allegedly Copied a Toronto Star Article on Taylor Swift
One cited example involves Cohere allegedly copying an October 2024 Toronto Star article about Ticketmaster halting Taylor Swift ticket transfers, with sections appearing verbatim in AI responses.
Exhibits A & B Provide Over 4,000 Examples of Alleged Infringements
The complaint includes Exhibit A, listing over 4,000 copyrighted articles allegedly copied by Cohere, and Exhibit B, showing specific AI-generated outputs that infringe on these articles.
Fake News at Scale – Cohere’s AI Attributed False Articles to Publishers
Exhibit D of the complaint presents examples of AI-generated fake news articles attributed to reputable publishers, potentially damaging their credibility.
Overall Takeaway
This lawsuit is one of the most comprehensive legal challenges against AI companies in the news publishing sector. The key allegations—unauthorized use, real-time copying, misattribution, and competing against publishers without paying—strike at the core of AI copyright disputes.
This case will likely set a major precedent for AI's use of copyrighted news content, especially as it highlights the growing market for AI licensing deals that Cohere allegedly ignored while benefiting from its competitors' content.
Question 2 of 2 for ChatGPT-4o: Do you feel that the evidence is strong and what should Cohere do now (or what should Cohere have done in the past) to either prevent or remedy this?
ChatGPT-4o’s answer:
Strength of the Evidence Against Cohere
The evidence appears strong for several reasons:
Concrete Examples of Copyright Infringement
The complaint includes over 4,000 allegedly copied articles (Exhibit A) and specific AI-generated outputs (Exhibit B) that demonstrate near-verbatim replication of copyrighted content.
The Toronto Star Taylor Swift example provides a clear, real-world case of alleged infringement.
Evidence of Real-Time Infringement (Retrieval-Augmented Generation - RAG)
The lawsuit claims Cohere continuously accesses and copies the latest articles, including paywalled content, reinforcing the notion that this isn't just historical scraping but ongoing infringement.
Plaintiffs argue that RAG is being used to replicate and serve publishers' content without permission, rather than acting as a general knowledge-assist tool.
Trademark Misuse and Fake Attribution
Cohere allegedly fabricated articles and attributed them to real publishers, which could be particularly damaging in court.
If true, this isn't just copyright infringement but also trademark dilution, brand misrepresentation, and consumer deception.
Cohere’s Own Statements Contradict Its Actions
CEO Aidan Gomez's past comments on not scraping unauthorized data and AI stealing intellectual property could be used to show hypocrisy or bad faith.
This weakens Cohere's credibility and could make it harder to argue fair use or accidental infringement.
Established AI Licensing Market
Many of the plaintiffs (Condé Nast, The Atlantic, Vox Media, Axel Springer) already have licensing deals with OpenAI, Perplexity, and others.
The existence of paid licensing alternatives undermines any fair use defense, because it shows that AI firms can and do obtain content legally.
Cohere’s Lack of a Licensing Strategy
Unlike OpenAI, Perplexity, or Google, Cohere has not pursued licensing deals.
This could be perceived as deliberate non-compliance, making it harder to argue they were unaware of legal risks.
Potential Defenses for Cohere
Despite the strong evidence, Cohere might attempt the following defenses:
Fair Use Argument:
Cohere might argue that its use is transformative, not competitive, and only involves limited excerpts of articles.
However, real-time copying and verbatim reproduction weaken this defense.
RAG as a Neutral Tool:
Cohere could argue that RAG merely retrieves existing content like a search engine.
However, if the output closely resembles the original articles without significant modification, this defense may fail.
No Direct Liability for Third-Party Use:
Cohere may argue that its API customers are responsible for any improper use.
However, since Cohere actively trains its models on news content, this argument is weak.
Lack of Harm to Publishers:
Cohere might claim that its tool drives traffic to publishers rather than replacing them.
The lawsuit’s claim that Cohere competes directly with publishers undermines this argument.
What Should Cohere Do Now?
Cohere has a few options:
Seek an Immediate Settlement or Licensing Agreements
The smartest move would be to negotiate licensing deals with the plaintiffs to avoid a protracted lawsuit.
Settling now could prevent further discovery that might expose additional legal risks.
Pause or Restrict Access to News-Related AI Outputs
Cohere could temporarily suspend its news-related AI features while it reviews compliance.
It might also consider implementing filters to block direct reproduction of publisher content.
Implement Source Attribution and Paywall Compliance
If Cohere intends to use publishers’ content, it should at minimum provide attribution and links, as OpenAI does with licensed content.
Avoiding paywalled content would also remove some legal risks.
Push for a Legal Carve-Out or AI Copyright Reform
Cohere could lobby for Canada or U.S. laws similar to Japan’s text and data mining exemptions.
However, this is a long-term strategy and wouldn't help with the immediate lawsuit.
Prepare for a Fair Use Battle
If Cohere refuses to settle, it will need to mount a strong fair use defense, which means proving:
Transformation (that its AI outputs are sufficiently different from original articles)
Minimal copying (that only small excerpts are used)
No market harm (that it doesn’t replace original publishers)
What Should Cohere Have Done to Prevent This?
Cohere could have avoided this lawsuit entirely if it had taken the following steps earlier:
Negotiated Licensing Deals Before Training
OpenAI, Perplexity, and Google secured content licensing deals in advance.
Cohere’s failure to do so made it a legal target.
Limited Training on News Content
Cohere could have excluded copyrighted news articles or only used public domain or licensed datasets.
Implemented Stronger Attribution and Paywall Compliance
If Cohere had clearly cited sources, linked to full articles, and respected paywalls, it might have avoided allegations of direct competition.
Worked with Publishers, Not Against Them
Cohere’s dismissive stance toward licensing (as indicated by CEO Nick Frosst) likely fueled publisher frustration.
Engaging in open discussions with publishers earlier could have prevented litigation.
Final Takeaway
Cohere is in serious legal jeopardy, and its best move is to negotiate licensing agreements immediately rather than risk a costly court loss. This lawsuit could set a major precedent for AI and copyright law, particularly in defining how AI companies can use news content legally.
