Pascal's Chatbot Q&As
Posts
Chicken Soup for the Soul & BMG v. Anthropic: a litigation strategy that treats generative AI as a copyright supply chain.

Chicken Soup for the Soul & BMG v. Anthropic: a litigation strategy that treats generative AI as a copyright supply chain.

The core injury is not merely downstream outputs, but upstream mass copying—especially copying allegedly sourced from shadow libraries—followed by years of compounding commercial benefit.

Pascal Hetzscholdt
March 18, 2026

The Shadow-Library Theory Goes Mainstream Again: What the New Anthropic Suits Are Really Trying to Prove

by ChatGPT-5.2

Two new lawsuits filed on the same day against Anthropic—one from a major book brand (Chicken Soup for the Soul) and one from a major music-rights holder (BMG)—are less about “a chatbot quoted my work” and more about something bigger: a litigation strategy that treats generative AI as a copyright supply chain. In that framing, the core injury is not merely downstream outputs, but upstream mass copying—especially copying allegedly sourced from shadow libraries—followed by years of compounding commercial benefit.

What makes these filings notable is how they try to tighten the loop between (1) alleged piracy-origin training pipelines, (2) alleged “cleaning” steps that strip identifying rights information, (3) product deployment, and (4) the emergence of a market that “can’t function” (in plaintiffs’ telling) without embedded, uncompensated creative labor.

What the grievances are

Chicken Soup for the Soul: a books case that tries to be an “industry case”

Chicken Soup’s complaint is built around an expansive theory: that multiple leading AI companies (not just Anthropic) allegedly copied and used pirated versions of Chicken Soup books at multiple steps in the AI development lifecycle—acquisition, ingestion, storage, deduplication, tokenization, training, fine-tuning, and sometimes retrieval-augmented generation.

The grievance is therefore framed as repeated infringement “throughout the lifecycle,” not a single discrete act. The complaint’s rhetorical centerpiece is a causal origin story: it alleges that the modern commercial LLM era begins with an OpenAI employee allegedly downloading Library Genesis books in 2018, creating internal datasets (later “Books1/Books2”), then training GPT-3; that act is described as the template that others followed. From there, it points to EleutherAI’s release of The Pile and the construction of Books3—allegedly built from a piracy tracker—and argues that the book-piracy substrate became a reusable industrial input for the entire sector.

Two additional grievances are worth noticing because they reveal what plaintiffs are optimizing for:

They do not want a class action. Chicken Soup explicitly says it is choosing individualized statutory damages and jury determination rather than a class structure that could produce what they portray as “pennies on the dollar” settlements. That is a strategic shot across the bow: it signals an intent to maximize statutory-damages leverage rather than accept an industry-wide compromise.
They want to treat “pirated-source provenance” as willfulness. The complaint doesn’t merely claim use of unauthorized copies; it repeatedly frames the alleged reliance on notorious shadow libraries as proof that defendants “knew” the sources were illegal, ignored warnings, and concealed dataset composition.

BMG v. Anthropic: a music-lyrics case that targets pipeline steps (and CMI)

BMG’s complaint is more surgical and process-driven. It alleges that Anthropic used copyrighted lyrics in model development and that Claude can output substantial lyric text in response to prompts. But the claim structure pushes harder than prior lyrics cases on how the sausage is made:

Direct infringement in training and output: using lyrics to train and generating infringing outputs.
Direct infringement by torrenting: a separate claim alleging infringement through acquisition/distribution mechanics associated with torrenting.
Contributory and vicarious infringement: theories aimed at Anthropic’s role in enabling user or customer infringement and benefiting from it.
Removal or alteration of Copyright Management Information (CMI): a DMCA §1202-style claim arguing that “cleaning” processes and extractor tools removed copyright-identifying information embedded in text.

BMG also foregrounds an interpersonal grievance: it claims BMG sent a cease-and-desist letter in December 2025 and Anthropic did not respond. In litigation terms, that allegation is not about etiquette; it’s about willfulness, irreparable harm, and injunctive posture.

Assessing the quality of the evidence: what’s demonstrable vs what’s inferred

These complaints do not “prove” their claims at filing; they are designed to survive early motions and force discovery. The key question is whether the allegations are anchored in evidence that a court will treat as more than speculation.

Where the cases look strongest (because the facts are testable)

1) Output demonstrations (BMG). If Claude outputs substantial portions of specific song lyrics under reproducible prompting, that is the cleanest kind of early evidence: it can be tested by both sides, and the dispute shifts to interpretation (guardrails, memorization, substantial similarity, fair use, etc.), not mere plausibility.

2) Pipeline + intent hooks (BMG’s CMI theory). The CMI allegation is potentially powerful because it doesn’t require proving that every lyric was copied—rather, it tries to show a systematic process that predictably strips the very metadata that helps rightsholders identify and manage works. Whether that becomes a winning claim depends heavily on discovery: what exactly the extractor tools did, whether CMI was in fact removed, and whether the court credits the inference that the removals were knowing and designed to facilitate infringement.

3) “Notorious source” willfulness posture (Chicken Soup). Chicken Soup tries to turn the public notoriety of shadow libraries into a willfulness presumption: if you used these corpora, you knew (or should have known) you were dealing with piracy. Courts may not accept that leap without specific internal documents, but the theory is discovery-friendly: it sets up subpoenas for dataset manifests, hashes, procurement tickets, and internal communications.

4) The NVIDIA “green light” allegation (Chicken Soup). This is one of the most striking factual allegations in the filing: it asserts NVIDIA engaged directly with a major piracy repository and proceeded after being warned of illegality. If plaintiffs can substantiate this with communications or records, it could become a case-defining willfulness fact pattern (or, conversely, if it’s not substantiated, it becomes a vulnerability).

Where the cases look weaker (because they rely on inference that must be proven later)

1) “Your book existed in a shadow library, therefore defendants copied it.” This inference is common in AI copyright litigation. It can be plausible, but it’s not the same as proving ingestion of specific titles/editions into a given training run. Plaintiffs will need discovery—dataset inventories, training snapshots, dedup logs—to bridge the gap.

2) The “industry genealogy” story (OpenAI → EleutherAI → everyone). It’s rhetorically effective and may be historically directionally true, but liability is individualized. Each defendant’s exposure turns on what they did, when, and with what corpora. Courts tend to treat broad “everyone followed the template” narratives as context, not proof.

3) Scale rhetoric as a proxy for wrongdoing. Both suits lean into “arms race” framing and massive economic stakes. That matters for motive and remedies, but it does not substitute for the factual trace of copying.

Net-net: these complaints look engineered to force disclosure. The evidentiary battle will be won or lost on what the defendants’ records show about dataset provenance, acquisition mechanisms, cleaning tools, and guardrail behavior—not on the drama of the narrative alone.

The most surprising, controversial, and valuable statements and findings

Surprising

The “omnibus” posture in Chicken Soup: naming a broad swath of the industry (including Apple and NVIDIA) is an escalation beyond the “one lab” model of many earlier suits. It’s a deliberate attempt to make the court confront generative AI as an ecosystem, not a single actor.
The NVIDIA piracy-interaction allegation: the “warned, then green-lit” story is unusually direct and—if proven—would be unusually damaging as evidence of willfulness.
BMG’s separation of torrenting as its own infringement claim: that is an effort to treat acquisition/distribution mechanics as independently actionable, not merely a step toward training.

Controversial

“Every training pass makes a new copy.” Chicken Soup pushes a very expansive view of “copying” across epochs/gradient steps. That’s legally provocative: it invites courts to decide whether routine computational steps constitute new actionable reproductions, which could massively amplify damages theories.
Turning “dataset secrecy” into evidence of concealment. Both the public debate and these pleadings increasingly treat nondisclosure of training data as a sign of guilt. Defendants will argue it’s trade secret protection and safety; plaintiffs will frame it as hiding unlawful sources.
The implied claim that generative AI cannot exist lawfully at scale without licensing. Neither complaint always says this explicitly, but both are built to support it: if the models are trained on pirated corpora and outputs can substitute for the originals, the market becomes structurally dependent on uncompensated copying.

Valuable (because they sharpen what courts will actually have to decide)

The “copyright supply chain” framing. These cases move beyond the simplistic “output quotes my work” debate and force a structured inquiry: where did the data come from, how was it processed, what metadata was stripped, what was stored long-term, what is reproduced in outputs, and what controls exist?
The CMI/cleaning angle. Regardless of outcome, it is a strategically important move: it targets the operational reality that training pipelines often normalize or erase attribution-like signals in bulk processing.
The refusal to proceed as a class action (Chicken Soup). That is a signal that at least some plaintiffs intend to make litigation expensive, individualized, and high-variance—harder to settle as an industry bundle.

How these cases compare to the ones that came before

Earlier waves of AI copyright litigation often fell into two categories:

Output-centric suits (show me the infringing output; argue substantial similarity and market harm).
Training-centric suits (the act of copying for training is infringement; argue fair use vs non-transformative substitution).

These new filings are best read as third-wave pleadings: they attempt to fuse training, acquisition provenance, data cleaning, metadata/CMI stripping, and downstream outputs into a single continuous theory of infringement and willfulness.

Chicken Soup’s complaint also intensifies a trend that has been building: the shadow-library provenance claim as the centerpiece, not a footnote. The complaint’s narrative is effectively: “This industry was bootstrapped on pirate corpora, and the piracy became industrial infrastructure.” That’s a direct escalation from earlier cases that sometimes treated data provenance as difficult-to-prove background.

BMG, meanwhile, resembles the Concord lyrics track but pushes further on process claims (especially CMI removal) and on secondary liability. It’s not only “Claude outputs lyrics,” but “Anthropic’s pipeline and product design make infringement foreseeable and profitable.”

Consequences for other AI companies and future litigation

If these complaints survive early motions and discovery opens up meaningfully, the consequences could be structural:

1) Discovery pressure will become a governance pressure.
AI companies will be forced to show what they used, when they used it, how they cleaned it, and what they stored. That incentivizes a world of better provenance systems, retention policies, and internal audit trails—because the absence of records can look like culpability.

2) Shadow-library exposure becomes a board-level risk category.
If “notorious source = willfulness” gains traction, the risk calculus shifts from “fair use might save us” to “our provenance choices could treble damages and drive injunction risk.”

3) Plaintiffs will diversify claim types to avoid fair-use bottlenecks.
Fair use will remain central, but plaintiffs are clearly trying to multiply paths to liability: CMI claims, torrenting/distribution mechanics, contributory/vicarious theories, unfair competition-like narratives (even if not always pled), and product design arguments (RAG, indexing, “central libraries,” etc.).

4) Settlements, if they come, may look less like “pay a license fee” and more like “rebuild your pipeline.”
The relief sought in these suits isn’t just money; it points toward injunctions that could require deletion, retraining, filters, provenance attestations, and operational constraints.

5) Litigation risk will spread to the stack.
Chicken Soup’s inclusion of NVIDIA and Apple reflects an emerging idea: not only “model developers” are exposed, but also those supplying foundational tooling, datasets, or distribution surfaces—depending on alleged involvement.

Predictions and future outlook

Courts will be pushed to decide “what counts as copying” in machine learning pipelines.
Chicken Soup’s “lifecycle copying” theory (acquisition → ingestion → training passes) is a direct invitation for courts to define the unit of infringement in modern ML systems. That decision, more than any single verdict, could reshape damages and compliance architecture.
The provenance question will become the litigation fulcrum.
Output demos are persuasive, but provenance is existential. If discovery reveals sloppy or knowingly risky sourcing (especially from notorious pirate libraries), willfulness becomes a live possibility and settlement pressure spikes.
CMI claims may become the sleeper battleground.
If courts accept that bulk cleaning that strips metadata can violate §1202 when done knowingly and at scale, that becomes a powerful tool for rightsholders—because it targets the normalization of “attribution erasure” as a technical default.
Expect more “omnibus” suits—and counter-strategies to sever them.
If plaintiffs can keep multi-defendant, ecosystem-wide cases together, defendants face coordinated risk. Defendants will likely fight hard to sever claims, move venues, and individualize issues to prevent an industry-wide liability narrative from crystallizing in one courtroom.
The industry will converge on a two-track reality: licensed corpora for defensible products; gray-market corpora for competitive frontier pushes.
The more litigation punishes shadow-library sourcing, the more companies will segment: “clean-room” datasets and licensed pools for products sold to risk-sensitive customers; and separate, more insulated experimentation pipelines that are harder to defend publicly. Regulators and enterprise buyers will increasingly demand attestations that force those worlds to reconcile.

In short: these filings are not merely “two more copyright suits.” They are attempts to force the legal system to treat generative AI as a reproducible, auditable production process—one that either has a lawful supply chain, or it doesn’t. If plaintiffs get discovery that meaningfully validates the shadow-library story (and the alleged cleaning/CMI stripping), the next phase of AI litigation won’t be about whether AI is “transformative” in the abstract. It will be about whether the industry can keep its most valuable inputs legally invisible.