- Pascal's Chatbot Q&As
- Posts
- Concord Music v. Anthropic (II): Anthropic and leadership allegedly chose BitTorrent because it’s fast and free, and used it to build a “vast central library” they intended to keep “forever”.
Concord Music v. Anthropic (II): Anthropic and leadership allegedly chose BitTorrent because it’s fast and free, and used it to build a “vast central library” they intended to keep “forever”.
A company branding itself as “AI safety and research” allegedly relied on pirate libraries and a protocol “synonymous with copyright infringement.”
Concord Music v. Anthropic (II) — a piracy case wearing an “AI training” suit
by ChatGPT-5.2
This January 28, 2026 complaint is framed as a second action by major music publishers against Anthropic and—unusually—two named individuals (Dario Amodei and Benjamin Mann). It is explicitly positioned as “distinct and separate” from the earlier Concord case (“Concord I”) and centers on two buckets of alleged wrongdoing:
Old-school, unambiguous piracy: downloading (and necessarily uploading) millions of books via BitTorrentfrom LibGen and PiLiMi—including songbooks and sheet-music/lyrics books containing at least 714 publisher-controlled musical works (Exhibit A).
Continuing AI-era infringement: post–Concord I scraping/copying/using physical books and other datasets to train and operate newer Claude models, allegedly involving 20,517 publisher works (Exhibit B), plus output-side infringement and DMCA §1202 “CMI removal” theories.
What makes the pleading strategically sharp is that it tries to anchor the whole narrative in the torrenting allegations, which—if proven—look far closer to “routine infringement” than the legally contested “training is fair use” debates.
A. What the plaintiffs are actually mad about (the grievances)
1) “You built your library on piracy”
The headline grievance is that Anthropic and leadership allegedly chose BitTorrent because it’s fast and free, and used it to build a “vast central library” they intended to keep “forever,” without paying.
They plead this not as a technical footnote but as a values-and-intent story: a company branding itself as “AI safety and research” allegedly relied on pirate libraries and a protocol “synonymous with copyright infringement.”
2) “And you didn’t just download—you also distributed”
They emphasize BitTorrent’s “swarm” mechanics to argue that downloading necessarily caused uploading/distributionto the public, amplifying harm and encouraging further infringement.
3) “You concealed this during discovery”
A pointed grievance is procedural: plaintiffs say they only learned of torrenting in July 2025 after rulings in Bartz v. Anthropic, and that Anthropic “concealed” this during discovery in Concord I.
4) “After being sued, you doubled down”
On the training/output side, the complaint says: even after Concord I, Anthropic released new Claude versions and continued copying and outputting infringing lyrics, with guardrails treated as a “band-aid—not a cure.”
5) “You strip copyright management information because you think it’s junk”
They add a DMCA §1202 claim: Anthropic allegedly removes/strips CMI (title, authorship, copyright owner info) during training and outputs, and they put a rhetorical dagger in the text by attributing to Anthropic the view that CMI is “useless junk” / “boilerplate.”
6) “You’re enabling user infringement and monetizing it”
The secondary-liability theory is unusually commercialized: the complaint argues Anthropic profits per interaction—pay-as-you-go API revenues each time someone prompts for lyrics and each time the model outputs them—and therefore has a direct financial benefit and the ability to supervise/control outputs and access.
B. Is the evidence any good?
1) Torrenting: stronger than most “AI training” complaints
On the torrenting claims, the complaint is not merely “information and belief.” It leans on public court opinions and filings from Bartz and even cites deposition transcript references. It alleges:
scale: ~5 million books from LibGen and 2 million from PiLiMi
leadership awareness: internal descriptions like LibGen being a “blatant violation of copyright,” “sketchy,” and celebratory messages about Z-Library mirroring
intent/choice: Amodei allegedly acknowledged legal purchase options but chose torrenting to avoid a “legal/practice/business slog.”
linkage to plaintiffs’ repertoire: they claim bibliographic metadata catalogs (title/author/ISBN fields) show many books containing publisher-controlled sheet music/lyrics, and they give examples of specific songbook titles and famous songs.
Assessment: If those Bartz materials hold up and if Exhibit A is defensible, this is high-quality liability evidence(reproduction + distribution), and it is psychologically powerful because it’s easy for a judge/jury to understand. The main evidentiary vulnerability is not “did torrenting happen?” but attribution and scope: connecting specific publisher works to the torrented files at scale, and proving distribution beyond mere technical possibility (though BitTorrent mechanics make the “distribution occurred” argument intuitively strong).
2) Training/output: plausible but more inference-heavy
On the AI training and output allegations, they assert:
continued copying via scraping, physical books, and third-party datasets
ongoing output of verbatim lyrics and ease of jailbreak, citing “scientific literature” and promising discovery will show more
a very large asserted worklist: 20,517 works (Exhibit B), called “non-exhaustive, exemplary”
Assessment: This part reads more like many other GenAI suits: a combination of (a) prior history (Concord I), (b) general model behavior (memorization/regurgitation), (c) public research, and (d) “discovery will reveal.”
It’s not weak—especially because lyrics are short, distinctive, and often regurgitated—but it is less “self-proving” than the torrenting narrative.
3) DMCA §1202 (CMI removal): strategically potent, factually fragile unless they can show process
The DMCA theory is a clever multiplier: if you can prove intentional CMI removal/alteration tied to infringement, it can be expensive and unpleasant for defendants. The complaint alleges CMI stripping both in training pipelines and outputs, plus that Anthropic anticipated users would be sued and indemnified them.
Assessment: Courts are split-y and technical on §1202. The plaintiffs will likely need process evidence (pipeline steps, dataset cleaning rules, prompt/output logs, training corpora handling) to avoid this being dismissed as “CMI just isn’t present in scraped text; omission ≠ intentional removal.” The complaint gestures toward that by alleging “algorithms known to remove” CMI and “by design,” but the proof burden is real.
C. The most surprising, controversial, and valuable statements/findings (from the complaint)
Surprising
The suit is blunt that torrenting created a permanent internal library intended to be kept “forever”—and treats that mere possession/storage as standalone infringement regardless of later training use.
It explicitly ropes in named founders as primary actors and “moving forces” behind the torrenting decisions, not just corporate conduct.
Controversial
The complaint attributes to Anthropic the view that copyright metadata is “useless junk” / “boilerplate,” and uses that to frame §1202 intent.
It asserts that guardrails are essentially PR: a “band-aid—not a cure,” because they do nothing about the underlying copying in training.
Valuable (strategically / evidentiary)
The plaintiffs are explicit that this lawsuit exists because the prior case could not be amended without “fundamentally transform[ing]” it—so they carved out the torrenting as a clean, high-clarity claim set.
The monetization theory is unusually concrete: API pay-per-word revenues are tied to lyric prompts and lyric outputs, bolstering vicarious-liability “direct financial benefit” allegations.
The requested relief includes an accounting of training data and methods—a discovery and transparency lever that, if granted, can reshape negotiating power far beyond this one case.
D. How this litigation differs from other GenAI cases you and I have discussed
1) It’s not primarily a “fair use of training” test case
Many headline AI suits (NYT/OpenAI-style) become trench warfare over fair use, transformative purpose, market substitution, and whether outputs are “derivative.” Here, the plaintiffs lead with torrenting—a fact pattern with far fewer doctrinal escape hatches.
2) It targets individual executives alongside the company
That changes settlement pressure, insurance dynamics, discovery posture, and reputational risk.
3) It uses BitTorrent distribution mechanics as a harm multiplier
A lot of AI cases emphasize copying; this one emphasizes copying + distribution to the public as structurally baked into the protocol.
4) The §1202 angle is foregrounded as “design intent,” not an afterthought
They’re trying to make “provenance stripping” the moral equivalent of removing serial numbers, not just missing metadata.
5) The workscale is framed as massive and continuing post-suit
They explicitly cite newer Claude releases (by date) and claim ongoing conduct after being sued.
E. Recommendations at this point
For AI developers
Treat piracy-sourced corpora as existential-risk material. If any internal team is using BitTorrent/LibGen/PiLiMi-like sources, stop and document a cleanup plan; “we didn’t use it for commercial models” is not a safe harbor if the suit is about copying/distribution and storage.
Build provable provenance, not vibes. Keep chain-of-custody logs for training data sources, licenses, transformations, and retention/deletion. Expect courts to increasingly demand “accounting”-style disclosures.
Engineer CMI preservation as a product requirement. If you ingest licensed text that includes metadata, preserve it; if you output protected text, output the attribution/CMI or block the output. If you can’t do this, don’t pretend §1202 risk doesn’t exist.
Guardrails aren’t a defense if the pipeline is dirty. Use guardrails as harm reduction, but assume plaintiffs (and judges) will view them as “band-aids” unless the underlying training and dataset compliance story is credible.
Align incentives: If you indemnify users, you’re effectively admitting predictable infringement scenarios—so invest in demonstrable prevention and auditability, not just legal language.
For content and rights owners
Prioritize cases with “bright-line” conduct. This complaint shows the playbook: lead with conduct like torrenting/distribution/storage where defenses are narrower, and use that leverage to force transparency and settlement terms that matter.
Ask for accountings that change the industry, not just money. The requested relief here (training data/methods disclosure) is the kind of lever that can produce systemic impact if you can win it.
Invest in “litigation-grade” evidence pipelines. Build repeatable tests for verbatim regurgitation, style-to-verbatim drift, jailbreak susceptibility, and metadata stripping; preserve logs and screenshots; use controlled prompts and reproducibility protocols.
Treat CMI as both shield and sword. Make sure your authorized distributions consistently include strong CMI; document how it appears in licensed contexts; it strengthens §1202 narratives when you can show what was present before ingestion.
Negotiate licensing with enforcement teeth. If you license, require: provenance reporting, audit rights, deletion/retention limits, output/attribution rules, and penalties for unlicensed sources. The market is moving toward “trust as infrastructure,” and enforceable transparency is the scarce asset.
