- Pascal's Chatbot Q&As
- Posts
- GPT-4o: Meta's lawyers were directly involved in discussions on: Stopping licensing efforts in favor of pirated sources, Concealing evidence of copyright violations.
GPT-4o: Meta's lawyers were directly involved in discussions on: Stopping licensing efforts in favor of pirated sources, Concealing evidence of copyright violations.
Meta’s legal team seemingly advised to avoid any licensed content to maintain a Fair Use defense. Employees discussed risks of being caught and proactively suggested using VPNs and alternative means.
Asking ChatGPT-4o: List every surprising, controversial and valuable detail (for copyright owners) you can find in the newly unsealed emails from META as published by Ars Technica
ChatGPT-4o’s answer:
Key Revelations from the Unsealed Meta Emails (as Published by Ars Technica)
The newly unsealed emails from Meta reveal surprising, controversial, and highly valuable details for copyright owners. Here are the most critical takeaways:
1. Meta Knew It Was Using Pirated Content
Several internal discussions acknowledge that datasets from LibGen, Sci-Hub, ResearchGate, and similar sources are explicitly illegal.
Employees flagged concerns, but leadership and legal discussions seemingly overruled ethical concerns.
Quote: “I don’t think we should use pirated material. I really need to draw a line there.” – Melanie Kambadur (Oct 19, 2022).
Significance: Despite these concerns, the following redacted messages indicate Meta’s decision to proceed.
2. Meta’s In-House Legal Team Facilitated the Decision
Multiple redactions indicate that Meta's lawyers were directly involved in discussions on:
Whether using pirated datasets could be justified as “Fair Use.”
Stopping licensing efforts in favor of pirated sources.
Concealing evidence of copyright violations.
Quote: “If we license one single book, we won’t be able to lean into fair use strategy. So we will have to drop all of the LibGen and Books datasets.” – Sergey Edunov (Aug 23, 2023).
Significance: Meta’s legal team seemingly advised to avoid any licensed content to maintain a Fair Use defense.
3. Meta Considered Concealing Its Piracy
Employees discussed the risks of being caught and proactively suggested using VPNs and alternative means to avoid detection.
Quote: “Can we load LibGen data using Meta IP ranges? Or should we use some VPN?” – Nikolay Bashlylov (April 17, 2023).
Significance: Shows a clear intent to hide data acquisition activities.
4. Meta Killed Licensing Efforts to Use Pirated Data Instead
At multiple points, internal discussions show that Meta halted negotiations with rightsholders in favor of free access to pirated databases.
Quote: “Licensing efforts have stopped.” (April 28, 2023).
Significance: Meta chose piracy over legal licensing deliberately.
5. Meta Torrenting and Storing Massive Amounts of Pirated Content
Over 81.7 TB of pirated content was acquired by Meta via torrents.
This includes:
35.7 TB from Z-Library and LibGen
Shadow libraries from Anna’s Archive
Quote: “Meta torrented at least 81.7 terabytes of data” (April 11, 2024).
Significance: This isn’t just incidental use; it’s a systematic and large-scale data acquisition effort.
6. Meta’s Legal Team Redacted Internal Emails to Hide Copyright Infringement
Many critical legal discussions were fully redacted, indicating efforts to conceal decision-making processes.
Quote: “The next step is actually us starting to process LibGen more thoroughly… the lawyers are [redacted].” – Melanie Kambadur (April 19, 2023).
Significance: Indicates lawyers were involved in operationalizing the use of pirated datasets.
7. Meta Employees Expressed Fear of Regulatory Exposure
Internally, employees worried about what would happen if regulators discovered that Meta trained its AI on pirated datasets.
Quote: “Expresses concern over what will happen if regulators learn that Meta is training Llama on pirated copyrighted data.” (Dec 13, 2023).
Significance: Acknowledges that this practice could lead to legal and regulatory action.
Why This Matters for Copyright Owners
✅ Clear Evidence of Willful Infringement:
Meta’s leadership knew that its data sources were pirated and chose to proceed anyway. This eliminates any plausible deniability.
✅ Legal Risk for Meta:
These emails could strengthen copyright infringement lawsuits and potentially trigger regulatory scrutiny, particularly in the U.S. and EU.
✅ Possible Basis for an Injunction or Legal Sanctions:
Rightsholders can use this evidence to argue for immediate restrictions on Meta's AI models, similar to how OpenAI has faced legal action.
✅ Potential Criminal Liability:
If proven, corporate-level torrenting of massive datasets could expose Meta to criminal copyright infringement charges, not just civil liability.
✅ Challenging Meta’s Fair Use Defense:
Meta strategically avoided licensing to maintain a Fair Use argument—this could be used to discredit their legal position.
Next Steps for Copyright Owners
📌 Legal Action:
Copyright holders can cite this as direct evidence of willful infringement.
Potential basis for seeking statutory damages and injunctions.
📌 Regulatory Pressure:
Present evidence to lawmakers and enforcement bodies to demand AI transparency.
Push for new AI training regulations.
📌 Public Exposure:
Pressure Meta through public awareness and media campaigns.
Conclusion
These revelations confirm and expand upon long-held suspicions about Meta’s AI training practices. The scope, scale, and intent behind its copyright infringement make this one of the most damning disclosures in the AI copyright debate.
This could significantly change the legal landscape for AI training on copyrighted materials.

·
16:37
