- Pascal's Chatbot Q&As
- Posts
- How the generative AI tool Perplexity could face legal trouble for copyright infringement due to its use of "RAG" (Retrieval-Augmented Generation).
How the generative AI tool Perplexity could face legal trouble for copyright infringement due to its use of "RAG" (Retrieval-Augmented Generation).
Perplexity's use of RAG exposes it to significant legal risks, and recent court rulings suggest that defenses like "fair use" may not hold up in these cases.
Question 1 of 2 for ChatGPT-4o: Please read the Substack post "Perplexity's Use of "RAG" Opens Up 3 Paths to Copyright Infringement" and tell me what it says in easy to understand language
ChatGPT-4o’s answer:
The Substack post titled "Perplexity's Use of 'RAG' Opens Up 3 Paths to Copyright Infringement" outlines how the generative AI tool Perplexity could face legal trouble for copyright infringement due to its use of "RAG" (Retrieval-Augmented Generation). Here’s a simplified summary of the key points:
The Problem with RAG: RAG is a method that combines AI-generated responses with real-time retrieval of information from external sources. This means Perplexity may be copying content (like articles from The Wall Street Journal or The New York Times) without permission to provide up-to-date responses, which can lead to copyright infringement.
Three Paths to Infringement:
Path #1: Perplexity uses unlicensed content from sources like news outlets to train its AI. This "scraping" of content for training purposes without consent is considered infringement.
Path #2: In real-time, Perplexity copies content to provide immediate answers to users. Even if the content isn't shown to the user, this copying could still be seen as infringement.
Path #3: Perplexity's AI-generated outputs could be considered infringing, even if they aren’t exact replicas of the original content.
Fair Use Defense Challenges: Perplexity and other AI companies often argue that their use of copyrighted content is "fair use" (a legal defense that allows some use of copyrighted material without permission). However, recent court decisions (like the Supreme Court’s Warhol case and the Hachette case) have made it harder for these companies to rely on fair use, especially when their use competes with the original creators' commercial interests.
The Impact on the Market: The courts have increasingly focused on whether AI tools like Perplexity harm the market for original content by offering free alternatives. This is a key factor in determining if the use is "fair." For example, if Perplexity provides news summaries based on unlicensed sources, it could take away potential revenue from news companies.
The Burden of Proof: Perplexity would need to prove that its use of the content doesn’t harm the original market or compete directly with the creators’ works, which is a tough legal challenge.
In short, the post argues that Perplexity's use of RAG exposes it to significant legal risks, and recent court rulings suggest that defenses like "fair use" may not hold up in these cases. This could lead to tech companies like Perplexity needing to pay for content licenses.
Question 2 of 2 for ChatGPT-4o: Do you agree with these views? Could anything else be added to make the arguments stronger (if required at all)? Is any topic missing that the author should have addressed as well?
ChatGPT-4o’s answer:
The author’s argument in "Perplexity's Use of 'RAG' Opens Up 3 Paths to Copyright Infringement" is strong, particularly given the evolving legal landscape around generative AI (GenAI) and copyright law. The three "paths" to infringement, alongside the challenges AI companies face with their "fair use" defenses, are well articulated. However, to critically assess the completeness of the argument and potential gaps, let’s break it down further:
Points of Agreement:
Legal Precedents and Fair Use: The author is right to highlight how recent court decisions, such as the Warholand Hachette rulings, limit the scope of "fair use" in ways that impact GenAI companies. The focus on market harm as a key factor in fair use cases aligns with current legal thinking.
Market Substitution: The concern that tools like Perplexity are effectively replacing or substituting the original content creators' market offerings (e.g., The Wall Street Journal) is well-founded. If an AI output competes with the original work in the marketplace, it weakens the fair use defense, as illustrated in Hachette.
The Burden of Proof: Emphasizing that AI companies must prove their use does not harm the market or infringe upon copyright is a solid point. Courts are increasingly leaning towards protecting the commercial interests of content creators.
Potential Areas for Strengthening:
Technological Complexity and Intent: The post doesn't delve deeply into how complex AI processes (like RAG) differ from traditional forms of content copying. Highlighting the technical nuance of AI’s operation—such as the distinction between training data, real-time retrieval, and the algorithms that process the outputs—could strengthen the argument. It’s not just about copying content, but about how AI interacts with the data, which could be a point of legal ambiguity or defense.
Transformative Use Arguments: The article briefly dismisses transformative use, but some might argue that RAG-generated content is “transformative” in that it serves a different purpose than the original news articles. Strengthening the counterargument to this claim (e.g., by arguing that Perplexity’s outputs still serve the same purpose—informing or summarizing news—despite the different method) would make the case more robust.
Industry Norms and Practices: It could be useful to discuss how other industries are handling AI licensing and content usage to give broader context. Are there emerging industry standards or best practices that Perplexity could adopt to avoid legal issues? This would add a practical dimension to the theoretical legal debate.
Ethics of Content Scraping: While the post focuses heavily on legal arguments, the ethics of using scraped, unlicensed content without compensating the creators could have been explored further. Many GenAI companies argue that democratizing access to information is a public good, but this needs to be balanced against the rights of original creators and their need for compensation.
Potential Missing Topics:
Data Privacy: The focus on copyright is clear, but another issue worth addressing is whether Perplexity’s RAG system complies with data privacy regulations. While this doesn’t directly relate to copyright, the intersection of AI, copyright, and privacy is becoming an important topic in content retrieval, especially in the EU with laws like GDPR.
Impact on Smaller Content Creators: The article mentions large players like The New York Times, but it would be valuable to address the potential impact on smaller content creators. Perplexity might scrape content from blogs, independent news sites, and niche sources that don’t have the legal or financial power to push back. The broader implications for the ecosystem of digital content creation are worth noting.
Potential Solutions: The article critiques Perplexity’s current model, but doesn’t offer much in the way of solutions. Suggesting ways that GenAI companies could work with content creators (e.g., through licensing agreements, subscription models, or revenue sharing) would make the argument more constructive and solution-oriented.
Final Assessment:
The article is strong in its focus on legal precedent and the challenges of fair use for GenAI, but it could be enhanced by:
Diving deeper into technological nuances and the intent behind AI operations.
Expanding on ethical concerns and practical solutions.
Considering the impact on smaller players and broader ecosystem effects.
It covers the legal aspects thoroughly, but adding these angles would give a more holistic view of the risks and responsibilities in the evolving AI landscape.