• Pascal's Chatbot Q&As
  • Posts
  • GPT-4o: Suno's approach of using vast amounts of data available on the internet, including copyrighted material, to train their models is indicative of a common practice in the AI industry.

GPT-4o: Suno's approach of using vast amounts of data available on the internet, including copyrighted material, to train their models is indicative of a common practice in the AI industry.

GPT-4o: AI developers often incorporate copyrighted material in their dataset without explicit permission from rights holders.

ChatGPT-4o’s answer:
In the articles "AI Music Generator Suno Admits It Was Trained on ‘Essentially All Music Files on the Internet’" and "A.I. Music Startup Suno Fires Back at Record Labels, Admits Training on Copyrighted Music," the following key admissions are made by Suno:

  1. Scope of Training Data:

    • Suno admits that its AI music generator was trained on "essentially all music files of reasonable quality that are accessible on the open internet"​. This includes tens of millions of recordings.

  2. Inclusion of Copyrighted Material:

    • Suno acknowledges that its training data "presumably included recordings whose rights are owned by the Plaintiffs," which refers to the major record labels involved in the lawsuit​. This admission directly confronts the lawsuit's claim that Suno used copyrighted material without permission.

  3. Legal Defense - Fair Use:

    • Suno is not contesting the use of copyrighted songs in its training data. Instead, the company is making a legal argument that such use falls under the fair use doctrine. Suno asserts that training its AI model on copyrighted music is legal under fair use because it involves creating new, non-infringing products from the training data​.

  4. Position Against Record Labels:

    • Suno argues that the lawsuit from the major record labels is an attempt to suppress competition and maintain their market share. They claim that their AI tool is meant to empower users, including musicians and everyday people, to create original music​.

These admissions and arguments were made in response to a lawsuit filed by major record labels accusing Suno of copyright infringement by using their music to train its AI model. Suno’s defense centers on the transformative nature of their AI-generated music and the assertion of fair use.


Question 2 of 2 for ChatGPT-4o: Can we conclude that this has been the way many AI makers have been going about training of AI models?

ChatGPT-4o’s answer:
Yes, the admissions from Suno provide a clear example of a broader trend among many AI developers regarding the training of AI models. The details from the articles suggest that Suno's approach of using vast amounts of data available on the internet, including copyrighted material, to train their models is indicative of a common practice in the AI industry. Here are the key points that support this conclusion:

  1. Widespread Data Scraping:

    • Suno admitted to training its model on "essentially all music files of reasonable quality that are accessible on the open internet," which implies a large-scale data scraping practice​​. This aligns with reports of other AI companies using similar methods to gather extensive datasets from publicly available sources.

  2. Inclusion of Copyrighted Material:

    • The admission that the training data included copyrighted recordings suggests that AI developers often incorporate copyrighted material in their datasets without explicit permission from rights holders. This has been a contentious issue across various AI applications, including image, text, and video generation​​.

  3. Legal Justifications - Fair Use:

    • Suno's defense based on the fair use doctrine reflects a broader legal strategy

[Load failed]