Viral AI-Generated Song about "Diddy Par…

BackOpenAI, Google, and Meta Alleged to Have Overstepped Legal Boundaries for Training AI

OpenAI, Google, and Meta Alleged to Have Overstepped Legal Boundaries for Training AI

Apr 6, 20241 reportSeverity: SevereToolHigh confidence

OpenAI, Google, and Meta used copyrighted content including YouTube videos, books, and other materials without permission to train their AI models, potentially violating intellectual property rights and platform terms of service.

In late 2021, OpenAI faced a data shortage for training its next AI system and created Whisper, a speech recognition tool to transcribe over one million hours of YouTube videos despite YouTube's prohibition on using videos for independent applications. The transcribed text was fed into GPT-4, which became the basis for ChatGPT. Google similarly transcribed YouTube videos for its AI models, potentially violating creator copyrights, and broadened its terms of service in July 2023 to allow use of publicly available Google Docs and other content for AI products. Meta executives discussed buying Simon & Schuster publishing house and using copyrighted material without permission, with one lawyer warning of ethical concerns. The companies justified these practices as fair use under copyright law, citing precedent from the Authors Guild versus Google case. The incident reflects an industry-wide data shortage, with researchers predicting high-quality internet data could be exhausted by 2026. Content creators and publishers have filed lawsuits, including The New York Times suing OpenAI and Microsoft for using copyrighted articles without permission.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

2Privacy & Security

2.1Compromise of privacy by leaking or correctly inferring sensitive information

AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Pre-deployment

Occurring before the AI is deployed

Harm Severity Assessment

Highest Score:4: Severe(Loss of Privacy, inferred)

National Security Assessment

Overall Score

Stakeholders

: OpenAI, Meta, Google
: OpenAI, Meta, Google
: YouTube Creators, General Public, Content Creators

AI System Classification

: Audio Localization
: Content Generation
: Tool
: 4 Minimal or No Risk
: 1

Population Impact

: 10,000
: 1,000,000

External Links

View on AI Incident Database