GitHub Copilot, an AI code generation tool trained on public repositories, reproduced copyrighted code verbatim including comments and license information, raising concerns about copyright infringement and license compliance violations.
GitHub launched Copilot in June 2021, an AI pair programming tool built using OpenAI Codex and trained on billions of lines of public code from repositories. The system generates code suggestions ranging from single lines to entire functions based on user input and context. Several incidents emerged where Copilot reproduced substantial blocks of code verbatim from open source projects, including original comments and copyright notices, without proper attribution or license compliance. One notable example showed Copilot suggesting code that retained the original license header but applied an incorrect license. Legal experts debated whether this constituted copyright infringement, with GitHub's CEO arguing that training on public data constitutes fair use and that output belongs to the operator. However, concerns arose about potential violations of copyleft licenses like GPL that require derivative works to carry the same license. A class-action lawsuit was filed in October 2022 against GitHub, Microsoft, and OpenAI, alleging violations of attribution requirements in 11 popular open-source licenses including MIT, GPL, and Apache licenses, as well as violations of GitHub's own terms of service, DMCA provisions, and privacy laws. The lawsuit represents millions of potentially affected GitHub users whose code was used to train the system without proper attribution.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed