Legal challenges
AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.
"Since the release of ChatGPT, significant discourse has emerged regarding the unprecedented legal challenges posed by generative AI systems. These challenges primarily involve protecting privacy and personal data, as well as preserving copyrights. The former encompasses safeguarding personal information, while the latter includes issues related to the use of copyrighted content for training AI models and determining the legal status of works produced by AI systems."(p. 96)
Sub-categories (4)
Privacy and data collection concerns (collecting personal information or personally identifiable information)
"Generative AI developers train their models with extensive datasets often gathered through online web scraping of websites that may include personal data or personally identifiable information (PII). For most generative AI applications, such as initial model training, the primary concerns are the quantity, variety, and quality of the data, not whether they include personally identifiable information. However, some web-scraped datasets may inadvertently include personal data. Additionally, when downstream developers integrate generative AI into their products or services by fine- tuning a pre-trained model, they often use their own in-house data, which may include personal information."
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationPrivacy and data collection concerns (data protection concerns)
"The incorporation of personal data within training datasets raises numerous concerns. The primary issue is that personal data may be incorporated without the knowledge or consent of the individuals concerned, even though the data may include names, identification numbers, Social Security numbers, or other personal information. Another particularly difficult problem is related to the fact that complex models may “memorize” (i.e., store) specific threads of training data and regurgitate them when responding to a prompt.498 This data memorization can directly lead to leakage of personal data. Even if generative AI models do not memorize or leak personal data, they make it possible to recognize patterns or information structures that could enable malicious users to uncover personal details."
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationCopyright challenges (training models using copyrighted output)
"Generative AI companies are regularly accused of violating copyright law by training AI models on copyrighted works without gaining permission or paying compensation to the copyright owners. In fact, a substantial number of copyrighted documents and books have been incorporated into the training datasets of generative AI models."
6.3 Economic and cultural devaluation of human effortCopyright challenges (copyright-infringing output)
"Even though models generally create new outputs, it is possible that the content produced by a generative AI tool—such as an image, or even computer code— could turn out to be almost identical to that used in the training data. Given that generative AI models tend to memorize fragments of their training data, they might reproduce these fragments, potentially leading to charges of copyright infringement."
6.3 Economic and cultural devaluation of human effortOther risks from G'sell (2024) (33)
Technical and operational risks
7.3 Lack of capability or robustnessTechnical and operational risks > Technical vulnerabilities (Robustness - unexpected behaviour)
7.3 Lack of capability or robustnessTechnical and operational risks > Technical vulnerabilities (Robustness - vulnerability to jailbreaking
2.2 AI system security vulnerabilities and attacksTechnical and operational risks > Technical vulnerabilities (The risk of misalignment)
7.1 AI pursuing its own goals in conflict with human goals or valuesTechnical and operational risks > Factually incorrect content (inaccuracies and fabricated sources)
3.1 False or misleading informationTechnical and operational risks > Opacity (the black box problem)
7.4 Lack of transparency or interpretability