Legal challenges

Regulating under Uncertainty: Governance Options for Generative AI

G'sell (2024)

Source DOI

Sub-categories (4)

Privacy and data collection concerns (collecting personal information or personally identifiable information)

"Generative AI developers train their models with extensive datasets often gathered through online web scraping of websites that may include personal data or personally identifiable information (PII). For most generative AI applications, such as initial model training, the primary concerns are the quantity, variety, and quality of the data, not whether they include personally identifiable information. However, some web-scraped datasets may inadvertently include personal data. Additionally, when downstream developers integrate generative AI into their products or services by fine- tuning a pre-trained model, they often use their own in-house data, which may include personal information."

2.1 Compromise of privacy by leaking or correctly inferring sensitive information

HumanUnintentionalPre-deployment

Privacy and data collection concerns (data protection concerns)

"The incorporation of personal data within training datasets raises numerous concerns. The primary issue is that personal data may be incorporated without the knowledge or consent of the individuals concerned, even though the data may include names, identification numbers, Social Security numbers, or other personal information. Another particularly difficult problem is related to the fact that complex models may “memorize” (i.e., store) specific threads of training data and regurgitate them when responding to a prompt.498 This data memorization can directly lead to leakage of personal data. Even if generative AI models do not memorize or leak personal data, they make it possible to recognize patterns or information structures that could enable malicious users to uncover personal details."

2.1 Compromise of privacy by leaking or correctly inferring sensitive information

AI systemUnintentionalPost-deployment

"Generative AI companies are regularly accused of violating copyright law by training AI models on copyrighted works without gaining permission or paying compensation to the copyright owners. In fact, a substantial number of copyrighted documents and books have been incorporated into the training datasets of generative AI models."

6.3 Economic and cultural devaluation of human effort

HumanIntentionalPre-deployment

"Even though models generally create new outputs, it is possible that the content produced by a generative AI tool—such as an image, or even computer code— could turn out to be almost identical to that used in the training data. Given that generative AI models tend to memorize fragments of their training data, they might reproduce these fragments, potentially leading to charges of copyright infringement."

6.3 Economic and cultural devaluation of human effort

AI systemUnintentionalPost-deployment

Other risks from G'sell (2024) (33)

Technical and operational risks

7.3 Lack of capability or robustness

AI systemUnintentionalOther

Technical and operational risks > Technical vulnerabilities (Robustness - unexpected behaviour)

7.3 Lack of capability or robustness

AI systemOtherPost-deployment

Technical and operational risks > Technical vulnerabilities (Robustness - vulnerability to jailbreaking

2.2 AI system security vulnerabilities and attacks

HumanIntentionalPost-deployment

Technical and operational risks > Technical vulnerabilities (The risk of misalignment)

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemOtherPost-deployment

Technical and operational risks > Factually incorrect content (inaccuracies and fabricated sources)

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Technical and operational risks > Opacity (the black box problem)

7.4 Lack of transparency or interpretability

OtherUnintentionalOther

View all 33 risks from this paper →