Opaque Data Collection
Generating Harms - Generative AI's impact and paths forwards
Electronic Privacy Information Centre (2023)
AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.
"When companies scrape personal information and use it to create generative AI tools, they undermine consumers' control of their personal information by using the information for a purpose for which the consumer did not consent."(p. 24)
Sub-categories (3)
Scraping to train data
"When companies scrape personal information and use it to create generative AI tools, they undermine consumers’ control of their personal information by using the information for a purpose for which the consumer did not consent. The individual may not have even imagined their data could be used in the way the company intends when the person posted it online. Individual storing or hosting of scraped personal data may not always be harmful in a vacuum, but there are many risks. Multiple data sets can be combined in ways that cause harm: information that is not sensitive when spread across different databases can be extremely revealing when collected in a single place, and it can be used to make inferences about a person or population. And because scraping makes a copy of someone’s data as it existed at a specific time, the company also takes away the individual’s ability to alter or remove the information from the public sphere. "
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationGenerative AI User Data
Many generative AI tools require users to log in for access, and many retain user information, including contact information, IP address, and all the inputs and outputs or “conversations” the users are having within the app. These practices implicate a consent issue because generative AI tools use this data to further train the models, making their “free” product come at a cost of user data to train the tools. This dovetails with security, as mentioned in the next section, but best practices would include not requiring users to sign in to use the tool and not retaining or using the user-generated content for any period after the active use by the user.
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationGenerative AI Outputs
Generative AI tools may inadvertently share personal information about someone or someone’s business or may include an element of a person from a photo. Particularly, companies concerned about their trade secrets being integrated into the model from their employees have explicitly banned their employees from using it.
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationOther risks from Electronic Privacy Information Centre (2023) (21)
Information Manipulation
4.1 Disinformation, surveillance, and influence at scaleInformation Manipulation > Scams
4.3 Fraud, scams, and targeted manipulationInformation Manipulation > Disinformation
4.1 Disinformation, surveillance, and influence at scaleInformation Manipulation > Misinformation
3.1 False or misleading informationInformation Manipulation > Security
4.2 Cyberattacks, weapon development or use, and mass harmInformation Manipulation > Clickbait and feeding the surveillance advertising ecosystem
3.2 Pollution of information ecosystem and loss of consensus reality