Skip to main content

Opaque Data Collection

Generating Harms - Generative AI's impact and paths forwards

Electronic Privacy Information Centre (2023)

Category
Risk Domain

AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.

"When companies scrape personal information and use it to create generative AI tools, they undermine consumers' control of their personal information by using the information for a purpose for which the consumer did not consent."(p. 24)

Sub-categories (3)

Scraping to train data

"When companies scrape personal information and use it to create generative AI tools, they undermine consumers’ control of their personal information by using the information for a purpose for which the consumer did not consent. The individual may not have even imagined their data could be used in the way the company intends when the person posted it online. Individual storing or hosting of scraped personal data may not always be harmful in a vacuum, but there are many risks. Multiple data sets can be combined in ways that cause harm: information that is not sensitive when spread across different databases can be extremely revealing when collected in a single place, and it can be used to make inferences about a person or population. And because scraping makes a copy of someone’s data as it existed at a specific time, the company also takes away the individual’s ability to alter or remove the information from the public sphere. "

2.1 Compromise of privacy by leaking or correctly inferring sensitive information
HumanIntentionalPre-deployment

Generative AI User Data

Many generative AI tools require users to log in for access, and many retain user information, including contact information, IP address, and all the inputs and outputs or “conversations” the users are having within the app. These practices implicate a consent issue because generative AI tools use this data to further train the models, making their “free” product come at a cost of user data to train the tools. This dovetails with security, as mentioned in the next section, but best practices would include not requiring users to sign in to use the tool and not retaining or using the user-generated content for any period after the active use by the user.

2.1 Compromise of privacy by leaking or correctly inferring sensitive information
HumanUnintentionalPost-deployment

Generative AI Outputs

Generative AI tools may inadvertently share personal information about someone or someone’s business or may include an element of a person from a photo. Particularly, companies concerned about their trade secrets being integrated into the model from their employees have explicitly banned their employees from using it.

2.1 Compromise of privacy by leaking or correctly inferring sensitive information
AI systemUnintentionalPost-deployment

Other risks from Electronic Privacy Information Centre (2023) (21)