Scraping to train data
Generating Harms - Generative AI's impact and paths forwards
Electronic Privacy Information Centre (2023)
AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.
"When companies scrape personal information and use it to create generative AI tools, they undermine consumers’ control of their personal information by using the information for a purpose for which the consumer did not consent. The individual may not have even imagined their data could be used in the way the company intends when the person posted it online. Individual storing or hosting of scraped personal data may not always be harmful in a vacuum, but there are many risks. Multiple data sets can be combined in ways that cause harm: information that is not sensitive when spread across different databases can be extremely revealing when collected in a single place, and it can be used to make inferences about a person or population. And because scraping makes a copy of someone’s data as it existed at a specific time, the company also takes away the individual’s ability to alter or remove the information from the public sphere. "(p. 25)
Part of Opaque Data Collection
Other risks from Electronic Privacy Information Centre (2023) (21)
Information Manipulation
4.1 Disinformation, surveillance, and influence at scaleInformation Manipulation > Scams
4.3 Fraud, scams, and targeted manipulationInformation Manipulation > Disinformation
4.1 Disinformation, surveillance, and influence at scaleInformation Manipulation > Misinformation
3.1 False or misleading informationInformation Manipulation > Security
4.2 Cyberattacks, weapon development or use, and mass harmInformation Manipulation > Clickbait and feeding the surveillance advertising ecosystem
3.2 Pollution of information ecosystem and loss of consensus reality