Skip to main content
Home/Risks/Gabriel et al. (2024)/Adversarial AI: Data and Model Exfiltration Attacks

Adversarial AI: Data and Model Exfiltration Attacks

The Ethics of Advanced AI Assistants

Gabriel et al. (2024)

Sub-category
Risk Domain

AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.

"Other forms of abuse can include privacy attacks that allow adversaries to exfiltrate or gain knowledge of the private training data set or other valuable assets. For example, privacy attacks such as membership inference can allow an attacker to infer the specific private medical records that were used to train a medical AI diagnosis assistant. Another risk of abuse centers around attacks that target the intellectual property of the AI assistant through model extraction and distillation attacks that exploit the tension between API access and confidentiality in ML models. Without the proper mitigations, these vulnerabilities could allow attackers to abuse access to a public-facing model API to exfiltrate sensitive intellectual property such as sensitive training data and a model’s architecture and learned parameters."(p. 74)

Part of Malicious Uses

Other risks from Gabriel et al. (2024) (69)