Adversarial AI: Data and Model Exfiltration Attacks

The Ethics of Advanced AI Assistants

Gabriel et al. (2024)

Source DOI

Sub-category

Risk Domain

2Privacy & Security

2.1Compromise of privacy by leaking or correctly inferring sensitive information

AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.

"Other forms of abuse can include privacy attacks that allow adversaries to exfiltrate or gain knowledge of the private training data set or other valuable assets. For example, privacy attacks such as membership inference can allow an attacker to infer the specific private medical records that were used to train a medical AI diagnosis assistant. Another risk of abuse centers around attacks that target the intellectual property of the AI assistant through model extraction and distillation attacks that exploit the tension between API access and confidentiality in ML models. Without the proper mitigations, these vulnerabilities could allow attackers to abuse access to a public-facing model API to exfiltrate sensitive intellectual property such as sensitive training data and a model’s architecture and learned parameters."(p. 74)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal