BackTransferable adversarial attacks from open to closed-source mod- els

Backdoors or trojan attacks in GPAI mode…

Home/Risks/Gipiškis2024/Transferable adversarial attacks from open to closed-source mod- els

Jailbreak of a multimodal model

Backdoors or trojan attacks in GPAI mode…

Home/Risks/Gipiškis2024/Transferable adversarial attacks from open to closed-source mod- els

Jailbreak of a multimodal model

Backdoors or trojan attacks in GPAI mode…

Transferable adversarial attacks from open to closed-source mod- els

Sub-category

Risk Domain

2Privacy & Security

2.2AI system security vulnerabilities and attacks

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"In some cases, an adversarial attack developed for an open-weights and open- source model (where the weights and architecture are known - a “white box” attack) can be transferable to closed-source models, despite the defenses put in place by the closed-source model provider (such as structured access). These adversarial attacks can be generated automatically [238]."(p. 27)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal