Skip to main content
BackTransferable adversarial attacks from open to closed-source mod- els
Home/Risks/Gipiškis2024/Transferable adversarial attacks from open to closed-source mod- els

Transferable adversarial attacks from open to closed-source mod- els

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"In some cases, an adversarial attack developed for an open-weights and open- source model (where the weights and architecture are known - a “white box” attack) can be transferable to closed-source models, despite the defenses put in place by the closed-source model provider (such as structured access). These adversarial attacks can be generated automatically [238]."(p. 27)

Other risks from Gipiškis2024 (144)