Cybersecurity
"This section catalogs the risk sources and mitigation measures related to cyber- security. These items may be related to security in terms of AI models being accessible only to the intended users, as well as AI models having appropriate access to the external world during both model development and deployment stages."(p. 42)
Sub-categories (4)
Interconnectivity with malicious external tools
"The growing integration and interconnectivity with external tools and plugins increase the risk of exposure to malicious external inputs. This interconnectivity makes it easier for external tools to introduce harmful content [220]."
2.2 AI system security vulnerabilities and attacksUnintended outbound communication by AI systems
"AI systems that have the broad ability to connect to a network to obtain infor- mation could also end up sending data outbound in ways that neither providers, deployers, or end users intended [138]. This can happen if there is no whitelisting of communication channels (such as network connections or allowed protocols). In general, this can occur if the deployment of the AI system violates the prin- ciple of least privilege. Such outbound communication may lead to leakage of confidential data, or the AI system performing unwanted actions like sending emails or ordering goods on the internet."
7.2 AI possessing dangerous capabilitiesAI System bypassing a sandbox environment
"An AI system may have the ability to bypass a sandboxed environment in which it is trained or evaluated."
7.2 AI possessing dangerous capabilitiesModel weight leak
"Model weights or access to them can be leaked when initial access is granted only to a select group of individuals, such as institutional researchers [209]. This risk can increase as more people gain access, and identifying the source of the leak becomes more difficult. The availability of leaked model weights makes various attacks on systems that use the leaked AI model easier to implement, such as finding adversarial examples, elicitation of dangerous capabilities, and extraction of confidential information present in the training data. The avail- ability of model weights might also enable the misuse of the AI system using the leaked model to produce harmful or illegal content [67]."
2.2 AI system security vulnerabilities and attacksOther risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content