Skip to main content

Cybersecurity

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Anwar et al. (2024)

Sub-category
Risk Domain

Using AI systems to gain a personal advantage over others such as through cheating, fraud, scams, blackmail or targeted manipulation of beliefs or behavior. Examples include AI-facilitated plagiarism for research or education, impersonating a trusted or fake individual for illegitimate financial benefit, or creating humiliating or sexual imagery.

"LLMs may exacerbate cybersecurity risks in various ways (Newman, 2024). Firstly, LLMs may significantly amplify the effectiveness of deceptive operations aimed at tricking people into disclosing sensitive information or granting adversary access to critical resources. For example, LLMs might prove highly effective at crafting personalized phishing emails or messages at scale that may be harder for an average user to recognize as phishing attempts (Karanjai, 2022; Hazell, 2023). In addition to being directly harmful to the targeted individual, such ‘social engineering’ attacks are often the base of larger hacking operations (Plachkinova and Maurer, 2018; Salahdine and Kaabouch, 2019)."(p. 85)

Supporting Evidence (3)

1.
"Coding capabilities of LLMs could be used for malicious purposes (Checkpoint Research, 2022). This may either be done through using off-the-shelf LLMs or through training or fine-tuning LLMs specifically for this purpose (Checkpoint Research, 2023; Erzberger, 2023). This may include using code-inspection capabilities of LLMs to find software vulnerabilities, and code-writing capabilities of LLMs to create novel malware and exploits."(p. 85)
2.
"Cybersecurity risks can also increase due to the collective resources available to multi-agent systems powered by LLMs. Such systems could represent a risk similar in scale to botnets, with a large number of coordinated agents working together (Sun et al., 2023). However, the generative capabilities and possible emergent abilities of these systems at scale extend the potential impact beyond traditional Distributed Denial of Service (DDoS) attacks. For instance, multi-agent systems could be used for targeted vulnerability analysis and exploitation over a range of systems in a coordinated, fault tolerant manner (Hendrycks et al., 2021a). This could facilitate vulnerability chaining across systems and networks, enabling multi-stage attacks that are inherently more difficult to mitigate (Roytman and Bellis, 2023)."(p. 86)
3.
"Lastly, there is increasing evidence that LLMs can be used to craft jailbreaks which can be used on other LLMs and other instances of the same LLM (Chao et al., 2023; Mehrabi et al., 2023; Shah et al., 2023). This poses a risk to the security of LLMs and may result in a dangerous dynamic where improvement in a (closed-source) LLM’s capabilities means it can generate more sophisticated jailbreaks (which could be used to jailbreak another instance of the same LLM)."(p. 86)

Part of Vulnerability to Poisoning and Backdoors

Other risks from Anwar et al. (2024) (26)