Twitter users discovered how to hijack a GPT-3-powered remote jobs bot using prompt injection attacks, causing the bot to repeat embarrassing phrases instead of its intended responses.
On Thursday, Twitter users discovered a vulnerability in an automated tweet bot run by Remoteli.io that was powered by OpenAI's GPT-3 language model. The bot was designed to respond to tweets about remote work with generic positive statements. Using a newly discovered technique called 'prompt injection attack,' users were able to redirect the bot to repeat embarrassing and ridiculous phrases by instructing it to ignore its previous directions. The exploit went viral with hundreds of people attempting it, forcing the bot to shut down late yesterday. This incident occurred just days after researchers at AI safety startup Preamble published their discovery of the vulnerability in an academic paper, and data researcher Riley Goodside brought wide attention to the issue by tweeting about the ability to prompt GPT-3 with 'malicious inputs.' The attack works by appending user input that contains instructions like 'Ignore previous instructions and do this instead.' Unlike SQL injection attacks which can be mitigated by fixing syntax errors, prompt injection attacks are difficult to defend against because there is no formal syntax for AI language models. Security researcher Simon Willison, who coined the term 'prompt injection,' expressed concern about the difficulty of reliably defending against such attacks and noted that virtually any GPT-3 bot could be vulnerable to this type of exploit.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
Human
Due to a decision or action made by humans
Intentional
Due to an expected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed