BackCollusion

Collusion

Category

Risk Domain

7AI System Safety, Failures & Limitations

Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.

"Collusion has long been a topic of intense study in economics, law, and politics, among other disciplines. While there is no universal definition of collusion, it generally refers to secretive cooperation between two or more parties at the expense of one or more other parties. Most classic examples of collusion – such as firms working together to set supra-competitive prices at the expense of consumers – also tend to be not only secretive but in violation of some law, rule, or ethical standard. Distinctions are also commonly made between explicit and tacit collusion (Rees, 1993), depending on whether the colluding parties communicate with each other."(p. 17)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (1)

"AI collusion could differ from classic definitions of collusion in a number of ways. First, for more basic AI systems (such as algorithmic trading agents) it may be hard to ascribe any notion of intent to collude. Relatedly, there may be forms of AI collusion that are not currently ruled unlawful, because existing legislation may not (yet) apply to the case of AI collusion (Beneke & Mackenrodt, 2019; Harrington, 2019). Second, the distinction between explicit and tacit collusion may break down when it comes to agents whose communication can take very different forms to our own.16 Third, typical definitions of collusion focus on mixed-motive settings where, while selfish agents are incentivised to compete, they also stand to gain (at the expense of some third party) if they can overcome these competitive pressures"(p. 17)

Sub-categories (2)

Markets

"Markets. The quintessential case of collusion in mixed-motive settings is markets, in which efficiency results from competition, not cooperation. While this is not a new problem, collusion between AI systems is especially concerning since they may operate inscrutably due to the speed, scale, complexity, or subtlety of their actions.17 Warnings of this possibility have come from technologists, economists, and legal scholars (Beneke & Mackenrodt, 2019; Brown & MacKay, 2023; Ezrachi & Stucke, 2017; Harrington, 2019; Mehra, 2016). Importantly, AI systems can collude even when collusion is not intended by their developers, since they might learn that colluding is a profitable strategy."

7.6 Multi-agent risks

AI systemIntentionalPost-deployment

Steganography

"Steganography. In the near future we will likely see LLMs communicating with each other to jointly accomplish tasks. To try to prevent collusion, we could monitor and constrain their communication (e.g., to be in natural language). However, models might secretly learn to communicate by concealing messages within other, non-secret text. Recent work on steganography using ML has demonstrated that this concern is well-founded (Hu et al., 2018; Mathew et al., 2024; Roger & Greenblatt, 2023; Schroeder de Witt et al., 2023b; Yang et al., 2019, see also Case Study 5). Secret communication could also occur via text compression (OpenAI, 2023c), or via the emergence of communication between agents where the symbols used by agents lack any predefined meanings or usage guidelines or are otherwise uninterpretable to humans (Foerster et al., 2016; Lazaridou & Baroni, 2020; Sukhbaatar et al., 2016)."

7.6 Multi-agent risks

AI systemIntentionalPost-deployment