Commitment and Trust

Supporting Evidence (2)

"In settings that require joint action in order to obtain a better outcome, inefficiencies can result whenever one or more actors cannot be trusted (perhaps due to strategic incentives, or due to their incompetence) to carry out their part of the plan. These inefficiencies can be reduced via credible commitments made by the untrusted parties. Unfortunately, the ability to make credible commitments is ‘dual-use’ and can therefore lead to new risks."

"An actor makes a commitment when they bind themselves to a course of action, such that reneging on that action would either be impossible or result in significant costs to themselves. A commitment is credible when other actors believe that the actor making the commitment will follow through with the actions they claim to have committed to. Credible commitments are useful in scenarios where trust is essential but hard to establish, such as in international treaties, economic policies, and contractual agreements."

Sub-categories (3)

Inefficient Outcomes

"Inefficient Outcomes. Without careful planning and the appropriate safeguards, we may soon be entering a world overrun by increasingly competent and autonomous software agents, able to act with little restriction. The abilities of these agents to persuade, deceive, and obfuscate their activities, as well as the fact they can be deployed remotely and easily created or destroyed by their deployer, means that by default they may garner little trust (from humans or from other agents). Such a world may end up being rife with economic inefficiencies (Krier, 2023; Schmitz, 2001), political problems (Csernatoni, 2024; Kreps & Kriner, 2023), and other damaging social effects (Gabriel et al., 2024). Even if it is possible to provide assurances around the day-to-day performance of most AI agents, in high-stakes situations there may be extreme pressures for agents to defect against others, making trust harder to establish, and potentially leading to conflict (Fearon, 1995; Powell, 2006, see also Section 2.2).42"

7.6 Multi-agent risks

AI systemUnintentionalPost-deployment

Threats and Extortion

"Threats and Extortion. A natural solution to problems of trust is to provide some kind of com- mitment ability to AI agents, which can be used to bind them to more cooperative courses of action. Unfortunately, the ability to make credible commitments may come with the ability to make credible threats, which facilitate extortion and could incentivize brinkmanship (see Section 2.2)."

7.6 Multi-agent risks

AI systemIntentionalPost-deployment

Rigidity and Mistaken Commitments

"Rigidity and Mistaken Commitments. Even when it is desirable to be able to make threats in order to deter socially harmful behaviour, doing so using AI agents effectively removes the human from the loop, which could prove disastrous in high-stakes contexts (e.g., a false positive in a nuclear sub- marine’s warning system; see also Case Study 11), or when irresponsible actors are enabled in making disproportionate or mistaken commitments."

7.6 Multi-agent risks

HumanUnintentionalPost-deployment