This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Independent audits, third-party reviews, and regulatory compliance verification.
Also in Risk & Assurance
Evaluation by independent third parties will allow frontier AI organisations to draw on external expertise, have more ‘eyes on the problem’, and provide for more accountability. External evaluation is particularly important at the pre-deployment phase, and can inform irreversible decisions around deployment of the model. Appropriate legal advice and confidentiality agreements may also protect any market-sensitive data when sharing information with third parties. For the subset of evaluations which may touch on national security concerns, a secure environment may be needed with appropriately cleared officials. There are further opportunities for independent evaluation for open source models, given the potential broader community involvement.
Reasoning
Independent external evaluators conduct model assessments throughout lifecycle—third-party audit activity.
Ensure that evaluators are independent and have sufficient AI and subject matter expertise across a wide range of relevant subjects and backgrounds
External evaluators’ relationships with frontier AI organisations could be structured to minimise conflicts of interest and encourage independence of judgement as far as practically possible. As well as expertise in AI, there are many other areas of subject matter expertise that will be needed to evaluate an AI system’s features. For instance, experts on topics as wide as fairness, psychological harm, and catastrophic risk will be needed.
2.2.3 Auditing & ComplianceEnsure that there are appropriate safeguards against external evaluations leading to unintended widespread distribution of models
Allowing external evaluators to download models onto their own hardware increases the chance of the models being stolen or leaked. Therefore, unless adequate security against widespread model distribution can be assured, external evaluators could only be allowed to access models through interfaces that prevent exfiltration (such as current API access methods). It may be appropriate to limit evaluators’ access to information that could indirectly facilitate widespread model distribution in other ways, such as requiring in-depth KYC checks or watermarking copies of the model.
2.3.2 Access & Security ControlsGive external evaluators the ability to securely ‘fine-tune’ the AI systems being tested
Evaluators cannot fully assess risks associated with widespread model distribution if they cannot fine-tune the model. This may involve providing external evaluators with access to capable infrastructure to enable fine-tuning.
2.2.2 Testing & EvaluationGive external evaluators sufficient time
As expected risks from models increase or models get more complex to evaluate, the time afforded for evaluation may need to increase as well.
2.2.2 Testing & EvaluationGive external evaluators access to versions of the model that lack safety mitigations
Where possible, sharing these versions of a model gives evaluators insight into the risks that might be created if users find ways to circumvent safeguards (meaning ‘jailbreak’ the model). If the model is open-sourced, leaked, or stolen, users may also simply be able to remove or bypass the safety mitigations.
2.2.2 Testing & EvaluationGive external evaluators access to model families and internal metrics
Frontier AI organisations often develop ‘model families’ where multiple models differ along only 1 or 2 dimensions – such as parameters, data, or training compute. Evaluating such a model family would enable scaling analysis to better forecast future performance, capabilities and risks.
2.2.2 Testing & EvaluationGive external evaluators the ability to study all of the components of deployed systems, where possible
Deployed AI systems typically combine a core model with smaller models and other software components, including moderation filters, user interfaces to incentivise particular user behaviour, and plug-ins for extension capabilities like web browsing or code execution. For example, a red team cannot find all the flaws in the defences of a system if they aren’t able to test all of its different components. It is important to consider the need to balance external evaluators’ ability to access all components of the system against the need to protect information that would allow bypassing model defences.
2.2.2 Testing & EvaluationAllow evaluators to share and discuss the results of their evaluations, with potential restrictions where necessary
for example, not sharing proprietary information, information whose spread could lead to substantial harm or information that would have an adverse effect on competition in the market. Sharing the results of evaluations can allow governments, regulators, users, and other frontier AI organisations to make informed decisions.
3.3.1 Industry CoordinationModel reporting and information sharing
Transparency around frontier AI can help governments to effectively realise the benefits of AI and mitigate AI risks. Transparency can also encourage sharing of best practices across frontier AI organisations, enable users to make well-informed choices about whether and how to use AI systems, and increase public trust, helping to drive AI adoption. Reporting and sharing information where appropriate could ensure that different parties can access the information they need to support effective governance, develop best practice, inform decision-making about the use of AI systems, and build public trust. Some reporting practices- such as model cards- are already used among frontier AI organisations, whereas other practices included here are areas for future consideration. Given the recent rapid pace of progress in AI, the appropriate government and international governance institutions are still being considered and we recognise that limits the ability of frontier AI organisations to share information with governments, even where it would be desirable. Throughout this section ‘relevant government authorities’ is used to indicate a good practice for information sharing with governments while recognising such relevant authorities may still be under development.
3.3.1 Industry CoordinationModel reporting and information sharing > Share model-agonistic information
3.3.1 Industry CoordinationModel reporting and information sharing > Share model-specific information
Sharing information about specific frontier AI models allows external actors to develop a more granular picture of ongoing AI development and potential risks that will need to be addressed.
3.3.1 Industry CoordinationModel reporting and information sharing > Share different information with different parties
99 OtherSecurity controls including securing model weights
To ensure the safety of frontier AI, consideration of cyber security, protective security risk management and insider risk mitigation is key. Cyber security, both of models and the systems that deploy them, must be considered from the outset of development to ensure that the benefits of AI can be realised. Cyber security is a key underpinning for the safety, reliability, predictability, ethics and potential regulatory compliance of an AI system. To avoid putting safety or sensitive data at risk, it is important to consider the cyber security of AI systems, as well as models in isolation, and to implement cyber security processes throughout the AI lifecycle, particularly where that component is a foundation for other systems. As AI systems advance, developers must maintain an awareness of possible attacks, identify vulnerabilities and implement mitigations. Failure to do so will risk designing vulnerabilities into future AI models and systems. A Secure by Design approach allows developers to ‘bake in’ security from the outset of design and development. Cyber security must be considered in concert with physical and personnel security. Developing a coherent, holistic, risk based and proportionate security strategy, supported by effective governance structures, is essential. Where the compromise of an AI system could lead to tangible or widespread physical damage, significant loss of business operations, leakage of sensitive or confidential information, reputational damage and/or legal challenge, then it is important that AI security risks are treated as mission critical.
2.3.2 Access & Security ControlsSecurity controls including securing model weights > Implement strong cyber security measures and processes (including security evaluations) across their AI systems, including underlying infrastructure and supply chains
2.3 Operations & SecurityEmerging processes for frontier AI safety
UK Department for Science, Innovation and Technology (2023)
The UK recognises the enormous opportunities that AI can unlock across our economy and our society. However, without appropriate guardrails, such technologies can pose significant risks. The AI Safety Summit will focus on how best to manage the risks from frontier AI such as misuse, loss of control and societal harms. Frontier AI organisations play an important role in addressing these risks and promoting the safety of the development and deployment of frontier AI. The UK has therefore encouraged frontier AI organisations to publish details on their frontier AI safety policies ahead of the AI Safety Summit hosted by the UK on 1 to 2 November 2023. This will provide transparency regarding how they are putting into practice voluntary AI safety commitments and enable the sharing of safety practices within the AI ecosystem. Transparency of AI systems can increase public trust, which can be a significant driver of AI adoption. This document complements these publications by providing a potential list of frontier AI organisations’ safety policies. These have been gathered after extensive research and will need updating regularly given the emerging nature of this technology. The safety processes are not listed in order of importance but are summarised in themes. The government is not suggesting or mandating any particular combination of policies – merely detailing the current suite available so that others can understand, interpret and compare frontier companies’ safety policies. This document contains the world’s first overview of emerging safety processes focused on frontier AI and is intended to be a useful tool to boost transparency. This conversation is for frontier AI and whilst it is important that safety is applied throughout the AI sector, it is also important that innovation is not stifled, hence why policies must be proportionate and based on capabilities which are the key driver of risk. This document contains processes and associated practices that some frontier AI organisations are already implementing and others that are being considered within academia and broader civil society. It is intended as a guide for readers of frontier AI companies’ AI safety policies to better understand what good policy might look like, though organisations themselves will be best placed to determine their applicability. Through this exercise, the government intends to help inform dialogue on potential appropriate measures for individual organisations to consider at the UK AI Safety Summit.
Other (multiple stages)
Applies across multiple lifecycle stages
Governance Actor
Regulator, standards body, or oversight entity shaping AI policy
Measure
Quantifying, testing, and monitoring identified AI risks
Other