This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
User vetting, access restrictions, encryption, and infrastructure security for deployed systems.
Also in Operations & Security
We use the SAIF framework to mitigate known and novel AI security risks. The latter category includes risks such as data poisoning, model exfiltration, and rogue actions. We apply security controls, or repeatable mitigations, to these risks. For example, for prompt injections and jailbreaks, we apply robust filtering and processing of inputs and outputs. Additionally, thorough training, tuning, and evaluation processes help fortify the model against prompt injection attacks. For data poisoning, we implement data sanitization, secure AI systems, enable access controls, and deploy mechanisms to ensure data and model integrity. We have published a full list of our controls for AI security risks. In addition, we continue to research new ways to help mitigate a model’s susceptibility to security attacks. For example, we’ve developed an AI agent that auto-detects real-world code for security risks.
Reasoning
Mitigation name "Managing security" lacks sufficient detail to identify focal activity or implementation mechanism.
Managing content safety
We leverage the expertise our Trust & Safety teams have honed over decades of abuse fighting to establish model and application-level mitigations for a wide range of content safety risks. A critical piece of our safety strategy is a pre-launch risk assessment that identifies which applications have sufficiently great or novel risks that require specialized testing and controls. We also employ guardrails in our models and products to reduce the risk of generating harmful content, for example: • Safety filters. We build safety classifiers to prevent our models from showing users harmful outputs such as suicide content or pornography. • System instructions. We steer our models to produce content that aligns with our safety guidelines by using system instructions — prompts that tell the model how to behave when it responds to user inputs. 13 Manage • Safety tuning. We fine-tune our models to produce helpful, high-quality answers that align to our safety guidelines.
1.2.1 Guardrails & FilteringManaging privacy
We have invested deeply in mitigations for privacy risks, as well as researching new risks that might emerge from evolving capabilities like agentic. For examples, our paper on how AI assistants can better protect privacy by using a “contextual integrity” framework to steer AI assistants to only share information that is appropriate for a given context.
1.1.2 Learning ObjectivesPhased launches
. A gradual approach to deployment is a critical risk mitigation. We have a multi-layered approach — starting with testing internally, then releasing to trusted testers externally, then opening up to a small portion of our user base (for example, Gemini Advanced users first). We also phase our country and language releases, constantly testing to ensure mitigations are working as intended before we expand. And finally, we have careful protocols and additional testing and mitigations required before a product is released to under 18s. To give an example, as Gemini 2.0’s multimodality increases the complexity of potential outputs, we have been careful to release it in a phased way via trusted testers and subsets of countries.
2.3.1 Deployment ManagementMonitoring and rapid remediation
We design our applications to promote user feedback on both quality and safety, through user interfaces that encourage users to provide thumbs up/down and give qualitative feedback where appropriate. Our teams monitor user feedback via these channels closely, as well as feedback delivered through other channels. We have mature incident management and crisis response capabilities to rapidly mitigate and remediate where needed, and feed this back into our risk identification efforts. Importantly, teams are enabled to have rapid-remediation mechanisms in place to block content flagged as illegal.
2.3.4 Incident ResponseProvenance
Outputs of our generative AI products typically carry watermarking (via our SynthID technology) and, when it comes to imagery, relevant metadata (per IPTC standards). As an example, About This Image in Google Image Search started identifying and labeling AI-generated images with SynthID in 2023, alongside other image metadata. We’ve opensourced SynthID to make it easier for any developer to apply watermarking for their own generative AI models, and shared our analysis of how labeling AI-generated content helps people make informed decisions about the content they see online. Google Search, Ads, and YouTube are also implementing the latest version of the Coalition for Content Provenance and Authenticity (C2PA)’s authentication standard. And moving forward, we plan to continue investing in the deployment of C2PA across our services.
1.2.5 Provenance & WatermarkingExplainability
Explainability is about helping people understand how an AI application operates. Products use disclaimers to set clear expectations — such as reminding people that the AI-generated outputs may contain inaccuracies and that they should take steps to verify information generated by the tool. These disclosure policies are backed up by research, and codified into explainability guidelines for our teams.
2.4.2 Design StandardsOther (multiple stages)
Applies across multiple lifecycle stages
Developer
Entity that creates, trains, or modifies the AI system
Manage
Prioritising, responding to, and mitigating AI risks
Other