Microsoft Copilot was found to be exposing sensitive data from over 20,000 private GitHub repositories through cached content that remained accessible even after repositories were made private or deleted.
Security researchers at Lasso discovered that Microsoft Copilot could access and return content from private GitHub repositories through Bing's caching mechanism. The issue began when Lasso found their own private repository data appearing in Copilot responses, despite the repository being inaccessible on GitHub. Investigation revealed that any GitHub repository that was public even briefly could be indexed by Bing and remain accessible through Copilot long after being made private or deleted. Lasso extracted over 20,000 since-private GitHub repositories affecting more than 16,000 organizations including major companies like Google, IBM, PayPal, and Microsoft itself. The exposed data included intellectual property, sensitive corporate information, access keys, tokens, and over 300 private credentials. Lasso reported the findings to Microsoft in November 2024, but Microsoft classified it as 'low severity' and stated the caching behavior was 'acceptable.' Microsoft disabled Bing's cached link feature in December 2024, but Copilot continued to have access to the cached data even after the fix, indicating only a partial resolution.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed