Accidental Exposure of 38TB of Data by Microsoft's AI Research Team

Jun 22, 20231 reportSeverity: MinorToolHigh confidence

Microsoft's AI research team accidentally exposed 38 terabytes of private data including employee workstation backups, secrets, and internal communications when sharing open-source AI training data through a misconfigured Azure SAS token that granted access to their entire storage account.

Microsoft's AI research division was sharing open-source AI models for image recognition through a GitHub repository called 'robust-models-transfer'. The researchers used Azure Shared Access Signature (SAS) tokens to provide download links to their training data. However, the SAS token was misconfigured to grant access to the entire Azure storage account rather than just the intended files. This exposed an additional 38 terabytes of private data including disk backups of two employees' workstations containing passwords, secret keys, private keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees. The token also had 'full control' permissions allowing attackers to potentially delete or modify files, and was set to expire in 2051, effectively making it permanent. The security research team at Wiz discovered this exposure on June 22, 2023, while scanning for misconfigured cloud storage containers. Microsoft invalidated the token on June 24, 2023, and completed their internal investigation by August 16, 2023. The incident highlights risks in AI development workflows where researchers handle massive datasets and the challenges of properly securing cloud storage sharing mechanisms.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

2Privacy & Security

2.1Compromise of privacy by leaking or correctly inferring sensitive information

AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Loss of Privacy, direct)

National Security Assessment

Overall Score

Stakeholders

: Microsoft's AI Research Division
: Microsoft
: Microsoft, Microsoft Employees, Third Parties Relying On The Confidentiality Of The Exposed Data

AI System Classification

: Image Classification
: Tool
: 4 Minimal or No Risk
: 1

Population Impact

: 359
: 359

External Links

View on AI Incident Database