Common Biases of Vector Embeddings

Jul 21, 20161 reportSeverity: SubstantialToolHigh confidence

Researchers discovered that word embeddings trained on Google News articles exhibit significant gender stereotypes and biases that can be amplified when these embeddings are used in downstream machine learning applications.

Researchers analyzed word embeddings, a popular framework for representing text data as vectors used in machine learning and natural language processing tasks. They found that word embeddings trained on Google News articles exhibited female/male gender stereotypes to a disturbing extent. The study revealed that gender bias could be captured geometrically by a direction in the word embedding space, and that gender neutral words were linearly separable from gender definition words. The researchers demonstrated problematic associations such as between 'receptionist' and 'female' while desired associations like 'queen' and 'female' remained appropriate. To address this issue, they developed methodology and algorithms to 'debias' embeddings by removing gender stereotypes while preserving useful properties like clustering related concepts and solving analogy tasks. They defined metrics to quantify both direct and indirect gender biases and used crowd-worker evaluation and standard benchmarks to demonstrate that their debiasing algorithms significantly reduced gender bias while maintaining the embeddings' functionality for downstream applications.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Differential Treatment, inferred)

National Security Assessment

Overall Score

Stakeholders

: Microsoft Research, Google, Boston University
: Microsoft Research, Boston University
: Women, Minority Groups

AI System Classification

: Text Style Replication
: Tool
: 2 High Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database