These visualizations are experimental and still being refined. Designs and data presentation may change.

Predicted vs Observed Severity

Comparing expert BAU severity assessments (how severe experts predict each risk subdomain will be) against observed incident severity (average direct harm across real-world incidents). Each dot represents one subdomain, positioned by its standardized prediction gap: both scales are z-score normalized before comparison. Positive values mean experts rate a subdomain as relatively more severe than incidents show; negative values mean incidents are relatively worse than predicted.

Show mitigation effectsShow low sample sizes

Subdomain

← Observed > PredictedPredicted > Observed →

-2.5σ-2σ-1.5σ-1σ-0.5σ0+0.5σ+1σ+1.5σ+2σ+2.5σ

Gap

1.1 Discrimination

-1.08σ

1.2 Toxic content

-2.29σ

1.3 Unequal performance

-1.86σ

2.1 Loss of privacy

-0.68σ

2.2 AI security vulnerabilities

+1.12σ

3.1 False information

+2.16σ

4.1 Disinformation & influence

+0.07σ

4.2 AI weapons & cyberattacks

+0.58σ

4.3 AI fraud & scams

-0.73σ

5.1 Overreliance & unsafe use

-0.91σ

6.1 Power centralization

+0.77σ

6.2 Inequality & unemployment

+0.60σ

6.3 Devaluation of human creativity

+0.83σ

7.3 Capability & robustness

+0.02σ

7.4 Transparency & interpretability

-0.99σ

Key Takeaways

1.3.1 False or misleading information has the largest relative overprediction (+2.16σ) — experts rate this subdomain as relatively more severe than incidents show.
2.6.6 Environmental harm has the largest relative underprediction (-2.72σ) — incidents are relatively more severe than expert predictions.
3.12 of 21 subdomains show experts rating relatively higher severity; 9 show incidents relatively exceeding predictions.
4.21 of 24 subdomains have qualifying incident data. 3 excluded (no severity data) and 6 have small samples (<5 incidents).

Both scales are z-score normalized (μ=0, σ=1) before comparison to account for different measurement scales.