AI-powered interview software from MyInterview and Curious Thing showed significant reliability issues when tested, including incorrectly assessing personality traits from German speech interpreted as nonsensical English and failing to properly evaluate language competency.
Journalists tested AI-powered interview software from two companies, MyInterview and Curious Thing, during the COVID-19 pandemic when demand for such technologies surged. The systems analyze candidates' responses to determine personality traits and job matching scores. During testing, a candidate completed interviews on both platforms while speaking German instead of English. Curious Thing, which conducts phone interviews, awarded the candidate a 6 out of 9 for English competency despite her reading a German Wikipedia entry about psychometrics. MyInterview not only provided a personality assessment but predicted the candidate to be a 73% match for a fake office administrator position, placing her in the top half of applicants. The system's transcript showed it had interpreted German words as English, creating nonsensical text. MyInterview's industrial psychologist explained that the algorithm pulled personality traits from voice intonation rather than content, though experts noted that intonation is not a reliable indicator of personality traits. The companies acknowledged the testing results, with Curious Thing noting this was their first encounter with German language input.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed