New York City's teacher evaluation system used value-added models and group measures to rate teachers based on standardized test scores, resulting in inconsistent and potentially unfair evaluations that affected thousands of educators.
New York City implemented a teacher evaluation system that relied heavily on value-added measurement (VAM) and standardized test scores to rate teachers from 2012 onwards. The system evaluated over 12,000 teachers who taught fourth through eighth grade English or math between 2007 and 2010. In the 2015-16 school year, 53 percent of NYC teachers were evaluated using group measures, meaning they were judged by test scores from subjects or students they did not teach. The VAM system used complex computer models to predict student performance and rate teachers based on how their students performed compared to these predictions. However, the system produced highly inconsistent results, with teachers receiving vastly different ratings for teaching the same students in the same year - for example, one teacher scored 97 out of 100 in language arts but only 2 out of 100 in math with identical students. The average confidence interval for these estimates was 35 percentile points in math and 53 in English Language Arts, indicating substantial measurement error. Teachers like Sheri Lederman, who had consistently high student performance, were rated as 'ineffective' despite their students scoring above state averages. The system led to hundreds of teachers in districts like Syracuse and Rochester planning appeals of their evaluations, with union leaders reporting that 40 percent of teachers in these districts received the two lowest ratings.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Intentional
Due to an expected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed