It seems that performance pay based on gains in student achievement may not be so good at identifying good teachers as its advocates claim.
It has been delivered a body blow by a major new study published in the United States. The study shows that “value-added” methods for determining the effectiveness of classroom teachers are built on very shaky assumptions and may be highly unreliable and misleading.
The study tested “value-added” performance pay models by analysing the contribution of 5th grade teachers on 3rd and 4th grade student achievement. Astoundingly, it showed a large impact. It sounds crazy, but it was all part of a clever test of the reliability of the models. The study has been lauded as a major advance in the literature on value-added modelling of school effectiveness and teachers.
Performance pay for teachers based on student achievement is increasingly proposed to improve the quality of teaching and schools. Many propose that merit pay should be based on “value-added” measures of student achievement rather than absolute levels of achievement.
The idea is that value-added models provide a fairer measure of teacher effectiveness because they track students’ year-to-year learning gains. That way, teachers are not getting undeserved blame for the learning deficits that students bring with them to the classroom or undue rewards for being blessed with a classroom of high achievers. It is proposed that teachers who achieve high “value added” on students’ test performance should be rewarded with higher salaries.
The new study challenges the assumption that teachers’ salaries can be successfully linked to value-added in student achievement. The study by Jesse Rothstein, Assistant Professor of Economics and Public Affairs at the Woodrow Wilson School at Princeton University, finds that current models used to measure teacher quality through standardized testing do not accurately capture the contribution of teachers to student development. The study is to be published in the Quarterly Journal of Economics next year. It can be downloaded here.
Rothstein’s study focuses on the challenges of distinguishing a teacher’s contribution from pre-existing differences among students. For example, teachers do disparate jobs – some teach gifted students and classes, others teach students with limited English skills, or with special needs. For accountability policies to produce improvements in teacher quality, it is essential to ensure that teachers who get the “right” students who test well do not get unfair advantages, and teachers who are assigned the “wrong” students who test poorly do not get unfair disadvantages. It would be harmful and unfair to implement a merit pay system that rewards teachers who work with more gifted students and penalizes those who work with more challenging students.
Accordingly, an effective performance pay system would have to adjust for the difficulty of tasks faced by teachers, comparing a teacher only with others working with similar students. So-called “value-added models” (VAMs) purport to do this using students’ own past test scores to adjust their current scores. Only the deviation of student achievement from what could have been predicted based on the previous year’s scores is attributed to the teacher’s contribution.
A key issue about the assignment of students to teachers is whether last year’s test score is a sufficient control for the difficulty of this year’s score. Rothstein shows that this is not so. Much of the variation in the difficulty of the task that teachers face is not adequately accounted for by existing VAMs. As a consequence, the VAMs do not accurately capture teachers’ true value added. They tend to credit and penalize teachers in part for pre-existing differences among the students that they teach. This is a fatal flaw in a merit based pay system based on student achievement.
To show this, Rothstein employed a “falsification” test using student test data from North Carolina. He tests whether value-added models would show that 5th grade teachers have effects on their students’ test scores in 3rd and 4th grades. Since it is impossible for students’ future teachers to cause their previous achievement outcomes, the models should show no such effects. But in fact they do. Rothstein tested three VAMs currently used for teacher accountability and each one indicated that 5th grade teachers have large effects on students’ 3rd and 4th grade achievement.
Rothstein says that value-added calculations are based on the assumption that the assignment of students to different classrooms is random, overlooking the day-to-day reality of what happens in schools. This reality is that students are often systematically sorted into classrooms on the basis of past achievement. A principal, for example, might assign a large number of students with behaviour problems to a teacher who is known to have a way with problem students or parents of high achievers might lobby to get their child in a class with the “best” teacher. When that happens, it biases the results of value-added calculations.
The study finds that performance pay systems based on available value-added estimates will to a substantial degree reward and punish teachers for the students they are assigned, rather than for what they are able to accomplish with those students. Moreover, it found that pay systems that rely on measures of short term value-added do an extremely poor job of rewarding the teachers who are best for students’ longer run outcomes.
Rothstein’s results cast doubt on the ability to use value-added as a basis for performance pay. There is such large variation in the difficulty of the tasks faced by teachers with different sorts of students that it is virtually impossible to adequately control for it in a statistical model. Consequently, use of value-added estimates in hiring, firing, and compensation decisions may reward and punish teachers for the students they are assigned, as much as for their actual effectiveness in the classroom.