Statisticians, psychometricians, and economists who have studied the use of test scores for high-stakes teacher evaluation, including its most sophisticated form, value-added modelling (VAM), mostly concur that such use should be pursued only with great caution.
Among the concerns raised by researchers are the prospects that value-added methods can mis-identify both successful and unsuccessful teachers and, because of their instability and failure to disentangle other influences on learning, can create confusion about the relative sources of influence on student achievement.
If used for high-stakes purposes, such as individual personnel decisions or merit pay, extensive use of test-based metrics could create disincentives for teachers to take on the neediest students, to collaborate with one another, or even to stay in the profession.
Donald Rubin,a leading statistician in the area of causal inference, reviewed a range of leading value-added techniques and concluded:
We do not think that their analyses are estimating causal quantities, except under extreme and unrealistic assumptions. [Rubin et.al. 2004: 113]
A research team at RAND has cautioned that:
The estimates from VAM modelling of achievement will often be too imprecise to support some of the desired inferences. [McCaffrey et.al. 2004: 96]
The research base is currently insufficient to support the use of VAM for high-stakes decisions about individual teachers or schools. [McCaffrey et.al. 2003: xx]
Henry Braun, then of the Educational Testing Service, concluded in a review of VAM research:
VAM results should not serve as the sole or principal basis for making consequential decisions about teachers. There are many pitfalls to making causal attributions of teacher effectiveness on the basis of the kinds of data available from typical school districts. We still lack sufficient understanding of how seriously the different technical problems threaten the validity of such interpretations. [Braun 2005: 17]
In a letter to the Department of Education, commenting on the Department’s proposal to use student achievement to evaluate teachers, the Board on Testing and Assessment of the National Research Council of the National Academy of Sciences wrote:
…VAM estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable. [BOTA 2009]
And a recent report of a workshop conducted jointly by the National Research Council and the National Academy of Education concluded:
Value-added methods involve complex statistical models applied to test data of varying quality. Accordingly, there are many technical challenges to ascertaining the degree to which the output of these models provides the desired estimates. Despite a substantial amount of research over the last decade and a half, overcoming these challenges has proven to be very difficult, and many questions remain unanswered… [Braun 2010: vii]
These quotes are from Baker et.al.2010: 7-8.
Baker, E. et. al. 2010. Problems with the Use of Student Test Scores to Evaluate Teachers. Briefing Paper 278, Economic Policy Institute, Washington DC.
BOTA (Board on Testing and Assessment, Division of Behavioral and Social Sciences and Education, National Academy of Sciences) 2009. Letter Report to the U.S. Department of Education on the Race to the Top Fund. October 5.
Braun, H. 2005. Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models. Educational Testing Service, Princeton, NJ.
Braun, H.; Chudowsky, N. & Koenig, J. (Eds.) 2010. Getting Value Out of Value-Added: Report of a Workshop Committee on Value-Added Methodology for Instructional Improvement, Program Evaluation and Accountability, National Research Council.
McCaffrey, D. F. et al. 2003. Evaluating Value-Added Models for Teacher Accountability. RAND Corporation, Santa Monica.
McCaffrey, D. F. 2004. Models for Value Added Modeling of Teacher Effects. Journal of Educational and Behavioral Statistics, 29 (1): 67-101.
Rubin, D. B.; Stuart, E.A. & Zanutto, E.L. 2004. A Potential Outcomes View of Value-added Assessment in Education. Journal of Educational and Behavioral Statistics, 29 (1): 103-116.