The following is a review of research studies published in 2010 on market-based education reforms in the United States. It shows that performance pay for teachers, charter schools and value-added measures of school and teacher performance are not succeeding in raising student achievement. It is a slightly edited version of an article originally published by the Albert Shanker Institute and then in The Washington Post .
“Race to the Top” and Waiting for Superman made 2010 a banner year for the market-based education reforms that dominate our national discourse. By contrast, a look at the “year in research” presents a rather different picture for the three pillars of this paradigm: merit pay, charter schools, and using value-added estimates in high-stakes decisions.
There will always be exceptions (especially given the sheer volume of reports generated by think tanks, academics, and other players), and one year does not a body of research make. But a quick review of high-quality studies from independent, reputable researchers shows that 2010 was not a particularly good year for these policies.
First and perhaps foremost, the first and best experimental evaluation of teacher merit pay (by the National Center on Performance Incentives) found that teachers eligible for bonuses did not increase their students’ tests scores more than those not eligible. Earlier in the year, a Mathematica study of Chicago’s Teacher Advancement Program (which includes data on the first two of the program’s three years) reached the same conclusion.
The almost universal reaction from the market-based reformers was that merit pay is not supposed to generate short-term increases in test scores, but rather to improve the quality of applicants to the profession and their subsequent retention.
In the area of charter schools, a report was published by the US Department of Education showing no test score benefits of charter middle schools. An article published in the American Journal of Education not only found no achievement advantage for charters, but also that a measure of “innovation” was actually negatively associated with score gains.
So, some of the best work ever on charters and merit pay came in 2010, with very lackluster results, just as a massive wave of publicity and funding awarded these policy measures a starring role in our national education policy.
This past year was also bountiful for value-added research. Strangely, the value-added analysis that got the most attention by far – and which became the basis for a series of Los Angeles Times stories – was also among the least consequential. The results were very much in line with more than a decade of studies on teacher effects. The questionable decision to publish teachers’ names and scores, on the other hand, garnered incredible public controversy.
From a purely research perspective, other studies were far more important. Perhaps most notably in the area of practical implications, was a report published by the US Education Department showing high error rates which persisted even with multiple years of data. On a similar note, a look at teacher value-added scores in New York City – the largest district in the nation – found strikingly large error margins.
The news wasn’t all bad, of course, and value-added research almost never lends itself to simple “yes/no verdicts.” For instance, the recently released preliminary results from the Gates-funded research study on Measures of Effective Teaching (MET) provide new evidence that alternative measures of teacher quality, most notably student perceptions of their effectiveness, maintain modest but significant correlations with value-added scores (similarly, another 2010 paper published by the National Bureau of Economic Research found an association between principal evaluations and value-added scores, and also demonstrated that principals may use this information productively).
In contrast to absurd mass media claims that these preliminary MET results validate the use of value-added scores in high-stakes decisions, the first round of findings represents the beginning of a foundation for building composite measures of teacher effectiveness. The final report of this effort (scheduled for release this fall) will be of greater consequence.
There were also some very interesting teacher quality papers that didn’t get much public attention, all of which suggest that our understanding of teacher effects on test scores is still very much a work in progress. There are too many to list, but one particularly clever and significant working paper from the NBER found that the “match quality” between teachers and schools explains about one-quarter of the variation in teacher effects (i.e., teachers would get different value-added scores in different schools).
A related, important paper from another research institute found that teachers in high-poverty schools get lower value-added scores than those in more affluent schools, but that the differences are small and do not arise among the top teachers (and cannot be attributed to higher attrition in poorer schools). The researchers also found that the effects of experience are less consistent in higher-poverty schools, which may explain the discrepancies by school poverty.
These contextual variations in value-added estimates carry substantial implications for the use of these estimates in high-stakes decisions. They also show how we’re just beginning to address some of the most important questions about these measures’ use in actual decisions.
In the longer-term, though, the primary contribution of the value-added literature has been to show that teachers vary widely in their effect on student test scores, and that most of the variation is unexplained by conventional variables. These findings remain well-established. But whether or not we can use value added to identify persistently high- and low-performers is still very much an open question.
Nevertheless, 2010 saw hundreds of states and districts move ahead with incorporating heavily-weighted value-added measures into their evaluation systems. The reports above (and many others) sparked important debates about the imprecision of all types of teacher quality measures, and how to account for this error while building new, more useful evaluation systems. The Race to the Top-fueled rush to design these new systems might have benefited from this discussion, and from more analysis to guide it.
Overall, while 2010 will certainly be remembered as a watershed year for market-based reforms, this wave of urgency and policy changes unfolded concurrently with a steady flow of solid research suggesting that extreme caution, not haste, is in order.
Matthew Di Carlo
Albert Shanker Institute