Labor’s cash bonuses for schools and teachers will be determined in part by improvements in test scores. There is little evidence that such schemes increase student achievement, but they are likely further harm education. The plan also glosses over the difficulty of reliably and accurately measuring the effect teachers and schools have on the learning growth of students.
The following is a slightly edited version of a submission made last year by the renowned US academic, Professor Helen Ladd, to the US education department criticising President Obama’s plan to require states to use test scores to evaluate the performance of teachers and principals.
Comments on Proposed Regulations for Education Stimulus Funds
I am writing to object to the heavy emphasis in the regulations on using student test scores for the formal evaluation of teachers and school principals. While student test scores clearly have a role to play in the overall effort of improving schools, they need to be kept in their place. The regulations you are proposing gives them a pride of place that will lead to little good and is likely to do much harm.
As an academic researcher with experience working with longitudinal data on students, teachers and principals, I have estimated value added models examining the effects of teacher credentials, examined teacher and principal labour markets, and evaluated school- based accountability programs.
Potential for harm
The main problem with the heavy focus of the proposed test-based approach is that it ratchets up the pernicious narrow test- based approach to education represented by No Child Left Behind (NCLB). The approach is narrow in part because the requirement that all students be tested every year means that students can be tested in only a limited number of subjects.
The result is a heavy emphasis on the basic skills of math and reading, to the detriment of other skills and orientations that young people need to become effective participants in the global society. Further, the emphasis on test results for individual teachers will exacerbate the well-documented incentives for teachers to focus on narrow test taking skills and drilling. It is time to move beyond this misplaced emphasis on test scores in a few subjects to return to the broader goals of education that have been such an important part of our history.
Any positive effects are likely to be limited at best
One theory of action seems to be that holding teachers more accountable for the gains in their students’ test scores will induce them to become better teachers. At this point, I am not aware of any credible evidence in support of that proposition.
Indirect evidence from schools’ experiences with the test based pressures of NCLB or its state level precursors suggests that any positive effects are likely to be small at best. While such school-based accountability pressures, which apply to all teachers in each school rather than to individual teachers, have undoubtedly raised scores on the high stakes tests on which they are based, there is far less evidence that they have raised test scores on lower stakes tests such as NAEP (National Assessment of Education Progress).
The best recent evidence on the achievement effects of NCLB is by Tom Dee and Brian Jacob. Using NAEP data, they conclude that the federal legislation appears to have increased student achievement in fourth grade math, but not in fourth grade reading, or 8th grade math or reading. Such findings, which are fully consistent with other research, provide little or no support for the view that test based incentives for individual teachers will lead to significant gains in student achievement, especially at levels beyond the most basic that are the easiest to measure with standardized tests.
My own recent overview of the research on teacher effects highlights the tremendous difficulties that arise in using student test scores in a fair way to evaluate teachers. Simple approaches that focus on whether a teacher’s students in one year perform at higher levels than her students in the previous year are clearly inappropriate because of the changing mix of students from year to year.
The “solution” proposed by researchers such as Thomas Kane is to use longitudinal data for students linked to teachers to measure the gains in achievement of the specific students in a teacher’s class , which then can be compared to some standard expected gain. But this “value-added” approach is fraught with difficulties.
Even the most sophisticated approaches typically cannot distinguish the contribution of teachers from the classroom context, and they generate estimates of a teacher’s quality that jump around from one year to the next, largely because of the small sample sizes for individual teachers.
Many states and districts are currently experimenting with various other options, including peer evaluations. If the Department of Education is seriously interested in improving the evaluation of teachers and making them serve the constructive purpose of improving student achievement, the Department should actively encourage states to experiment with a range of approaches that differ in the extent to which they rely on student test scores.
Moreover, those experiments should all be fully evaluated. It is far too early in the development of new evaluation systems for the Department to impose its preferred solution on the states.
For both teachers and principals, it is neither fair nor constructive to try to hold them accountable for factors over which they have little control, using statistical measures that are based on a narrow range of outcomes, and that are subject to large amounts of random variability.
Helen F. Ladd is the Edgar Thompson Professor of Public Policy, Sanford School of Public Policy, Duke University. She is co-author of the Handbook of Research in Education Finance and Policy, Routledge, New York, 2008. The full submission is available here .