Value-added ratings of teachers have been slammed in a new report by education measurement experts in the United States.
The report says that the ratings are highly error-prone, will lead to unreliable and unfair assessments and will have significant harmful consequences. It says that they should not be used as a major factor in teacher assessment and calls for a more comprehensive approach to teacher evaluation.
Value-added measurement is the big new recipe to improve the quality of teaching. It uses test scores to track the learning growth of individual students as they progress through the grades to see how much “value” a teacher has added. It is claimed this is a better way to evaluate, reward and remove teachers than existing methods.
More and more schools in the US are using test scores to evaluate teachers, determine their pay, and in some cases, terminate them. The Obama administration has required states to change laws to allow teachers to be evaluated primarily by value-added measures in order to be eligible for federal education grants.
The Los Angeles Times recently used a value-added measure to rate more than 6,000 California teachers. The Washington DC School Chancellor, Michelle Rhee, said she would consider making these value-added assessments public as well and the US Secretary of Education, Arne Duncan, has called for all school districts to make such results public.
The new report, Problems with the Use of Student Test Scores to Evaluate Teachers, was published by the Economic Policy Institute, a non-partisan, non-profit think tank based in Washington DC.
The report has unusual credibility because its ten authors are prominent education and testing experts. They include four former presidents of the American Educational Research Association; two former presidents of the National Council on Measurement in Education; and the current and two former chairs of the Board of Testing and Assessment of the National Research Council of the National Academy of Sciences.
The report says evaluating teachers by their value-added is “unwise” because it cannot reliably and accurately identify more or less effective teachers. It cites research studies that show the results are prone to large statistical error and are highly unstable across statistical models, years and classes.
One recent study prepared for the US Department of Education found that the magnitude of errors that are likely with value-added measurement is sufficiently large to lead to the mis-classification of many teachers. For example, it found an error rate of 26% when three years of data are used to measure teacher performance. This means that more than one in four average teachers would be misclassified as either outstanding or very poor and more than one in four who are either outstanding or very poor would be mis-classified as average.
Another study cited in the report found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%. Another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year. Thus, a teacher who appears to be very ineffective in one year might have a dramatically different result the following year. The same dramatic fluctuations were found for teachers ranked at the bottom in the first year of analysis.
A further study designed used value-added methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that value added results are based on factors other than teachers’ actual effectiveness.
The report also found that even though value-added measures purport to take socio-economic factors into account, many other factors can skew the estimates. These include the influences of other teachers, both previous teachers and, in secondary schools, current teachers of other subjects. Schools that pull out students for specialist instruction and use team teaching will not be able accurately isolate individual teacher “effects” for evaluation, reward pay, or sanctions.
Other factors include school conditions such as the quality of leadership, curriculum materials, specialist support, class size, student welfare support and many other factors that affect student learning.
Student test score gains are also strongly influenced by school attendance and a variety of out-of-school learning experiences at home, with peers and in the community. Well-educated and supportive parents can help their children with homework and secure a wide variety of other advantages for them. Other children have parents who, for a variety of reasons, are unable to support their learning academically. Student test score gains are also influenced by family resources, student health, family mobility, and the influence of neighbourhood peers and of classmates who may be relatively more advantaged or disadvantaged.
The report says that value-added measurement generally fails to take proper account of these other factors influencing student progress in school:
…even when methods are used to adjust statistically for student demographic factors and school differences, teachers have been found to receive lower “effectiveness” scores when they teach new English learners, special education students, and low-income students than when they teach more affluent and educationally advantaged students. The non random assignment of students to classrooms and schools – and the wide variation in students’ experiences at home and at school – mean that teachers cannot be accurately judged against one another by their students’ test scores, even when efforts are made to control for student characteristics in statistical models. [p.3]
All this means that a large proportion of teachers may be wrongly assessed – some average teachers will be identified as highly or poorly performing and some highly effective and poor teachers will be identified as average. For these reasons, the report concludes:
Recognizing the technical and practical limitations of what test scores can accurately reflect, we conclude that changes in test scores should be used only as a modest part of a broader set of evidence about teacher practice. [p.4]
It also says that using such a flawed statistical approach will have significant negative consequences including: narrowing the curriculum further; causing teachers to focus more heavily on areas that are likely to improve scores; and discouraging teachers from working with low-income students with lower test scores.
It cites research studies which show that excessive focus on basic reading and mathematics test scores can lead to narrowing and over-simplifying the curriculum to only the subjects and formats that are tested, reducing the attention to science, history, the arts, civics, and foreign language, as well as to writing, research, and more complex problem-solving tasks.
Using test scores to evaluate teachers also unfairly disadvantages teachers of students with the largest learning needs. Because of the inability of value-added methods to fully account for the differences in student characteristics and in school support measures, teachers who teach the neediest students will appear to be less effective than they are. This could exacerbate disincentives for teachers to teach students with high levels of need because it will reduce their teacher effectiveness scores.
Individual teacher rewards based on comparative student test results can also create disincentives for teacher collaboration. Better schools are collaborative institutions where teachers work across classroom and grade-level boundaries toward the common goal of educating all children to their maximum potential. A school will be more effective if its teachers are more knowledgeable about all students and can coordinate efforts to meet students’ needs.
Evaluation of teachers by value added test scores may also have major implications for the future of the profession. The report says that the large, unpredictable variation in the results and their perceived unfairness can undermine teacher morale. It notes surveys which have found that teacher attrition and demoralization have been associated with test-based accountability efforts, particularly in high-need schools. It says:
Adopting an invalid teacher evaluation system and tying it to rewards and sanctions is likely to lead to inaccurate personnel decisions and to demoralize teachers, causing talented teachers to avoid high-needs students and schools, or to leave the profession entirely, and discouraging potentially effective teachers from entering it. [p.4]
The report concludes that there is “no shortcut” to evaluating and improving teaching. Sound evaluation of teachers necessarily involves a balancing of many factors that provide a more accurate view of what teachers do in the classroom and how that contributes to student learning.
It says that approaches other than relying on test scores and value-added measurement have been found to improve teachers’ practice while identifying differences in teachers’ effectiveness. These approaches use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and lesson plans, assignments, and samples of student work. Multiple measures of student learning gains can be used in a supplementary role.
The report says that evaluation by competent supervisors and peers, employing such approaches, should form the foundation of teacher evaluation systems. It says that no system is perfect, but a more comprehensive system than value-added measurement is needed:
What is now necessary is a comprehensive system that gives teachers the guidance and feedback, supportive leadership, and working conditions to improve their performance, and that permits schools to remove persistently ineffective teachers without distorting the entire instructional program by imposing a flawed system of standardized quantification of teacher quality. [p.21]