A new paper from the OECD Directorate of Education [Morris 2011] highlights the negative impacts of high-stakes testing on education. It says that the evidence is unclear as to whether standardised tests lead to improved student outcomes, but there is greater certainty that they lead to increased strategic behaviours on the part of schools and teachers which may nullify any positive effects.
The paper synthesises the findings of research studies on the impact of standardised tests on student outcomes and broader education impacts. It finds that the results of the tests are distorted by strategic responses by schools and teachers such as teaching to the test, narrowing of curriculum, teacher cheating and exclusion of students from the tests.
It says that the problems with teaching to the test are two-fold. By emphasising test-taking skills and concentrating on tested content, test scores tend to be inflated without reflecting an increase in student understanding of concepts. It cites another OECD paper which states:
Research from the United States has shown that if national tests are considered to be ‘high stakes’ for teachers and schools, teaching to the test can easily lead to an artificial over-inflation of results and thus render the results useless as a measure of real progress.
Furthermore, when instruction is narrowly focused on specific knowledge, skills and question formats test results become an increasingly misleading measure of student achievement. For example, teaching to the test takes the form of increased focus on tested content areas and reduced focus on creative, innovative and oral skills.
Another limitation of standardised tests is the inability to test achievement over the whole curriculum. In evaluation systems where incentives are attached to test results and where teachers and schools are pressured to improve test scores, schools and teachers have a tendency on aspects of the curriculum which will be tested.
The paper says that curriculum narrowing is different from teaching to the test. Curriculum narrowing refers to the increasingly unbalanced focus on content areas that will be tested and neglecting non-tested areas whereas teaching to the test refers to using test items to teach with. Standardised tests often assess literacy and mathematics and rarely assess other subject areas such as sciences, civics, history, the arts and foreign languages.
Curriculum narrowing leads to more time spent on tested areas, like mathematics and reading, and less time on non-tested content, such as history. As a consequence, teachers may overemphasise the subjects that are tested.
When standardised tests are linked to accountability measures schools and teachers are pressured by incentives and sanctions to improve student test scores. Researchers have found that one response to this pressure is to manipulate the student population and exclude low-performing students from taking standardised tests. For example, the paper says that there is clear evidence that schools have responded to accountability pressures by re-classifying low performing students as students with disabilities so as to be able to exclude them from the tests.
Using standardised tests in accountability systems can also lead to teacher cheating. Teachers have been found to engage in a number of activities which manipulate student test data, including changing student responses, filling in answers that were left blank, allowing additional time for testing, providing correct answers to students or obtaining copies of the exam prior to the test date. Many recent US news reports have brought to light teacher cheating scandals in several US states, further highlighting the prevalence of this problem.
Other negative consequences are also well-documented. The paper says that the repercussions of standardised tests can extend beyond immediate student outcomes and affect the school system and the teacher labour market in complex and pervasive ways.
One such impact is on the teacher labour market. Studies have shown that testing accountability systems have resulted in increased teacher attrition in schools classified as failing relative to other schools. The paper says that the effect of standardised tests on the labour market can be detrimental if effective teachers shy away from entering ‘low-performing’ schools.
In contrast to the studies on the negative consequences of high stakes testing, there is little consensus about its impact on student achievement. Reviews of research studies have come to different conclusions. For example, one review by Figlio & Loeb  concludes:
While, in general, the findings of the available studies indicate achievement growth in schools subject to accountability pressure, the estimated positive achievement effects of accountability systems emerge far more clearly and frequently for mathematics than for reading.
Another review by Mons  finds similar differences in the impact of accountability pressure on mathematics and reading scores:
The research revealed inconsistencies in the results for mathematics and reading: in some cases, reforms were linked to performance improvements, and others saw a decline. Evidence of the relationship between improved student outcomes and standardised testing is “unpredictable” and is a product of a number of complex policy decisions and implementation structures.
A report by the U.S. National Research Council [Hout & Elliott 2011] summarises the literature surrounding the effect of the No Child Left Behind (NCLB) legislation on student outcomes and the report comes to a more sobering conclusion than the Figlio & Loeb review. While Figlio & Loeb highlight studies in which NCLB had a positive impact on mathematics outcomes, the NRC report finds that studies of the impact of NCLB show positive, negative and non-significant results. The review finds that statistically significant effects are concentrated in one area: Year 4, mathematics. Results for Year 8 mathematics and for reading in both Year 4 and Year 8 are not significant or in some case are negative.
The Research Council report concludes the gain in student outcomes as a result of NCLB is small:
…the evidence related to the effects on achievement of test-based incentives to schools appears to be modest, limited in both size and applicability.
Test-based incentive programs, as designed and implemented in the programs that have been carefully studied, have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries. When evaluated using relevant low-stakes tests, which are less likely to be inflated by the incentives themselves, the overall effects on achievement tend to be small and are effectively zero for a number of programs.
The OECD paper concludes from its review of such studies:
In the U.S. case, which tends to dominate the literature due to the controversial effects of the NCLB policy, the impact on student outcomes appears to be positive in terms of elementary mathematics, yet not statistically significant for reading and not great enough to produce the improvement in student learning that the policy aims to achieve.
All this goes to show that the case for high stakes accountability testing is not strong. The impact on student outcomes is uncertain and, at best, small, while the negative consequences are much more apparent and are likely to nullify any positive effects.
Figlio, D. and S. Loeb 2011, School Accountability. In E. Hanushek, S. Machin and L. Woessman (eds.), Handbooks in Economics, Vol. 3, North-Holland, The Netherlands, pp. 383-421.
Hout, M. and S. Elliott (eds.) 2011, Incentives and Test-Based Accountability in Education, National Research Council, The National Academies Press, Washington, D.C.
Mons, N. 2009, Theoretical and Real Effects of Standardised Assessment, Background paper to the study: National Testing of Pupils in Europe, Eurydice Network.
Morris, A. 2011, Student Standardised Testing: Current Practices in OECD Countries and a Literature Review, Working Paper No. 65, OECD Directorate of Education, Paris, 11 October.
Shewbridge, C.; E. Jang; P. Matthews and P. Santiago 2011, OECD Reviews of Evaluation and Assessment in Education: Denmark, OECD, Paris.