Caution Needed in Interpreting PISA 2015 Results

Last year, the Director of Education at the OECD, Andreas Schleicher, admitted that the switch from pen-and-paper to computer tests for PISA 2015 assessments may have contributed to significant falls in results amongst higher performing countries. A new research paper published by the Centre of Education Economics in the UK provides more evidence for this.

The study by Professor John Jerrim from the Institute of Education at University College London found that students completing the computer-based test performed substantially worse than students completing the paper-based test. It also found that differences remained after an OECD adjustment of the results to compensate for the change in the mode of the tests.

There were widespread declines in the PISA 2015 test scores in high performing countries in science and mathematics. The average OECD science score in PISA 2015 was around eight points lower than in 2012 and 11 of the top 30 countries in 2012, including Australia, had a significant decline in achievement in 2015. For example, the Hong Kong science score fell by 32 points which is the equivalent of a year of learning; Korea’s score fell by 22 points and Finland’s by 26 points. Korea’s mathematics score fell by 30 points and Hong Kong’s fell by 13 points.

Australia’s science score fell by 11 points after a small decline from 2006 to 2012. Australia’s reading score dropped by 9 points after virtually no change between 2006 and 2012. Its mathematics score fell by 10 points, but there was a steady decline between 2006 and 2012.

The study examined the impact of the switch to computer-based assessment by using data from the OECD field trial for PISA 2015 carried out in 2014 in all countries that made the switch. Students taking part in the field trial were randomly assigned to complete the same PISA questions on a computer or using paper and pen, so it was possible to draw causal inferences. The study used data from Germany, Ireland and Sweden.

It found that students who took the computer version of the PISA test scored much lower than their peers who were randomly assigned to the paper version. The average difference across reading, mathematics and science results in Germany was 22 points, or over six months of schooling, including a 26-point difference in science. The average difference in Ireland was 15 points and 11 points in Sweden.

The OECD had recognised the problem of a possible test mode effect and adjusted the PISA 2015 results to compensate for the change. However, the study found that while the correction applied by the OECD did reduce the test mode effect statistically significant differences remained in the German and Irish results, but not in the Swedish results. The difference in science in Germany was 15 points and 11 in Ireland. The average difference in across reading, mathematics and science in Germany was 13 points, 7 points for Ireland and 2 points for Sweden.

The OECD PISA 2015 report itself noted that is not possible to rule out small and moderate effects of the mode of delivery on the mean performance of countries [p.188]. Last year, Schleicher told The Times Educational Supplement that the move to computer-based tests might be part of the reason behind the fall in results in some high performing East Asian countries. He said:

It remains possible that a particular group of students – such as students scoring [high marks] in mathematics on paper in Korea and Hong Kong – found it more difficult than [students with the same marks] in the remaining countries to perform at the same level on the computer-delivered tasks.
Such country-by-mode differences require further investigation – to understand whether they reflect differences in computer familiarity, or different effort put into a paper test compared to a computer test.

Jerrim concluded that his results show that policy makers should take great care when comparing the computer-based PISA 2015 results to the paper-based results from previous PISA cycles. It could have contributed to the fall in rankings by some countries, but there is not enough evidence available to make a firm conclusion.

Trevor Cobbold


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.