New York City Schools Chancellor, Joel Klein, was exposed as a dissembler at his National Press Club address in Canberra last week. Under forensic questioning from The Canberra Times’ education reporter, Emma Macdonald, Klein resorted to lies and deceptions to justify his claims of increases in student achievement in New York City schools.
Macdonald challenged Klein on his claims by citing national reading and mathematics assessments which show that there has been no improvement in student achievement in NYC since 2003, except for 4th grade mathematics. She questioned him on whether the grades given to schools in this year’s school progress reports had been manipulated by reducing the cut-off scores to achieve an A or B.
Klein denied both charges. He said that Macdonald was wrong on both facts. His response was to falsely assert that the cut-off scores for school grades had not been reduced, falsely claim that New York State tests were a better measure of student achievement than the independent national assessments, and to selectively cite evidence about the success of African-American students.
Cut-off Scores for School Grades
Klein’s answer that the cut-off scores for school grades were not reduced in 2008 is an outright lie. The Educator Guides for school progress reports published by the New York City Department of Education [2007a, 2007b, 2008a, 2008b] provide the actual cut-off scores for each grade level and show that they were reduced for the 2007-08 reports. For example, the cut-off score for an elementary school to achieve an A was reduced from 64 for the 2006-07 progress reports to 59.6 for 2007-08 and the score for a B was reduced from 49.9 to 45.8. The cut-off score for a high school to achieve an A was reduced from 67.6 to 64.2 and that for a B reduced from 48.8 to 43.5.
Not surprisingly, there was a large increase in schools achieving an A or B in 2007-08 compared to the previous year. This year, 79% of elementary schools received an A or B compared to 71% last year and 83% of high schools received an A or B compared to 65% in 2006-07.
Virtually all of the increase in elementary schools achieving an A or B and about two-thirds of the increase for high schools appears due to the reduction in grade cut-off scores. If last year’s cut-off points had been used, only 72% of elementary schools would be rated as A or B, almost no change from last year, and only 71% of high schools.
The reduction in cut-off scores is not even mentioned in the list of changes to the school progress reports appended to the new technical guides to the reports published by the Department of Education. The guides for the 2006-07 reports stated the cut-off scores ‘will be used for the next several years’. They lasted only one year before being revised down.
It appears that the cut-off scores are to be increased for 2008-09. For example, the Department has announced that the new cut-off score for a high school to achieve an A is 70 and for a B it is 54. These cut-off scores are even higher than they were for the 2006-07 school progress reports.
In answering Macdonald’s question, Klein attempted to use this increase as a smokescreen to avoid admitting that the cut-off scores had been reduced for this year’s progress reports. Not only did it expose him as a dissembler, but it also demonstrated the unreliability of his school reporting system for comparing school performance from one year to the next.
The changes in cut-off scores are one of many made to the methodology for calculating school scores and grades for the school progress reports. Several other changes were made for the 2007-08 reports. They invalidate comparisons of school performance over time and show that the reporting system lauded by Julia Gillard lacks integrity and credibility.
National vs. State Tests
Klein claims that the New York State tests are a better accountability measure because they are mandatory whereas the National Assessment of Education Progress (NAEP) conducted by the US Department of Education is based on only a sample of students. This claim is false on several grounds.
First, NAEP is a properly constructed valid statistical sample of the student population. It is widely seen by experts as the best measure of student achievement in the United States. It is very carefully designed to measure trends in student achievement, it reflects a consensus about what students should know and its coverage of the tested curriculum areas is broad [Koretz 2008: 92]. As such, the NAEP is no less valid than the State tests, despite Klein’s claim. Any statistical textbook will say that a well-constructed random sample can estimate the true result just as well as an exhaustive count of all potential participants. This is the whole point of statistical sampling.
Second, the NAEP has the advantage of providing consistent comparisons over time. The New York State tests do not provide such comparisons as there have been significant changes to these tests in recent years.
The State test data is not strictly comparable between 2003 and 2007/8. In 2006, the New York State Department of Education expanded its English and mathematics testing program to grades 3- 8. Previously, the State tests were administered in grades 4 and 8 and tests for grades 3, 5, 6 and 7 were administered by the New York City Department of Education. The State tests for grades 3 – 8 include both multiple choice and extended response questions while the City tests comprised multiple choice questions only. As a result of these changes, the results from 1999 to 2005 cannot be compared to those from 2006 to 2008 because the State Department changed the scale scores with the expansion of the State tests for English and mathematics.
Other changes have been also made since then. For example, there are now more stringent requirements for participation of English Language Learners in the State tests.
These changes make consistent comparisons over time impossible and therefore to determine whether student achievement increased or declined over the period. This lack of comparability in the state test data over time is not acknowledged in Klein’s claims about improvement in student achievement on the basis of the state results.
Third, Klein demonstrates no understanding of the protocols for reporting statistics accurately. Because all tests are samples of student achievement there are random errors involved in interpreting the results as estimates of true student achievement. For this reason, a fundamental aspect of reporting test results is to report estimates of the margins of error in order to aid accurate interpretation of the results. Margins of error are reported for the NAEP tests but not for the New York State tests. As a result, it is impossible to accurately interpret changes in the New York State test results.
The National Center on Education Statistics which administers the NAEP has criticised the failure of the New York City Department of Education to report margins of error on test results. It has said that Klein’s conclusions about progress on student achievement are “incomplete” and “do not take into account whether the changes or differences are statistically significant” [Green 2008]. Klein’s response was that confidence limits do not matter and that statistical significance is “playing something of a game”. This demonstrates a cavalier and irresponsible disregard for the importance of reporting test results accurately to ensure that those using the results are not misinformed or mislead.
Fourth, New York State tests are not independently administered and are open to manipulation in various ways. The tests are administered by teachers and schools and the “high stakes” associated with publishing school results creates incentives to cheat and rort the system. For example, schools and teachers are able to score parts of their own students’ tests. US research shows that teachers who administer the tests of their own students were approximately 50 percent more likely to cheat [Jacob & Levitt 2003]. Several allegations of cheating and other rorting of the state tests have been published in the New York press in recent years, including in 2008.
In contrast, the NAEP is designed and administered in a way that makes it more robust against manipulation [Thiessen et.al. 2008]. It is not used to make “high stakes” decisions about schools, principals or individual teachers so there is less incentive to try and manipulate the results. Teachers do not have access to the test items before the tests are administered. The US Department of Education hires staff to administer the tests and classroom teachers can monitor the administration.
Thus, contrary to the claims of the New York City Schools Chancellor that the State tests are a better accountability measure than the national tests, it is the State test results which are the less reliable measure.
In his response to Emma Macdonald, Klein said she was wrong on the facts she presented about no overall improvement in student achievement in New York City since 2003, except for grade 4 mathematics. It is Klein who is wrong.
The NAEP tests show no statistically significant change in average student scores for reading in Grades 4 and 8 between 2003 and 2007 in New York City [Lutkus, Grigg & Donahue 2007; Lutkus, Grigg & Dion 2007]. They show a small improvement in Grade 4 mathematics but no improvement in Grade 8.
Nor was there any general improvement for disadvantaged students. There was no improvement in average reading scores for low income, Black and Hispanic students in either Grade 4 or 8. There were small improvements in average mathematics scores in Grade 4 for low income, Black and Hispanic students. In Grade 8 mathematics there was no improvement for Black and Hispanic students, but a slight improvement for low income students.
In his attempt to demonstrate improvement in student achievement, Klein focused on the achievement of African-American students. He stated that New York City is at the top of the list nationally for African-American 4th grade students in reading and mathematics. This was a case of using selective evidence.
While it is largely true that 4th grade African-Americans performed above their counterparts nationally in 2007, this was not the case in 8th grade reading and mathematics where the New York City results were similar to those nationwide (see table below). In terms of changes in the relative position of New York City since 2003, there was an improvement in 4th grade mathematics, but a deterioration in 8th grade reading and mathematics. Moreover, the advantage held by New York City over other large cities for the 4th grade cohort in 2003 had disappeared by the time it reached 8th grade in 2007. This is hardly the record of success claimed by Klein.
The details of this rebuttal of Klein’s claims are as follows.
Average scale scores in mathematics for New York City African-American students in 2007 were higher than the averages for large US cities and the nation [Lutkus, Grigg & Donahue 2007; Lutkus, Grigg & Dion 2007]. However, contrary to Mr. Klein’s claim, the average scale scores in reading for 4th grade African-American students in 2007 were statistically similar to the average for the nation, although they were higher than the average for large US cities.
In addition, there was no statistical difference between the average scale scores for NYC African-American students in 8th grade reading and mathematics and the average for large US cities and the nation.
The position of NYC African-Americans in reading was largely unchanged since 2003, when Klein’s changes began to be implemented. In that year, the average scale score in 4th grade reading was higher than the average for the large cities but similar to that of the nation, as was the case in 2007 [Lutkus & Weiner 2003a]. The average scale score in 4th grade mathematics was higher than the average for the large cities in 2003 [Lutkus & Weiner 2003b] as it was in 2007, but it improved from being similar to the national average in 2003 to above the national average in 2007.
In 8th grade, the position of NYC African-American students slightly deteriorated between 2003 and 2007. In 2003, the average scale score in reading and mathematics was higher than the average for large US cities and statistically similar to the national average but by 2007 the NYC scores were similar to both those of the large cities and the nation.
Moreover, the position of the 4th grade cohort of African-Americans deteriorated by the time it got to 8th grade. As noted above, the average scale score in 4th grade reading and mathematics in 2003 was higher than the average for the large cities but similar to that of the nation. However, by 2007, the average scale score of the same cohort in 8th grade was similar to both other averages.
Test Scores of New York City African-American Students Compared to African-American Students in Large US Cities and the Nation
|4th Grade Reading||Higher||Higher||Similar||Similar|
|4th Grade Mathematics||Higher||Higher||Similar||Higher|
|8th Grade Reading||Higher||Similar||Similar||Similar|
|8th Grade Mathematics||Higher||Similar||Similar||Similar|
Note: The relative positions take account of statistical margins of error.
Sources: Lutkus & Weiner 2003a, 2003b; Lutkus, Grigg & Dion 2007; Lutkus, Grigg & Donahue 2007.
“Jacob, B. and Levitt, S. 2003. Rotten apples: an investigation of the prevalence and predictors of teacher cheating. Quarterly Journal of Economics 118 (3): 843– 878.
Koretz, Daniel 2008. Measuring Up: What Educational Testing Really Tells Us. Harvard University Press, Harvard.
Lutkus, A.; Grigg, W. and Dion, G. 2007. The Nation’s Report Card: Trial Urban District Assessment Mathematics 2007 (NCES 2008-452). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, D.C.
Lutkus, A.; Grigg, W. and Donahue, P. 2007. The Nation’s Report Card: Trial Urban District Assessment Reading 2007 (NCES 2008-455). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, D.C.
Lutkus, A.D. and Weiner, A.W. 2003a. The Nation’s Report Card: Trial Urban District Assessment, Reading Highlights 2003 (NCES 2004-459). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, D.C.
Lutkus, A.D. and Weiner, A.W. 2003b. The Nation’s Report Card: Trial Urban District Assessment, Mathematics Highlights 2003 (NCES 2004-458). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, D.C.
New York City Department of Education 2007b. Educator Guide: The New York City Progress Report, High School. [Copy available from author]
Thiessen, Brad; Magda, Tracey and Ho, Andrew 2008. Nonparametric Comparisons of High-Stakes and Low-Stakes Trends: 2003 – 2007. Paper presented at the Annual Meeting of the American Educational Research Association, New York City, March 24-28.