Questioning the validity and policy uses of international tests

In the most recent issue of Curriculum Inquiry, educational researchers debate the uses and abuses of international tests, particularly the Third International Mathematics and Science Study (TIMSS). In nontechnical language they describe the technical challenges of such large-scale testing programs and debate the way policy makers interpret and use the results. In his introduction, Joseph P. Farrell, Ontario Institute for Studies in Education/University of Toronto, outlines the debate. Now that large-scale achievement testing is used for high-stakes accountability purposes in the No Child Left Behind Act, Farrell believes it is important for educators to understand what these tests set out to measure and the limits of their usefulness.

Researchers in many countries – both those that do well on international tests and those that do less well – are looking to find out why their students perform as they do. The degree to which the curriculum corresponds to the test certainly influences a country’s performance. Not surprisingly, when students have studied what a test measures, they do better than when it measures a large portion of material that was not covered in their classes. Some researchers contend that these tests measure an underlying trait not so closely aligned with specific curriculum, but they have been unable to define this trait.

Educators know that overall performance on a test can be improved by increasing the amount of time devoted to that subject in school or by increasing the amount of time devoted to those aspects of the subject that are included in the test. Performance can also be improved by removing students likely to do poorly on the test. There are examples around the world of school’s using both of these “solutions” to raise their scores.

Another important point, made by the researchers who developed these tests, is that cross-national tests “were never intended to, and indeed cannot, relate in any direct way to local policy/practice issues among the participating places.” They contend that international comparisons were intended as research enterprises to build knowledge and theory, not policy. Tension always exists, says Farrell, between those who want to build knowledge about how people learn and how we might promote learning in various national or cultural circumstances, and those who want quick fixes for local educational policy.

Farrell argues that these international tests were developed as a scientific and systematic comparative study of educational systems – as a way to test hypotheses about factors that influence how children learn in all parts of the world. The founders of the International Association for the Evaluation of Educational Achievement did not believe that these tests could produce useful cross-national comparisons of test scores, and they did not believe that their work would have any direct or immediate policy relevance for participating nations. Test performance between and within countries varies with regard to variables such as the society’s technological level and urbanization, social background of children, education of parents, training of teachers, money spent on education, and the number of classes taken and overall time spent studying a subject. Performances also vary significantly by the structure of school systems and the number of children brought to what level of the educational ladder. The test designers’ main purpose was to test hypotheses about education, not to compare achievement between countries.

Targeted follow-up studies of international results, however, are a good and proper use of this data. When results are surprising, designing a detailed local study to provide information about methodologies may lead local educators to rethink educational practice. One example of a local follow-up study was undertaken by French educators interested to find out why French students do considerably better than American students in mathematics at some grade levels. The study involved classroom observation, interviews with teachers and teacher-educators, and analyses of French mathematics curricula and teaching materials. The French researchers compared their findings to similar data from U.S. elementary classrooms. The results indicate that in French culture, there is a reverence for mathematics. Mathematics plays an important role in the elementary curriculum, both in terms of time allocated and importance assigned to it. In addition, French teachers occupy a prestigious role in their society, and teacher education is highly competitive.

Farrell asserts that disagreements about the proper meaning and use of comparative tests have existed since the tests were first developed. He urges us to remember that the developers’ purpose was to study the science behind learning, and not to compare the quality of education in different countries. He cautions that if we ignore the original purpose for which these tests were designed, we are likely to draw unwarranted conclusions.

“The Use and Abuse of Comparative Studies of Educational Achievement”, Curriculum Inquiry, Volume 34, Number 3, Fall 2004, pp. 255-265.

Published in ERN November/December 2004 Volume 17 Number 8

Leave a Reply

  • (will not be published)