Public understanding of the limitations of tests

There is a need to increase transparency and openness about educational assessments and to improve public confidence in them, writes Paul E. Newton of the Research and Statistics Team, Qualifications and Curriculum Authority, London. He states that many believe that increased public understanding is incompatible with public confidence because of the inherent error of assessments.

Newton argues that not understanding measurement inaccuracy is a far greater threat than understanding it, since it will cause the system to be repeatedly held accountable for more than it can possibly deliver. Newton explains his ethical and practical arguments in favor of educating the public about the inherent limitations of educational measurement.

Understanding measurement inaccuracy

A most important point is that users who fail to understand measurement inaccuracy will not be equipped to draw valid conclusions from results. Another important ethical point is that diagnostic error is not only extremely common, but it can carry very high stakes.

A certain amount of measurement error is due to factors beyond the control of schools, for example, when a student fails to put in sufficient effort or misreads a question. When a question is effective for the vast majority of students with no evidence of bias against prominent subgroups of students, this is probably as good as we can expect.

However, some decisions made for pragmatic reasons can raise the amount of inaccuracy in scores. An example is the decision to have essays scored by a single examiner. Grading every essay twice will produce more reliable scores, but the education department may not have the resources to pay for multiple scoring.

Sometimes errors are apparent only with the benefit of hindsight. Even when agencies have agreed to procedures and exercise due diligence, unforeseen circumstances can cause the accuracy to be less than expected. Sometimes, however, error is the result of failing to observe due process or to exercise necessary care and attention. Newton points out that it is important to understand that inaccuracy does not imply fault or culpability, nor does lack of culpability imply accuracy.

Error is an inevitable characteristic of measurement. If we accept the need for educational measurement, then we must accept the inevitability of error. There is no such thing as perfect validity, reliability or comparability. These technical concerns are traded off against each other and against pragmatic concerns including manageability, security and cost-effectiveness.

Weighing technical and pragmatic concerns

The weighing of technical and pragmatic concerns must be informed by the consequences of error for those who will use the results for a variety of purposes. Low-stakes purposes (instructional planning in a classroom) cushion technical limitations; high-stakes purposes (graduation or admission requirements) exacerbate them. The significance of measurement inaccuracy, and the extent to which it can be tolerated, is always a function of the use to which results will be put and cannot be discussed independently of this.

Encourage reporting of error

Assessment agencies can do more to address the problem of error. Behind many errors lie conditions that can increase the effect of error. Such conditions include: legislation requiring the use of results from a single test to determine graduation; the requirement for scores to be reported faster than can be done dependably; the use of tests to track achievement trends without any external confirmation of validity; and rushing the piloting process for test questions. Inaccuracy can be reduced by creating a culture that actively encourages and facilitates the reporting of error and actively discourages assigning blame. New technologies may also potentially help address the problems of measurement inaccuracy.

Newton calls for more research to be done on the reliability and validity of national tests. Research should be aimed at generating findings that can be usefully communicated to educators and the public. Evidence from research studies could be synthesized as defensible arguments and published for a lay audience. Agencies could publish more general statements on the strengths and weaknesses of their assessments, outlining the kind of inferences that may and may not be drawn from the results. In addition, Newton recommends that test publishers or users avoid simplistic statements about tests when speaking to the media.

It is essential, in Newton’s opinion, that measurement inaccuracy be understood by all those who use tests. Measurement inaccuracy is not something educational agencies should try to hide. Tests and examinations are blunt instruments; assessment results should be thought of as estimates. Finding the right trade-off between technical and practical concerns should be a matter of public debate. Assessments represent an important advance in ensuring that very large numbers of students are judged and rewarded fairly. To gain the public trust, assessment agencies should be proactive in increasing transparency and educating the public about the inevitability of some measurement error.

“The Public Understanding of Measurement Inaccuracy”, British Educational Research Journal, Volume 31, Number 4, August 2005, pp. 419-442.

Published in ERN October 2005 Volume 18 Number 8

Leave a Reply

  • (will not be published)