Problems in educational measurement

Robert L. Brennan, University of Iowa, is a strong advocate of testing, but he expresses some reservations about inappropriate uses of tests or unwarranted claims about them. In the current climate, he asserts, the tremendous emphasis on testing is fraught with political issues, and it is often difficult to disentangle the practical issues in educational measurement from political considerations.

In a recent article in Educational Measurement: Issues and Practice, Brennan outlines the limits of our capabilities and questions some current practices.

In practical terms, Brennan says, one of the difficulties facing the measurement community is that everyone involved in the testing debate — educators, the public and politicians — exaggerates their understanding of the complexities of developing, maintaining and administering a rigorous testing program.

The fact that test scores are used to make decisions necessitates setting standards. The ultimate responsibility for setting standards seldom resides with measurement professionals; it is usually the prerogative of policy makers and those who make decisions using test scores. These people need to understand that their authority makes them responsible for the inevitable value judgments that are part of setting standards.

Brennan believes that measurement professionals should play an important role in helping policy-makers choose and implement standard-setting procedures and interpret their results. Standards can not be equated with truth; every aspect of standard-setting is a value-laden activity that is subject to disagreement.

Norm-referenced or criterion-referenced

good example is whether to use a criterion-referenced or norm-referenced standard or a combination of both. Norm-referenced standards are easier to develop and use, especially for measuring progress, but they may not answer questions about what students should be able to do. One standard is not better than another; the type of standard should match the type of interpretation or decision to be made.

Brennan contends that these debates would be more reasoned if everyone accepted the idea that there is no gold standard in either theory or practice. Some methods are better than others in specific cases, but there is no one right way. Choosing a standard-setting method involves making subjective judgments. Debates about standard setting are doomed to incoherence when policy considerations masquerade as methodology.

High-Stakes Accountability

In Brennan’s opinion, K-12 norm-referenced standardized-testing programs are best used to guide, inform and ultimately improve instruction. He is less convinced that such tests should be used as the sole basis for high-stakes, externally imposed accountability purposes involving rewards and sanctions.

He is not against accountability; in fact, he believes that when tests are used for instructional purposes they serve an important accountability function. However, when a K-12 testing program assumes the sole or primary burden of high-stakes accountability, unintended negative consequences are likely.

As the stakes get higher, educators tend to narrow the focus of instruction more than they would otherwise deem appropriate. This can cause tests to lead the curriculum rather than reflect important components of it.

Tests can only measure a sampling of needed skills in a subject area. Therefore, while it is appropriate to use a test as one element in decision-making, it should not be the sole element.

Test security

In addition, when high stakes are attached to test performance, onerous and costly test-security issues inevitably arise. When tests are used for instructional improvement, the items should not be widely distributed, although teachers need access in order to know what is being measured and how these skills compare to their own curricula.

This moderate level of security is impossible to maintain in a high-stakes environment. With high-stakes tests, security can be so tight that tests are effectively kept secret from teachers, who will be skeptical of an accountability system in which the basis for the decision is hidden.

Or, if forms of the test are released after use, there is a high cost in money and time to continually develop new forms of the test. New forms need to be equated with old forms, and that demands expensive data collection. If the equating is not well done, furthermore, judgments about changes in student performance over time will be questionable.

In conclusion, Brennan does not believe that norm-referenced, standardized K-12 testing programs are best used as the sole basis for high-stakes accountability purposes. They may play a role in an accountability system, but that role should not undercut the use of the test for instructional improvement.

In Brennan’s opinion, a good achievement test can and should cover important aspects of the curriculum, but no single test — no matter how carefully designed, developed and researched — can cover schools’ full subject-matter curricula. Multiple measures in different formats are essential if educators want assessments to reflect their curricula.

Brennan summarizes his overall belief about measurement by saying that any test should be judged by its contribution to good educational decisions. He also suggests that testing and assessment skills should be required for teacher certification. Explicit coursework or experience in testing and assessment is seldom required, and yet teachers are expected to be competent in these areas.

“Some Problems, Pitfalls, and Paradoxes in Educational Measurement” Educational Measurement: Issues and Practice Volume 20, Number 4, Winter 2001 Pp. 6-19.

Published in ERN March 2002 Volume 15 Number 3

Leave a Reply

  • (will not be published)