Testing has been used as a key element in educational reform for 50 years. Robert L. Linn, University of Colorado/Boulder, summarizes the history of assessment in the United States, questioning the validity and generalizability of reported gains and the credibility of the results used for high-stakes accountability. Linn writes that assessments are appealing to policymakers because they are relatively inexpensive compared to programmatic changes. Test results are visible, and testing is easier than trying to change what happens inside the classroom. Increasing instructional time, reducing class size, attracting able teachers, hiring teaching aides, or providing substantial professional development are all very expensive and require long-term commitment. Testing, on the other hand, can be implemented quickly.
Linn cautions, however, against drawing conclusions from short-term results. Past experience shows that an increase in scores in the first few years of a new testing program is common, yet this often occurs without real improvement in the broader achievement outcomes that the tests are intended to measure. When the stakes are high, poor testing practices increase, reducing the validity and reliability of test results. For example, districts may use the same form of the test with the increasingly out-of-date norms year after year. In addition, students who would lower schools’ test averages often are excluded from testing. And classroom instruction increasingly focuses on the narrow set of skills and question types used in this one test. All these practices tend to subvert the intention of the assessment program and to inflate scores. Although focusing instruction on general concepts and skills measured by the test may be acceptable if the test measures instruction-ally important objectives, Linn condemns narrowing teaching to the specific content sampled by a test because it invalidates the results and often overemphasizes basic skills.
Improving Assessment Programs
Recently, educators have been examining the concept of “tests worth teaching to” and the issue of “opportunity to learn.” They focus on developing ambitious content standards for all students as the basis of assessment and accountability.
Linn stresses the importance of identifying the features of assessment and accountability systems that influence the trustworthiness of test results and the likely impact of the systems on educational practices and student learning. In Linn’s opinion, high-stakes tests can be designed to sidestep most of the pitfalls of earlier tests, but they will be expensive because new and comparable forms of the test will be needed each year. This will avoid repeated use of the same form of the test, which leads to inflated results. The issue of how high to set expectations for all students while not penalizing those from poor, minority, low-achieving schools, remains. Kentucky is testing one approach: setting a common goal for all schools to achieve by the end of 20 years. The state has established faster growth targets for initially low-achieving schools.
Linn concludes that, in the past, assessments have not been up to the demands that have been placed on them by high-stakes accountability. Tests can be useful monitors of student performance, but they lose much of their dependability and credibility when there are high stakes for the schools using them. The unintended negative effects of high-stakes accountability often outweigh the intended positive effects. Linn argues for more modest claims about uses that can validly be made of our best assessments and warns against overreliance on them. He offers the following seven suggestions for enhancing the validity, credibility, and positive impact of assessment systems while minimizing their negative effects:
- Provide safeguards against selective exclusion of low-achieving students from assessments. Include all students in accountability calculations.
- Make the case that high-stakes accountability requires new high-quality assessments each year that are comparable to those of previous years.
- Use multiple indicators so that important decisions are made on the basis of more than one assessment. More valid inferences can be drawn from gains in achievement observed on more than one test.
- Place more emphasis on comparisons of performance from year to year than from school to school. This allows for differences in starting points while maintaining an expectation of improvement for all.
- To avoid institutionalizing low achievement expectations, set the same high standards for everyone.
- Recognize, evaluate, and report on the degree of uncertainty in test results.
- Put in place a system for evaluating both the intended positive effects and the unintended negative effects of the assessment system.
“Assessment and Accountability” Educational Researcher Volume 29, Number 2, March 2000 pp. 4-16.
Published in ERN April 2000 Volume 13 Number 4