Standards for high-stakes tests

iStock_000014316766XSmallPressure to raise test scores can push educational administrators to make decisions that may not be in the best interests of students say researchers at the American Educational Research Association (AERA). Many states and school districts mandate testing programs to gather data about student achievement over time and to hold teachers and students accountable. When these tests carry serious consequences for students or educators, they are called “high-stakes” tests. The intention of these tests is to improve education by inspiring greater efforts by students, teachers and administrators. AERA published the following research-based guidelines for policymakers, test publishers and school personnel concerning the use of high-stakes testing in pre-K-12 education. These researchers describe a set of conditions essential to sound implementation of high-stakes testing programs. It is AERA’s position that every high-stakes achievement-testing program should meet all of the following conditions.

Protection against high-stakes decisions based on a single test

Decisions that influence a student’s life chances or educational opportunities should not be made on the basis of test scores alone. At the very least, students must be given multiple chances to pass a high-stakes test. And when there is credible evidence that a test score may not adequately reflect a student’s true proficiency, alternative forms of assessment must be provided.

Adequate resources and opportunity to learn

Before students, schools and districts can be passed or failed by high-stakes tests, they must have access to the material, curriculum, and instruction that would enable them to meet the new standards. When standards and associated tests are introduced as a reform to improve educational practice, students must have meaningful opportunities to learn the content and cognitive processes before they are sanctioned for failing to meet the new standards.

Validation for each separate intended use

Tests valid for one use may be invalid for another. Each use of a high-stakes test (for individual certification, for school evaluation, for curricular improvement, for increasing student motivation) must be subject to a separate evaluation of the strengths and limitations of both the testing program and the test itself.

Full disclosure of likely negative consequences of testing programs

The AERA recommends that where credible scientific evidence suggests that a type of testing program is likely to have negative side effects, test developers and educators should make a serious effort to explain these possible effects to policymakers.

Alignment between the test and the curriculum

Both the content of the test and the cognitive processes engaged in taking the test should adequately represent the curriculum. In addition, high-stakes tests should not be limited to that portion of the relevant curriculum that is easiest to measure. Test developers and administrators must ensure the test is aligned with the curriculum if the test is to be used for school accountability. To avoid teaching to the test, multiple test forms should be used or new test forms should be introduced on a regular basis. Without this, high-stakes tests can lead to a narrowing of the curriculum toward just the content sampled on one form of a test.

Validity of passing scores and achievement levels

When testing programs set passing scores or proficiency levels, the validity of these specific scores must be established. To begin with, the meaning of passing scores or achievement levels must be clearly stated. There is often confusion, for example, among minimum competency levels (usually required for grade promotion), grade level (a range of scores around the national average on standardized tests), and “world class” standards (set anywhere from the 70th to the 99th percentile). Once the test purpose is defined, sound and appropriate procedures must be followed in setting passing scores or proficiency levels. Finally, validity evidence that is consistent with the stated purpose of the test must be reported.

Meaningful remediation for students who fail

Students who fail a high-stakes test should get a second chance to take the test after they have had adequate opportunities for remediation. Remediation should focus on the knowledge and skills the test is intended to measure, not the test content itself. Before retaking the test, students should be allowed enough time to remedy any weaknesses discovered.

Language differences

If a student lacks mastery of the language in which a test is given, then that test becomes, in part, a test of language proficiency. Unless a primary purpose of a test is to evaluate language proficiency, it should not be used with students who cannot understand the instructions or the language of the test itself. If English language learners are tested in English, their performance should be interpreted in the light of their language proficiency. Special accommodations for English language learners may be necessary to obtain valid scores.

Students with disabilities

In testing individuals with disabilities, steps should be taken to ensure that the test accurately reflects students’ knowledge and skills rather than characteristics associated with their disability.

Adherence to explicit rules for determining which audents are to be tested

When schools, districts or states are compared to one another or when changes in scores are tracked over time, there must be explicit policies specifying which students are to be tested and under what conditions students may be exempted from testing. Such policies must be uniformly enforced to assure the validity of score comparisons. In addition, reporting of test score results should accurately portray the percentage of students exempted.

Sufficient reliability for each intended use

Reliability refers to the accuracy or precision of test scores. It must be shown that scores reported for individuals or for schools are accurate enough to support each interpretation. Information about the reliability of raw scores may not adequately describe the accuracy of percentiles; information about the reliability of school averages may be insufficient if scores for subgroups are also used in reaching decisions about schools.

Ongoing evaluation

Ongoing evaluation of both intended and unintended consequences of any high-stakes testing program is essential. In most cases, the governmental body that mandates the test should provide resources for a continuing program of research and for dissemination of research findings concerning both the positive and negative effects of the testing program. AERA stresses that every one of these criteria must be in place in order for testing to be fair and accurate. Currently, no state meets this standard.

American Educational Research Association position statement, July 2000.

Published in ERN October 2000 Volume 13 Number 7.

Leave a Reply

  • (will not be published)