In his presidential address to the National Council of Measurement in Education, H.D. Hoover, University of Iowa, discussed standardized testing and addressed three common misconceptions surrounding achievement tests.
Misconception 1: In general, males outscore females on achievement tests. Hoover reports that this misperception exists because the averages on the college entrance tests are generally higher for males than for females even though these scores underpredict females’ success in college. Hoover suggests that the difference in scores is due to the selection of males and females who take the tests. The primary reason for the differential selection by gender is high-school grades. Excepting math, females earn significantly higher grades than males, while males show greater variability in performance than females. Among the highest scoring students, however, the gender differences are relatively small. In several states, students with excellent high-school grades are admitted to state universities without college boards. This admits more females (56 percent of whom are in the top half of their class vs. 44 percent of males), leaving fewer high-scoring females in the college test-taking pool. Hoover promotes the idea of a system that combines high-school rank and test scores in a compensatory model. College boards overpredict college grade point average for males and underpredict GPA for females. Using high school grades as a predictor has the opposite effect, leading to underprediction for males and overprediction for females. Thus, a combination of high-school grades and test scores would seem an equitable way to make college entrance decisions.
Misconception 2: The primary role of norms is to compare one student to another Norms give educators an external frame of reference for interpreting a given test score. This allows educators to make two kinds of comparisons that are critical in evaluating the achievements of students and schools. The most beneficial role that norms play is in the identification of relative strengths and weaknesses. The “No Child Left Behind” legislation is geared to making all students proficient in reading. Approximately 70 percent of grade 4 students are below the proficient level as measured by the NAEP at the present time.
However, there is very little diagnostic information available to teachers and parents of students who are failing to achieve that would help them design better instruction for such students. Norm – referenced results, as opposed to standards-based data, point out a student’s relative strengths and weaknesses in different areas, allowing teachers to apply different teaching methods based on the individual profile of each student. For example, a poor reader with a strong vocabulary and good listening skills but poor reading comprehension needs different instruction than a student with poor comprehension but strong word analysis and spelling skills. These kinds of comparisons are possible only when using norms. Normative comparisons, Hoover argues, tell teachers and parents more about a students’ knowledge and skills than achievement levels do. Norms also allow educators to measure academic growth over time. Adequate yearly progress relies on normative data. Hoover points out some major problems with standards-based achievement tests: achievement levels are set independently by each publisher and state curricula vary widely. Therefore, it is difficult, using standards-based tests, to compare results across grades and between different test areas for individual students. In addition, the lack of comparability of state standards makes it difficult to compare the proportions of students in each state who have achieved proficiency. Norm – referenced tests emphasize strengths and weaknesses and estimate the growth of individual students across subject areas and in comparison to a typical student in their grade.
Misconception 3: It is not possible to reliably and accurately measure the achievement of kindergarten and primary grade students with group-administered tests. While the kindergarten reliabilities are somewhat lower than the others, the grade 1 and 2 data is very similar to that of grades 3 through 8. Hoover presents evidence showing that the reliability and validity of early primary group-administered tests is adequate. He disagrees with The National Education Goals Panel’s conclusion that “before age 8, standardized achievement measures are not sufficiently accurate to be used for high-stakes assessments.” As demands increase for monitoring students’ achievement, Hoover argues, it is important to determine the best ways to monitor young students’ development. He does not believe that the physical, social and emotional requirements of testing situations are different from those needed in many other primary-school activities, such as sitting still and listening, using good manners and observing certain rules.
“Some Common Misconceptions About Tests and Testing”, Educational Measurement: Issues and Practices, Volume 22, Number 1, Spring 2003, pp. 5-14.
Published in ERN June 2003 Volume 16 Number 5