Rigorous studies of education programs are essential to build confidence in educational research and to improve schools, writes Robert E. Slavin. In the Dewitt Wallace-Reader’s Digest Distinguished Lecture of 2002, Slavin describes “the need to establish the highest possible standard of evidence, on par with the standards in other fields, to demonstrate what educational research can accomplish.”
Slavin reports that while the U.S. Department of Education’s strategic plan calls for having 75 percent of all funded research address causal questions using random research designs by 2004, currently less than 5 percent of research uses such rigorous methods.
Slavin contends that the kind of progressive, systematic improvement seen in fields such as medicine, agriculture, transportation, and technology in the last century is possible in education. Development, rigorous evaluation and dissemination have produced an unprecedented level of innovation and improvement in these other fields.
Physicians cannot ignore research findings; randomized trials have produced dramatic improvements in medical practice. Education, on the other hand, moves from fad to fad, Slavin writes. Its “change process resembles the pendulum swings of art or fashion rather than the progressive improvements characteristic of science and technology.” Educational research findings are respected only when they correspond with current educational or political fashions.
Research is common in education, but studies are usually brief and artificial and not always of practical significance. Experiments studying instructional programs over at least a full school year are rare. Educational research has produced meaningful studies of basic principles but very few rigorous studies of programs that could provide a solid base for policy and practice.
Because of this, policy makers rarely see the relevance of research to the decisions they have to make and therefore provide minimal funding for research. This inadequate investment in research is responsible for the lack of large-scale, definitive studies that could inform policy decisions. A scientific revolution in education is only possible if research begins to focus on replicable programs and practices central to education policy and teaching, and if it employs research methods that meet the highest standards.
What kinds of research are necessary to produce findings of sufficient rigor to justify faith in the meaning of their outcomes? Slavin asserts that “nothing less than randomized experiments will provide effective evaluation of educational interventions and policies.” Random assignment of subjects to experimental and control groups provides evidence that any differences seen in the outcomes of an experiment are due to the experimental treatment and not to other factors.
When random trials are replicated with different populations, the effectiveness of a treatment can be established beyond any reasonable doubt. Even with well-matched experiments (where two groups of subjects that are judged to be similar are compared), it is always possible that the observed differences are due to factors other than the treatment being studied. Often the differences are influenced by the fact that one set of schools or teachers was willing to implement a given treatment while another was not, or that some students agreed to participate in the study while others did not.
When selection bias is a possibility at the student level, Slavin contends, there are few, if any, alternatives to random assignment, because unmeasured, pre-existing differences are highly likely to be alternative explanations for study findings. Random assignment of willing teachers or schools is preferable to matching, as matching leaves open the possibility that volunteer teachers or staffs are better than non-volunteers.
However, average pretest scores for an entire school should indicate how effective the current staff has been to date, so controlling for pretests in matched studies of existing schools or classes would control much of the potential impact of having more willing teachers.
Randomized experiments of interventions applied to entire classrooms can be difficult and expensive to do and are sometimes impossible. For the cost of doing one randomized study, two or three equally large, well-matched studies can be carried out.
Slavin contends that if repeated by different investigators in different places, these matched studies might produce more valid and meaningful results. Research in other fields has found that matched studies generally find larger effects than randomized studies. Therefore, Slavin recommends that when possible, for policy-relevant program evaluation, randomized studies should be used.
From personal experience
In his work at Johns Hopkins University, Slavin has found that schools can be unwilling to take a chance on being assigned to a control group for several years and missing the potential positive effects of treatment, or they may not be able to afford the experimental program costs.
Slavin and colleagues have gotten around this problem by offering schools the study program free of cost, either in the first year or the second. In this way, schools assigned to the program in the second year serve as the control group for the first-year experiment. This delayed treatment control-group design involving no cost to the school enables schools to participate and provides researchers with a randomized trial to evaluate educational programs. Random assignment is possible, therefore, with enough resources and cooperation from policy makers.
An opportunity for rigorous educational research
Given the current political push for rigorous research, Slavin believes this is the time to reverse the poor reputation that educational research has among policy makers. He believes “it makes sense to concentrate resources on a set of randomized experiments of impeccable quality and clear policy importance to demonstrate that such studies can be done.”
Over the longer run, Slavin says, a mixture of study types will probably serve the field best. Correlational and descriptive research is essential, in his opinion, in theory building and for suggesting variables worthy of experiments. But at this moment, it is important to “establish the highest possible standard of evidence, on par with standards in other fields, to demonstrate what educational research can accomplish.”
Slavin points out a key distinction that is often lost in the current enthusiasm for science as a basis for practice. This is the distinction between programs and practices that have “scientifically based research” and programs that have themselves been rigorously evaluated.
Slavin points out that in the No Child Left Behind legislation, the phrase “based on scientifically based research” was used 110 times. He contends that any program can find some research that supports the principles it incorporates. The fact that a program is based on scientific research does not mean that it is effective. An example Slavin uses is that before the Wright brothers, many inventors built airplanes based on exactly the same scientifically based aviation research as the Wrights used at Kitty Hawk, but the other airplanes never got off the ground.
Currently, because there is little existing research on replicable programs in education, it is not possible to require that federal funds be limited to programs that have been rigorously evaluated. However, programs that have such rigorous evidence should be emphasized over those that are only based on valid principles, and there needs to be a strong effort to invest in the development and evaluation of replicable programs.
It is essential, in Slavin’s opinion, that independent review commissions representing diverse viewpoints be constituted to review research and produce consensus on what works best in education. Consensus panels operating on a broad range of policy-relevant questions will help practitioners and policy makers cut through all the completing claims and isolate research findings that represent the evidence fairly and completely.
Finally, Slavin points out that accountability is a necessary but insufficient strategy for school reform. Rewards and sanctions based on test score gains can be very inexact in fostering good practice. Year-to-year changes in an individual school are unreliable indicators of that school’s quality.
It is essential that schools focus both on the evidence base for their programs and on the outcomes in their particular school. Schools should be expected to use methods known to be effective in general and then to make certain that their school’s implementation of those methods is of sufficient quality to ensure progress on state assessments. Investment in rigorous research and development will produce progressive, step-by-step improvements that, over time, will substantially improve educational practice and student outcomes.
“Evidence-Based Educational Policies:Transforming Educational Practice and Research” Educational Researcher Volume 31, Number 7, October 2002. pp. 15-21.
Published in ERN December 2002/January 2003 Volume 16 Number 1<p