New research indicates that district-wide portfolio assessment can yield reliable scores. Early results from Vermont’s statewide assessment program (ERN November/December 1994) revealed the difficulty of meeting both the instructional and measurement goals of portfolios.
The Vermont experience led researchers to speculate that in order to obtain comparable results either a larger number of a student’s pieces had to be included in portfolios or the pieces included had to be more standardized. Neither solution seemed useful. Educators believed that further standardization of samples would limit the instructional benefits of individually designed portfolios. And administrators concluded that scoring more pieces per students would make the cost of the assessment prohibitive.
Pittsburgh’s experience with portfolios
Researchers Paul G. LeMahieu, University of Delaware, Drew H. Gitomer, Educational Testing Service, and JoAnne T. Eresh, Pittsburgh Board of Public Education, recently reported on a writing portfolio assessment that is proving successful in the Pittsburgh Public Schools. The goal was to achieve reliable scores for an acceptable cost while maintaining the instructional benefits of portfolios.
The portfolio assessment as designed by the Pittsburgh educators controlled costs and achieved reliability by using a small cadre of highly trained raters to score a random sample of the writing portfolios. All raters underwent repeated training using benchmark samples of student portfolios.
Once trained to a high level of consistency, raters’ scoring was continually calibrated with a sample portfolio twice a day during portfolio evaluation. This method provided highly reliable scores despite the complete freedom given teachers and students in choosing writing.
The Portfolio Program
In the Pittsburgh schools, all students in grades 6-12 created portfolios of their writing. No assignments were common to all students or even to one grade. Teachers created their own writing programs. Students chose four pieces to include in their portfolios. However, drafts leading to the final piece as well as student-written reflection on the work had to be included with each selection.
The students’ reflections on their writing included their reasons for selecting each piece and their responses to a series of questions about the piece and the experience of writing it. The four selections included: a piece judged important by student or class standards; a piece the student found satisfying; one the student found unsatisfying; and, any other piece of the student’s choice. In addition to these selections, each portfolio contained a table of contents, a writing inventory describing the student’s experience as a writer and a final reflection on writings over the year.
The scoring system for this portfolio assessment was developed through an inductive process involving teachers’ review and evaluation of a large volume of student work. Student portfolios were evaluated on three dimensions. Each dimension was rated on a six-point scale from inadequate to outstanding.
First, writing was judged on what the students accomplished: whether they met worthwhile challenges; established and maintained a purpose; used the techniques of the genre; had control of conventions, vocabulary and sentence structure; and were aware of the needs of the audience, as well as how well they used language, sound, images, tone, voice, humor, and metaphor.
Second, raters judged the use of writing processes; rewriting strategies, use of drafts, conferencing and revision.
Last, students were evaluated on their growth, development and engagement as writers.
Results of the Pittsburgh assessment were low overall. The vast majority of middle and high school students were found to have less than adequate writing performance. Girls tended to get higher ratings than boys, and white students higher ratings than minority students, but there was no evidence that the sex or race of the rater influenced the scoring of student portfolios.
Follow-up classroom observations revealed that differences in writing performance among students were due to differences in the intensity of the writing programs they experienced. Portfolios seemed to be more sensitive than standardized tests to differences in instructional programs.
Highly trained raters needed
These researchers do not believe that reliability can be ensured through a standardized scoring system alone. They conclude that the reliable results of this portfolio assessment are due to the use of a small number of highly trained raters. They stress that a shared understanding of the scoring system is central to reliable portfolio use and that this can only be achieved through repeated scoring of sample portfolios.
District teachers, once they had sufficient experience with portfolios, viewed them as part of the instructional process They reported that over the course of the year students took increasing responsibility for assessing their own work. However, portfolio assessment led to a fundamental restructuring of teaching routines. Teachers modeled for students the processes of reflecting on and selecting among writings in order to create a portfolio.
During the development and training for portfolio assessment, teachers worked with each other and with supervisors to examine sample portfolios which represented low, average and high levels of student performance at their grade level. Discussions focused on what could be seen in the writing and what instructional decisions could be made. Teachers used these same techniques with students.
LeMahieu et al. conclude from the Pittsburgh experience that portfolio assessment can have sufficient reliability and validity for public reporting purposes. However, attaining this quality of information requires that the purpose of the assessment must be clear and instructional practices must be consistent with that goal. A shared understanding and coherence throughout the community using the assessment is necessary so accountability goals are consistent with classroom goals.
“Portfolios in Large-Scale Assessment: Difficult But Not Impossible”, Educational Measurement: Issues and Practice, Volume 14, Number 3, Fall 1995, pp. 11-28.
Published in ERN November/December 1995 Volume 8 Number 5