|
It is now commonplace for students to evaluate their professors, usually by filling out bubble sheets late in the term. A typical form has a dozen or more questions, many with multiple parts. For example, one question in a form I have seen asks, "Does the instructor meet his/her classes regularly? Does he/she keep his/her office hours? Are there any regrettable irregularities in his/her behavior?"
Staff members typically enter the students' ratings into a computer and give them (without the students' names) to faculty members the following term, in the somewhat wistful hope that improvements in teaching will follow. Because the ratings often are the only evidence available about how well a professor can teach, they are also considered seriously by committees charged with evaluating the professor for retention, promotion, and tenure.
The value of students' evaluations seems clear, but opinions differ on the best ways to gather and use them. Part craft,part art, teaching is extremely complex and difficult to assess accurately. But many people both inside and outside the field believe that teaching can be evaluated with a few simple measurements. Although we cannot easily assess the art involved in teaching, the argument goes, we can measure certain behaviors--like distributing the syllabus early in the term, and being cordial during office hours--that may be associated with good teaching. Most teaching-evaluation forms thus ask about those behaviors, as well as students' reactions to them and to the teacher's style. But nobody involved seems confident that the measurement is getting at the essence of teaching. The most important concept to emerge in writing assessment has been the development of holistic scoring as a reliable way to assess students' performance on essay examinations. That development, begun under the aegis of the Educational Testing Service in the 1960's, allows raters to give each student essay a single numerical score, in accordance with a brief, focused scoring guide, for the overall quality of the writing. Training in the use of the scoring guide has become so sophisticated that raters give very consistent scores to the same student essays. If two or more raters evaluate two or more samples from the same student, the scoring can be extremely reliable. The scores are regularly used by admissions officers at colleges and graduate and professional schools, as well as by faculty members who are deciding which courses to place students in, or whether a student has met a writing requirement for graduation. Holistic scoring of student writing has come under attack, however, because it provides little useful information for teachers, yielding only ratings of performance according to the criteria set out in the scoring guide. That is, a numerical score is fine for placement, but teachers want assessments that will yield more detailed diagnostic information for students in their classes. Teachers need to know if a particular paper's low score results from faulty sentence structure, lack of evidence to support assertions, incoherent paragraphs, misuse of sources, or some other flaw. the mechanics of writing, or prior knowledge of the subject? Administrators find the scores valuable and cost-effective as measures of educational outcomes. But because students and teachers need more information to improve the students' writing, teachers seldom use holistic scoring, instead writing comments about specific points on individual papers. Using different methods of assessment for different purposes should work as well in evaluating teaching as it does in evaluating writing. The failure to distinguish between assessments of overall outcomes and analyses of specific problems has been a sore point from the start in teacher evaluation. The assumption has been that the same information that allows committees and administrators to judge teaching ability will also lead to improved teaching. However, evidence from writing assessment (not to mention the lessons of experience) shows that that is not the case. Committees do not need or use much of the evidence collected by the questionnaires that students usually fill out to evaluate their teachers. In fact, often a committee needs to know only if a professor is one of the truly exceptional teachers, an average teacher, or one of the really poor teachers who need drastic improvement if they are to stay on the campus. There are so many different ways of being a good teacher, so many ways in which strengths in one area compensate for weaknesses in another, that the detailed questions on the bubble sheets about a professor's attendance, say, or how promptly he or she returns papers wind up as distractions from the holistic assessment that is really called for. A very few questions about the overall quality of the class, or whether students would recommend the teacher to a friend, would produce enough information about students' learning experience in the class. On the other hand, teachers often pay little attention to the ratings because they are designed for general purposes and often do not deal with the questions that a particular teacher is most concerned about: Did I use the right mix of lecture and discussion? Were the assignments and the reading sufficiently challenging, but not too hard to get through? What concepts, ideas, and attitudes did the students learn? Some institutions have already separated the two purposes of evaluating teaching. The University of Arizona, for instance, has an Office of Assessment and Enrollment Research, to which teachers may apply for an evaluation; a facilitator from the office discusses the instructor's strengths and weaknesses with the students and then reports to the teacher alone what the students have to say. In that way, suggestions for how to improve a professor's teaching are kept wholly separate, as they should be, from the evaluation on which salary and professional advancement depend. When we look at the ways institutions have misused assessments of students' writing ability, we find other interesting parallels to the evaluation of teaching ability. For example, when money and time are short, an institution may substitute a multiple-choice test of usage, grammar, or mechanics for actual assessment of students' writing. An even worse abuse occurs when institutions use aptitude tests like the SAT or ACTwhich are not designed to measure what a student has learnedin an attempt to evaluate the effectiveness of a writing program. Students' bubble sheets may be equally invalid in assessing teaching ability if the questionnaire asks students to rate surface characteristics of one kind of good teachinglike "returns papers promptly" and "leads discussions"without allowing students to give a holistic assessment. The use of such questionnaires distorts teaching. When an institution uses a multiple-choice test to evaluate students' writing, professors tend to drill their students in taking that kind of test, instead of giving them practice in writing. Similarly, when professors' careers depend on getting high scores on student evaluations of particular behaviors, professors will tend to adopt those behaviors whether or not they are appropriate to a particular subject or teaching style. One recent development in both writing assessment and teacher evaluation is especially encouraging. Portfolio assessment has begun to make inroads in both areas as a more flexible and responsible form of measurement. Portfolios of students' writing or of teachers' syllabuses, exams, assignments, and statements of teaching philosophy can be evaluated holistically, when a single score or rating is all that is needed. But they can also receive much more detailed (and expensive) analyses, should the purpose of the evaluation be to improve the writing or teaching. As with assessments of students' writing, evaluations of professors' teaching ought to be designed to meet specific goals of the assessmenteither measuring overall teaching ability for tenure and promotion committees, or identifying ways to improve particular aspects of teaching for the professor. Colleges and universities now put a great deal of energy and expense into elaborate bubble sheets that serve nobody's ends well. Instead, institutions should devote their efforts to producing holistic information for administrative purposes, and detailed analyses of their performance for the teachers' own use. If that happened, we might actually see systematic improvement of college teachingwhich is supposed to be the goal of the entire operation. Posted March 9, 2001 "Bursting the Bubble Sheet: How to Improve Evaluations of Teaching" appeared in the November 10, 2000 issue of The Chronicle of Higher Education and is reprinted here by permission of the author. All material appearing in this journal is subject to applicable copyright laws. |