Task Force on Teaching Evaluation - Overview and Analysis of Techniques for Evaluating Teaching Effectiveness

Background | Recommendations | Process | Overview | Bibliography |Appendices

This section borrows extensively from the "Teaching Documentation Guide" and "Teaching Evaluation Guide" prepared by the faculty of York University, Canada (http://teachingcommons.yorku.ca/).

This chapter describes techniques that faculty members may use to evaluate their teaching. Which techniques are chosen will depend on the intent of the evaluation. The following techniques are included: self-evaluation; classroom observation; measures of student achievement; questionnaires; letters and individual interviews; focus groups; and review of instructional materials.

Self-evaluation

Source: Self

Description and purpose: A self-evaluation may take the form of an informal self-reflection exercise, classroom assessment such as a minute paper, or a formal written appraisal compiled for the teaching portfolio. Self-evaluation can be carried out for both formative and summative purposes. It gives faculty members an opportunity to articulate their teaching philosophy, review their teaching goals and objectives, and assess their areas of strength and difficulty.

Benefits: Self-evaluation encourages faculty members to become monitors of their own performance and promotes reflective practice. It is an excellent first step in planning a thoughtful and comprehensive teaching development program. Informal self-evaluation involves little formal data collection and takes very little time. A formal written appraisal provides a context for assessing data about teaching gathered using other methods. This can be especially important in the tenure and promotion process as it puts other types of data in perspective. In addition, a teaching portfolio may be sent to colleagues at another institution for appraisal.

Limitations: Self-evaluation is, by its very nature, biased and can rarely stand alone if used for summative evaluation. Some individuals find it very difficult to engage in critical self-evaluation or to be honest with themselves and others about their difficulties.

Classroom observation

Source: Peers (faculty members from the same unit) and colleagues (faculty members from another unit)

Description and purpose: Classroom observations complement student assessment of teaching. Although peers and colleagues are unlikely to be knowledgeable of the full extent of the teaching situation, they can comment on subject matter and/or teaching methodology from the perspective of a professional. Before visiting a class, the observer should meet with the faculty member to discuss the faculty member's teaching philosophy, the specific teaching objectives, and the teaching strategies to be employed during the observation session.

Classroom observation may be carried out for both summative and formative purposes. For summative evaluation, more than one person should observe, and each observer should visit more than one class. This counteracts observer bias toward a particular teaching approach and the possibility that an observation takes place on a bad day. These precautions also provide for greater objectivity and hence reliability of the results. To ensure that they obtain a full picture of a faculty member's strengths and weaknesses, some observers find check lists useful (see Figure 1 for an example). Some units may delegate classroom observations to a committee. As the range of activities in a class can be overwhelming, some observers find it helpful to focus on specific aspects (e.g., presentation). Thus, colleagues unfamiliar with the content can provide a different perspective from that of the faculty member's disciplinary peers.

1, Peer Evaluation Checklist

Introduction

_____ Used good interest approach

_____ Reviewed previous instruction

Content

_____ Covered objectives -- no less, no more

_____ Class was logical and organized

Teaching Methods & /Techniques:

_____ Well planned and organized

_____ Used procedures appropriate to objectives

_____ Used procedures appropriate to students

_____ Provided stimulus variation

_____ Questioning was appropriate and skillful

_____ Encouraged student participation

_____ Used effective reinforcement techniques

_____ Used effective examples

_____ Maintained effective pace

_____ Made effective use of time

_____ Maintained enthusiasm and interest level

_____ Provided smooth transitions

Closure

_____ Review/summary

_____ Application

_____ Context to future instruction

Effectiveness

_____ Achieved objectives

Questions for Reflection

1. What aspects of the lecture and the written materials (assignments, tests, handouts) make a positive contribution to learning and should be retained?

2. What aspects of the lecture and written materials could be improved? How should they be improved? Classroom observation is especially useful for formative evaluation. In this case, it is important that the results of the observations are confidential, and used for summative evaluation only with the faculty member's consent. The process of observation should take place over time, allowing the faculty member to implement changes, practice improvements and obtain feedback on whether progress has been made. It may also include videotaping the faculty member's class. This process is particularly helpful to faculty members who are experimenting with new teaching methods.

A particularly valuable form of classroom observation for formative purposes is peer-pairing. With this technique, two faculty members give each other feedback on their teaching on a rotating basis, each evaluating the other over time (anywhere between two weeks and a full year). Each learns from the other and may learn as much in the observing role as when being observed.

Benefits: Classroom observations can complete the picture of a faculty member's teaching obtained through other less direct methods of evaluation. Observations are an important supplement to inconsistent student ratings in situations, for example, where a faculty member's teaching is controversial because of experimentation, where non-traditional teaching methods are being used, or where other unique situations exist within the classroom context. Peers are better able than students to comment upon the difficulty of the material, the relevance of examples chosen, knowledge of subject matter, and integration of topics. Colleagues are better able than peers to place the teaching within a wider context and to suggest alternative teaching formats and ways of communicating the material.

Limitations: There are several limitations to using classroom observations for summative purposes. It is costly in terms of faculty time since several observations are necessary to ensure reliability and validity of findings. Faculty members may find observations threatening and they and their students may behave differently when there is an observer present. Some evidence suggests that peers can be relatively generous evaluators. Since observers vary in their definitions of effective teaching, and considerable tact is required in providing feedback on observations, it is desirable that observers receive training before becoming involved in providing formative evaluation.

Questionnaires

Source: Students and, sometimes, alumni

Description and purpose: Student questionnaires (e.g., the CCQ) are the most commonly used source of summative evaluation data. For purposes such as tenure and promotion, data should be obtained over time using standardized questionnaires. Information obtained from questionnaires can also be used by faculty members for improving subsequent incarnations of the course, and for identifying strengths and weaknesses by comparison to those teaching similar courses. Questionnaires are also useful in a program of formative evaluation if designed and administered by a faculty member during a course.

Benefits: The use of a mandatory, standardized questionnaire puts all teaching evaluations on a common footing, and facilitates comparison between faculty members, courses and academic units. The data gathered also serves the purpose of assessing whether the educational goals of the unit are being met. Structured questionnaires are particularly appropriate where there are many students involved, and where there are several sections of a single course, or several courses with similar teaching objectives using similar teaching approaches.

Questionnaires are relatively economical to administer, summarize and interpret. If students are asked to comment only on items with which they have direct experience, student responses to questionnaires have been found valid. Research has identified the following seven dimensions of a faculty member's teaching as especially important in identifying exemplary teaching: stimulation of interest in the course and its subject matter; preparation and organization; clarity and understandability; sensitivity to and concern with students' level of understanding and progress; clarity of course objectives and requirements; impact of instruction; encouragement of questions and discussion; and openness to opinions of others.

While questionnaire forms with open-ended questions are more expensive to administer, they often provide more reliable and useful sources of information in small classes and for the tenure and promotion process. Open-ended questions may provide insight into the numerical ratings.

Limitations: Faculty members have such different perspectives, approaches, and objectives that a standardized questionnaire cannot adequately or fairly compare their performance. For example, the implicit assumption behind the design of many evaluation forms is that the primary mode of instruction is the lecture method. Such a form will be inadequate in evaluating the performance of a faculty member who uses collaborative or feminist teaching methods. One way to overcome this limitation and to tailor it to the objectives and approaches of a specific course or faculty member, is to design an evaluation form with a mandatory core set of questions and space for inserting questions chosen by the faculty member.

Recent research on the effects of gender on student ratings suggests that female faculty members tend to be judged more rigidly than male faculty members on a variety of dimensions, particularly in questions relating to students' interpersonal experiences with the faculty member. Further, there is some evidence to suggest that student evaluations are biased against non-traditional teaching methods and curriculum, and faculty members from under-represented groups. Required courses are generally rated lower than compulsory courses. Care should therefore be taken to create an appropriate context for interpreting the data in comparison with other courses. Another way to ensure fairness and equity is to ask students to identify the strengths of the faculty member's approach as well as weaknesses and to ask for specific suggestions for improvement.

Validity of Student Evaluations of Teaching: Two types of evidence are necessary to support the validity of student evaluations of teaching (SET). The first, convergent validity, would indicate that SETs correlate well with other measures of instructional quality. In general, research has supported the convergent validity of SETs (see Marsh & Roche (1997) for a review). They have been shown to account for approximately 20 percent of the variance in student achievement as measured by standardized final exams. In addition, instructor self-evaluations correlate moderately well with SETs, and there is a high correlation between the ratings of current and former students. SETs are also related to ratings made by trained external observers. However, there is no relation between ratings by colleagues and administrators and SETs. (This null relationship is due to unreliability in peer ratings; ratings by different peers do not correlate well with each other.)

Biasing Influences on Evaluations. The second type of evidence, discriminant validity, would indicate that SETs do not correlate well with measures unrelated to instructional quality. Although the convergent validity of SETs has been supported, the same cannot be said of discriminant validity. Several factors correlate with SETs that are, on the surface at least, unrelated to the quality of instruction. Factors associated with higher SETs include prior subject interest, elective courses, faculty member expressiveness, and higher expected grades (Marsh & Roche, 1997). The extent to which these factors correlate with SETs, if at all, varies across studies. Many of these studies suffer from a lack of methodological rigor (e.g., small samples, single institutions, etc.) which reduces the confidence that the results are valid and generalizable.

One biasing factor that has been extensively examined is expected grades or grading leniency. Reviews of data gathered from multiple universities, courses, and faculty members indicated a correlation of .20 as a best estimate of the relationship, with a range of approximately .10 to .30 (Feldman, 1976; 1997). The biasing effect of grading leniency has also been demonstrated in experimental studies where student grades were intentionally manipulated upward or downward in natural classroom settings. In a meta-analysis of these studies, Greenwald (1997) reported that the manipulation of grades had moderate to large effects on SETs.

A recent article that garnered attention from The Chronicle of Higher Education (January 16, 1998) used a causal modeling technique (covariance structure analysis) to examine the effects of expected grades on SETs. In this article Greenwald and Gillmore (1997a) analyzed student evaluations of instruction in approximately 200 courses at the University of Washington during three different time periods. At the end of each course, but before the final exam, students responded to survey items measuring evaluations of the course and faculty member, workload, and grade expectations. Their final model included a significant path from expected grade to evaluation of instruction (.44), indicating that higher expected grades were associated with higher evaluations. Interestingly, the model also included a significant path from expected grade to workload (-.49), indicating that students expected higher grades in courses with a lighter workload. Overall, expected grade explained approximately 20 percent of the variance in student evaluations. Greenwald and Gillmore (1997b) suggested that the unwanted influence of grading leniency could be statistically removed from SETs if measures of expected grades were included on student surveys.

Conclusion. Evidence supporting the validity of SETs is equivocal. On the one hand, there is some consistency between ratings made by students and ratings made by others, such as trained evaluators. Most importantly, SETs are related to objective measures of student achievement. On the other hand, however, SETs are biased by several factors, including grading leniency. This is a serious concern because it feeds a culture of grade inflation. Although it is possible to statistically remove the biasing effects of grade inflation, there are no studies examining the relation of "corrected" ratings to important outcomes such as student achievement. It is also possible that excellent faculty members who motivate students to achieve high grades would be unfairly punished if their ratings were lowered by a statistical correction. A simple alternative is to merely recognize that SETs are not "true" indicators of teaching quality and to weight them for personnel decisions with this in mind.

Measure of student achievement

Source: Faculty and appropriate administrators

Description and purpose: In some courses, a test or examination can be an explicit measure of teaching effectiveness. Ideally, data collected at the beginning of the course is compared with data collected at the end of the course to measure students' improvement on some relevant scale of knowledge, ability, etc. These profiles of achievement may be a good source of summative evaluation information. They are particularly effective in a situation characterized by many students and/or multiple sections working from a common syllabus with a common examination where the course goals are very specific.

Benefits: Given similar student entry characteristics and teaching situations, this provides perhaps the most objective evidence of teaching effectiveness.

Limitations: The "sameness" required for this method to be meaningful limits the situations in which it can be used. As differences in expectations or student assessment procedures enter the equation, so the usefulness of this method for summative evaluation declines. The use of examination results to evaluate teaching can lead to instruction being geared to the examination.

Letters and individual interviews

Source: Students, alumni, peers

Description and purpose: Interviews and/or letters can lead to greater depth of information for improving teaching, or for providing details and examples of a faculty member's impact on students for the purposes of teaching award nominations, and the tenure and promotion process. Letters from students and alumni are best solicited by the chairperson or mentor, and individuals asked to write letters should be randomly selected.

Benefits: Interviews and letters elicit information not readily available through questionnaires or student achievement records. Insights, success stories, and thoughtful analyses are often outcomes of an interview or request for a written assessment of a faculty member's teaching. Students who are reluctant to give information on a rating scale often respond well to a skilled, probing interviewer.

Limitations: The disadvantage of letters is that the response rate can be quite low. The major disadvantage of interviews is time. Interviews can take approximately one hour to conduct, about 30 minutes to arrange, and another block of time must be allocated to coding and interpretation. A structured interview schedule can be used to eliminate the bias that may result when an untrained interviewer asks questions randomly of different students.

Focus groups

Source: Students

Description and purpose: Focus group discussions, involving about six or eight students chosen randomly from a faculty member's class, provide a rich description of a faculty member's teaching, since it is based on students' individual opinions and their reflections on, and reactions to, the opinions of others.

The discussion is carried out at a mutually convenient time outside class and should preferably be conducted by a colleague or peer. At the beginning of the group meeting, students are given about five minutes to write independently, describing which teaching behaviors they would like to see the faculty member maintain and which they think the faculty member should change or improve. The items generated are gathered and prioritized under the headings "maintain" and "improve." The facilitator then moves from item to item, alternating between the two columns, asking for clarification and examples to illustrate the points. At the conclusion of the discussion, the facilitator prepares an oral or written report.

Benefits: Data generated in this way provides a very rich description of the strengths and weaknesses of a faculty member's teaching and is probably the most effective way to generate constructive criticism and positive reinforcement for successful strategies. This can be particularly helpful to faculty members who are experiencing problems with their teaching, in which case it is important that the students are selected openly in front of the whole class and in the faculty member's absence. This technique also provides useful feedback for faculty members who are experimenting with new methodology or are engaged in a program of self-improvement. This method may be used for summative purposes and supplements well the quantitative data generated by teaching evaluation forms.

Limitations: The only limitation of this method is that it is time-consuming.

Review of instructional materials

Source: Self

Description and purpose: Instructional materials typically include the following: course outlines, examinations, quizzes, assignments, reading lists, student manuals, practicum requirements, various audiovisual materials (overhead transparencies, videos, slides, computer software, etc.). Many academic units require a course outline which highlights teaching objectives along with student performance expectations.

The content contained within the resource materials can reflect the quality of thought and effort put into the planning and preparation for teaching. The materials may provide insight into the guidance and supervision provided to students outside the classroom setting. Gathering this material gives the faculty member an opportunity to assemble a teaching portfolio that can be an essential component of a tenure and promotion file or teaching award nomination. The portfolio may also be useful to units in upgrading or reforming their curriculum.

Benefits: A review of instructional materials can instigate a professional exchange of information regarding the content being taught and research that might be integrated into the course. The data collected in this way provides a perspective on teaching not obtainable through classroom observations and may also enable an academic unit to maintain a curriculum focus. Evaluation of instructional materials by peers is a more reliable and valid measure of a faculty member's teaching effectiveness than that obtained by asking students to assess the course materials.

Limitations: This initiative is time consuming and costly. It is also open to individual bias, and so a standing committee within an academic unit could provide a formal, consistent, and systematic approach to carrying out this initiative.