Task Force on Teaching Evaluation - Background & Rationale
"An important and welcome change is taking place on college and university campuses: Teaching is being taken more seriously (Seldin, 1993, p. 1)."
Teaching is central to our job as university professors. We should thus develop and evaluate teaching in the same rigorous and careful way that we evaluate scholarship and service. Some faculty members may ask: why evaluate teaching? The most direct response is that the UW System requires it. A more complex answer is that since teaching evaluation will continue to be a critical part of the personnel evaluation process (retention, promotion, tenure, merit, post-tenure review), it is in the best interests of the entire academic community to develop a process that is fair and reliable. A more formative answer is that we need to understand how well we are doing to get better. Evaluation can lead to significant improvements in teaching (Centra, 1993), and as Seldin (1991) notes, "there is no greater purpose for performance evaluation than to improve performance (p. 28)."
At UW-Green Bay, we have traditionally relied on student end-of-course evaluations to evaluate teaching. Tapping only one source of information, it is quite possible that the way we evaluate teaching promotes a biased assessment of teaching effectiveness (Centra, 1979; Miller, 1987; Seldin, 1993, 1991). Certainly, much that goes into teaching is not directly visible in the classroom. Teaching is a process, but students primarily see the performance. We do not train students to evaluate teaching, and they cannot address such things as the appropriateness and currency of material or choice of topics. On the other hand, students can judge a faculty member's ability to get an idea across, whether assignments have been explained, and so on.
If we view teaching as a complex process and not as a simple "input-output" equation, we must move beyond the use of student end-of-course evaluations and gather additional information that gives a more accurate picture of the process and of the faculty member's efforts. Gathering additional information and determining how to combine this information for summative and formative purposes is difficult and time-consuming. However, given that it is a necessity in some form, the central question should be: what sort of work has the best payoff?
We believe the answer is found in a portfolio approach to the evaluation and development of teaching effectiveness. A well-prepared portfolio contains documents and material that describe the scope and quality of a faculty member's teaching. By its very nature, it recognizes that teaching situations vary from one individual to the next and provides a means to incorporate these individual differences into the evaluation process.
What is Good Teaching?
All evaluation methods contain implicit assumptions about the characteristics of good teaching. These assumptions should be made explicit and should become part of the evaluation process itself in that we recognize faculty members' rights to be evaluated within the context of their own teaching philosophies and goals. Thus, teaching is not right or wrong, good or bad, effective or ineffective in any absolute, fixed or determined sense. Faculty members emphasize different domains of learning (cognitive, affective, psychomotor). They work at different sites (classrooms, labs, seminar rooms, studios, field locations, etc.), using different techniques and resources (lecturing, demonstrating, etc.) with students of diverse backgrounds and levels of preparedness. They may also employ different theories of education and teaching methodologies (feminist, anti-racist, humanistic, etc.). In one situation, faculty members may see their role as imparting information, and in another as promoting critical thinking.
As variable and diverse as effective teaching might be, we can generalize about its basic characteristics. Effective teaching brings about the most productive and beneficial learning experience for students and promotes their independence as learners. This experience may include, along with the content base of the course, such factors as intellectual growth, change in outlook and attitude toward the discipline and its place in the academic endeavor, and skill improvement (e.g., critical reading and writing). The criteria for evaluating teaching can vary with the discipline and within the discipline, depending on the course's level, the faculty member's objectives and style, and the teaching methodology. Regardless, the primary criterion must be improved student learning. Research shows that students, faculty and administrators agree on the following qualities of effective teaching: ability to motivate and establish a positive learning environment; providing appropriate challenges; concern for students' needs and welfare; sensitivity to students' different learning styles; and fairness. For some situations (e.g., lecture), the following may also indicate effective teaching: organization of subject and course; effective communication skills; knowledge of and enthusiasm for the subject; availability to students; choice of materials; and openness to student concerns and opinions. Some characteristics are more easily measured than others. Finally, we know that not everyone will display the same strengths -- excellent faculty members may be strong in many areas but not all.
Use of a portfolio with multiple sources of information moves us dramatically beyond the student-driven evaluation of teaching model. We are not suggesting that student evaluations be discontinued. Rather, we argue for an evaluation process that recognizes and captures the complexities of teaching. A teaching portfolio does this. Further, as demonstrated in our later discussion of the portfolio process, a portfolio approach is consistent with (1) the principles espoused earlier in this report; (2) the policies on teaching evaluation in place at UW-Green Bay (see the UW-Green Bay references in the Bibliography); and (3) the perspectives on teaching evaluation shared by the faculty and instructional academic staff at UW-Green Bay.Perspectives on Teaching Evaluation at UW-Green Bay
In January 1998, we mailed surveys to all of UW-Green Bay's teaching academic staff and faculty. The surveys were accompanied by a cover letter explaining the charge of the Task Force, the statement of the problem, and assurance that responses were completely anonymous. Respondents were given two weeks to return the survey. The survey focused on three aspects of teaching evaluation at UW-Green Bay: (1) what faculty members think about the existing evaluation system; (2) an examination of the different teaching evaluation tools at UW-Green Bay; and (3) specific comments relating to the UW-Green Bay Course Comments Questionnaire (CCQ). A five-point Likert scale was used for the first two sections of the survey and open-ended comments were collected for the third section. See Appendix A (Perspectives on Teaching Evaluation at UW-Green Bay) for a copy of the cover letter and survey.
Table 1 shows the respondent demographics compared with the demographics for all university faculty and instructional academic staff. Respondents were well-distributed across the academic areas except the Natural Sciences which were under-represented. We had a representative sample of males and females. Associate Professors and Lecturers/Instructors were slightly under-represented and Professors were slightly over-represented. Average years of service was 14.1. Thirty-five (35) percent of the respondents had served or were serving as budgetary unit chairs; fifty (50) percent had served or were serving as disciplinary chairs. Overall, we believe the sample is sufficiently representative of the total faculty population.Table 1, Survey Respondent Demographics
76 respondents of 190 possible (40% of total) Respondents All UW-Green Bay Faculty and Instructional Academic Staff of sample of total Rank Professors Professors missing Total Area & the Arts Sciences Programs Sciences missing Total Gender missing Survey Results and Discussion
We have organized the results of the study around the three sections of the survey.Faculty Assessment of UW-Green Bay's Teaching Evaluation Process
Frequencies, means, and standard deviations were calculated for the first set of questions (1a-1n). Table 2 summarizes these findings. Based on the distribution of scores, we can conclude:
- Units should consider a wider variety of kinds and sources of evidence about the quality of teaching (question 1b). Using additional materials (course syllabi, examinations, etc.) should help (question 1m). There is too great a reliance on student evaluations of teaching for personnel decisions (question 1j); however, approximately two-thirds of the respondents (68 percent) believe that student evaluation of teaching should be part of that process (question 1g). Over half the sample believe that students take evaluation of faculty members seriously (question 1k).
- Student evaluations should be used for improvement processes (question 1c) and student evaluations have helped to improve teaching (question 1f). However, reliability and validity of student evaluations represent an issue for approximately half the respondents (question 1d). The same number believe that the use of student evaluations for promotion and tenure have contributed to grade inflation (question 1l).
- Faculty members agree on the need to improve the quality of student evaluations. Specifically, 78 percent of the respondents report that "efforts to improve the quality and reliability of students evaluations should be given a high priority."
An important question to ask is: do the responses for the first set of questions vary according to rank, gender, department, chairperson status, and years of service? To answer this question, we employed a one-way Analysis of Variance (ANOVA), with rank, gender, department, years of service and chairperson status as the independent variables and responses to the first 14 questions as the dependent variables. We chose a p value of .05 as an appropriate measure of significance.
Gender is a powerful predictor (See Table 3). Nine of the 14 relationships are significant at the p<.10 level. As evidenced in Table 3, females are less confident in the use of student evaluations and argue most strongly for their strengthening.
Table 3, One Way Analysis of Variance
Gender as the Independent Variable
Statements re: teaching evaluations Mean Score: Females Mean Score: Males F Value P Value Teaching weighted more in personnel process Depts consider greater variety kinds and sources of evidence Student evaluation of teaching used for improvement purposes Little confidence in student teaching evaluations Dept has evidence to decide quality of teaching Student evaluation results improved quality of teaching Student evaluations used for personnel decisions Made changes in teaching as result of student evaluations High priority given to improvement of student evaluations Too much reliance on student evaluations for personnel decisions Student evaluations not taken seriously by most students Importance placed on student evaluations led to grade inflation Instructional materials evaluated by colleagues Personnel review system decisions - fair and equitable
NOTE: * = statistically significant at p<.10. In calculating the means, a four-point scale was used (4 = strongly agree, 1 = strongly disagree).
Whether one served as a budgetary chair is not related to variations in our dependent variables. In a similar vein, disciplinary chair experience predicts little. Only for question 1e (sufficiency of evidence to make reasonable personnel decisions) did we find a significant relationship (x1 = 2.66; x2 = 2.23; F = 5.14; p<.03). These results suggest that present and past disciplinary chairs more strongly believe that reasonable personnel decisions are made with the available evidence.
The results for title and area are quite similar. For title, we found one significant relationship. Professors are more convinced that student evaluations of teaching should be used for improvement processes (question 1c) than Associate Professors (X1 = 3.4; X2 = 3.0; F = 2.7; p<.05). With area as the independent variable, two statistically significant relationships were found. People in the Natural Sciences are less convinced than their counterparts that efforts to improve the reliability of student evaluations should be a top priority (question 1i, F = 3.46, p<.013). Also, a Natural Science mean score of 2.2 for question 1m suggests disagreement with the statement that "instructional materials . . . should be evaluated with the same care as is used in reviewing scholarly publications" (F = 6.01, p<.0004).
Years of service is inversely and significantly related to three questions: "I made changes in my teaching as a result of student evaluations" (question 1h, r = -.39, p<.01); "Efforts to improve the quality and reliability of student evaluations of teaching should be given high priority" (question 1i, r = -39, p<.01); and "The current personnel review system results in decisions which are fair and equitable vis a vis teaching" (question 1n, r = -30, p<.05). Given the scaling of the questionnaire and the negative relationships, we can conclude that as years of service increase, (1) faculty members are less likely to make changes in their teaching because of student evaluations, (2) they are less likely to believe that improving the reliability of student evaluations should be a top priority, and (3) they are less likely to believe that the current review system results in decisions that are fair vis a vis teaching.
Methods to Evaluate Teaching at UW-Green Bay
The second section of the questionnaire (questions 2a through 2z) examines the different tools/methods used to evaluate teaching effectiveness at UW-Green Bay and the usefulness of these tools. We report the survey results in Table 4. An examination of the table reveals that relatively few tools/methods are used to evaluate teaching. For the most part, these methods parallel those stated on the personnel addendum as evidence of teaching effectiveness. Summary of student evaluations is the only method used in all units. The eight most commonly used tools/methods are: (in descending order)
- Statistical summaries of student evaluations (100%);
- Student comments (83%);
- Number of independent studies (77%);
- Improvement in instructional activities (73%);
- Publication of text books (64%);
- Course materials such as exams and syllabi (58%);
- Involvement in instructional activities with an interdisciplinary focus (54%); and
- Number of thesis committees (52%).
Of special interest are the responses to the question: "Should this information be used in your area to evaluate faculty members' teaching?" A pattern becomes visible when one examines responses to that question given the relatively few number of tools that are currently and regularly used to evaluate teaching. Of the 26 methods listed, over half the respondents gave a rating of "should definitely be used" or "should probably be used" to 17 of these methods. Clearly, there is a desire to broaden the evaluation process and use additional methods/tools. Combining the "should definitely be used" category with "probably should be used" category, the five methods receiving the highest ratings are:
- Course material (95%);
- Involvement in instructional improvement (86%);
- Statistical summaries of student evaluations of teaching (81%):
- Analysis of course materials by colleagues who know the field (78%); and
- Publication of textbook/instructional material (78%).
The following 12 methods/tools received combined ratings of over 50 percent, suggesting that units may want to consider adopting these methods:
- Surveys/interviews of graduating students or recent alumni (73%)
- Student comments from teaching evaluations (73%)
- Student self-assessment of skills (72%)
- Instructional improvement with interdisciplinary focus (68%)
- Colleague evaluations of teaching (67%)
- Incorporation of Instructional Technology (66%)
- Colleague evaluations in consultative/clinical settings (60%)
- Student learning in subsequent courses (59%)
- Faculty self-assessment (55%)
- Comparison of student course ratings with ratings of other faculty teaching similar courses (53%)
- Videotape samples of classes (52%)
- Number of thesis committees chaired or research projects supervised (51%)
Finally, we employed a one way ANOVA to see if "usefulness" of a given evaluation method varies according to gender, age, area, and rank. The analysis uncovered many such differences significant at the p<.05 level. For our purposes, the exact differences are not important. Rather, these many differences strongly suggest that individual units should develop evaluation methods that suit their own purposes and unique circumstances. Also, based on reported differences across years of service age, rank, and gender, the data suggest that variation in method should be permitted or encouraged across individuals in a given unit. Hence, a "portfolio" approach is an appropriate strategy. With this approach, each faculty member's portfolio would be different.
Table 4, Evaluation Methods
Frequencies Evaluation Method Yes No SDU SPU SPNU SDNU NO Statistical summary of student evaluations Survey graduating students Comp of std course ratings Student comments Student self assessment Colleague observe -consultative settings Course materials Expert analysis of materials Faculty self assessment Alumni placement/performance Thesis committees chaired Letters from students Letters from alumni Student learning in subsequent courses Colleague evaluation of teaching Attract high quality students Colleague evaluation of student work Prod/rep of former stds Colleague observation: colloquia Instruct improvement Instruct improvement w/ interdisciplinary focus Textbook publication Incorporation of instructional technologies Number of independent studies Number of students advised Video of class NOTE: Percentages may not add up to 100% due to rounding. CODING: SDU = Should definitely be used; SPU = Should probably be used; SPNU = Should probably not be used; SDNU = Should definitely not be used; and NO = No opinion.
Course Comments Questionnaire (CCQ)
Section three of the survey asked respondents to comment directly on the Course Comments Questionnaire (CCQ), which is used in most units across campus. We asked two questions: (1) Why do you use the CCQ?, and (2) What should be done to change the CCQ? The open-ended comments are summarized below.
The major reasons I use the CCQ are:
- Requirement/widespread use/tradition 49%
- Ease of use/convenience 29%
- Gives standards for comparison 12%
- Provides feedback 4%
If I could change the CCQ, I would:
- Add to or alter the questions 55%
- OK as is 8%
- Not use it 8%
- Validate it 6%
The suggestions for improving the CCQ were very helpful and are:
- ask for students' overall evaluation of the teaching as opposed to overall assessment of the course
- add questions that give the faculty member specific ideas on how to improve his or her teaching
- delete items (the most asked for deletion is the fifth question which deals with course difficulty)
- require comments
- add questions that are more specific to areas, the size of the course, required versus elective, and the nature of the course (disciplinary/interdisciplinary)
- add questions that assess student effort
- add questions that delineate student demographics (e.g., expected grade, whether the student is majoring in discipline, students' perceptions of the amount learned, students' expectations of a reasonable workload)
- add questions that better assess course content
In summary, the open-ended comments suggest that faculty members use the CCQ because of tradition and ease of use. Over half the respondents want to alter the CCQ, adding items that are more descriptive of what actually happens in the classroom.