Education Policy Analysis Archives |
||
Volume 5 Number 18 |
August 25, 1997 |
ISSN 1068-2341 |
| Editor: Gene V Glass Glass@ASU.EDU. College of Education Arizona State University,Tempe AZ 85287-2411 Copyright 1997, the EDUCATION POLICY ANALYSIS ARCHIVES.Permission is hereby granted to copy any article provided that EDUCATION POLICY ANALYSIS ARCHIVES is credited and copies are not sold. |
Academic Freedom, Promotion, Reappointment, Tenure And The Administrative Use of Student Evaluation of Faculty (SEF):
(Part III)
Analysis And Implications of Views From The Court in Relation to Accuracy and Psychometric ValidityRobert E. Haskell 1
University of New England This is the third of four articles by Haskell on this subject. The other articles can be found at
Abstract: In two previous papers, it was noted that while a controversial history of research on the reliability and validity of student evaluation of faculty (SEF) exists, it has not been typically viewed as an infringement on academic freedom, promotion, reappointment, and tenure rights. As a consequence, legal aspects of SEF are neither readily apparent, nor available. Legal rulings, their implications and assumptions in relation to their accuracy and psychometric validity where SEF are integral to the denial of academic freedom, tenure, promotion, and reappointment are reviewed along with the legal principles of Disparate Treatment and Disparate Impact, and the scientific Precautionary Principle in policy decisions.
Table of Contents
..........As I indicated in previous papers on SEF (Haskell, 1997a, 1997b), the history of legal rights demonstrates that issues not considered to have legal standing only come to have legal standing after a long process of advocacy. The evolution of a policy or legal principle requires the accumulation of data, coalescing judgements and arguments. To this end, this paper, will continue to examine court reasoning and rulings on SEF in cases involving the denial of academic freedom, tenure, promotion, and reappointment (AFTPR) decisions in relation to its implications and assumptions regarding accuracy and psychometric validity.2
..........In a second paper on SEF, (Haskell, 1997b), I abstracted from the text of located legal cases views from the court pertinent to SEF. The appendix of the second paper provided a verbatim abstracting of the text of each case relative to its SEF content. As a consequence, in summarizing the pertinent findings of that paper for the present one, for convenient referencing the specific case textual material for each section will be in placed in a footnote indicated at the beginning of each section heading and the indented "Summaries" are carried over from Part II.For convenience, I will use these abstracted legal views and rulings to examine their implications for the courts use of SEF in relation to their accuracy and psychometric validity. A final paper will address the implications of court rulings for academic freedom and instruction. As I noted in my second paper, not only are legal cases prima facially complex, but when specific legal definitions (e.g., disparate treatment and impact) and other special Congressional Acts (e.g., EEOC) are superimposed on them, they become logically unwieldy, not just to the non legal scholar, but apparently to the Courts as well.3
..........Finally, I would like to point out that the issues examined in this series of papers are not primarily concerned with individual faculty rights but with the implications of SEF when used for administrative purposes on academic freedom, educational quality, standards, and ultimately on the competence of graduates.4
Brief Overview of the Validity of SEF
..........As shown in Haskell (1977b) and reiterated below, views from the court on the appropriate use of SEF vary so greatly that the concept of variation might more descriptively be replaced by the concept of "randomness" were it not for the fact that there has been a consistent trend by the courts to accept SEF data as it is presented to them by institutions. In presenting an analysis and implications of these views from the court in relation to their validity, the detailed research literature on the validity of SEF will largely have to be bracketed. To do otherwise would take this article to far afield. Nevertheless, because the issue of validity is so central to this paper, an overview of the SEF validity literature is a necessary foundation for the following analyses.
..........There is a long and controversial research history on SEF, with most early reviews and extant opinion---though certainly not all---suggesting their general validity, with validity referring to the accuracy of SEF measuring teaching effectiveness. More recently, however, sophisticated statistical reviews of this past literature strongly suggest that earlier reviews of SEF literature were not rigorously analyzed and controlled methodologically, thus casting serious doubt on their validity. As Barnett (1996), Greenwald (1997), Greenwald and Gillmore (1996) demonstrate, past reviews have tended to not be sophisticated critiques. Positions, suggesting cautious support for validity of SEF while at the same time expressing concerns about the adequacy of their support, include, Abrami, Dickens, Perry, & Leventhal (1980). Reviews and empirical critiques that are critical of the validity of SEF include, Chacko (1983), Dowell and Neal (1982), Powell (1977), Snyder and Clair (1976), Vasta and Sarmiento (1979), and Worthington and Wong (1979). Some of the past reviews that have categorized the significant research that have found SEF to be essentially a valid measure of quality of instruction are: Cashin (1995), Cohen (1981), Franklin & Theall (1990), Holmes (1972), Howard, Conway, and Maxwell (1985), Howard and Maxwell (1980, 1982), Marsh (1980, 1982, 1984), Marsh and Dunkin (1992), and McKeachie (1979).
..........Cahn (1987) suggests that student ratings do not measure the instructional effectiveness or the intellectual achievement of students. SEF measure student satisfaction, attitudes toward instructors course, student personality, and their psychosocial needs. Cahn further suggests, students know if instructors are likeable, not if they are knowledgeable; they know if lectures are enjoyable, not if they are reliable. In a meta-analysis Cohen (1983)---who basically accepts the validity of SEF---concludes from his study, "While the magnitude of the average rating/achievement correlation for the thirty-three multisection courses is not overwhelming [14.4% of shared variance between ratings and the criteria], the relationship is certainly stronger and more consistent than we were led to believe..." (p. 455). And Dowell & Neal (1982) conclude that"The research literature can be seen as yielding unimpressive estimates of the validity of student ratings. At their most valid, then, validity of SEF refers to only 14% of the total variance. The literature does not, therefore, support claims that the validity of student ratings is a consistent quantity across situations. Rather, the evidence suggests that the validity of student ratings is modest at best and quite variable...The variability in obtained validity coefficients even in studies with reasonable methodological requirement lead us to suspect that the validity of student ratings is influenced by situational factors to such an extent that a meaningful, generalizable estimate of their validity does not exist. In general . . .no meaningful estimate of the validity of student ratings can be provided with confidence that is generalizable enough to be useful..." (59-61).
For example, studies demonstrate the following confounding variables: (1) Age, (2) gender, (3) class size, (4) year of student, (5) level of student, (6) instructor style, (7) subject matter, (8) major or elective course, (9) student interest in subject matter, (10) instructor grading difficulty, (11) anonymous v.s signed ratings, (12) whether students are informed of their use, (13) instructor present v.s instructor absent while completing the evaluation (see for example, Divoky and Rothermel, 1988), (14) length of class period, and a host of other variables.
.
..........Finally, the philosopher of science, Michael Scriven who has conducted rigorous work on evaluation procedures, (1995, 1993, 1991, 1988), particularly on the justification of inferring from ratings to conclusions about the merit of teaching on the basis of statistical correlations between ratings and student learning gains. He suggests that such inferences are invalid, unless a number of stringent conditions are met on the design, administration, and use of such ratings. He further suggests of faculty evaluation in general that, "All are face-invalid and certainly provide a worse basis for adverse personnel action than the polygraph in criminal cases. Based on examination of some hundreds of forms that are or have been used for personnel decisions (as well as professional development), the previous considerations entail that not more than one or two could stand up in a serious hearing." Given this highly questionable state of affairs on the validity of SEF, the question is how do courts view validity in relations to its use for administrative purposes?The Courts' Approach to the General Accuracy
and Psychometric Validity of SEF 5..........An issue directly related to the reliance on SEF for administrative purposes is its validity. Presumably the more valid SEF data in a given case, the more justifiable is the reliance on it for administrative purposes. From the legal cases reviewed (in Haskell, 1997b), it is clear that the courts tend to accept SEF data as presented to them by institutions.
Summary: With regard to requiring the general and statistical accuracy of SEF, legal reasoning and rulings can be summarized (see Haskell, 1997b) as ranging from: (1) accepting statistical analyses as a part of a plaintiff's effort to establish discriminatory treatment if it reaches proportions comparable to those in cases establishing a prima facie racial discrimination, (2) cautioning that statistics are not irrefutable, with their usefulness depending on surrounding facts and circumstances of a case, (3) maintaining that the court need not consider validity and is under no obligation to establish the accuracy of administrative interpretations of SEF, (4) that tenure criteria are not drawn with "mathematical nicety," (5) administrator's failure to perform statistical comparisons is not arbitrary and is reasonable, (6) especially if such is not required by a Faculty Association Contract, (7) nearly any use made of SEF, regardless of its validity, is acceptable if it followed the standard practice of the university, (8) that creativity, rapport with students and colleagues, teaching ability, and other qualities are intangibles which cannot be measured by objective standards.Some courts (e.g., Fields V. Clark University, 1987) have noted even when SEF are not gathered and evaluated according to accepted standards of scientific polling procedures it is nevertheless acceptable if the process followed standard practice involved in other tenure decisions at the university (p.671).
..........While there does exist a "substantial evidence" standard which gauges whether an institution's decision-making body carefully considered the evidence and had a substantial body of evidence on which to base its decision, and an "arbitrary and capricious" standard which gauges whether a deciding body acted without reason or irrationally, (See Kaplin,1995, section 1.4.3.6. Standards of Judicial Review and Burdens of Proof 35), it appears these standards are frequently ignored in relation decisions based on SEF.
..........In general, the exception to the courts almost total disregard for the validity of SEF has been in cases involving EEOC issues. In such cases, the courts require precise accuracy. I will address this issue in more detail in the section of disparate treatment and impact below.
Historical Overview of the Courts' Approach to the Validity of Faculty Evaluation Data
..........As noted previously, unlike general performance evaluations of faculty, SEF does not have a categorical legal history. Since SEF is but a subset of faculty performance evaluation in general, it is appropriate to briefly review the history of this more general area. Given SEF as a subset of faculty performance in general, it is accordingly not surprising to see that the view from the courts on the validity of SEF parallels that of the courts view of faculty performance evaluation.
..........Historically, in terms of faculty evaluation instruments in general, (on both secondary and postsecondary levels) it is widely agreed by legal scholars (Baez, Benjamin, and Centra, 1995) that "Despite the subjectivity of measuring the quality of a faculty member's scholarship, service and teaching accomplishments, courts will rarely, if ever, question the appropriateness of an institution's criteria (or how they measure them) for granting reappointment, promotion, or tenure....they will rarely substitute their judgments for those of peer review committees....Although juries may have less deference" (p.139). It might also be added that courts will seldom question administrative judgements of evaluations. It seems that faculty who challenge institutional evaluation tools very rarely succeed. Although the legal "competent and substantial evidence" standard places a significant burden of proof on the educational organization, it has not generally required that faculty assessment instruments are professionally validated (Rebell, 1990; Kaplin and Lee, 1995). Such rulings do, however, appear to vary by state or federal jurisdiction.
..........Psychometric standards of validity, reliability, and specific evaluation techniques, are rarely incorporated in state laws, regulations, or common-law standards. Accordingly, cases that involve evaluation have tended to focus on adherence to specific procedural requirements as set forth in state law or on general common-law notions of fairness and due process, not on expert psychometric standards. Although state courts will require strict adherence to the procedural aspects of these requirements and will strike down an arbitrary failure to use any apparent evaluative criteria, the state courts tend not to probe the substance of evaluation criteria or methods (Rebell, 1990; Kaplin and Lee, 1995). As Copeland and Murry (1996) have put it, "the judiciary has generally behaved as though it believed that evaluations were made only after careful deliberation and with procedural due process protections. In short, the judiciary has tended to act as if colleges and universities could be trusted to act in good faith" (p.246).
..........Rebell (1990) outlines what he describes as a "striking example of the courts' traditional deferential attitude toward teacher evaluation" data (p.337). The decision of the United States Court of Appeals for the Eighth Circuit in Scheelhaase v. Woodbury Central Community School District (1973), involved the dismissal of a teacher whose contract had previously been renewed over a ten-year period. The reason for her termination was that she was incompetent as indicated by the low scores of her students on the Iowa Test of Basic Skills (ITBS) and the Iowa Test of Educational Development (ITED). Despite the a number of expert witnesses testifying that it was inappropriate to use such test scores as a basis for evaluating a teacher performance, the court dismissed Scheelhaase's claim. The claims were considered basically irrelevant by the court because "such matters as the competence of teachers and the standards of its measurement" are not matters of constitutional dimension.
..........This early case involving a public school teacher is significant both because (a) of the Court's apparent lack of concern with the serious psychometric issues raised by a reliance on student achievement scores as a sole stated basis for termination and (b) because of the Court's almost total reliance on a school administrator's psychometrically unsubstantiated, and quite possibly equally erroneous evaluation. One of the concurring Scheelhaase case judges bluntly stated:The Board was entitled to rely upon the recommendation of conclusions of its superintendent, not-withstanding the existence of strong opinions contrary to his regarding the use of the ITBS or ITED tests as a tool of Leacher evaluation...Thus, its decision, even though premised upon an apparently erroneous 'expert opinion 'cannot be faulted as arbitrary and capricious. The Board's mere mistake in judgment or in weighing the evidence does not demonstrate any violation of substantive due process. (Emphasis added).
Thus, even when states use student achievement scores as an index of faculty proficiency, 6 courts have had an "apparent lack of concern with serious psychometric issues raised by reliance on student achievement scores as a sole stated basis for termination," again, relying on administrator's unsubstantiated evaluations (Rebell, 1990). Thus, courts have historically adopted the position that they are not qualified to second guess peer-review committees, at least as long as committees do not act arbitrarily and instruments are consistently and fairly applied (Baez, Benjamin, and Centra, 1995; Kaplin and Lee, 1995; Rebell, 1990). Traditionally, notes Rebell, most other courts have tended to take a similar deferential stance in teacher evaluation cases.
.
..........There seems to be two exception to this. The first is in discrimination cases. In general, courts have tended to only require precise accuracy in cases where EEOC issues are involved (See below). The second, is in claims of unfair treatment because of exercise of First Amendment free speech rights, including union-organizing activities, or allegations of denial of Fourteenth Amendment rights to due process by tenured teachers or others with a reasonable expectation of continued employment will trigger federal court jurisdiction with greater scrutiny of data (Rebell, 1990).Acceptance of Administrative Subjective and Untrained Evaluator Judgements Of SEF Data 7
..........An issue directly related to both the reliance on and statistical accuracy of SEF are views of the court regarding accepting or not accepting subjective administrative judgements of faculty teaching effectiveness.
Summary: With regard to accepting the subjective judgements of administrators evaluation of SEF, the legal reasoning and rulings can be summarized as ranging from: (1) accepting administrative subjective judgements if (2) they are deemed sincere (3) grounded on some evidentiary basis (4) if made on the "vigor and variety of student criticisms" (5) "not arbitrary or capricious and were exercised honestly upon due consideration," (6) based upon "much experience in reviewing student evaluations, (31) reasonably draw on that experience (7) and have ruled that Presidents are not bound by factual findings made by majority members of a faculty.
Not only have the courts not traditionally examined faculty evaluations rigorously, they have tended not to require that evaluators be trained in the use, analysis, and interpretation of evaluation instruments. In general, state courts reviewing teacher evaluation practices will not analyze directly the substantive criteria used to evaluate teachers, nor the or qualifications of the raters. (Rebell, 1990). There are exceptions, however.
..........Some states, like Florida and Pennsylvania now mandate such training. Florida specifically mandates school boards to provide training programs to "ensure that all individuals with evaluation responsibilities understand the proper use of the assessment criteria and procedures" (Fla. Educ. Code, /sec 231.29(2). In Pennsylvania (Rebell, 1990), employees must be evaluated "by an approved rating system which shall give due consideration to personality, preparation, technique and pupil reaction in accordance with standards and regulations for such scoring as determined by rating cards to be prepared by the Department of Public Education...." (p.345-6).
SEF as Social Judgement and Diagnosis
..........Given the courts assumptions regarding validity and the untrained judgement of those making decisions based on SEF, a part of influencing the courts is demonstrating relevant research. In the research on social judgement and clinical diagnosis, it is clear that the manner in which nearly all SEF data are analyzed is but a subset of the social judgement and clinical diagnosis literature, involving the same logical and cognitive bias and distortions that result in the pervasive inaccuracy of social judgement in general and clinical diagnosis in specific. The findings of the judgement research literature applies to students making such judgements in evaluating faculty and to those interpreting the results; they are in fact making diagnoses.
..........Psychological research has recognized the severe cognitive problems and limitations of "intuitive," and "experience-informed" everyday judgements for over thirty years, (Dawes, Faust, and Meehl, 1989; Faust, Guilmette, Hart, Arkes, Fishburne and Davey, 1988; Garb, H. N. 1989; Hayes, 1991; Larkin, McDermott, Simon, and Simon, 1980; Rabinowitz, 1993) yet the mistakes continue in everyday practice situations. Interpretation of SEF are no different. As two authors who consider SEF literature valid (Franklin & Theall, 1990)---point out:Even given the inherently less than perfect nature of ratings data and the analytical inclinations of academics, the problem of unskilled users, making decisions based on invalid interpretations of ambiguous or frankly bad data, deserves attention. According to Thompson (1988, p. 217) "Bayes Theorem shows that anything close to an accurate interpretation of the results of imperfect predictors is very elusive at the intuitive level. Indeed, empirical studies have shown that persons unfamiliar with conditional probability are quite poor at doing so (that is, interpreting ratings results) unless the situation is quite simple." It seems likely that the combination of less than perfect data with less than perfect users could quickly yield completely unacceptable practices, unless safeguards were in place to insure that users knew how to recognize problems of validity and reliability, understood the inherent limitations of rating data and knew valid procedures for using ratings data in the contexts of summative and formative evaluation. (79-80).
The authors conclude by noting, "It is hard to ignore the mounting anecdotal evidence of abuse. Our findings, and the evidence that ratings use is on the increase, taken together, suggest that ratings malpractice, causing harm to individual careers and undermining institutional goals, deserves our attention." (p.79-80). Recognizing such problems is not methodological nit-picking; they are pragmatic, paradigmatic, and scientifically fundamental.
.Variables Affecting Validity Not Taken Into Account When Assessing SEF 8
..........In conducting any research, it is a given there are a host of variables that affect outcomes. Put in experimental terms, there are a host of independent variables that affect the dependent variable (here teaching effectiveness). The question is, how have courts addressed this crucial issue that impacts so centrally on validity of SEF data?
Instructional Variables
..........Legal cases concerned with the validity of SEF occasionally note various instructional factors that were not controlled in the faculty evaluation process.
Summary: The variables noted in the legal cases reviewed include, (55) not controlling for class size, i.e., those obtained in small seminars from those obtained in large lecture classes, (56) those obtained from tenured faculty from those obtained from non tenured junior faculty, (57) not performing appropriate comparisons of SEF with other faculty, (58) noting SEF in all courses, not just to problem courses, (59) not mistaking student 'response' figures for actual student enrolment figures when using them to determine student attraction to a course, (60) using all courses taught, (61) taking into consideration faculty teaching a wide range of courses, versus those with lighter teaching loads, (62) number of new courses taught in a year, (63) whether graduate courses were taught at the same time as teaching undergraduate courses, (64) selectively mentioning only negative student comments, or (65) overly weighting negative comments, and (66) different procedures for gathering student opinion.
Courts sometimes weigh these variables heavily, in most cases, however, the courts either ignore them or do not weigh them very heavily in the total context of a particular case. 9
.Student Biases Variables 10
..........A significant issue is how courts view student biases in assessing the reliability and validity of SEF.
Summary: Student bias variables include reactions to (48) academically demanding faculty, that (49) thus thwart student expectations, (50) difficult examinations (51) tough grading policy, (52) heavy workload in a course. (53) While most courts ignore these student biases in SEF, (54) occasionally a court will recognize that difficult courses have to be given to the students and that such material is difficult for even the best teacher to get the material across.
In general, however, it is overwhelmingly clear that courts seldom take these variable into account, despite the fact that such reactions often function as generalized affective overlays on SEF (see below).
.Popularity Variables and Effectiveness 11
..........A related student variable issue is the extent to which SEF measures popularity, not teaching effectiveness. Accordingly, it is instructive to see how courts view this issue.
Summary: Court rulings range from saying that (9) in cases of exceptional research faculty that popularity should not play a role in termination due to teaching, to (10) in normal cases that a measure of popularity is related to teaching effectiveness.
While not noted frequently, popularity appears to be generally assumed to be involved in teaching effectiveness. But again, the courts are mixed on this issue as well. In terms of the research literature there is little to no support for popularity being a measure of teaching effectiveness in higher education. 12
.The Courts' Reliance on Both Quantitative Data
and Qualitative Comments in SEF 13
Reliance on SEF v. Peer Evaluation
..........Is it considered acceptable, for example, to rely heavily or even solely on SEF, or must they be used in conjunction with other evaluative methods?
Summary: From the cases analyzed, it can be seen that court rulings range from saying that (1) relying primarily or solely on student evaluations is acceptable, to (2) placing little exclusive reliance on SEF, (3) in rare cases SEF can not be permitted to stand in the way of promoting or retaining professors who are excellent in non teaching areas, (4) tenure decisions can not be based solely on SEF by students who have not been made aware of the ramifications of their evaluations, (5) anonymous documents or those "based on hearsay" should not be included in a faculty member's file, (6) students should be made aware of the purpose and ramifications of their evaluations of faculty, (7) anonymous student evaluations should not be used, (8) peer evaluations must also be a part of evaluating teaching.
Again, courts range widely on the exclusiveness or non exclusiveness of SEF, even though books on how to conduct faculty evaluation (by authors who basically accept the validity of SEF, e.g., Seldin, 1984; Theall, and Franklin, 1990) for some time now have consistently emphasized that SEF should not be used as the only and/or primary method for assessing teaching effectiveness.
.Numerical Ranking of Faculty 14
..........An important issue is how the courts view the relative weighting of SEF in administrative decisions of teaching competence. It seems to be common practice to ordinally rank and compare faculty to each other according to average SEF numerical scores.
Summary: From the cases reviewed, numerical scores from SEF often result in faculty (22) being compared relative to other faculty, (23) being ranked relative other faculty, (24) with distinctions often being made on the basis of tenths of a decimal, (25) with most courts accepting these fine decimal distinctions.
Despite the above overview of the research on the highly questionable validity of SEF, institution administrators and the courts continue to make and accept fine numerical distinctions in faculty scores from student evaluation questionnaires to ordinally rank faculty. Even given that SEF is valid to a level accounting for 14% of the variance, it is not psychometrically appropriate to accept such ordinal rankings.
.
..........It should be noted that SEF rate the majority of faculty as above average---whatever this means.
..........Ordinal scales do not tell us if a faculty half way down the scale is only half as good as the top ranked member. Thus without a criterion referenced standard, we have no way of knowing if everyone on the scale is an effective teacher, or conversely an ineffective teacher. Moreover, should all faculty who fall below the statistical "average" be eliminated? And if so, using the same logic, should we rank order and thereby eliminate all Olympic team members who fall below the team average? If the answer is 'yes,' then (a) we eliminate highly functioning athletes, and (b) it leads to an infinite regress where we end with only one or two on any given team. Currently, we have no idea if "statistical average" means good, bad, or indifferent teaching in terms of instructional effectiveness.Use of Qualitative Written Student Comments 15
..........Over and above quantitative data, the use of written comments, often single instances, by students on their SEF forms seems wide spread by both educational administrators, faculty evaluation committees, and the courts.
Summary: For the use of student comments, court views ranges from (33) placing importance on a single comment (34) to several comments as significant information, (35) maintaining that statistical analyses of SEF need to be bolstered by individual comments, (36) maintaining that while some very negative---e.g., racist, sexist---comments may be found, the court may find that they do not render SEF unreliable, (18) that such instances or "impressions" may be validated after the fact, (37) negative comments often seem to outweigh positive ones, and (38) may often outweigh numerical data to the contrary, (39) negative comments need not be verified before acting on them, to (40) that negative comments can not be used to undermine otherwise generally favorable comments received in an annual performance review.
Clearly the views from the court suggest the legitimacy of not only using what is in fact anecdotal data, but often to raise it above more systematic (averaged) data.
.Mixed Student Comments 16
..........Just as quantitative SEF data may be bimodal, so too written student comments may also be bimodal or mixed. How do courts (indeed, educational administrators, and faculty evaluation committees) view and pronounce on such data?
Summary: With regard to non numerically assessed written student comments, they are often qualitatively characterized as (41) a few were ambivalent, (42) a considerable number, (43) of mixed result, and selectively recognized: (44) it would only be fair to add that there were a number of comments in favor, (45) there were also some negative comments, (45) sometimes placing the greater weight on past evaluations of teaching over current comments, (47) sometimes placing greater weight on current comments over past positive evaluation of teaching.
Again, with regard to single and mixed comments on SEF, the courts (administration, and faculty evaluation committees, See Appendix) tend to weigh them far above their non representative and anecdotal-data value.
.
..........It seems to be generally assumed by most faculty and administrators that SEF are used by virtually all schools in the U.S. It is further assumed by many that SEF is necessary for both faculty evaluation of teaching effectiveness and thus for quality control of student learning. While its use is clearly wide spread (see Seldin, 1984; Crumbley, and Fliedner, 1995) in the U.S., and is increasing in Europe (Husbands, and Fosh, 1993), what is not generally recognized is that there are schools that preclude its use in salary, promotion and tenure decision either totally, or in part, by precluding the use of qualitative students comments. 17Transcendent Value of a Professor Over Teaching Quality 18
..........Despite the importance placed on teaching, there is precedent for both school policy and the courts---under certain conditions---to ignore poor teaching as indicted by SEF.
Summary: (11) The courts and educational administrations can not allow low SEF to stand in the way of promoting or retaining professors who may be world renowned scientists, (12) deemed nationally or internationally exceptional as a researcher, courts may nevertheless disregard SEF, (13) at least in these two cases the courts did not find the faculty exceptional. It would be interesting to see if what the court seems to accept in principle exists in fact.
The above collective categories abstracted from court cases are illustrated by a denial of tenure case described in the Appendix below, by a (non litigated) case that contains an interesting difference from most of the cases reviewed here.
Procedural, Burden of Proof, and Policy-Decision Criteria in Assessment of SEF
..........Other overlooked issues involving SEF and its validity are the problems of (a) content versus process, i.e., whether the assessment of SEF data constitutes a process or procedural issue or (b) is simply a content issue.
Validity Assessment of SEF as Procedural or Process Issue
..........An exemplar of the content and the procedural/process distinction is often exhibited between trial and appellate courts. The latter often only judge if correct procedural/due process was followed by a lower court. The content v procedural/due process distinction is typically used by college campus grievance committees. When a tenure committee, for example, renders an unacceptable decision, a faculty member may challenge the decision. A grievance or appeal committee then may review the decision only in terms of if the correct process or procedures by which the decision was made was followed. The point here is that many such appeals committees do not define looking at the procedures by which SEF data were gathered and analyzed by a tenure committee or administrative evaluator as procedure/due process (e.g., whether the tenure committee just 'eye balled' the data and student comments, whether they compared the data to other similar faculty SEF, etc.), but as content and therefore not within its purview. Grievance committees often therefore will not review the substantive content of SEF data on grounds that it is not a procedural or process issue.
..........In general, given the courts tendency to accept the validity of SEF data, at least by default, how SEF data are assessed and used is often considered to not be a process/procedural issue. At least one court has, however, considered how SEF data is assessed and used as procedural. This is evidenced in Christopher Turner v. The President of the University of British Columbia (1993), where it was stated thatthe Dean said, "there were few students in undergraduate literature courses since 1986/7---(3,8, and 6 respectively," thus mistaking student 'response' figures for actual student enrolment. The Board concluded that (5) "This misunderstanding is in our opinion sufficient in itself for a reconsideration, since teaching was the focus..." (p.3), and (7) "we think that the comments and emphasis on the size of Dr. Turner's classes as evidence of poor teaching are open to objection and constitute errors of procedure and/or evidence" (p.6). [italics added]
As noted above, however, it appears that most courts, and indeed, perhaps most faculty grievance committees (See Appendix below) have not considered how SEF data is analyzed as a procedure/due process issue. The issue of the validity of SEF, then, would appear to have legal "due process" implications.
Decision Criteria and the Scientific Precautionary Principle
..........Since SEF has haphazardly evolved along with a general acceptance of its validity as an appropriate measure of faculty teaching effectiveness, the burden of proof somehow has been placed on faculty-as-challengers of such data to scientifically prove that SEF data is not valid--- a strange state of affairs, at least in science. And the standard of proof required has been typically high. In effect, faculty are guilty until proven innocent. So the process that exists is:
- Either (a) a legal abdication of the assessment of SEF by the court, relying on the good faith evaluation of SEF data by the institution, or (b) the court simply assuming its validly.
- Placing the burden of proof on faculty who challenge the data of demonstrating with scientific levels of certainty (statistical significance or confidence level) that the data is not valid.
..........Given---at the very least---the controversial assessment of the validity level of SEF in measuring teaching effectiveness, in terms of decisions and policy perhaps we should err on the side of caution in applying such data for administrative purposes. In the field of environmental science, Lemons (1996) and Lemons, Shrader-Frechette and Cranor (in press) have suggested a Precautionary Principle when making policy decisions. In essence, this principle says that when making policy decisions about environmental harm, given (a) a certain level of possible harm, (b) the complexity/uncertainty of data, and (c) the high level of proof (typically a 95 per cent confidence level) required for a scientific finding to be accepted by scientists, setting policy should not be based on this level of scientific proof. The reason is this: To wait for such a confidence level may be too risky given the level of harm that may be indicated (by the existence of data with a lesser confidence level suggests). In short, using scientific criteria that have been adopted for doing science may often not be appropriate criteria for making policy decisions.
.
..........The reasoning surrounding the Precautionary Principle is too complex to fully delineate here. The reader is referred to the citations. In the meantime consider the following analogy that in broad outline exemplifies the spirit of the Precautionary Principle: A dangerous tiger has escaped from a local zoo a few miles from your house. In the back of your house is a wooded area. Your child wants to go out and play in the woods. No one has actually seen the tiger in the woods or anywhere else around the neighborhood. In other words, there is no scientific level of evidence that the tiger is anywhere around, or that your child would be in immediate danger by playing in the woods. Do you let your child out to play in the woods?
In most areas of science, the rule is to avoid type-I error---asserting there is an effect when there is none, and therefore place the burden on those who postulate an effect rather than on those who postulate no effect---and not so much concerned with avoiding type-II error---asserting no effect when there is one. In adopting SEF data as indicating teaching effectiveness administrators, faculty evaluation committees and the court have engaged in type-I error---given both the level and burden of proof.
..........Now there are two implications for the Precautionary Principle as applied to SEF in relation to faculty and instructional quality. First, given (a) the haphazard way SEF have been introduced and accepted by the courts (b) the level of possible harm of accepting SEF for administrative purposes of salary, promotion, denial of tenure or non reappointment, to that faculty and more importantly (c) the effects of SEF used for such purposes has on the quality standards of higher education (see Haskell, 1997a) should such a burden of proof be demanded by the court of faculty challenging SEF data? Certainly, as shown below in disparate treatment and disparate impact cases, a kind of Precautionary Principle is already in effect. Second, given the at least clearly conflicting evidence of whether SEF demonstrates teaching effectiveness of a faculty, should not administrators and faculty evaluation committees apply, for the same reasons, a similar Precautionary Principle stance?The Court's Approach to Validity of SEF in Relation to the Principles of Disparate Treatment and Disparate Impact 19
..........Given the above findings on how the courts have tended to treat SEF validity issues, I would now like to further look at the implications. Federal courts---and to a lesser degree state courts---have adopted a more stringent approach to testing teacher evaluation cases, at least regarding primary and secondary teachers. According to Rebell, the four main reasons for this change are (1) the wider use of more stringent evaluation techniques by institutions, which largely stem from legislative reform initiatives that have led to an increased number of denials of teacher certification and terminations, (2) a disproportionate number of these certifications and termination involve members of minority groups, (3) legal developments have broadened the jurisdiction of the federal courts to consider issues of social reform, and (4) judges' own increased experience in assessing psychometric techniques in employment discrimination cases. It is perhaps 2 and 3, however, that have had the most impact on the courts (Kaplin and Lee, 1995; Rebell, 1990).
..........Educational reform issues from desegregation, special education, and other school-based litigations, has made the courts more experienced and more inclined to scrutinize educational testing requirements. As the consequence of federal Equal Employment Opportunity Commission (EEOC) criteria, in today's civil rights climate, courts are more likely to scrutinize the validity of the faculty evaluation instrument, especially in terms of racial, gender, and age discrimination.
Disparate Impact
..........In regard to teacher evaluation in general in cases involving claims of discrimination under the equal protection clause of the Fourteenth Amendment, or under the anti discrimination statutes enacted to protect members of racial and ethnic minorities, women, handicap conditions, age, and other protected groups scrutiny of the case tends to be more probing and stringent. Such cases are of two basic kinds: (1) those involving discriminatory intent, called disparate treatment claims, and (2) those involving no intent, called disparate impact claims (see Kaplin and Lee, 1995, section 3.3.2.1.).
..........Disparate impact claims in personnel evaluation is the use of assessment procedures that are facially (on their surface, or methodologically) neutral in their treatment of different groups, but which produce evaluation outcomes that inadvertently fall more harshly on one group than on another. Thus, proof of a discriminatory motive is not necessary to establish a disparate impact claim. To establish a prima facie case of such adverse impact, a minority need only show a causal connection between the facially neutral employment practice and the disproportionate negative or adverse effects on him or herself as a member of a protected group. For example, a university tenure process may be found to discriminate against females because the evaluation process or evaluation criteria favors male faculty more than female. In such cases, rigorous statistical analysis is typically used to establish disparate impact.
..........Discriminatory treatment and disparate impact claims has made courts more inclined to specifically analyze educational testing instruments for validity, and this increased involvement by the courts is predicted to increase. As of 1990, 41 states have mandated some form of standardized testing requirements as part of their teacher certification process Because many of these exams are claimed to have a disproportionate negative impact on minority candidates, competency tests have triggered a number of large scale federal class suits. Again judge Rebell (1990) notes,In June 1988, the United States Supreme Court issued a ruling which is likely to accelerate the trend toward increased judicial involvement in teacher evaluation matters. That case, Watson v. Fort Worth Bank and Trust (1988), extended to judgmental employment practices the Court's 1971 holding in Criggs v. Duke Power Company (1971) that standardized employment tests having a disparate impact on minorities must be shown to be job-related. Although the Court's ruling in Watson was unanimous, there was substantial disagreement among the Justices as to how closely courts should scrutinize particular practices and validation techniques. Whatever the precise standard of review ultimately implemented, there is little doubt that the federal courts will be more likely to scrutinize nonobjective evaluation procedures as a result of Watson (p.339).
Thus any instrument or evaluation criteria that in effect places an unfair burden on those being evaluated has been judged to exhibit what is legally termed disparate impact.
..........The present point is that while the courts have not, and continue to not rigorously scrutinize SEF, they have for sometime now applied fairly rigorous standards to evaluations both in the workplace and in academia to cases involving discrimination of protected groups, whether the discrimination is purposeful, or by disparate impact. Not every indication of racism, however, may be considered by a court to be proof of discrimination.
..........For example, In Yu Chuen Wei v. Vermont State Colleges Faculty Federation (1995), the Labor Relations Board said, "with respect to comments that while some students had written that she was a "slant eyed bitch," and that she should "go back to China....We also are not persuaded that the racism evident in the student evaluations of Grievant made student evaluation results unreliable. The percentage of evaluations in which racism by students was evident was approximately one percent of the total evaluations" (p.306). Assuming some level of covert racism, how does one disentangle the generalized affective racist and sexist overlay of students evaluation on a total questionnaire?20
The Disparate Treatment and Impact Principles Generalized
..........In the evolution of any legal policy or principle its extension often occurs by generalization or analogical transfer, extending a principle thought to apply to only one area to other areas (See for example, Anderson and Schadewald, 1991; Golding, 1984; Levi, 1949; Marchant, Robinson, Sunstein, 1993). Currently both corporate and academic cases of straight forward discrimination and the more inadvertent discrimination cases based on disparate impact often trigger the courts to rigorously scrutinize the methodology and statistical data of such evaluations not typically accorded to non discrimination cases. Presumably, in discrimination cases the court's interest is in establishing validity and using rigorous statistical methods in ascertaining the "truth." If this is the case, then by clear logical implication and inference---as we have seen---in generic cases of evaluation the court could be said not to be in the truth business. As documented above, in non discriminatory cases, courts have assumed the "truth" lay in the appropriateness of an institution's criteria and rarely would substitute their judgments for those of peer review committees, adopting the position that they are not qualified to second guess peer-review committees, at least as long as committees do not act arbitrarily and instruments are consistently and fairly applied. The burden of proof is on the faculty challenging an institutional decision. Some courts have only been concerned with consistency and fairness of application, even if the methods of evaluation are clearly defective.21 Generally, however, the courts have acted as though they believed that institutional evaluations were made only after careful deliberation and with procedural due process protections.
..........The question is: why not make the same assumptions regarding discrimination? The answer is that, understandably, the courts have accepted that there has existed a widespread conscious and non conscious ethnic, gender, age , religious belief, sexual orientation, and handicap bias in society, such that they can not simply rely on the "truth" or good faith behavior of an institution or its data. Given this, the argument is made that herein lies the distinction, and reason, for treating non discriminatory cases differently from cases where either discrimination has been charged (treatment claim) or where discrimination has be inadvertent (disparate impact claim).
..........Thus the courts have tended to accept the judgement and "good faith" motivations of organizations. Unlike in the past, however, just as the data are in regarding discrimination of protected groups in academia, so too the data are now sufficiently in to cast serious doubt on the courts assumption of "truth" residing in corporate and academic data on discrimination, so too is it in on (a) the questionable validity of SEF, (b) the internal politics of administration and faculty relations which can revolve around student retention and unpopular ideas, and (c) the economic pressures on institution to not tenure faculty and to sometimes terminate tenured faculty, all of which can have serious contaminating consequences institutional decisions.
..........The importance of this is that while courts have scrutinized SEF for evidence when civil rights discrimination has been questioned or suspected, they have not applied the same rigor to the validity of evaluation instruments or have held as suspect other institutional biasing variables. The courts continue to assume a kind of pre 1960s academic Camelot. If such a round table of academic knights ever did historically exist or was merely mythical, it certainly now exist only in myth.
..........One compendium of legal findings in higher education specifically notes SEF and recognizes the accepted application of the principles of disparate treatment and impact along racial and gender lines in SEF. It should noted that the disparate treatment and impact issues when applied to SEF, is of course, no different than any other disparate impact case, except that the student evaluation data in such cases will be scrutinized by the courts. The authors (Baez and Centra, 1995) suggests that the SEF research in the area of race and gender discrimination, has been inconsistent, and suggest that while deserving of more attention, the inconsistency of the research makes it unlikely that the courts will sustain such a claim. Some courts, however, have found in favor of faculty in such cases. For example,In Cynthia J. Fisher v. Vassar College (1995), after a bench trial, the district court found that, in denying Fisher tenure, Vassar had discriminated against her by reason of (a) her sex in violation of Title VII of the Civil Rights Act of 1964, (b) her age in violation of the Age Discrimination in Employment Act. The court found that the termination of Fisher's employment resulted not from any inadequacy of her performance, qualifications, or service, but rather from pretextual and bad faith evaluation of her qualifications. Scrutinizing Vassar's report on Fishers teaching ability which included reviews of her student evaluations that were said to reflect "consistent problems with clarity and her ability to illuminate difficult material" but which were otherwise generally positive. The district court found that the Vassar's biology department had distorted her teaching recommendations by "selectively exclud[ing] favorable ratings," by selectively "focus[ing] on the two courses in which she had difficulties" and by "applying different standards to her than were applied to other tenure candidates" (Id. at 1209). The court further observed that "the males tenured while Dr. Fisher was on the faculty were praised for their fine teaching while Dr. Fisher was criticized, although the facts on which the Committee's determinations were based (student evaluations, Biology Majors Reports and [Student Advisory Committee] reports) revealed that Dr. Fisher's evaluations were superior to theirs" (Id. at 1211). The court noted that statistical analysis may be a part of a plaintiff's effort to establish discrimination under a theory of disparate treatment.
The point here is that if this had not been a disparate treatment discrimination case the biases and distortions of data about her teaching student evaluations would likely have gone un examined. 22
..........It would seem, then, that this discrepancy in the discrimination-based search for "truth" should be used as---and provide justification for---a kind of generalized disparate impact principle to legally invoke or generalize a fairness principle that the same rigor be applied to non civil rights cases such as SEF. As Kaplin points out, however, current law generally prohibits courts from such generalization. 23 So the issue of change apparently becomes not so much one for the court as it is a policy issue for both higher educational administration to use as a guideline and for legislatures to legislate change.24
..........Age discrimination in SEF is another possible bridge in this potential extension of disparate impact. The Age Discrimination in Employment Act of 1967 (ADEA) requires employers to evaluate persons on their qualifications or ability to competently perform their job, and not on the basis of age. Like any other employer, colleges and universities are likewise prohibited from considering a faculty member's age in making decisions about employment, salary increases, promotion, tenure, and retention. Yet, there is evidence that SEF do discriminate on the basis of age, with older faculty receiving lower student ratings (Feldman, 1983). There are a host of other variables like class size, or teaching a courses within a student's major as opposed to elective course, or teaching freshman v.s upper level students, that also make a kind of default "disparate impact" if such variables are not controlled in the analysis of SEF data.
..........What is being suggested here is that in the interest of justice, equity, truth, and in "fact finding," the courts and institutions should scrutinize all SEF data as rigorously as they do disparate treatment and disparate impact cases. Currently data and conclusions from SEF are seldom scrutinized (as indeed are other issues in the denial of tenure or promotion not equally scrutinized) as they are in discrimination and disparate impact cases. Justice, however, is not only blind to ethnic, gender, age, sexual orientation, religious belief, and handicap status, it is blind to institutional economic pressures and other biasing variables within academic institutions. Thus, biases and distortions of the SEF data are not revealed in non discrimination cases as they are in disparate treatment and disparate impact cases. As a consequence, in terms of revealing unfair attributions based on SEF data, those covered under EEOC guidelines have a "truth finding" advantage over those who are not covered.
Beyond Statistical Significance of SEF Research
..........Having reviewed SEF cases and examined the significance of validity, I would now like t to turn the issue of validity on its head. Underlying statistical research on SEF that attempts to establish its validity is a complex of contextual variables and assumptions seldom addressed.25 In this section, I will address some of these contextual variables and assumptions that I suggest cut through and render the best of statistical research on SEF showing teaching effectiveness nearly irrelevant. Understanding is not acquired by statistical significance alone. Certainly showing statistical validity of SEF is a necessary condition, but it is not a sufficient condition for understanding their meaning and for its use in administrative decisions. It is an understanding of these contexts and assumptions that underlie statistical validity research on SEF that educational policy-makers and the courts need to think long and hard about accepting SEF for assessing instructional competence and using it for promotion, tenure, and reappointment decisions.26
Assumption # 1: Statistical Significance of Indicators of Teaching Effectiveness
..........An assumption underlying statistical analyses of SEF is that we know what the indicators of effective teaching are. To my knowledge, the research does not support this assumption. What makes us so sure that many of the questions we ask on SEF questionnaires are all that related to effective student learning. Consider, for example, the typical question "Was your instructor organized?" This question in turn entails a myriad of assumptions and conditions about effective teaching. Would Socrates, for example, be perceived as organized by most students---being peripatetic and just asking a lot of questions? And what makes us think-----at least for some students and some kinds of subject matter---that just going into class, being Socratic, asking a lot of provocative questions, and confronting students by challenging their belief systems may not be the most effective instructional and learning method in the long run to get students engaged and to think critically?27 What evidence is there that either being perceived as organized or actually being organized is a necessary condition for effective instruction? I know of no rigorous supporting evidence. Indeed, many of my friends in the humanities, much to the dismay of my behaviorist and cognitive colleagues---and sometimes myself---would suggest that systematic and sequentially structured teaching methods are simply structural analogues of our technological society (see, for example the classic by Jacques Ellul (1964).
..........Consider, too, a question that, while it is not directly asked on SEF questionnaires is implied in other questions in various forms, inquiring "Does your instructor mainly lecture?" Though there is precious little rigorous evidence showing that lecturing is inherently an ineffective teaching method, it is clearly persona non grata among many educational theorist. Lecturing is "out" while collaborative learning is "in"---but apparently not so considered by many faculty (for both valid and invalid reasons).
..........While I happen to agree that being organized is generally good, and that collaborative learning is perhaps good for certain student populations, subject matters and desired outcomes, the question is: are they appropriate indicators of effective teaching applied to individual faculty as claimed? The answer to this question is they are not appropriate indicators of effective teaching applied to individual faculty---and this applies even if the statistical research strongly supported the claim. This is an important point that, as I recall, is addressed in the faculty evaluation literature only by Scriven (1988). I will quote Scriven at some length.
..........Scriven observes that in the attempt to render teacher evaluation more scientific the field rushed into focusing on research-based indicators, teaching indicators which sound research supposedly demonstrated are positively correlated with successful student learning. These indicators orPopular envies are structured presentations, active involvement, emphasis on positive reinforcement, high eye contact, high frequency of question asking, provision of learning objectives, frequent feedback, use of multi-media (p.4)....the provision of a brief outline of topics to be covered in a day's lesson can be justified on administrative grounds, since substitute teachers must get some guidance; but the requirement that anything like that be provided to students, for pedagogical reasons--a claim often said to be supported by research---cannot be justified. The use of instructional objectives or any other kind of advance organizer is simply a characteristic of ones style of teaching, not a duty of the teacher. Nor can such an outline be required as evidence of preparation (arguably a duty), since a teacher using a textbook--or for that matter, memory--may do as well or better than one with lengthy lesson plans listing activities and testing procedures (p.7-8).
The presence or absence of these factors, says Scriven, defines a style of teaching. He maintains that any reference to a 'teaching style' in teacher evaluation is not valid, regardless of whether there exists a research basis for thinking the style is correlated with teaching effectiveness.28 He goes on to explain:
Scriven is not denying the validity of statistical inference. Useful information is contained in a statistical correlation, and there are circumstances in which that information can be put to good use. It can even be put to good use in making decisions about people---but only when no better data is available because of limitations on time or resources. 30 Scriven maintains that such teaching effectiveness indicators are invalidA major source of confusion in discussing the use of indicators is that the research is often presented as showing that 'the best way to teach' is by using high eye contact (or whatever), whereas all it really shows is that there a slight tendency for better teachers to exhibit this characteristic, for reasons which might include the fact that they were taught to use it, although in fact it's not a help at all. The reader is seduced by the relative plausibility of the style recommendations, whereas you'd never buy the idea of using eye color or skin color. But plausibility isn't necessity, and absent necessity, you're just a stylist Our kids don't need stylists, they need good teachers; and if you can't distinguish the two, you're in the wrong business (p.7).29
for essentially the same kind of reason that the evaluation of personnel by the color of their skin or their church affiliation is necessarily invalid. While it is true that much racial prejudice, sexism, etc., is based on false beliefs about the groups discriminated against, the essential flaw in it goes deeper than that. The essential flaw is that even if women in general are less strong than men, you shouldn't. use gender to discriminate against a particular candidate for a position as a luggage-handler, but only a job-related strength test or series of observations in a trial period on the job. And this is nor just for ethical/legal reasons, but also for scientific reasons and reasons of efficiency (p.4)....Which means you can't discriminate against a teacher on the grounds that s/he exhibits some approach to teaching that research has shown is less likely to be successful. Whites are statistically less likely to be good basketball players than blacks, but you can't kick the whites off the squad the day you discover that the statistics are worse than you thought. nor would you be any good as a coach is you used skin color as a criterion for selection. You have to look at the individual's success, not at the success of groups to which the individual belongs (p.4).
Finally, Scriven suggests a reason for the almost total disregard of the validity of SEF by the courts documented in this paper (and my previous paper, Haskell, 1977b). He understands the implications for courts recognizing the fallacy of such indicators: He says, The current fallacy of using such statistical-indicators are,"as certain to crash in the courts---eventually---as the most blatantly racist hiring practices. We may have only a short breathing space before the courts and defense attorneys begin to see the underlying similarity of these two approaches....The consequences for states and districts will be chaotic; old decisions may be reversed on appeal, huge damages may be awarded, those hearings will clog the system, and there will be no legitimate process to take the place of the illicit one (it is because of this potentiality for disaster that we are giving a longer than-usual treatment of the issue here) (p.5).
Assumption # 2: Statistical Significance of SEF of Teaching Effectiveness Measures Appropriate Learning
..........An assumption that is virtually unnoted in the literature is that given SEF is eventually found to measure teaching effectiveness---and this "given" is only for the sake of the current argument---it is assumed that what is thereby being measured is appropriate learning. This assumption is arguably incorrect for at least two reasons. I say it is arguably incorrect, as whether the assumption is correct or not depends on other differing assumptions about higher education.
..........First, let us not fool ourselves into thinking that we know what effective teaching is for all populations of students and subject matters. There is no shortage of possible indicators of effective instruction and learning, but most are not articulated within an adequate theory of effective instruction or learning. At the very least, "effective" is relative to a given student population. And when referring to teaching effectiveness are we referring to measuring short term or long term learning?31
..........In addition, as Abrami (1989) and others (see Cohen, 1983) have suggested, most studies on the relationships between student ratings and instructor-generated student learning have been done with learning outcomes collected largely from freshman classes, and---more importantly---learning at the lowest level of Bloom's taxonomy. Similarly, the literature on transfer of learning shows that when student transfer of learning is found, it reflects the lowest level of concrete transfer. So even if we are effective in achieving this level of effectiveness, what have we achieved? This brings me my main point.
..........I suggest that teaching effectiveness and appropriate learning in higher education are two different logical and empirical entities. I shall now address these two differing assumptions together. If the data showing (a) student level of unpreparedness, (b) student ability level as measured by most national tests, (c) unrealistic student expectations about learning, (d) grading, (e) feeling of entitlement, (f) motivation level, (g) good faith motivation for evaluating faculty, (h) maturity level, and (i) hours spent studying have been either in decline for years, or have become increasingly inappropriate is accepted, then effectiveness in teaching most of these students does not necessarily---and most likely does not---mean appropriate learning. For purposes of clarity (and at some risk of seeming not only insensitive, but as a right wing radical, which I assure the reader I am not), let me demonstrate why teaching effectiveness is separate from appropriate learning by using what may be considered an extreme scenario as an example: Suppose that the American Disabilities Act as applied to higher education is amended to include having to admit the mentally retarded, thus requiring making whatever instructional adjustments need to accommodate their disability.
.......... Now assume that such adjustments are made, e.g., speaking slower, simplifying and otherwise decreasing the amount of content to be mastered, along with the depth of understanding and critical thinking. In addition, assume that if such adjustments and other classroom behaviors that were once apporpriate for a previous level of student are not accommodated and that this is reflected in low SEF score. Now assume that because of pressures such adjustments have been made and that SEF findings for those teaching the disabled students unequivocally shows teaching-effectiveness. The question then becomes: is this appropriate learning for a higher education course? 32
..........Most will likely respond to this question with a resounding "no." Some, on the first assumption noted above may say "yes." Some will respond by maintaining the above scenario is extreme and inappropriate. The fact is, however, that this scenario is simply a quantitative extension not qualitatively different from what has been occurring in the lowering of admission and course requirement standards that has been occurring for some time. So the question now becomes, not simply teaching effectiveness but teaching effectiveness at what level of learning, and by implication, academic standards. This is an issue that needs to be addressed nationally by faculty. Being well versed in logic as well as statistics, Scriven (1988), of course, understands this. In a similar context he notes,And herein lies the ghost in the machine of most statistical validation studies of SEF---at their very best: There is nothing wrong with the statistics only with the meaning of what they are purportedly measuring. Thus the problem is not a flaw in the data or the measurement instrument, but a flaw in the measurer.It's not even true that 'it all boils down to how much the students learn from the teacher': if it did, the teachers of mentally-retarded students would automatically be the worst teachers. In fact. they are often much better teachers than those teaching smart students, because smart students survive bad teaching better. (How many of the research studies naively treated "amount learned" as the criterion against which they "validated" the indicators?) p.7
..........Finally, to conclude this section, the implications for SEF in general and for the issue of validity seem clear. The issue of validity of SEF, then, is not the primary issue it appears to be, and serves inadvertently to hide the significant issue of academic standards.33 I will address this issue in relation to academic freedom and academic standards in more detail in my final paper. 34
Conclusion
..........From most of the above cases---even given that, as challengers, the burden of proof has been on faculty --- it seems clear that the courts have not been kind to faculty with regard to student evaluations.35 Some clearly see the courts various involvements in academic matters as detrimental to academic freedom. Arguably, rulings do often seem to shape it in inappropriate---and not so arguably---inconsistent and contradictory ways. "It is not clear, however," suggests Rebell (1990b), "that increased judicial involvement will have such a detrimental impact. In some measurement situations, courts have exhibited a sophisticated understanding of the complex judgmental factors at stake, and their insistence on thorough-going implementation of improved, fairer assessment devices has enhanced, rather than impeded, the development of professional standards" (p.340). He goes on to point out that, "because the state of the art concerning teacher-evaluation practices is at a sensitive developmental stage, extensive court intervention at this point can substantially influence---for better or worse---the future direction of basic practice in the field" (Rebell, 1990b, p.344). Thus whether increased judicial intervention in faculty matters will have a positive or a negative impact on professional evaluation practice depends on providing the courts with appropriate psychometric data and other scientific procedures.
..........Given the above rulings and the courts propensity to accept faculty/institutional agreements, it would seem as Kaplin and Lee advise, regarding academic freedom that "it is especially crucial for institutions to develop their own guidelines on academic freedom and to have internal systems for protecting academic freedom in accordance with institutional policy" (p. 192) would be especially true for a detailed SEF policy, especially including how the data is to be assessed.
..........The fourth and final paper will address the implications of court reasoning and rulings for academic freedom, standards, and instructional decisions.
Notes
1.
Address correspondence to: Robert E. Haskell, Ph.D., Professor of Psychology, Department of Social and Behavioral Sciences, University of New England, Biddeford, Me. 04005. Email: rhaskel1@maine.rr.com. I would like to thank Professor John Damron, of Douglas College for continually providing me with sources, support, and advice, and especially Professor William A. Kaplin, School of Law, Catholic University of America for his invaluable legal counsel and for reading a draft of this paper. Interpretive liberties with the legal material and any other problems and omitted legal nuances are my responsibilities.
[BACK to document]2.
As with my second paper (Haskell, 1997b), the focus here will be delimited to how the courts reviewed have addressed SEF issues within various legal challenges to the denial of academic freedom, tenure, promotion, and reappointment by institutions of higher education. There are multiple legal variables that define an action or influence an outcome in a particular case. Among them are the statutes or other sources of law being applied, the cause of action being asserted, the prescribed prima facie case, the allocation of burdens of proof, and the standards of judicial review (see, e.g., Kaplin and Lee, 1995, section 1.3 & section 1.4.3.6). For my purposes here, I will not be concerned with these variables. Accordingly, this paper will neither be concerned with the outcome of the legal rulings, nor with the complex legal reasoning on which the rulings were based. My purpose is to review the general reasoning of the courts on SEF from a "reasonable man" standard and from a policy point of view.
[BACK to document]3.
To the layman, legal rulings regarding SEF are a veritable thicket, often seeming that the use of context to differentiate one apparently similar case from another functions as a kind of ad hoc carte blanche to justify preconceptions and positions.
[BACK to document]4.
A largely neglected---or ignored---important function of education is its social function. Education is not just for the benefit of the individual but for the benefit of society. Like it or not, we in higher education have accepted the social function of certifying competence of our students entering into an increasingly complex world. The certifying function has become especially important since the introduction of vocational programs into university curricula.
[BACK to document]5.
..........In Johnson v. University of Pittsburgh (1977), the court said (7) "We have repeatedly approved the use of statistical proof where it reached proportions comparable to those in this case to establish a prima facie case of racial discrimination in jury selection cases . . Statistics are equally competent in proving employment discrimination. We caution only that statistics are not irrefutable. They come in an infinite variety and, like any other kind of evidence they may be rebutted. In short, their usefulness depends on all of the surrounding facts and circumstances" (8) The court further said in Footnote # 20: "Considerations such as small sample size may of course detract from the value of such evidence" " (p.1361).
5...........In Peters v. Middlebury College (1977), it was maintained that (5) "A professor's value depends upon his creativity, his rapport with students and colleagues, his teaching ability, and numerous other intangible qualities which cannot be measured by objective standards" (p.860).
5...........In Fields V. Clark University (1987), the court noted that (10) Fields' "attacks" the university's use of her student evaluations because they were not gathered and evaluated according to accepted standards of scientific polling procedures. In response, the court agreed, saying, "She is probably correct. The use made of the student evaluations in her case, however, followed the practice at the defendant's university in other tenure decisions" (p.671).
5...........In Cynthia J. Fisher v. Vassar College (1995), the court noted that (7) "statistical analyses may be a part of a plaintiff's effort to establish discriminatory treatment" (p.1209).
5...........In Yu Chuen Wei and the Vermont State Colleges Faculty Federation (1995), the court ruled that (4) "The Court need not consider the accuracy of these administrative determinations, and that (24) tenure criteria "are not drawn with mathematical nicety." The board further ruled that (25) "the Dean and the President, both reviewed Grievant's student evaluations carefully. Their failure to take it a step further, and perform a statistical comparison of Grievant's student evaluations with those of other faculty members who have been granted tenure was not arbitrary and was reasonable; (26) Such a comparison is nowhere required by the Contract, [and] (27) we decline to hold such an involved comparison is necessary before a reasonable tenure determination can be made" (p.311).
5...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the court concluded (38) "that the instrument was not perfect, that it had flaws, and that the very limited number of samples (because of the very limited number of courses and students surveyed over the period) impaired its reliability. (p.30). (39) "However, we accept the evidence of Dr. [X] that the instrument has some value, directed toward the specified factors. The court noted that (28) "One problem with the questionnaire is that it solicits bad points as well as good points. Despite that caveat, we conclude that the inclusion of the qualitative comments was not a significant error" (p.32).
5...........In Robert Kramer v. The President of the University of British Columbia (1992), the Board said (19) Given certain Departmental procedures, "there is a danger that some negative class commentary will dominate the discussion and will not be the 'independent' opinion of all of the students. (20) This is especially true in the context of the direction to assess "effectiveness" versus "popularity" (p.10). They further noted, (18) Given that "There was no peer review at all; no member of the Department audited any of Dr. Kramer's lectures. There was, therefore, nothing to guide the Department but the student comments," and "no way to test the accuracy or fairness of the undoubtedly disturbing comments in Asian Studies" (p.10).
5...........In University of Regina Faculty Association v. University of Regina (1993), The Board argued (6) that "the University was under an obligation to verify negative comments before acting on them" (p.4).
5...........In Christopher Turner v. The President of the University of British Columbia (1993), the Board said, (7) "while not ignoring some student unhappiness with Dr. Turner's teaching style, we think that the comments and emphasis on the size of Dr. Turner's classes as evidence of poor teaching are open to objection and constitute errors of procedure and/or evidence" (p.6).
[BACK to document]6.
This is an important area but will not be dealt with here because student achievement scores as a measure of teaching effectiveness is almost exclusively used on the secondary level of education.
[BACK to document]7.
..........In Dyson v. Lavery (1976), the court found that despite questionable errors it concluded that administrative judgements were acceptable because, "they were sincere and grounded on some evidentiary basis" (p.111); and (5) "In the absence of a finding that same were sexually motivated, the administration's professional judgment must be respected" (p.111 all italics added).
7...........In William Sypher v. Vermont State Colleges Faculty Federation (1982), (7) sufficient evidence exists from which the Dean and President could have reasonably concluded Sypher was not above average in his teaching effectiveness; (8) the Board went on to say that if they adopted the Colleges' view that Sypher was not reappointed because of his teaching effectiveness, no argument advanced by him defending his teaching was likely to persuade the President because his decision was made on the "vigor and variety of student criticisms" (p.135).
7...........In Carley v. Arizona Board of Regents (1987), The court ruled (18) the University president was free to consider factual findings made by minority members of the academic freedom and tenure committee and any other evidence which he found relevant in determining whether to deny renewal of teaching contract to non tenured instructor. The president was not bound by factual findings made by majority members of committee (P.1103).
7...........In Yu Chuen Wei v. Vermont State Colleges Faculty Federation (1995), it was noted that (28) The Dean and the President obviously had much experience in reviewing student evaluations, and could reasonably draw on that experience in each tenure review. (p.311); judgements "were not arbitrary or capricious and were exercised honestly upon due consideration,"....that Deans and Presidents have "much experience in reviewing student evaluations, and could reasonably draw on that experience" (p.311).
7...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the court said, (40) The relevance and quality of the scores are "a matter of weight for the various decision-makers, and we assume that they were reasonably aware of the limitations of student evaluations and gave them the weight they deserve" (p.30).
7...........In Robert Kramer v. The President of the University of British Columbia (1992), the board concluded, "In the final analysis, we feel that this review of the Head's comments on teaching, which would be the sole evidence upon which the Dean and the President could rely, shows that it was incomplete and might have been misleading" (p.12-14).
7...........In University of Regina Faculty Association v. University of Regina (1993), he Board said teaching was wrongfully evaluated, but upheld denial of tenure on grounds of inadequate scholarship.
7...........In Christopher Turner v. The President of the University of British Columbia (1993), The board concluded that (11) "there were sufficient errors of procedure and/or evidence to return the case for reconsideration" (p.11).
[BACK to document]8.
In Lieberman v. Grant (1979), Lieberman attempted to introduce approximately ten personnel files concerning the tenure proceedings of other faculty in the English department for comparison. (6) Recognizing that such evidence would have had some minimal probative value, the Court, exercised its discretion under Fed. R.Ev. 403, and excluded it on the ground that "such probative value would be substantially outweighed by the delay and waste of time, which introduction of such evidence would have necessarily entailed....The plaintiffs case without such evidence seemed almost interminable, consuming 52 trial days over a two-year period. That is long enough" (p.873).
8...........In Fields V. Clark University (1987) notes but does not admonish the non separation of student remarks from small seminar courses and those from large lecture classes.
8...........In Cynthia J. Fisher v. Vassar College (1995), the district court found (2) that the biology department distorted Fisher's teaching recommendations by (3) "selectively exclud[ing] favorable ratings," by "focus[ing] on the two courses in which Dr. Fisher had difficulties" and (4) by "applying different standards to her than were applied to other tenure candidates" (p.1209).
8...........In Yu Chuen Wei v. Vermont State Colleges Faculty Federation (1995), it was noted that (19) "The statistical comparison demonstrates that Grievant was evaluated higher by students than her [male colleague] with respect to upper level classes, but that (20) [male colleague] was evaluated higher than Grievant in lower level classes. Given (21) this "mixed" result, the statistical comparison of evaluations does not demonstrate by a preponderance of the evidence that Grievant's students rated her the same, or better, than [male colleague]" (p.305). Wei maintained that (16) her students rated her the same or higher than the male colleague's students rated him. The Board disagreed, saying, (19) "We note that the comparison offered by Grievant is somewhat weak since [male colleague] was tenured in 1988, and those student evaluations of his which were compared with Grievant post-dated his tenure review by a number of years...further saying, "we decline to hold such an involved comparison is necessary before a reasonable tenure determination can be made" (p.305).
8...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the Board noted that (19) the reviewing faculty held in-class discussions about his teaching.
8...........In Robert Kramer v. The President of the University of British Columbia (1992), Kramer argued that the most significant mistake was the failure to consider all aspects of his teaching. For example, only his teaching in 1989-90 was considered, whereas (9) he had taught a wide range of courses over the previous three years (10) had three new courses that year, (11) plus a graduate course. Moreover, (17) The department head indicated that his teaching was not up to the departmental "standard." The standard appeared to be the performance of the tenure-track faculty, though Kramer was one of the most junior faculty members (p.8). (15) Only one of the more than thirty numerically rated questions was used: "Rate instructor bad to good." (16) While a number of negative student comments were quoted in the department Head's letter, there were a number of very positive comments, and these were not mentioned at all.
8...........In Christopher Turner v. The President of the University of British Columbia (1993), the Dean said, "there were few students in undergraduate literature courses since 1986/7---(3,8, and 6 respectively," thus mistaking student 'response' figures for actual student enrolment. The Board concluded that (5) "This misunderstanding is in our opinion sufficient in itself for a reconsideration, since teaching was the focus..." (p.3), and (7) "we think that the comments and emphasis on the size of Dr. Turner's classes as evidence of poor teaching are open to objection and constitute errors of procedure and/or evidence" (p.6).
[BACK to document]9.
Given the extensive variation of rulings on SEF cases, from the perspective of a non legal professional it seem that legal reasoning carries the use of contextual analysis and variables to an extreme, making it possible---and justifiably legally---to rule just about anyway a court wants to rule. The logical extension of such reasoning would lead to each case being unique and nonsignificantly related to any other case.
[BACK to document]10.
..........In Johnson v. University of Pittsburgh (1977), the court noted that (10) "It has also been pointed out that in some cases difficult courses have to be given to the students and the material is such that it is difficult for even the best teacher to get it across.
10...........In Carley v. Arizona Board of Regents (1987), he (7) characterized his professional style as being a "demanding teacher contrary to some student expectations," (8) Because of this, he maintained his popularity suffered and resulted in low student evaluations, (9) examination of his student comments indicated that Carley was correct in his assessment as 61% (49 out of 80) negative student comments focused on these values. The court ignored these findings.
10...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), it was noted that (21) While the knowledge, interest and enthusiasm of Dr. MacLean were acknowledged, "the problem appeared to be one of style or personality."
10...........In Robert Kramer v. The President of the University of British Columbia (1992), the Board noted that (26) It was obvious that almost all of the classes were upset about an examination which was considered more geography than Asian Studies, and (27) they didn't like the marking. (28) They also felt the workload was far too heavy for an "introductory" course. The Board apparently only noted this variable.
[BACK to document]11.
..........In Johnson v. University of Pittsburgh (1977), the court said, "It is also obvious that the court and the administration of universities cannot permit students to exercise a veto over professors who may be world renowned scientists and yet if the students rate them unfavorably can be terminated at any time because of unpopularity" (p.1366-7).
11...........In Carley v. Arizona Board of Regents (1987), he (8) he maintained his popularity suffered as reflected in his low student evaluations
11...........In Robert Kramer v. The President of the University of British Columbia (1992), he maintained that (14) Student evaluations were considered from the standpoint of his popularity, not his effectiveness.
11...........In Brian Maclean v. President of The University of British Columbia (1991), (35) The Faculty Agreement specified that "Evaluation of teaching shall be based on the effectiveness rather than the popularity of the instructor." Courts have ruled in various directions on this issue.
11...........In Robert Kramer v. The President of the University of British Columbia (1992), the board noted (21) "As for the 'popularity vs. effectiveness' debate, a discouraging or hostile attitude is a part of effectiveness as much as it is of popularity" (p.8).
11...........In Christopher Turner v. The President of the University of British Columbia (1993), the Board ruled, (8) while popularity is not competence nor effectiveness, to the extent that it encourages students it has some relation to both" (p.7).
[BACK to document]12.
There may well be research showing that being a popular teacher affects learning on elementary and secondary levels of education, I know of no such rigorous research on the post secondary level. In my view, one of the problems is that all too often we automatically transfer findings from elementary and secondary levels to higher education.
[BACK to document]13.
..........In Johnson v. University of Pittsburgh (1977), the court noted that it (5) "has placed little reliance on students' surveys....students in a given course rating a teacher, or professor, some of them as excellent, others as terrible and in between, many who say passable, mediocre etc.... we cannot say it was unreasonable for the tenured faculty to consider this along with other matters" (p.1359). (8) "It is also obvious that the court and the administration of universities cannot permit students to exercise a veto over professors who may be world renowned scientists" (p.1366-7). A similar view was expressed in Yu Chuen Wei v. Vermont State Colleges Faculty Federation (1995).
13...........In Peters v. Middlebury College (1977), the court gave some weight to an administrative devaluing of a set of positive student evaluations of a faculty that said (2) "The department chair sent a letter to the president of the college, saying, " The course of action I recommend is not likely to be popular with students who, though they in part recognize her intellectual limitation, are warmly responsive to her enthusiasm, energy, openness and ready human concern" (p.860).
13...........In Carley v. Arizona Board of Regents (1987), the court said, (23) "Carley has cited no authority that relying primarily or solely on student evaluations would be impermissible. We have found none" (p.1105, italics added).
13...........In Guam Federation of Teachers v. The University of Guam (1990), the Guam Federation of Teachers challenged the use of SEF in tenure and promotion decisions (Blum, 1990). The Board (1) ruled to remove anonymous student evaluations from professors' tenure files, (2) The union said the use of SEF violated the union's contract with the university, (3) which provides that anonymous documents or those "based on hearsay" should not be included in a faculty member's file, (4) The court further ruled that (5) students should be made aware of the purpose and ramifications of their evaluations, and (6) anonymous student evaluations should not be used.
13...........In Robert Kramer v. The President of the University of British Columbia (1992), the Board noted that (18) "The most important perceived error in the teaching evaluation, in the opinion of the Board, is the reliance solely upon the student evaluations and written comments for the 1989 course evaluations. There was no peer review at all; no member of the Department audited any of Dr. Kramer's lectures" (p.10).
13...........In University of Regina Faculty Association v. University of Regina (1993) a Canadian Arbitration Board ruled that (3) "With respect to teaching, it is our opinion that the evidence of unsatisfactory performance is very weak indeed ...It is important to note that the basis of the comments, particularly the negative ones in the fall of 1992, were written student assessments... [and] Although these assessments are expressly recognized in Art. 17.19 of the collective agreement, to base important career decisions on them only does not seem justified" (p.4). The Board further ruled (4) that tenure decisions could not be based solely on assessments which were completed by students who had never been made aware of the ramifications of their statements. (5) [I]f evaluations are to be used for serious career development purposes those completing them should be aware of the potential consequences of their participation" (p.4) (8) "To base serious career decisions narrowly on student evaluations is not to be encourage... (9) If teaching is to be seriously evaluated for career purposes, whether for positive or negative purposes, it seems incumbent upon Faculties not to rely only on classroom administered evaluations but to broaden the base of assessment" (p.4).
13...........In Christopher Turner v. The President of the University of British Columbia (1993), the Board ruled, (9) while the [Faculty Association] Agreement permits, but does not mandate either student reviews or peer reviews, and the methods of assessment 'may vary', we do conclude that the reliance placed on these very limited student reviews must have been great, since there was no other evaluation referred to. Where there is no other evidence sought, student comments will have an apparent importance and credibility that they may not deserve... (10) We would strongly recommend peer review in the reconsideration which we are requiring" (p.7). The board further noted that (8) "This board has been asked on a number of occasions to pass judgment on the relevance of student evaluations to the [Faculty Association] Agreement criteria for good teaching. Good teaching is an elusive concept. Students may not be good judges during a course; their judgment might be quite different several years later in life. (p.7).
[BACK to document]14.
..........In Dyson v. Lavery (1976), a student evaluation ranked her 46th of 48 teachers.
14...........In Lieberman v. Grant (1979), the court noted (4) a compilation of student ratings showed that the cumulative ratings for members of the department ranged from a low of 4.09 to a high of 8.95. She had a cumulative rating of 7.06, which ranked her 12th out of the 15 junior faculty members. The 7.06 figure included the ratings from a previous semester in which the plaintiff received a rating of 8.18. Prior to this rating in the spring of 1972, the plaintiff's cumulative rating was 6.7.
14...........In Carley v. Arizona Board of Regents (1987), it was noted that (1) of the 13 faculty in his department of art, he was ranked fifth, (2) by his chairman he was ranked 7th, (3) student evaluations, however, ranked him last: 13th of 13 (p.1105).
14...........In Robert Kramer v. The President of the University of British Columbia (1992), the court noted (24) scores in the other two courses were higher---3.45 in one, 3.91 in another, against a "faculty average" of 4.22. The board further noted, "In the result, one got a 2.82 and one got a 3.07...the difference is statistically invalid in any event" (p.10).
[BACK to document]15.
15...........In Dyson v. Lavery (1976), the course said (1) "A number of students apparently had voiced displeasure over the quality of her class preparation and presentation" (p. 111 (3) "These impressions" said the court, "were largely confirmed after the initial decision to not rehire her had been made, by a student evaluation that ranked her 46th of 48 teachers in the Business Department" (p.111, italics added).
15...........In Johnson v. University of Pittsburgh (1977), the court said, (3) "we have the instance referred to in Finding 27 (p.1359, italics added).
15...........In Lieberman v. Grant (1979), the court noted (3) based on complaints received from "several students," to the effect that Lieberman's interest in feminism caused her to ignore other themes in literature (p.873, italics added).
15...........In William Sypher v. Vermont State Colleges Faculty Federation (1982), (1) some of the student comments noted that, "When students try to disagree he shoots you down and tries to degrade you in front of the class," (p.115), while others said, "encourages student participation as much as possible... encourages student to express their ideas freely and not worrying how 'dumb' it may sound...always wants you point of view." (P.115) (2) With regard to the numerical ratings, the Board's opinion was that (3) "regardless of a strong majority of students' rating his teaching as above average, (4) the existence of a significant minority of students feeling degraded, humiliated, and embarrassed can reasonably lead an evaluator to question a teacher's effectiveness" (p.115).
15...........In Yu Chuen Wei v. Vermont State Colleges Faculty Federation (1995), the Board said, (22) "the statistical comparison does not take account of the comments made by students on the evaluation forms. Grievant's student evaluations are striking in how often mention is made of Grievant's communication difficulties, particularly language difficulties (p.304-5). The board further noted with respect to comments that while some students had written that she was a "slant eyed bitch," and that she should "go back to China," (30) "We also are not persuaded that the racism evident in the student evaluations of Grievant made student evaluation results unreliable. The percentage of evaluations in which racism by students was evident was approximately one percent of the total evaluations" (p.306).
15...........In Robert Kramer v. The President of the University of British Columbia (1992), (2). The department Head viewed Kramer's 1989-90 course evaluations "with some alarm"....(4) Even more disturbing to the department Head was that a considerable number of students in their written comments stated that Dr. Kramer was biased, sarcastic, and hostile to the material and that a number of students had stated that Dr. Kramer's teaching would cause them to stay away from the Asian Studies department. (5) There were also some diametrically apposed positive comments" (p.10).
15...........In University of Regina Faculty Association v. University of Regina (1993), The Board argued (6) that the University was under an obligation to verify negative comments before acting on them. Consequently, (7) the fact that Dr Jalan had received some negative evaluations from students could not be used to undermine the otherwise generally favorable comments he had received in his annual performance reviews" (p.4).
15...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the court noted that (25) "With respect to the "qualitative" scores---i.e., the "comments," there was a clear error. The qualitative comments from a number of courses were read and commented on, and conclusions were drawn from them which went into the "file." Both Reviewing faculty read and commented on them, as did the Department Chair in her letter to the Dean. Yet the Dean had clearly stated in a departmental memo that the qualitative comments were not to be used for administrative or promotion purposes. (26) While in the abstract there is no reason why such comments would not be relevant, if the Department had a rule against their use, or in other words if they were "for the professor's eyes only," then it was a significant breach of Departmental rules to use them" (p.31). (27) In the opinion of the Board, so long as the comments were fairly presented, they offered the PAT [Promotion and Tenure Committee] and others a better balanced view of the teaching qualities and problems of Dr. MacLean than the quantitative statements alone" (p.31). (28) The court noted that "One problem with the questionnaire is that it solicits bad points as well as good points. Despite that caveat, we conclude that the inclusion of the qualitative comments was not a significant error" (p.32).
[BACK to document]16.
16...........In Johnson v. University of Pittsburgh (1977), the court noted (2) they "approached this question of teaching ability with considerable doubt, in view of the fact that in prior years there does not appear to have been any criticism of her teaching and also in view of the fact that...there was evidence that the department chairman, had informed her after one of her lectures in 1971 what a great lecture it had been;" On the other hand, the court said (3) "we have the instance referred to in Finding 27 (p.1359, italics added).
16...........In Fields V. Clark University (1987, it was observed (3) a few of which, from students in Fields' seminars, were "wildly enthusiastic" about her enthusiasm, commitment and presentations; (4) a few were ambivalent; (5) with a considerable number being extremely negative, particularly (6) with regard to her large lecture classes in basic courses in sociology.
16...........In Yu Chuen Wei v. Vermont State Colleges Faculty Federation (1995), moreover, they said, (19) "The statistical comparison demonstrates that Grievant was evaluated higher by students than [her male colleague] with respect to upper level classes, but that (20) [the male colleague] was evaluated higher than Grievant in lower level classes. Given (21) this "mixed" result, the statistical comparison of evaluations does not demonstrate by a preponderance of the evidence that Grievant's students rated her the same, or better, than [male colleague]" (p.305).
16...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), it was noted that (20) In general, the in-class peer reports were mixed but favourable. The in-class discussions were more problematic. (p.30). (21) While the knowledge, interest and enthusiasm of Dr. MacLean were acknowledged, "the problem appeared to be one of style or personality." It was further noted that (29) "As against the low figures, they disclosed a number of good qualities in Dr. MacLean---enthusiasm for his subject, wide knowledge of the literature, much out of class assistance to students, and a commitment to seeking good work from students. (p.31). (30) The reviewing faculty report noted the comments about Dr. MacLean's "derogatory manner, biased opinion, unwillingness to listen," were matched by "clear, stimulating, very helpful after class." And, (31) "some students have told us that the comments made were not representative of the class as a whole and were unduly influenced by the process" (p.41). (32) "A number of students, both from earlier years and from his current classes, furnished letters of support, and in preparation for the appeal, some furnished affidavits with respect to particular matters such as the 'intimidation' discussion in Soc. 250 and events in Soc. 490 and 520 in the fall of 1989." (p.33)
16...........In Robert Kramer v. The President of the University of British Columbia (1992). (16) While a number of negative student comments were quoted in the department Head's letter, there were a number of very positive comments, and these were not mentioned at all. (25) "We have examined all of these written comments. There was a very wide range of comments. There were not 29 comments saying sarcastic and biased comments; but there were certainly 29 comments which included either cynical, sarcastic, biased, insulting, negative, condescending, belittling, opinionated, arrogant, nihilist, and destructive.... (29) However, it would only be fair to add that there were a number of comments in favour of Dr. Kramer, stating that the student "liked the course immensely," "now interested in Asian Studies;" "helps create a relaxed atmosphere," "really enjoyed him," "very approachable and knowledgeable," "very enthusiastic," "captivates audiences with his humour," "very effective" (p.12). (30) "In the other two courses, both small, both Japanese language, there were also some negative comments" (p.12).
16...........In Christopher Turner v. The President of the University of British Columbia (1993), the board noted that (6) "While there is no question of Dr. Turner's competence as a teacher at all levels, teaching evaluations for the last several years show that his effectiveness is marred by what students perceive as excessive formality, lack of enthusiasm and dullness....In a previous promotion attempt, his teaching was briefly described as "very competent" but student evaluations indicate further improvement to be "better than adequate" (p.2)
[BACK to document]17.
..........I wish to thank to Patrick B. Shaw, Attorney for AAUP for referring me to Ms. Linda Lott, Administrative Coordinator, Hofstra Univeristy Chapter, AAUP, who conducted a search for me of a faculty collective bargaining contract database being developed there. Ms. Lott searched the database with "several key words that relate to academic freedom, teaching methodology and student evaluations. The only word that was identified in some of the contract provisions was 'student evaluation'"(Personal communication, March 21, 1997). It should be noted that very few explicit references in the contracts to the use of signed/unsigned SEF or the use/nonuse of comments were found in this developing database. Some of the instances found are:
17...........At Rider University, the agreement stated "The College may not use course evaluations for purposes of discipline, promotion, or tenure, unless introduced for such purposes by the faculty member."
17...........At Western Michigan University, the agreement stated "Only the ratings shall be included in all promotion, reappointment, merit, and tenure recommendations, together with such other evaluations of teaching competence as may be employed by faculty members and made available. Western agrees to consider all the evidence of teaching competence that is presented in evaluating teaching faculty and shall not use unsubstantiated structured comments in personnel decisions." I have already noted the ruling at the University of Guam (Blum D. E. (1990, October 3). which stated that (1) students not being made aware of the purpose and ramifications of their evaluations, (2) the anonymous nature of student evaluations, (3) the invalid analysis of SEF, and therefore, (4) SEF in effect being anecdotal and hearsay data. Since most SEF results are prepared anonymously, an instructor has no recourse to confront his/her evaluators. As will be addressed below, the anonymous nature of SEF is beginning to also be questioned by arbitration boards.
17...........I am informed from a colleague at St. John's University (New York) that, though SEF are mandated, they are not used administratively. I suspect there are many more schools (likely those who have union contracts) that do not use SEF administratively or who limit its use. I might note here for those who maintain that without SEF used administrative that there is no quality control over instruction and that therefore student learning will suffer, to check with the schools who do not use SEF administratively for a reality check on their assumption.
[BACK to document]18.
..........In Johnson v. University of Pittsburgh (1977), the court said, "It is also obvious that the court and the administration of universities cannot permit students to exercise a veto over professors who may be world renowned scientists" (p.1366-7), noting, "It is obvious that a professor may be possessed of ex