~ EPAA Vol. 7 No. 4: Haney, Fowler, Wheelock, Bebell & Malec
"Massachusetts Teacher Test" ~
page 1 |
introduction |
background |
reliability & validity |
interviews |
conclusions |
references
Appendix 2
Richardson v. Lamar County Bd. of Educ.
729 F. Supp 806 (M. D. Ala. 1989) (Excerpted)
United States District Court, M.D. Alabama, Northern Division.
Nov. 30, 1989.
MEMORANDUM OPINION
MYRON H. THOMPSON, District Judge.
Plaintiff Alice Richardson, an African-American, has brought this lawsuit claiming that defendant Lamar County Board of Education [FN1] wrongfully refused to renew her teaching contract in violation of Title VII of the Civil Rights Act of 1964, as amended. [FN2] Richardson charges the school board with two types of discrimination under Title VII. First, she asserts a claim of "disparate treatment": [FN3] that the school board refused to renew her contract because of her race. Second, she asserts a claim of "disparate impact": that the board's stated reason for not renewing her contract--that she had failed to pass the Alabama Initial Teacher Certification Test--is impermissible because the test has had a disparate impact on African-American teachers. The court's jurisdiction has been properly invoked pursuant to 42 U.S.C.A. § 2000e-5(f)(3).
FN1. Richardson has sued not only the Lamar County Board of Education but also its superintendent and members. However, because Richardson may obtain full relief from the school board the court has not treated the board members and the superintendent separately from the school board.
FN2. Title VII is codified at 42 U.S.C.A. §§ 2000e through 2000e-17.
FN3. Richardson's disparate treatment claim is also based on 42 U.S.C.A. § 1981 and the fourteenth amendment, as enforced by 42 U.S.C.A. § 1983, Jett v. Dallas Independent School District, 491 U.S. 701, 109 S.Ct. 2702, 105 L.Ed.2d 598 (1989), with jurisdiction premised on 28 U.S.C.A. §§ 1331, 1343. Because a plaintiff must prove intentional discrimination to establish a disparate treatment claim under § 1981, § 1983 and the fourteenth amendment as well as under Title VII, Stallworth v. Shuler, 777 F.2d 1431, 1433 (11th Cir.1985), and because Richardson is seeking the same relief under all these statutory provisions, the court need not address separately her theories under §§ 1981, 1983, and the fourteenth amendment. The court also need not address whether Richardson has stated a cognizable claim under § 1981. Patterson v. McLean Credit Union, 491 U.S. 164, 109 S.Ct. 2363, 105 L.Ed.2d 132 (1989).
Based on the evidence presented at a nonjury trial, the court concludes that Richardson may recover on her disparate impact claim but not on her disparate treatment claim. The court's disposition of Richardson's disparate treatment claim is simple and direct. The court simply applies the procedure set forth by the Supreme Court in Texas Department of Community Affairs v. Burdine, 450 U.S. 248, 101 S.Ct. 1089, 67 L.Ed.2d 207 (1981). The court's disposition of her disparate impact claim is, however, much more difficult. The court first addresses and finds meritless two defenses raised by the school board: that Richardson's disparate impact claim is barred by principles of collateral estoppel and res judicata; and that under the framework set forth in Price Waterhouse v. Hopkins, 490 U.S. 228, 109 S.Ct. 1775, 104 L.Ed.2d 268 (1989), Richardson would not have been reemployed even if she had passed the state certification test. The court then goes through a lengthy application of the disparate impact analysis outlined by the Supreme Court in Wards Cove Packing Co., Inc. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).
I. BACKGROUND
Richardson taught in the Lamar County School System for three years, from 1983 to 1986. She was, however, unable to obtain a permanent teaching certificate and therefore had to teach with temporary and provisional certificates. To obtain a permanent certificate, Richardson, like all other teachers in the state at that time, had to *809 pass the Alabama Initial Teacher Certification Test, which consisted of a "core" examination and an examination aimed at the specific area in which the teacher sought to teach. Richardson wanted to teach in the areas of early childhood education and elementary education, and thus could meet the certification test's specific area requirement by passing the examination in either area. Between 1984 and 1986, Richardson failed the early childhood education examination twice and the elementary education examination three times.
In the spring of 1986, the Lamar County Board of Education decided that the elementary school where Richardson taught should be consolidated with another school. Because fewer teachers would be needed, the school board informed 15 nontenured teachers, including Richardson, that their contracts would not be renewed for the 1986-87 school year. Four of the 15 teachers were, however, rehired. Richardson, who would have acquired tenure if she had been rehired, was not one of the four.
Approximately a year later, in May 1987, this court enforced a consent decree requiring the State Board of Education to issue permanent teaching certificates to a court-defined class of black teachers who had failed the state teacher certification test. [FN4] Richardson received her certification pursuant to the consent decree.
FN4. Allen v. Alabama State Board of Education, 816 F.2d 575 (11th Cir.1987) (directing district court to enforce consent decree); Allen v. Alabama State Board of Education, Civil Action No. 81-697-N (M.D.Ala. May 14, 1987) (enforcing the consent decree).
[NOTE: Omitted from this reproduction of Judge Thompson’s opinion are several pages in which he discussed: II. DISPARATE TREATMENT CLAIM; and III. DISPARATE IMPACT CLAIM. The remainder of the opinion is reproduced in its entirety.]
[15] Since Richardson has established that the early childhood education and elementary education examinations had an adverse racial impact, the burden shifts to the Lamar County Board of Education to produce evidence of employment justification. An understanding of the history of the Alabama Initial Teacher Certification Test is important to determining whether the school board has met its burden and, if so, whether Richardson has, in turn, shown that the school board's justification for the certification test has no basis in fact.
a. History of the Early Childhood Education and Elementary Education
Examinations
In 1979, amidst a national groundswell in favor of teacher competency testing, the Alabama State Board of Education placed development of a uniform certification test at the head of its agenda. It retained a professor at Auburn University to conduct a feasibility study regarding implementation of a teacher testing program in Alabama; the state's Assistant Superintendent for Teacher Certification also participated in the study. After a rather cursory investigation, the two educators recommended implementation of a testing program similar to one designed by a private test developer for the State of Georgia.
The State Board agreed with the recommendation. In January 1980, it awarded a contract to the private test developer on a noncompetitive basis. [FN29] While the board did not always express its purpose for imposing the test requirement with perfect clarity, both the test developer and the board understood that the test would measure whether a teacher possessed enough minimum content knowledge to be competent to teach in the classrooms of Alabama.
FN29. Board members anticipated that the test requirement would adversely impact against African-American applicants for teaching certificates. However, the same decision would have been reached without consideration of that factor. The board's action was predicated on a legitimate concern for improving the quality of education in Alabama.
The time frame for development of the Alabama Initial Teacher Certification Test, as it came to be known, was quite short. The test developer had one year to complete development and implementation of 36 separate examinations. The test developer created a "core" examination and 35 additional examinations that covered specific subject areas. As stated, a teacher had to pass the core examination and one subject area examination in order to receive certification.
The Assistant State Superintendent, the sole ranking state official charged with oversight of the private test developer's contract compliance, had a doctorate in educational administration; but neither he nor anyone on his staff had any expertise in test development. And no outside experts were retained to monitor the test developer's work. The developer's work product was accepted by the state largely on the basis of faith.
The test developer began by preparing a preliminary planning document. It next asked the State Department of Education to appoint Alabama educators to the various committees and panels necessary for completion of the project. According to criteria provided by the developer, these educators were selected to represent a fair cross section of persons from different geographic areas throughout the state. They were also selected in such a way that African- Americans and women were fairly represented overall; however, not all committees and panels had minority representatives.
The test developer's technical staff and subject area consultants then formulated topic outlines for the various examinations. They consulted state education standards, state courses of study, materials related to Alabama's student competency tests, and examples of textbooks used in Alabama public schools. They also developed actual test objectives. These objectives were more explicit statements of concepts embedded in the topic outlines. The objectives were reviewed by the developer's editors and management. The developer's in-house work was far below average.
*818 In October of 1980, approximately 200 Alabama educators attended a two-day conference to review the topic outlines and objectives for 36 examinations. They had previously been mailed orientation materials. After additional orientation, they were divided into curriculum committees to review the topic outlines for comprehensiveness, organization, accuracy, and absence of bias. The committees then reviewed the objectives to ensure that they matched the topic outlines. Taxonomic level, significance of content, accuracy, level of specificity, suitability, and lack of bias were considered. Decisions were reached by consensus during both stages of review. Modifications and deletions were recorded by the test developer's personnel assigned to each committee. In some cases, however, the developer made additional changes, or ignored suggested changes, without obtaining clearance from committee members. No effort was made at any time to link the topic outlines and objectives to the state-mandated curriculum for teacher training programs.
The test developer then sent a job analysis survey packet to approximately 3,000 in-service teachers throughout Alabama. The purpose of this survey was to determine the job relatedness of the test objectives. [FN30] However, in nine fields where there were fewer than 200 teachers throughout the state, the test developer's process resulted in very small response rates. The survey packet was sent to persons certified and teaching in specific content areas. The packet included a set of objectives for that content area, a survey form, and a set of instructions. The teachers were asked whether they had taught or used each objective in the past two school years. If the answer was yes, they were asked to rate the objective in terms of time and essentiality. The scales used to record those responses were balanced in favor of indicating that an objective was job related, and teachers were instructed to resolve doubts in favor of job relatedness. The results of the job analysis survey were tallied in such a way that responses from only those who indicated that an objective had been used in the last two years were reflected in the data. Those who indicated that an objective had not been used were ignored.
FN30. A stratified random sampling technique was employed to select survey respondents and a fair cross section of teachers was generally achieved.
In January of 1981, the curriculum committees met for a second time. They were provided results from the job analysis survey and were asked to determine which objectives should generate questions to appear on the examinations. This step was called "objective selection." The survey results were a major determinant of which objectives were ultimately selected.
The test developer then prepared a "blueprint" for each examination. These blueprints specified the number of test questions, or items, necessary to measure each objective. Test items were drafted by the test developer's content area consultants and edited by its staff. Again, the developer's in- house work was far below average.
In March of 1981, the test items were reviewed by Alabama curriculum committees for "item/objective" match, significance of content, accuracy, clarity, and absence of bias. This "item review" process lasted for two days. Committee revisions were recorded by the test developer's personnel. However, in some cases, the developer ignored the suggested changes, or made additional recisions, without consulting committee members for approval. In other cases, the developer simply added new items that had never been reviewed by committee members. As many as 20 items for each 120-question examination fell into one of these categories.
In late April of 1981, the test developer convened a separate group of educators to review the test items once again for content validity. The purpose of this session, which lasted one day, was to provide an independent check against the judgments already rendered by the previous committees of Alabama educators. The new panelists reflected a fair cross section of persons in their field and were qualified to make content validity judgments in their *819 field. Each educator worked separately, but votes were tallied as if educators had served on a committee. After orientation, the educators were asked to judge whether each item matched its objective, was accurate, was free of bias, and was not tricky, misleading, or ambiguous. If the item met these criteria, the item was rated content valid by that judge. If the item was deemed invalid, the judge's reason for rejecting that item was recorded. The test developer compiled these content validity ratings; a level of agreement among judges greater than 50% was required for an item to be deemed content valid. While a majority of items appearing on the final test instruments reflected the judgment of Alabama educators that those items were content valid, a significant number of items appearing on the tests did not reflect that judgment. These included those items that had been revised by the developer without obtaining clearance from the panelists. [FN31]
FN31. The test developer did not convene separate panels of minority educators at any stage of the item review or content validity process to screen items for possible bias.
The judges were also asked to make cut-score decisions for those items they had rated content valid. For these items, and those items only, judges were asked whether a teacher with minimum content knowledge in the field should be able to answer the item correctly. A yes-no response was requested. Judges were disqualified from making that same cut-score determination for any item they had previously rated content invalid. In essence, their expert judgment as to those items was ignored.
The test developer then assembled and produced the actual test instruments for all 36 examinations. Each examination had 100 items tentatively designated as scoreable and 20 items tentatively designated as nonscoreable. The examinations were first administered to a group of actual candidates. The test developer had originally contemplated a separate field tryout, but time constraints prohibited such a course. After the first administration, the developer examined item statistics to flag problem questions. Based on this item analysis, it selected 100 scoreable items and 20 nonscoreable items for each examination. The developer did not conduct empirical bias studies to determine whether the difficulty of items varied according to the race of examinees.
The test developer then set a minimum cut score for each examination. The developer's original plan was to take the panelists' cut-score ratings and subject them to a 10% non-cumulative binomial algorithm. This level of agreement among judges would then determine the minimum cut score. However, the developer's procedure yielded cut scores that were so astoundingly high that they signaled, on their face, an absence of correlation to minimum competence. For example, of the more than 500 teachers who took the first administration of the core examination, none would have passed if the original cut-score methodology had been followed.
Faced with this problem, the test developer made various mathematical "adjustments" to the original cut score. First, the developer applied a 10% cumulative binomial algorithm. When the cut scores still remained too high, it applied a 5% cumulative binomial algorithm. This process of applying successively stricter algorithms was referred to at trial as a "binomial twist." The developer engaged in this process without consulting the State Department of Education or any Alabama educators. In two fields--that of Music and that of Speech, Communication, and Theatre--the 5% binomial twist yielded cut scores that were much too low. The developer simply applied a different mathematical algorithm to those examinations; again, the developer consulted no one. For all special education and school counseling examinations, the developer recommended a uniform cut score cap of 80 to the State Department of Education. This recommendation was based on the developer's experience in the Georgia testing program. However, in Georgia, the decision to place a cap on cut scores was reached by state officials in conjunction with Georgia educators. [*820 FN32]
FN32. Although the cut scores in the special education area were intended to serve as an upper limit, the cut scores on five of those examinations were actually raised to 80 to achieve uniformity.
The State Department of Education was then given the option of dropping the cut scores, as set by the developer, by two or three standard errors of measurement (SEM's). It was clear at that time that cut scores, even after the various adjustments catalogued above, were not measuring competence. For example, even after the developer's 5% binomial twist, 78% of the teachers taking the first administration of the core examinations would have failed. The same would have been true for 93% of those taking the school counseling examination, 89% of those taking the learning disability examination, and 97% of those taking the library media examination. Instead of challenging what the developer had done, the state simply dropped the cut scores three SEM's in order to arrive at a "politically" acceptable pass rate. In so doing, the state knew that the examinations were not measuring competency.
In 1982, the test developer formulated nine additional examinations. Its test construction procedures and quality of execution were essentially the same, with the following exceptions. First, the developer's job analysis survey form contained a rating scale with additional errors. Second, a more restrictive binomial table was used to calculate agreement among panelists on content validity questions. Third, a more accurate cut-score methodology was employed.
In 1983, the developer conducted a "topicality review" to update ten of the examinations already in use. A curriculum committee performed item and objective review. The committee's tasks were to determine whether items had become stale because of changes in the teaching field and to identify problems with items by reference to item statistics for the first eight administrations of the certification test. On average, 50% of the items in any given examination were replaced or revised. The developer did not convene a separate panel, as it had during the initial test development, to provide an independent screen for content validity, nor was an independent cut-score panel convened. The curriculum committee provided ratings used to set cut scores.
b. Validity of the Early Childhood Education and Elementary Education
Examinations
The Lamar County Board of Education contends that the state teacher certification test was designed to determine whether a teacher is competent to teach in Alabama's classrooms. Richardson claims, as stated, that the early childhood education and elementary education examinations were invalid, that they did not measure competency.
Generally, validity is defined as the degree to which a certain inference from a test is appropriate and meaningful. APA Standards at 94. [FN33] It is suggested that validity evidence must necessarily be restricted to success on the job; and, to be sure, there are Title VII decisions that have approached the question of validity by asking whether a given score on a test yields an appropriate and meaningful inference about successful performance on the job. See, e.g., Contreras v. City of Los Angeles, 656 F.2d 1267, 1271-1272 (9th Cir.1981), cert. denied, 455 U.S. 1021, 102 S.Ct. 1719, 72 L.Ed.2d 140 (1982); Guardians Association of New York City Police Dept., Inc. v. Civil Service Commission, 630 F.2d 79, 91 (2d Cir.1980), cert. denied, 452 U.S. 940, 101 S.Ct. 3083, 69 L.Ed.2d 954 (1981). However, there is no magic to using success on the job as an anchor point for validity. Success on the job is just one of many constructs that a test can measure. Thus, a sound inference as to a different construct, such as minimal competence, may also form the basis for a finding of validity. In short, a test will be valid so *821 long as it is built to yield its intended inference and the design and execution of the test are within the bounds of professional standards accepted by the testing industry. APA Standards at 9; cf. Washington v. Davis, 426 U.S. 229, 247 & n. 13, 96 S.Ct. 2040, 2051 & n. 13, 48 L.Ed.2d 597 (1976) (validity need not be limited to inference about success on the job).
FN33. The term APA Standards is a shorthand reference for the American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Standards for Educational and Psychological Testing (1985).
In order to be valid, a licensure or certification test must support the inference that persons passing the test possess knowledge necessary to protect the public from incompetents. APA Standards at 63. Part of an appropriate validation strategy for licensure and certification tests is to define clearly and correctly the domain of minimum content knowledge necessary for competence. The test domain, once defined, must then be translated into actual test questions that measure competence. At all stages, validity flows from the expert judgment of practitioners in the field being tested. The test developer's role is to employ professionally accepted practices that accurately marshal the expert judgment of those practitioners. When the questions on a given test actually measure what practitioners in the field consider to be content knowledge associated with competency, the test instrument is held to possess content validity. However, mere content validity does not alone establish test validity. No matter how valid the test instrument, an inference as to competence or incompetence will be meaningless if the cut score, or decision point, of the test does not also reflect what practitioners in the field deem to be a minimally competent level of performance on that test. Again, the test developer's role in setting a cut score is to apply professionally accepted techniques that accurately marshal the judgment of practitioners.
In assessing the overall validity of the Alabama Initial Teacher Certification Test, the court must therefore address both content and cut-score validity. The test developer retained by the State Board of Education followed a multi- step procedure to build 36 teacher certification examinations in 1981. With minor variations, it followed the same procedure when it built nine additional examinations in 1982. The developer then applied a third procedure when it revised ten examinations in 1983. The content validity of each of these examinations turns on whether the developer's procedures were adequate, or were outside the bounds of professional judgment. For reasons that follow, the court concludes that the developer's procedures violated the minimum requirements for professional test development. Accordingly, none of the examinations, including the early childhood education and elementary education examinations, possesses content validity.
The test development process was outside the realm of professionalism due to the cumulative effect of several serious errors committed by the developer when it formulated the 45 examinations in 1981 and 1982. First, while practicing teachers were asked to offer their judgment about the job relatedness of test objectives, it is clear that the test developer's survey instrument distorted that judgment. Scales were balanced in favor of finding job relatedness and respondents were specifically instructed to resolve all doubts in favor of job relatedness. Moreover, the response of those teachers who indicated that they had not used an objective was ignored.
Second, Alabama educators serving on curriculum committees selected test objectives based on those survey results. It has been suggested that the survey was used only in an advisory capacity and that any survey errors were offset by the overall judgmental process undertaken by committee members. However, it is plain that the survey was conducted to solicit critical firsthand knowledge from in-service teachers. It is equally plain that curriculum committee members, aware that the survey had been conducted for that purpose, took the survey results quite seriously. The court concludes that the overall judgmental process for determining job relatedness of test objectives was distorted significantly by survey error.
Third, a significant number of items appearing on the examinations failed to reflect accurately the collective judgment of curriculum committee members. In some *822 cases, changes to actual test items were not implemented. In other cases, items that had never been reviewed by a curriculum committee appeared on examinations. It is suggested that, in any testing program of this size, a certain number of errors of this type will be found. The court agrees with this proposition in principle; however, the evidence reflects that the error rate per examination was simply too high.
Fourth, Alabama educators were never asked to determine whether the test items themselves were job related, even though such an approach is standard practice in the testing industry.
Fifth, many items appeared on the examinations even after they had been rated content invalid by the requisite number of Alabama panelists. It is suggested that, before any such item appeared on a final test form, it was revised by the test developer, and that all revisions were approved by Alabama panelists. However, neither the State Board of Education nor the test developer produced any documentation of this alleged revision and approval process. Moreover, not a single panelist was called at trial to confirm that the process had actually occurred. The court finds that no such process occurred and that the test developer simply substituted its own judgment for that of Alabama educators.
In 1983, the test developer conducted a topicality review for ten of the examinations already in use. It is suggested that, even if those ten examinations were previously content invalid, they gained content validity by way of the topicality review process. The court does not agree. The topicality review process resulted in changes to, or replacement of, only about 50% of any given examination's 120 items. Items that were not revised or replaced therefore remained just as invalid as they were at birth. Moreover, as to items that were revised or replaced, there was no separate content validity determination. The court agrees with Richardson's experts that, on balance, these two factors rendered the ten examinations subjected to the 1983 topicality review to be content invalid as well. [FN34]
FN34. The court does not agree that the test developer's multi-step test development process was inherently self-correcting. There is substantial support in the record for the view that errors at one step not only survived the next step, but also created new errors.
Moreover, the fact that a validity study for the National Teachers Examination was upheld in United States v. South Carolina, 445 F.Supp. 1094 (D.S.C.1977), aff'd 434 U.S. 1026, 98 S.Ct. 756, 54 L.Ed.2d 775 (1978), does not mandate the same result here. The validity of the present examinations must be assessed on the basis of evidence now before the court. Cf. York v. Alabama State Board of Education, 581 F.Supp. 779, 786 (M.D.Ala.1986) ("tests are not valid or invalid per se ...; the fact that the validity of a particular test has been ruled upon in prior litigation is not necessarily determinative in a different factual setting").
Richardson advances an array of challenges to the cut-score methodology employed by the test developer. It is clear that, as to the 35 examinations developed in 1981, the cut scores bear no rational relationship to competence as that construct was defined by Alabama educators. [FN35] The *823 evidence reveals a cut-score methodology so riddled with errors, that it can only be characterized as capricious and arbitrary. There was no well- conceived, systematic process for establishing cut scores; nor can the test developer's decisions be characterized as the good faith exercise of professional judgment. The 1981 cut scores fall far outside the bounds of professional judgment.
FN35. The court must point out that three of Richardson's arguments with respect to the 1981 cut scores clearly lack merit. First, she asserts that Nassiff's 1978 "Two-Choice Angoff" method for yielding an original cut score was and is "without professional endorsement." However, professional literature published well before the initiation of Alabama's testing program endorsed methodologies similar to Nassiff's approach. See R. Thorndike, Educational Measurement at 514-515 (1971). Moreover, while current professional literature does not grant Nassiff's method the highest possible marks, it certainly does not condemn it as being wholly outside the bounds of professional judgment. See Berk, A Consumer's Guide to Setting Performance Standards on Criterion Referenced Tests, 56 Rev. of Educ. Research 137, 148 (1986). Second, Richardson contends that Nassiff's method was largely unproven and that an alternative cut-score methodology should have been used at the same time as a backup. While the court agrees that this might have been advisable, there is no evidence that the failure to use a backup cut-score method was unprofessional. Third, Richardson argues that the test developer's recent adoption of a more sophisticated cut-score methodology signals the bankruptcy of Nassiff's 1981 method. The court does not agree. The fact that, with new developments in the field, the test developer later changed its methodology should not be held against it as an admission of error.
First and foremost, it is undeniable that cut scores for the 35 examinations developed in 1981 do not reflect the judgment of Alabama educators who served as panelists on the minimum cut score committees. This is a crucial error, because competence to teach is a construct that can only be given meaning by the judgment of experts in the teaching profession. Here, expert panelists who rated an item invalid as to content were automatically disqualified from going on to indicate whether that item should be counted toward the minimum cut score. This means that when a panelist indicated that an item should be excluded--because it contained inaccurate content, did not measure an objective, was tricky, ambiguous, or misleading, or was biased--that panelist's opinion was ignored for purposes of determining whether the item measured competence and should contribute to the cut score. The exclusion of such opinions resulted in a series of cut scores that reflected a distorted notion of competence.
Second, the court has no doubt that, after the results from the first administration of those 35 examinations were tallied, the test developer knew that its cut-score procedures had failed. The proof of this fact is that none of the more than 500 teachers who took the first administration of the core examination would have passed if the original cut score, calculated according to the developer's original plan, had been utilized. The court cannot conclude that all Alabama teachers who took that examination were totally and completely incompetent. It follows, therefore, that the developer knew that its cut-score procedure had utterly failed to reflect a valid construct of competence.
Third, instead of notifying the State Department that its cut-score procedure had malfunctioned, the test developer attempted to mask the presence of system failure by making various unilateral mathematical "adjustments" to the original cut score until an "acceptable" score had been reached. The most common adjustment was application of a "binomial twist" to the data collected from Alabama educators. This adjustment tended to lower cut scores. It is argued that lowering cut scores offset any system failure that might have occurred previously. This argument, however, misses the mark. The critical factor with respect to cut-score validity is not whether there was a net change in cut- score level, but whether the cut score itself accurately reflected the expert judgment of Alabama educators about whether examinees possess the competence to teach. This construct of competence cannot be guessed at by out-of-state test makers. It is also argued that the developer's resort to the "binomial twist" was an exercise of "tempered judgment" in light of actual examination data. Again, however, the fatal error is that it was the developer, and not Alabama educators, that exercised this judgment. [FN36]
FN36. It is argued that the binomial twist was, in fact, implemented in consultation with the State Department of Education, and that such consultation somehow injects the judgment of Alabama educators into the cut-score process. However, the evidence is clear that the developer never consulted any official at the State Department of Education with respect to the binomial twist. In fact, the State Department was not advised of that twist until shortly before trial.
Fourth, in two fields--that of music, and and that of speech, communication and theatre--the 5% binomial twist yielded cut scores that were much too low. In those areas, the developer simply applied a different mathematical algorithm to arrive at an acceptable cut score. Again, the developer substituted its judgment about competence for that of Alabama educators.
Fifth, for all special education and school counseling examinations, a uniform cut score of 80 was adopted. To be sure, the *824 State Department of Education made this decision, based on a policy judgment that no score should exceed 80. However, it is clear that the developer played an advisory role in that decision and that its advice was completely irresponsible. The developer recommended holding the scores at 80 based on its experience in the Georgia testing program. However, in Georgia, the decision to place a cap on cut scores was reached by the State Department of Education in conjunction with Georgia educators. The test developer never suggested that the State Department consult Alabama educators, and there is no evidence that such consultations in fact occurred. In effect, the developer assumed that the judgment of Georgia educators in a different testing program would be good enough for the people of Alabama. Once again, cut scores bore no relation to the expert judgment of Alabama educators. Moreover, if the rationale for adopting a cut score of 80 was to place a cap on such scores, it is difficult to understand why the cut scores for five special education examinations were actually raised to 80.
Sixth, the State Board did not drop the cut scores, as set by the developer, to advance bona fide psychometric or policy purposes. The board did not drop the scores three SEM's to account for measurement error; the developer recommended a drop of only two SEM's for that purpose. Nor were scores dropped three SEM's to reduce adverse impact against blacks; the State Assistant Superintendent in charge of the certification test was vehemently opposed to taking race into account in setting the cut scores. Finally, while cut scores may have been lowered by three SEM's in part for the permissible purpose of maintaining an adequate teacher supply, the court is convinced that the primary purpose for dropping three SEM's was to mask the obvious system failure generated by the developer's cut-score methodology. For example, even after the developer's binomial twist, 78% of the teachers taking the first administration of core examinations would have failed, and the same would have been true for 93% of those taking the school counseling examination, 89% of those taking the learning disability examination, and 97% of those taking the library media examination. It is apparent that these pass rates did not reflect a fair construct of minimal competence. Further adjustments were employed to back into a passing rate that would appear tolerable and reasonable. The State Board of Education and the test developer in effect abandoned their cut-score methodology, with the result that arbitrariness, and not competence, became the touchstone for standard setting.
The court would be inclined to uphold the cut-score procedures employed for the nine examinations developed in 1982 and the ten examinations subjected to topicality review in 1983; however, each of these examinations has already been shown to be content invalid. Since a valid cut score cannot be generated by items that lack content validity, the validity of the cut-score procedure itself is not enough. Accordingly, the cut scores for the 1982 and 1983 examinations are also invalid.
In reaching the above conclusions, the court has been sensitive to a number of factors. First of all, as stated earlier, close scrutiny of any testing program of this magnitude will inevitably reveal numerous errors, and these errors will not be of equal footing. Secondly, cut scores cannot be determined with mathematical certainty, and political considerations may properly enter into cut-score decisions. The court's task therefore is to assess the sum gravity of the defects found, and to determine whether, as a result of these defects, the examinations are invalid as to content and cut scores. The court recognizes that, in carrying out this task, it must proceed with caution, and even deference. Although the court must assess the credibility of testimony advanced by each side and arrive at an independent judgment, the court should not readily set aside the findings of those who developed a test; the mere fact that the court sees things differently should not, by itself, be considered sufficient to impeach such findings. But while a court should eschew an idealistic view of test validity, it should also be careful not to apply an "anything *825 goes" view. In other words, the mere presence of conflict in expert testimony does not prove that a test fails to meet minimum standards; nor does it prove that a test meets such standards. A court should find a test invalid only if the evidence reflects that the test falls so far below acceptable and reasonable minimum standards that the test could not be reasonably understood to do what it purports to do. The court is convinced that this was the case with the Alabama Initial Teacher Certification test, and in particular with the early childhood education and elementary education examinations. [FN37]
FN37. The court recognizes that it has focussed not so much on the early childhood education and elementary education examinations, but on the Alabama Initial Teacher Certification Test as a whole. The court has done this because the history of the two examinations challenged by Richardson is the same as the history of the teacher certification test as a whole; the conclusions reached by the court regarding the certification test are also applicable to the two challenged examinations. Moreover, in order to appreciate fully the invalidity of the two challenged examinations, one must also understand just how bankrupt the overall methodology used by the State Board and the test developer was.
The court also recognizes that it has focussed on the development and implementation of several individual examinations which have not been challenged by Richardson. The court has included these examinations as additional evidence of the invalidity of the State Board and test developer's overall methodology.
IV. RELIEF
Since Richardson is entitled to prevail on her disparate impact claim, the court must now determine her relief. The court will require that the Lamar County Board of Education reemploy Richardson as an elementary school teacher at a salary and with such employment benefits and job security as would normally accompany the position had she been employed in the school system since 1983. The court will also require that the school board pay her all backpay and other employment benefits she would have received had the school board reemployed her for the 1986-87 school year. The court will also require that the school board pay reasonable attorney's fees to her attorney. 42 U.S.C.A. § 2000e-5(k). The court will give Richardson and the school board an opportunity to agree, between themselves, to the appropriate amount of attorney's fees, present pay, backpay, and other employment benefits to which Richardson is entitled. If the parties are unable to agree, the court will then set these matters down for a hearing.
An appropriate judgment will be entered.
JUDGMENT AND INJUNCTION
In accordance with the memorandum opinion entered this date, it is the ORDER, JUDGMENT, and DECREE of the court:
- That judgment be and it is hereby entered in favor of plaintiff Alice Richardson and against defendants Lamar County Board of Education and its superintendent and members;
- That it be and it is hereby DECLARED that plaintiff Richardson may recover on her "disparate impact" claim but not on her "disparate treatment" claim against defendants Lamar County Board of Education and its superintendent and members;
- That defendants Lamar County Board of Education and its superintendent and members, their officers, agents, servants, employees, attorneys, and those persons in active concert or participation with them who receive actual notice of this injunction by personal service or otherwise, be and they are each hereby ENJOINED and RESTRAINED from failing to reemploy forthwith plaintiff Richardson as an elementary school teacher in the Lamar County School System at a salary and with such employment benefits and job security as would normally accompany the position had she been employed in the school system since 1983;
- That plaintiff Richardson be and she is hereby awarded from defendants Lamar County Board of Education and its superintendent and members all backpay and other employment benefits she would have received had said defendants not illegally refused to reemploy her;
- That plaintiff Richardson and defendants Lamar County Board of Education *826 and its superintendent and members be and they are hereby allowed 21 days from the date of this order to file a request for the court to determine the appropriate amount of present pay, backpay and other employment benefits to which plaintiff Richardson is entitled, should the parties be unable to agree to these matters;
- That plaintiff Richardson be and she is hereby allowed 28 days from the date of this order to file a request for reasonable attorney's fees; and
- That all other relief sought by plaintiff Richardson that is not specifically granted be and it is hereby denied.
It is further ORDERED that this court retains jurisdiction of this case until further order.
It is further ORDERED that all costs of these proceedings be and they are hereby taxed against defendants Lamar County Board of Education and its superintendent and members, for which execution may issue.
The clerk of the court is DIRECTED to issue a writ of injunction.
page 1 |
introduction |
background |
reliability & validity |
interviews |
conclusions |
references
|