This article has been retrieved
times since November 1, 1993.
Education Policy Analysis Archives
Volume 1 Number 11
November 2, 1993
ISSN 1068-2341
A peer-reviewed scholarly electronic journal. Editor: Gene V Glass, Glass@ASU.EDU. College of Education, Arizona State University,Tempe AZ 85287-2411
Copyright 1993, the EDUCATION POLICY ANALYSIS ARCHIVES.Permission is hereby granted to copy any article provided that EDUCATION POLICY ANALYSIS ARCHIVES is credited and copies are not sold.
Why Production Function Analysis is Irrelevant in Policy Deliberations Concerning Educational Funding Equity
Jim C. Fortune
FORTUNE@VTVM1.BITNET
College of Education
Virginia Tech UniversityAbstract: Hanushek and Walberg use production function methodology to contend that there is no relationship between school expenditures and student achievement. Production function methodology uses correlational methods to demonstrate relationships between input and output in an economic system. These correlational methods may serve to hide rather than reveal these relationships. In this paper threats to the validity of these correlational methods for analysis of expenditure-achievement data are discussed and an alternative method of investigation is proposed. The proposed method is illustrated using data from two states (Ohio and Missouri). The method demonstrates relationships between expenditures and achievement that were overlooked by the production function method.
Introduction "On 26 February 1988 Bennett remarked, `Money doesn't cure school problems.' On 29 February 1988 he was more explicit: `We've done 147 studies at the Department of Education. We cannot show a strong, positive correlation between spending more and getting a better result.' In an earlier reference to those studies, he had said on 13 April 1987 that `in only two or three do we find even a weak correlation between spending and achievement.'" (Baker, 1991) The 147 studies referred to by Bennett are those summarized by Hanushek (1986) using the production function technique.Hanushek (1989) contended that "Variations in school expenditures are not systematically related to variations in student performance" and that "... schools are operated in an economically inefficient manner ." He suggested that "increased school expenditures by themselves offer no overall promise for improving education" and that "school decision making must move away from the traditional "input directed" policies to ones providing performance incentives." To support his contentions, Dr. Hanushek relied on the 26 year old, much maligned study by Coleman et al, Equality of Educational Opportunity, and his summary of 187 studies using educational production functions.
Walberg appears to base his case contending no relationship between achievement and productivity on his theory of causal influences on student learning and the resulting nine productivity factors (1982), on the triad relationship of socio- economic status, productivity, and expenditures (1989), and on Hanushek's model and the early literature related to production function analysis (1984).
POLICY RELEVANCE OF THE PRODUCTION FUNCTION METHODOLOGY Monk (1992) described production function analysis as the relating of an input measure to an output measure using correlation or multivariate analysis (regression analysis). He reported that production research began in education some 30 years ago. The process involves the study of relationships between purchased schooling inputs and educational outcomes. The research, according to Monk, is deductively driven, although the deductive arguments tend to be abbreviated. He suggested that the approach has limited utility in policy research because of methodological and conceptual limitations. Monk pointed out that recent research includes more complex multivariate models which have greater potential for illuminating policy.Both traditional production function analyses and the modern multivariate version to which Monk alluded are based on correlational methods which are inadequate to deal with causation. In the simple linear correlation model, a single input variable (often, expenditures, but sometimes other school related inputs such as teacher experience or teacher preparation) is correlated with a single output variable (usually achievement, but sometimes percent passing minimum competency tests or rate of graduation). The multiple dimensionality of schooling suggests that such simple representations of either input or output are inadequate to describe the production relationships.
The second major production function analysis model is based on regression procedures, where a single output variable is predicted by one or more input variables (chosen from expenditure data, teacher experience or teacher preparation) and by intervening variables (such as socio-economic variables, school size, and the like). The purpose of using the intervening variables is to control factors which may confound the actual input-output relationship. In some applications the researcher permits the intervening variables to enter the regression prior to the entry of the input variables. There exists a serious problem with shared variance among the three sets of variables that will be discussed later. Regression based on the prediction of the output variable residual (which has been created by regressing the partial correlation residual of the intervening variables controlling for their relationship to the input variables with the output variable) by the input variables is a more appropriate application to control for confounding variables.
Problems with the Simple Linear Correlation Approach
The Assumptions of the Linear Correlation Approach. Application of the simple correlation model must meet the data assumptions required by correlation, the limitations to inference assumed in the use of the model, and the implicit assumptions about the relationship between correlates inherit in the production function methodology. For the application of the Pearson's Product Moment Correlation it is required that one have near or better than interval data for paired cases and that the full range of each variable be present. The coefficient attained measures the linear relationship between the two variables and indicates association, but not necessarily causation.
What Constitutes Differences in Expenditures? "Throwing a bucket of water on a raging fire will not keep a building from burning to the ground, but no one would argue on the basis of this experience that water has no value in fire-fighting. The value of water is apparent only when enough is applied to overcome the fire by reducing the heat below a critical point, degrading the fuel, or temporarily removing the air needed for combustion. An analogous situation often occurs in education. Frequently, we judge an intervention strategy to be ineffective before we have really implemented a program that is intense enough to achieve the desired effects. "Compensatory education" is a case in point." (Bridge, Judd, and Moock, 1979)
The above phenomenon has been labeled a threshold effect. One reason why the correlation method of production function analysis does not show effects of small differences of funding on achievement is the threshold effect. One dollar difference in funding will not purchase a commensurate or observable difference in achievement. Instead some larger, aggregate differences in funding, perhaps $600 or $700, is needed to purchase observable differences in achievement.
Perhaps the greatest problem in the use of the simple, linear correlation method beyond variable specification, is the absence of the cost disparities that are essential to demonstrate differences in educational purchasing power. An ordering of districts by amount of instructional expenditures does not necessarily order the same districts by their educational purchasing power. One district may have five dollars less in per pupil expenditure than a second district, but may have to pay on the average ten dollars more per teacher than does the second district. Ordering of districts by dollar differences which are less that the measurement error associated with expenditures results in gross underestimation of the true relationship between costs and achievement.
The Truncated Variable (Attenuation). Percent passing a test as a measure of achievement represents a somewhat unusual truncation of a variable in that the variance on the achievement measure is limited to variation of dichotomies rather that variation across the full set of test scores. Variable truncation also occurs when the tests have either floor or ceiling effects, when only one specific segment of the enrollment is used (such as at risk students or college students) or when data are not available for the entire sample being analyzed.
Potential Non-Linear Relationships. The simple, linear correlation method will not identify non-linear relationships between the input and output variables. In one of the two states discussed later in this paper, I found a quadratic relationship in exploring the data. A state department report in the second state also alluded to a potential quadratic relationship between input-output variables.
Problems with the Multiple Regression Approach
The Assumptions of the Multiple Regression Approach. The application of the regression approach is characterized by a single output variable (some form of achievement measurement or percent reaching an educational standard) being predicted by one or more input variables (expenditures, teacher characteristics, and the like) and controlling for one or more background variables (such as socio-economic variables or school size). Two ways are used to control for the background variables. The first way is to permit the background variables to enter first in the prediction equation. The second is a residualizing technique. The residualization process involves creating the residual of the output variable by regressing the first partial of the controlling variables with the vector of predictors on the output variable. The linear combination of the predictor variables are then regressed on the output residual. The regression approach requires that the researcher meet all of the assumptions that have to be met in the simple, linear correlation analysis. In addition, the researcher is required to have a theory or rationale for establishing the order of variable entry and an understanding of the shared variance problem.
The Order of Variable Entry Problem. The order of variable entry in the calculation of the correlation is important in handling shared variance or commonality of explanation. If two correlated independent variables (predictors) are related to a dependent variable (outcome or criterion), the first variable to enter into the regression calculation gets credit for all of its correlation with the dependent variable. When the second variable is entered into the regression calculation, it gets credit only for the correlation that it has with the dependent variable that has not been explained by the first variable entered. Hence, the first variable gets credit for the correlation with the dependent variable that is shared by the second variable. Critics of Coleman showed that his order of effects do not hold up across applications of different regression models. (Pedhazur, 1982)
The Shared Variance Problem. In dealing with this triad relationship created by the output variable, the input variables and the controlling variables, Walberg (1989) simply failed to discuss how he handled the shared variance problem inherit in the triad. His regression model enters socio-economic status as the first predictor of students' test performances, size as the second prediction variable, and finally expenditures as the third predictor variable. The amount of explanation shared by socio- economic status and size and the amount of explanation shared by socio-economic status and expenditures are credited to socio- economic status solely; the amount of explanation shared by size and expenditures are then attributed to size alone. Certainly not much variance remains to be explained by expenditures. A different order of entry would produce markedly different results. Pedhazur (1982) credits Mayeske with the development of commonality analysis to address this problem, but this methodology has been subjected to some criticism. There is in fact no effective statistical method that will unconfound shared predictive relationships. The only appropriate treatment of the shared relationship problem is, perhaps, a straight-forward admission that it is the cause of the unresolvable ambiguity.
Other Design Problems for Both Correlational Models
Inadequate Variable Specification. In addition to the difficulty created by trying to represent multiple inputs and outputs by single variables, there is the additional difficulty of including confounding data elements in the input and output variable measurement. Selected single variables may provide inadequate description of key inputs or outputs, may be unlikely to have the relationship assumed by the production function paradigm, and may not be accurately measured.
Inclusion of Confounded Data Elements. Federal dollars are included in school expenditures as unrestrained expenditures. Some federal dollars are likely ear-marked for efforts that do not contribute to student performance on achievement tests and some federal funds are not involved in instruction. The inclusion of federal funds is not nearly the potential problem of some districts testing special education students and including their scores in the test results. Hence, when there is random confounding of the performance measure or the selection of a weak input variable, each serves to reduce the size of relationships. The choice of percent passing a basic competency test is an unfortunate choice of measure for an output variable. Percent passing immediately sets up a ceiling effect for those passing the test. The use of a dichotomized scoring process reduces the amount of variance to be explained and attenuates the observed relationship.
Inadequate Determination of the Input Variables. Variable specification problems occur three ways in the determination of input variables. Problems occur when input measures are chosen that are not related to instruction. Perhaps, the most frequent example of this problem occurs in the use of teacher salary as an input measure. Teacher salary is based on seniority and is likely not related to quality of instruction. The second way that problems occur in the selection of input measures is the selection of an input which cannot be measured adequately across all districts. An example of this can be seen where school district size varies enough that economy of scale enters into the accuracy of the measure. Very small districts require more dollars per pupil to provide educational services equivalent to those of larger districts. The third way that selection of input variables can create problems is when in some districts the input variable has larger investment in special students than do other districts. Such cases are generated when districts have a large number of "At Risk" students or where a district invests highly in advanced placement instruction.
Inadequate Determination of the Output Variables. Variable specification problems occur in at least four ways in the determination of output variables. The first way is when the output variable that was chosen was a minor emphasis of many schools. Such may be the case when school districts focus more on emotional, attitudinal, behavioral, or vocational outcomes. The second way that dependent variable specification problems can occur is when there are floor and ceiling effects to the measures. If the achievement measure has a ceiling or a floor effect, then many of the students making a perfect or a zero score have accomplishments that are not being measured. The third way that variable specification problems can occur is when the output variables have no logical linkage to either the selected input variables or to school quality. An example of this problem is the "Efficiencies" notion used by Walberg (1989). "Efficiencies" are school expected output developed by the use of prediction based on socio-economic status. The variable can be argued to better represent an error of measurement of the socio- economic construct than an actual measure of school output. The fourth way that variable specification problems can occur is the selection of an output measure that does not pertain to the whole student body. An example of this is the selection of freshman grade point averages for their first year of college. Differential proportions of students across districts go to college, college curriculum differ in difficulty and colleges differ in difficulty.
Crossing Economic Eras. Production function studies are often grouped for interpretation and for the making of policy recommendations. The 38 publications from which Hanushek extracted his review range from the late 1950s to the early 1980s. This means that several of the studies were conducted in different economic eras. In the 1950s, there was a dearth of federal funding, but there was a wave of post-war resources and the early beginning of inflation. The 1960s brought the Elementary and Secondary Education Act, increased federal funding, escalation of inflation, baby boom growth beginning to enter schools and the emergence of civil rights as major issues in education. The 1970s brought a slowing of federal funding, abatement of inflation and more focus on growing enrollment. The 1980s marked a reduction in federal funds, the beginning of a recession, the start of program retrenchment and the end of growing enrollment. It is quite likely that input-output relationships differ across these four decades.
Inconsistent Determination of What is to be Considered a Production Function Study. Several of the studies included in Hanushek's (1989) reviews do not have one or more of the elements required to be classified as production function analyses. One such study is a study that occurred in a large school district where teacher experience and differential teacher salaries were used as input variables. (Murname, 1975) In another study, college freshman grade-point-averages were used as the output variable. (Raymond, 1968) It seems necessary that every study called a production function analysis must have at a minimum an input variable, an output variable, an assumption of a logical linkage between the school, total group and unbiased estimates for both variables across the units of comparison and the computation of a correlational analysis.
Inadequate Sampling Representation. Problems with sampling representation occur in two ways: through lack of disclosure and through inadequate sample size. Sampling becomes very important in making an inference to a given population. In most production function analyses, the intent appears to be that the researcher wishes to generalize to all of the school districts in the United States. Not a single study or collection of studies appears to meet sampling requirements for this inference.
Criticism of the Work of Hanushek As Spencer and Wiley (1981, p. 44) suggested "Hanushek offers a provocative interpretation of the last two decades of research on educational productivity." Unfortunately, "Hanushek misinterprets the data on which he bases his conclusion and draws inappropriate policy implications from them." (Spencer and Wiley, 1981, p. 41) After reading a sampling of Hanushek's articles, I concur with Hughes (1992) that one could quote from 20 years of Hanushek and destroy his current argument with his own words. However, I choose here to look at his current thesis and see if it stands on its own foundation or falls.Hanushek contended that "There is no systematic relationship between school expenditures and student performance" (Hanushek, 1991, p. 425) and that "... schools are economically inefficient." (Hanushek, 1986, p. 1166) He suggests that "increased school expenditures by themselves offer no overall promise for improving education" (Hanushek, 1986, p. 1167) and that "school decision making must move away from the traditional `input directed' policies to ones providing performance incentives." (Hanushek, 1989, p. 49) To support his contentions, Dr. Hanushek relies on the 26 year old, greatly criticized study by Coleman et al, Equality of Educational Opportunity, Washington, D. C., Government Printing Office, 1966; and his own summary of 187 (147 of these studies are those referred to by Bennett) studies of educational production functions. (Hanushek, 1989, p. 46)
The Coleman Study as Support
The Coleman Study did indeed highlight input-output relationships across a large number of districts, using a regression model. Coleman et al concluded that family characteristics and peer group characteristics were more instrumental in promoting student achievement than were school system characteristics. Critics of the study suggested that this ordering of effects may be due to the analytic model used. Because the nature of regression analysis requires theory to specify models and order of variable entry into the computations, Coleman received considerable criticism, some of which resulted in George Mayeske's contributions to a new analytic technique, commonality analysis (Pedhazur, 1982).
The order of variable entry in the calculation of the correlation is important in handling shared variance or commonality of explanation. If two independent variables (predictors) are related to a dependent variable (outcome or criterion) and are related to each other, the first variable to enter into the computation gets credit for the correlation to the dependent variable that it shares with the second variable. Hence, if a family variable enters first in the computation of the correlation being used in predicting reading performance and then a peer variable enters into the calculation, the regression results will show for the family variable its unique correlation with reading performance plus the correlation to reading performance that it shares with the peer variable. For the peer variable only its unique correlation to reading performance is shown. Critics of Coleman show that his ranking of effects does not hold up across applications of different regression models. A second criticism of using Coleman as a primary research foundation lies in the age of the Coleman data. Any economist should be able to see that time has likely made relationships in the Coleman data obsolete with regard to today's economy.
Hanushek's Summary of Production Functions
Hanushek's (1989) summary of 187 studies of educational production functions is a continuing theme throughout his publications. This summary began in 1981 with 29 articles and 130 studies, it was continued in 1986 with 33 articles and 147 studies, and it was completed in 1989 with 38 articles and 187 studies. The summary is the research foundation for Hanushek's assertion of no relationship between school districts' expenditures and student performance on standardized achievement tests.
There are several serious omissions and research flaws in the description and logic of Hanushek's summary. These include the lack of disclosure of sample sizes in the studies that were reviewed, inadequacy in size and representativeness of the 187 case studies, misinterpretation of the results of the hypothesis testing, potential misinterpretation of the summary, failure to use selected research that is not consistent with the ideas being promoted (Glass and Smith (1979), Spencer and Wiley (1981), Burstein (1980), and inadequate specification of the key variables.
Lack of Information on Sample Sizes in the Studies that Were Reviewed. The studies that were reviewed by Hanushek were qualified in some unspecified manner. It appears that the primary criterion for qualification was publication. Hanushek stated that at least one study deals with a district or districts in all regions of the United States, with different grade levels, and across different performance measures. He provided two tables that are purported to describe the sample. In Table 1 of his 1989 article, "The Impact of Differential Expenditures on School Performance," Hanushek showed the number of studies dealing with single districts (60) and the number dealing with multiple districts (127), but he failed to provide any information on the number of districts involved in the multiple districts. In his Table 2, Hanushek showed that 90 studies deal with at least one grade level in the range of grades from 1 to 6 and that 97 studies deal with at least one grade level in the range of grades from 7 to 12. No attempt is made to show replication across grade levels, number of students involved at each grade level or for each district. With so few cases, the reader must wonder where the holes are in the sample.
Inadequate Size and Lack of Representativeness of the 187 Case Studies. There are approximately 15,000 public school districts in the United States. These districts are characterized by a large variance in total enrollment. Samples that include a majority of the students and provide a confidence band of 0.95 percent are usually selected randomly using a stratified sampling frame that involves the selection of approximately 800 districts (See the Condition of Education Annual Reports by the National Center for Educational Statistics). A simple random sample without control for the number of students covered requires approximately 400 districts for a 0.95 percent confidence level and for representation (Schaeffer, Mendenhall and Ott, 1986). The sample used by Hanushek was not random and was likely smaller than either required sample sizes. The size is less bothersome than the scant likelihood of randomness. The 187 studies were likely to have been conducted in reaction to some problem or inquiry. Hence, are the relationships found in these unusual districts representative of those that exist in the other 15,000? No evidence is presented to allow the reader to judge the generalizability of the results.
Misinterpretation of the Results of the Hypothesis Testing. In hypothesis testing, the researcher assumes the null hypothesis and seeks reason to reject it. Failure to find such evidence does not permit one to accept the null hypothesis, but only permits one to fail to accept the alternative hypothesis. Failure to gather evidence that will lead to the acceptance of the alternative hypothesis and the subsequent rejection of the null hypothesis may be due to inadequate sample size, measurement errors or inaccurate model specification.
Spencer and Wiley (1981) used the 109 studies which were analyzed in 1981 by Hanushek who sought to argue for the conclusion of no relationship between teacher-pupil ratio and the performance of students as an example that illustrates another of Hanushek's difficulties with the interpretation of significance tests on regression coefficients. Their argument showed that the null hypothesis can be rejected for positive results and then can be rejected for negative rejects; pointing out difficulty with the model used and the data set.
Potential Misinterpretation of the Summary. Baker (1991) discussed Hanushek's absence of a decision rule in his summary of the literature for the 147 studies (Hanushek, 1986). He stated that a synthesis of literature as reported by Hanushek can be conducted in one of two ways: either by the vote counting method with a stated expectancy or decision rule or by the meta- analysis method. Hanushek did not compute effect sizes so his review must have entailed by the vote counting method. Given the absence of the statement of a decision rule by Hanushek, Baker assumed a decision rule that 5% of the studies will be significant by chance. He then showed that 20% of the studies are significant, thus ruling out a chance relationship (Baker, 1991).
In Table 3 of his 1989 article, "The Impact of Differential Expenditures on School Performance," Hanushek showed the expenditure parameters for the 187 studies for seven educational inputs as they relate to student achievement test performance. Although he reported number of studies, he did not report number of districts, number of students, or grade levels to which the studies pertain. For the various components he reports the number of non-significant studies found. Hence, 82% of the 152 studies relating teacher/pupil ratio to student performance were found not significant (p<0.05); 88% of the 113 studies relating teacher education to student performance were found not significant (p<0.05); 64% of the 140 studies relating teacher experience to student performance were found not significant (p<0.05); 78% of the 69 studies relating teacher salary to student performance were found not significant (p<0.05); 75% of the 65 studies relating expenditures per pupil to student performance were found not significant (p<0.05); 87% of the 61 studies relating administrative inputs to student performance were found not significant (p<0.05); and 84% of the 74 studies relating facilities to student performance were found not significant (p<0.05). For four of these seven inputs (Teacher experience, Teacher salary, Expenditures/pupil, and Administrative inputs) ratios of the significant to non- significant studies are equal to or exceed 11 to 4 odds in favor of positive relationships.
Failure to Cite Research that is not Consistent with the Ideas Being Promoted and Inadequate Specification of the Key Study Variables. Given Hanushek's liberal qualification of studies and his reliance on the Coleman study, his rejection of the Glass study as being subject to too much criticism for attempting to calculate effect sizes for different class size intervals is surprising and unaccountable. Hanushek's failure to address the criticisms of Spencer and Wiley was also surprising. In his discussion of aggregation effects, the work of Burstein was overlooked. This work demonstrates the potential danger of aggregated data and correlation.
Inclusion of Confounded Data Elements. Federal dollars are included in school expenditures as unrestrained expenditures. Some federal dollars are ear-marked for efforts that do not contribute to student performance scores. An even more serious potential problem is that some districts test special education students and other districts fail to test special education students. Hence, there is random confounding of the performance measure, reducing the sizes of correlations possible.
Choice of Performance Measure. The choice of percent passing the basic competency test (bct) is an unfortunate choice of measure for a performance indicator. Percent passing immediately sets up a ceiling effect for those passing the test. Even if they benefit from additional or redistributed expenditures, their gains can never be shown in the scattergrams. Gains shown by those who pass and by those who continue to fail are not reflected in the measure.
Baker (1991) noted that another major problem is Hanushek's failure to correct correlations for attenuation arising from the fact that per pupil expenditures are truncated. Baker stated that the correlation between achievement and expenditures is greatly reduced because "no schools spend a great deal more or less than others. ... It is quite easy for a significant finding to be overlooked, if the observed data come from the center of a scattergram, where the attenuated data often appear to be random. (Baker, 1991, p. 4)
Criticism of the Work of Walberg Walberg appears to base the case for no relationship between achievement and expenditures on his theory of causal inferences on student learning and the nine productivity factors (1982); on the triad relationship of socio-economic status, productivity, and expenditures (1989); and on reliance on Hanushek's model and on the early literature related to production function analysis (1984).Theory of causal inferences on student learning and the nine productivity factors
Walberg's review of productivity research and his development of the "theory" of school learning has received much professional praise. I am in agreement with this praise in that the model appears to synthesize a large body of research clearly and usefully. Walberg's model includes a paradigm connecting Aptitude (ability, development and motivation), Instruction (amount and quality), and Environment (home, classroom, peers and television) as inputs to Learning (affective, behavioral and cognitive). I believe that this model is an accurate picture of a subset of variables that are precursors of productivity. My experience suggests that curriculum probably should not be ignored and left out of the model. Also, note that no variable entitled "expenditure" is included directly in the model. Yet, expenditures are represented indirectly in both Instruction and Environment. Walberg recognized this role in the following statement, "... and expenditure levels of schools and districts, and their political and sociological organization - are less alterable in a democratic, pluralistic society; are less consistently and powerfully linked to learning; and appear to operate mainly through the nine factors in the determination of achievement." (Walberg, 1982, p. 120) What is puzzling about about this statement is that Walberg appears to be trying to stretch logic to agree with Hanushek's weak and inconsistent position, and reasons that higher expenditures follow quality instruction rather than higher expenditures serve as mediating factors to the purchase of quality instruction.
The triadic relationship of socio-economic status, productivity, and expenditures
Walberg appears to be interested in the triadic relationship of socio-economic status, productivity (or at least efficiency of student test performance), and expenditures. This interest is expressed in several studies and reviews authored by Walberg. In several of the studies, Walberg appears to have problems in the specification of at least two or perhaps all three of the variables of the triad. Perhaps, one of the major problems with how Walberg has set out to study these variables is his lack of control of certain key school variables. In the discussion of studies of the relationship of class size to achievement test performances nothing is said as to how many of the small classes were made up of special education students or were composed for remediation. The overlooking of these two common practices in school certainly confounds the study of class size and the inclusion of special education students confounds the measure of student performance in reading, mathematics, science or other standard school curricula criteria used to define school productivity. In his studies of district size, he permits urbanism to confound his variable. Walberg is frequently unclear as to what is being measured as a variable representing productivity. Sometimes his productivity variable is measured as percent passing. The method of measurement clearly restricts the range of the achievement construct and serves to reduce the observed correlation. At other times, Walberg uses what he refers to as an efficiency measure, which is made up of the predicted achievement score using socio-economic status in the prediction equation divided by the observed achievement score. This configuration called "efficiency" appears to more closely represent a measure of prediction error for socio-economic status. Clearly, his expenditure data include funds for transportation, lunch, special education, and similar programs which do not bear directly on instruction.
In dealing with this triadic relationship, Walberg simply fails to discuss how he has handled the shared variance problem inherent in the relationship. His regression model enters socio- economic status as the first predictor of students' test performances, size as the second prediction variable, and finally expenditures as the third predictor variable. The amount of explanation shared by socio-economic status and size and the amount of explanation shared by socio-economic status and expenditures are credited to socio-economic status solely; the amount of explanation shared by size and expenditures are then attributed to size alone. Certainly, not much explanation remains to be credited to expenditures. A different order of entry would produce markedly different results. Mayeske developed commonality analysis to address this problem, but the methodology has been subjected to some criticism. In actuality there is no effective statistical method that will unconfound shared predictive relationships. Appropriate treatment of the shared relationship is perhaps a straight-forward discussion of the irresolvability of the problem.
Reliance on Hanushek's model
Walberg depends in several literature reviews on the productivity analyses reported by Hanushek. He appears to rely on them without critical scrutiny and uses Hanushek's work as rationale for demoting the role of expenditures in his model and in further analyses. Walberg's acceptance without question of Hanushek's work raises some concern about the other studies that he uses in his argument.
Regression Analyses of New Jersey Data
The analyses performed for the New Jersey hearings (Walberg, 1989) appear to duplicate many of the faults discussed in Walberg's triad studies, and potentially contain a few new variances from standard research practice. On page 43 lines 4 and 5 of the 1989 document, Walberg's description of regression analyses is misleading. Regression analyses does not provide a method of simultaneous analysis of the predictive contribution of three variables. Order of entry attributes shared variance of two variables to the first one entered into the prediction process. Observed relations are most likely not independent; only the last variable to enter in the equation is likely to be independent.
Variable specification is again a problem as confounding other school factors such as special education, remediation processes, transportation costs, and the lunchroom expenditures have not been removed from the studies. It appears that the variable "expenditures" rather than "expenditures per student" was run in the correlations. The "Efficiencies" prediction is still used as a dependent variable and the truncated measurement of productivity (such as percent passing) is used in several of the achievement measures.
Order of entry and the problem of shared variance is a problem in these analyses. One wonders what kind of discussion would ensue if an appropriate expenditure variable was entered first in the prediction of test performances that had not been truncated or obscured by the use of ratios.
Demonstration of the lack of validity of the production function methodology A Suggested Alternative Approach
The production function method must be altered in three ways to make it policy relevant. To identify the effects of large versus small expenditures, the research task appears to demand a comparison rather than an association. Rather than asking if there is a consistent relationship across the whole population, it is better to ask for what kinds of districts do such effects exist within a state. A third change is to create a discrepancy in expenditures large enough to reveal differences in the purchasing power of educational services.
Finding Homogeneous Sets of Districts. Districts within a state differ on many dimensions. Furthermore, the dimensions that are most discriminating in one state may not be so in another. By grouping districts in a particular state into classes (e.g., rich vs. poor) according to the key dimension for that state (e.g., wealth), homogeneous subgroups can be obtained for further analysis. Size of districts, rural/urban, and number of exceptional children (either gifted or at risk) are variables whose subdivisions are likely to establish subsets of homogeneous groups. In states like Montana and Missouri, size is the dimension which creates homogeneous subgroups. In Alabama rural/urban is the variable that yields homogeneous subgroups. In Ohio, income levels or socio-economic status creates homogeneous subgroups. In some cases, there are one or two large, poor, urban districts which have to be considered as outliers so as to establish homogeneous subgroups.
Creating the Disparity in Funding. In 1970 a study conducted for the Office of Panning and Program Evaluation/Bureau of Elementary and Secondary Education/United States Office of Education found that approximately 300 dollars was needed to improve elementary school children's reading scores one month over the course of a year. A proration of this finding suggests that a disparity of 600 to 700 dollars is needed between districts compared. Within each homogeneous subgroup, the districts are ordered by instructional expenditures and then divided into two groups where one is formed by the upper 30% and the other is defined by the lower 30%. The two groups are equal with regard to sample size and differences between the groups on expenditures should exceed 600 dollars. Given the satisfaction of these conditions differences in achievement scores should be apparent, if they exist.
Using t-Tests to Investigate the Results of the Disparity. Given the creation of the two groups (upper and lower 30%) from a single homogeneous subgroup and the verification of a 600 dollars disparity, the independent t-test with pooled variance can be used to discover achievement test differences. If more than three homogeneous subgroups are to be analyzed, methods to deal with the inflation of the confidence level should be considered. Such methods include the recalculation of the confidence levels compensating for the use of several t-tests (the Bonferonni procedure) or the use of the family of t-tests notion (e.g., the Tukey procedure).
The proposed model can be used to investigate either a family of dependent or independent variables or both. The use of several t-tests provides the method for including a number of dependent or output variables. The ordering of districts for the determination of the upper 30% and lower 30% with regard to the input or independent variables permits the consideration of any number of independent variables.
Application of the Alternative Approach to Two States
Data for the states of Missouri and Ohio were obtained through Education Policy Research, Incorporated which participated in the suits involving equity of the state system for funding the public schools. These data involved the per pupil expenditure data, the proxy data for socio-economic status of the attendance area of the districts, district enrollment, and achievement data which were used in the preparation of the cases by both sides in the lawsuit. The achievement data for Missouri are the Missouri Mastery Achievement Test (MMAT) prepared by the state to measure state objectives for the year 1990-91. The achievement data for Ohio are NCEs from standardized achievement tests selected by the districts for the year 1989-90. Both sets of achievement data are judged to have adequate reliability.
In Table 1 are shown the production function correlations for the achievement data for the school districts in Missouri. Note that there is only one correlation, the one for tenth grade mathematics, that is large enough to be judged statistically significantly different from zero. Since there are twenty production functions, one would conclude from such an analysis that the production function shows no relationship between instructional costs and achievement in Missouri.
Table 1: Correlations Between Expenditures per Student and Student Performance on MMAT Achievement Tests GRADE SUBJECT AREA Reading Mathematics Science Soc Studies 4th (n=509) 0.050 0.073 -0.008 -0.025 6th (n=522) -0.026 -0.044 -0.108 -0.062 8th (n=519) -0.024 -0.019 0.027 0.012 9th (n=392) -0.005 0.077 0.077 0.072 10th (n=433) 0.049 0.117* 0.027 0.065 * denotes p<0.05 In Table 2 are shown the t-tests resulting from a partial application of the alternative approach which creates the funding threshold not included in the production function analyses for the twenty distributions of achievement data. The creation of the threshold results in two of the distributions showing significant positive relationships using the Bonferonni procedure. Ten of the twenty t-tests reach significant levels for single applications for the t-test. Given the family-wise results, it remains risky to conclude a positive relationship between achievement and per pupil expenditures at this time.
Table 2: Contrasts of High and Low Funded Districts on the Missouri MMAT for 1990-1991 Per Pupil Expenditure Averages: Upper 30% = $2056.79 Lower 30% = $1248.48 Subject Group Mean Std Dev n t Sign. 4th Grade
ReadingHigh
Low316.32
309.3625.64
24.07154
1542.441 ns 4th Grade
MathHigh
Low313.47
306.8733.75
25.49154
1541.934 ns 4th Grade
ScienceHigh
Low330.33
329.4841.01
32.05154
1540.367 ns 4th Grade
Soc.StudiesHigh
Low336.18
334.1436.79
34.04154
1540.529 ns 6th Grade
ReadingHigh
Low309.83
307.4727.54
23.56158
1580.737 ns 6th Grade
MathHigh
Low360.12
358.8242.67
34.39158
1580.298 ns 6th Grade
ScienceHigh
Low340.02
353.2741.81
38.28158
158-0.942 ns 6th Grade
Soc.StudiesHigh
Low323.94
323.3132.54
31.19158
1580.175 ns 8th Grade
ReadingHigh
Low325.98
322.9724.26
24.30156
1561.088 ns 8th Grade
MathHigh
Low341.92
336.1940.07
36.16156
1561.318 ns 8th Grade
ScienceHigh
Low365.41
360.9644.25
37.45156
1560.955 ns 8th Grade
Soc.StudiesHigh
Low326.32
321.0827.26
24.84156
1561.764 ns 9th Grade
ReadingHigh
Low294.13
287.6322.59
18.94131
1312.198 ns 9th Grade
MathHigh
Low312.61
299.6435.81
23.17131
1312.961 0.05 9th Grade
ScienceHigh
Low367.99
357.4137.51
31.98131
1312.143 ns 9th Grade
Soc.StudiesHigh
Low316.89
309.4924.85
20.34131
1312.295 ns 10th Grade
ReadingHigh
Low311.82
306.8924.52
18.36144
1441.693 ns 10th Grade
MathHigh
Low339.80
330.5232.30
20.31144
1442.525 0.10 10th Grade
ScienceHigh
Low347.97
343.7929.23
23.54144
1441.180 ns 10th Grade
Soc.StudiesHigh
Low309.53
306.0324.59
18.47144
1441.196 ns Application of the full alternative model involves not only the creation of the threshold, but also the elimination of outliers or of extreme scores which may have an unusual relationship between instructional expenditures and achievement. Such scores come from economies of scale effects in small districts, the concentrating of at-risk students, or the amassing of more than essential wealth. In order to complete the comparison, production function analyses were performed on the twenty distributions after the outliers had been eliminated. In the Table 3 are reported the results of these production function analyses. Significant non-zero correlations are found for four of the twenty coefficients: fourth grade reading, eighth grade reading and social studies, and ninth grade mathematics. The significant correlation for tenth grade mathematics was lost in the elimination of the outliers. However, only three of the correlations in Table 3 are negative, while nine are negative in Table 1. Still these four non-zero correlations make concluding a relationship between instructional expenditures and achievement too risky. The outliers removed were school districts with enrollments less than 300 and enrollments of greater than 25,000 students.
Table 3: Correlations Between Expenditures per Student and Student Performance on MMAT Achievement Tests with Outliers Removed. GRADE SUBJECT AREA Reading Mathematics Science Soc Studies 4th (n=329) 0.142** 0.107 0.019 0.096 6th (n=329) 0.048 -0.026 -0.052 0.015 8th (n=329) 0.132* 0.066 0.078 0.121* 9th (n=268) 0.063 0.146** 0.055 0.080 10th (n=318) 0.023 0.052 -0.029 0.023 * denotes p<0.05
** denotes p<0.01In Table 4 are reported the results of the full application of the alternative model. Note that the threshold is about $620 dollars and that the number of districts has now been reduced to 331. Eight of the twenty t-tests are significant for Bonferonni calculated alpha levels. Fourteen of the twenty t-tests reach the level of significance for unadjusted t-test probabilities. These results permits the conclusion of a positive relationship between expenditures per student and achievement on the MMAT. Missouri school districts can be characterized by a large number of districts with fewer than 300 student enrollment, a few extremely large districts which have a majority of high risk students and high expenditures, and a handful of rich districts that have extremely high expenditures.
Table 4: Contrasts of High and Low Funded Districts with Outliers Removed on Missouri MMAT for 1990-1991 Per Pupil Expenditure Averages Upper 30% = $1906.43 Lower 30% = $1284.22 Subject Group Mean Std Dev n t Sign. 4th Grade
ReadingHigh
Low321.17
310.4423.21
19.2099
993.451 0.01 4th Grade
MathHigh
Low317.13
307.0624.14
21.2699
993.012 0.05 4th Grade
ScienceHigh
Low336.67
332.8928.91
27.1599
990.914 ns 4th Grade
Soc.StudiesHigh
Low345.71
334.7827.57
26.0599
992.764 0.05 6th Grade
ReadingHigh
Low312.33
306.9820.66
18.1699
991.921 ns 6th Grade
MathHigh
Low363.47
358.7034.60
30.5499
991.020 ns 6th Grade
ScienceHigh
Low358.25
354.4636.77
34.0199
990.748 ns 6th Grade
Soc.StudiesHigh
Low327.97
322.6226.53
24.1399
991.472 ns 8th Grade
ReadingHigh
Low327.68
319.1316.69
17.6799
993.280 0.05 8th Grade
MathHigh
Low344.05
333.2034.44
30.1899
992.338 ns 8th Grade
ScienceHigh
Low371.37
359.6634.25
32.6499
992.544 0.10 8th Grade
Soc.StudiesHigh
Low329.59
319.2421.13
21.1399
993.419 0.01 9th Grade
ReadingHigh
Low293.30
288.0117.25
18.2181
812.848 0.05 9th Grade
MathHigh
Low311.95
300.5826.83
23.3281
812.808 0.05 9th Grade
ScienceHigh
Low366.42
357.0128.53
29.6081
812.014 ns 9th Grade
Soc.StudiesHigh
Low316.33
309.2419.74
19.1581
812.275 ns 10th Grade
ReadingHigh
Low311.73
308.5517.13
17.0993
931.263 ns 10th Grade
MathHigh
Low338.46
332.0321.65
19.8793
932.089 ns 10th Grade
ScienceHigh
Low347.79
345.4019.55
23.3493
930.755 ns 10th Grade
Soc.StudiesHigh
Low308.67
306.8517.31
18.5393
930.689 ns A similar sequence of analyses has been performed for data obtained for the state of Ohio. Production function analyses were performed on the number of school districts in the state and contrasted with the results of t-tests performed after a threshold had been created. This sequence comparing production functions with t-test contrasts was then repeated after outliers were removed.
In Table 5 are reported the nine production function analyses for Ohio. None of the nine achievement areas shows significantly non-zero correlations. In Table 6 are reported the t-test contrasts for the same nine Ohio distributions. None of the nine contrasts reach the Bonferonni significance levels.
Table 5: Correlations Between Instructional Expenditures and Selected Variables in Ohio Database Selected Variables District Instructional
Expenditures per Student4th Grade Reading -0.012 n = 608 4th Grade Language Arts -0.065 n = 608 4th Grade Mathematics -0. 024 n = 608 6th Grade Reading 0.008 n = 608 6th Grade Language Arts -0.019 n = 608 6th Grade Mathematics -0.006 n = 608 8th Grade Reading 0.004 n = 608 8th Grade Language Arts -0.028 n = 608 8th Grade Mathematics -0.002 n = 608
Table 6: Contrasts (t-tests) of School District Expenditures on Achievement Scores Per Pupil Expenditure Averages
Upper 30% = $2442.62 Lower 30% = $1578.16
n = 183 ... n = 183Achievement Area Group Mean St Dev t Sign. 4th Reading high
low54.95
54.275.93
5.451.133 ns 6th Reading high
low54.27
53.345.74
5.901.514 ns 8th Reading high
low54.79
54.075. 41
5. 361. 264 ns 4th Language high
low53.82
53.186.79
6.290.041 ns 6th Language high
low53.05
52.366. 21
6. 301.057 ns 8th Language high
low53.73
53.306. 25
6. 230.648 ns 4th Math high
low52.73
51.887.50
7.411.081 ns 6th Math high
low53.46
52.157.03
7.291.740 ns 8th Math high
low53. 70
52. 437. 31
6.891.712 ns In Tables 7 and 8 are reported the same analyses after the outliers have been removed from the achievement distributions. In Table 7 are reported the production functions.
Table 7: Correlations Between Instructional Expenditures and Selected Achievement Variables in Ohio Database with Outliers Removed 1989-1990. Selected Variables District Instructional
Expenditures per Student4th Grade Reading 0.053 n = 458 4th Grade Language Arts 0.034 n = 458 4th Grade Math 0.071 n = 458 6th Grade Reading 0.055 n = 458 6th Grade Language Arts 0.037 n = 458 6th Grade Math 0. 074 n = 458 8th Grade Reading 0. 072 n = 458 8th Grade Language Arts 0. 024 n = 458 8th Grade Math 0.091* n = 458 * denotes p<0.05 The nine production functions reported in Table 7 include only one non-zero correlation, for eighth grade mathematics. From these analyses one is led to conclude no relationship between instructional expenditures and achievement in Ohio. In Table 8 five of the nine t-test contrasts show positive relationships leading to the conclusion that instructional expenditures are related to achievement, demonstrating the inefficiency and inappropriateness of production function analyses.
Table 8: Contrasts (t-tests) of School District Expenditures on Achievement Scores with Outliers Removed Ohio Database, 1989-90. Per Pupil Expenditure Averages
Upper 30% = $2187.07 Lower 30% = $1544.76
n = 106 ... n = 106Achievement Area Group Mean St Dev t Sign. 4th Reading high
low55.09
53.646.23
5.441.714 ns 6th Reading high
low54.42
52.595.91
5.802.253 0.10 8th Reading high
low55.26
53.255.24
5.522.703 0.05 4th Language high
low53.84
52.896.91
6.271.042 ns 6th Language high
low53.23
51.646.29
6.421.805 ns 8th Language high
low53.94
52.566.17
6.331.603 ns 4th Math high
low53. 44
51.087.85
7.212.258 0.10 6th Math high
low53.94
51.157.36
7.262.759 0.05 8th Math high
low54.00
51.497.60
7.182.454 0.10 Conclusion Production function analyses have been used to assist policy deliberations concerning educational funding equity. These analyses are based on correlational methods which can be misleading in the investigation the relationship between student achievement and instructional expenditures. The correlation process fails to create a threshold of dollars needed to demonstrate differences in achievement. An alternate method has been developed for the investigation of the relationship. This method is based on creating homogeneous subgroups of districts which are then ordered by expenditures per student. The achievement mean for the group created by the highest funded 30% of the districts is compared to the mean for the group created by the lowest funded 30% of the districts using a t-test. This method was used to demonstrate relationships missed by production function analyses in two states.
References Baker, Keith, "Yes, Throw Money at Schools." Phi Delta Kappan, April, 1991, 72(8), page 4-6.
Bridge, G.R., C.M. Judd, and P.R. Moock, The determinants of educational outcomes: The impact of families, peers, teachers, and schools, Cambridge, Mass., Ballinger Publishers, 1979.
Burstein, Leigh, "Issues in the Aggregation of Data," in Berliner, David, (ed), Review of research in education, Vol. 8, Washington, D. C., American Education Research Association, 1980, pp 158-63.
Coleman et al, Equality of educational opportunity, Washington, D. C., Government Printing Office, 1966.
Glass, G. V. and M. L. Smith, "Meta-analysis of Research on Class Size and Achievement," Education Evaluation and Policy Analysis, 1979, I(1), pp. 2-16.
Hanushek, Eric A., "Conceptual and Empirical Issues in the Estimation of Educational Production Functions," The Journal of Human Resources, Summer, 1979, 14(3), pp. 351-88.
Hanushek, Eric A., "Throwing Money at Schools," Journal of Policy Analysis and Management, Fall, 1981, I(1), pp. 19-41.
Hanushek, Eric A., "The Economics of Schooling: Production and Efficiency in Public Schools," Journal of Economic Literature, September, 1986, XXIV, pp. 1141-1177.
Hanushek, Eric A., "The Impact of Differential Expenditures on School Performance," Educational Researcher, May, 1989, 18(4), pp. 45-51.
Hanushek, Eric A., "When School Finance "Reform" May Not Be Good Policy," Harvard Journal on Legislation, Summer, 1991, 28(2), pp. 423-456.
Hughes, Mary F., "Review of the Literature: Education Production- Function Studies and Eric A. Hanushek," fugitive document, West Virginia Education Fund, Charleston, West Virginia, 1992.
Monk, David H., "Education Productivity Research: An Update and Assessment of its Role in Education Finance Reform." Educational Evaluation and Policy Analysis, Winter, 1992, 14(4), pp 307-332.
Murname, Richard, Impact of School Resources on the Learning of Inner City Children. Ballinger, Cambridge, MA, 1975.
Pedhazur, Ezekiel, Multiple Regression in Behavioral Research. Holt, Rinehart and Winston, New York, 1982.
Raymond, Richard, "Determinants of the Quality of Primary and Secondary Public Education in West Virginia." Journal of Human Resources, Fall, 1968, 3(4), pp. 450-470.
Schaeffer, Richard L., W. Mendenhall, and L. Ott. Elementary Survey Sampling. Duxbury Press, Boston, MA, 1986.
Spencer, Bruce D. and David E. Wiley, "The Sense and Nonsense of School Effectiveness," Journal of Policy Analysis and Management. Fall, 1981, pp. 43-52.
Walberg, H. J., and W. J. Fowler, Jr., "Expenditure and Size Efficiencies of Public School Districts." apparantly prepared for the New Jersey hearing, ERIC ED 274 471, RC 015 786, 1989.
Walberg, H. J., "Educational Productivity: Theory, Evidence, and Prospects" Australian Journel of Education, 26(2), 1982, pp. 115-122.
Walberg, H. J., D. L. Harnisch, and S. L. Tsai, "Elementary School Mathematics Productivity in Twelve Countries." British Educational Research Journal, 12(3), 1986, pp. 237-248.
Walberg, H. J., "Improving the Productivity of America's Schools." Educational Leadership, May, 1984, 41(8), pp. 19-27.
Walberg, H. J., and K. Marjoribanks, "Family Environment and Cognitive Development: Twelve Analytic Models." Review of Educational Research, 46(4), 1976, pp. 527-550.
Walberg, H. J., and T. Weinstein, "The Production of Achievement and Attitude in High School Social Studies." Journal of Educational Research, 75(5), 1982, pp. 285-292.
Copyright 1993 by the Education Policy Analysis Archives
EPAA can be accessed either by visiting one of its several archived forms or by subscribing to the LISTSERV known as EPAA at LISTSERV@asu.edu. (To subscribe, send an email letter to LISTSERV@asu.edu whose sole contents are SUB EPAA your-name.) As articles are published by the Archives, they are sent immediately to the EPAA subscribers and simultaneously archived in three forms. Articles are archived on EPAA as individual files under the name of the author and the Volume and article number. For example, the article by Stephen Kemmis in Volume 1, Number 1 of the Archives can be retrieved by sending an e-mail letter to LISTSERV@asu.edu and making the single line in the letter read GET KEMMIS V1N1 F=MAIL. For a table of contents of the entire ARCHIVES, send the following e-mail message to LISTSERV@asu.edu: INDEX EPAA F=MAIL, that is, send an e-mail letter and make its single line read INDEX EPAA F=MAIL.The World Wide Web address for the Education Policy Analysis Archives is http://olam.ed.asu.edu/epaa
Education Policy Analysis Archives are "gophered" at olam.ed.asu.edu
To receive a publication guide for submitting articles, see the EPAA World Wide Web site or send an e-mail letter to LISTSERV@asu.edu and include the single line GET EPAA PUBGUIDE F=MAIL. It will be sent to you by return e-mail. General questions about appropriateness of topics or particular articles may be addressed to the Editor, Gene V Glass, Glass@asu.edu or reach him at College of Education, Arizona State University, Tempe, AZ 85287-2411. (602-965-2692)
Editorial Board
John Covaleskie
Syracuse UniversityAndrew Coulson Alan Davis
University of Colorado--DenverMark E. Fetler
mfetler@ctc.ca.govThomas F. Green
Syracuse University
tfgreen@mailbox.syr.eduAlison I. Griffith
agriffith@edu.yorku.caArlen Gullickson
gullickson@gw.wmich.eduErnest R. House
ernie.house@colorado.eduAimee Howley
ess016@marshall.wvnet.eduCraig B. Howley
u56e3@wvnvm.bitnetWilliam Hunter
hunter@acs.ucalgary.caRichard M. Jaeger
rmjaeger@iris.uncg.eduBenjamin Levin
levin@ccu.umanitoba.caThomas Mauhs- Pugh
thomas.mauhs-pugh@dartmouth.eduDewayne Matthews
dm@wiche.eduMary P. McKeown
iadmpm@asuvm.inre.asu.eduLes McLean
lmclean@oise.on.caSusan Bobbitt Nolen
sunolen@u.washington.eduAnne L. Pemberton
apembert@pen.k12.va.usHugh G. Petrie
prohugh@ubvms.cc.buffalo.eduRichard C. Richardson
richard.richardson@asu.eduAnthony G. Rud Jr.
rud@purdue.eduDennis Sayers
dmsayers@ucdavis.eduJay Scribner
jayscrib@tenet.eduRobert Stonehill
rstonehi@inet.ed.govRobert T. Stout
aorxs@asuvm.inre.asu.edu