logo.gif (2221 bytes)

infoworks.gif (1859 bytes)measure2.gif (1737 bytes)


children2.gif (10850 bytes)

toc_6blue.gif (4991 bytes)

Technical Brief on the 1999 Statistical Model


Multiple Regression and the RI Model

Up to this point, we have discussed the computer modeling of student achievement in terms of the relationships between only two variables. Research shows that student achievement results are the result of a whole variety of factors ranging from things which are clearly definable and collectable -- like eligibility for free and reduced lunch and receiving certain kinds of special services -- to intangible but important factors such as an individual student's motivation to perform well on a state test or the general climate of a school. (For example, studies show that most students perform better on state tests when there are local consequences attached to their performance.)

Statisticians over the years have built increasingly sophisticated models to relate various factors simultaneously. One of the most powerful of these methods is Hierarchical Linear Modeling (HLM). When applied to schools, HLM would consider several characteristics of a school as well as several characteristics for individual students. Researchers at URI attempted to use HLM but ultimately rejected it as an approach because our sets of schools that look similar to each other are too small to yield reliable results. HLM can predict scores for individual students with certain characteristics, but Information Works! has no intention of focusing on individual students (teachers, or administrators). The school as a whole is the important unit of analysis and improvement for state accountability purposes.

Instead, the URI research team used hierarchical regression analysis, which is a specialized form of multivariate analysis.6 Multivariate, like its name implies, looks at how multiple variables acting separately or combined in various ways, impact on the variable of interest (in this case student achievement on selected state tests). Broadly speaking, multiple regression analysis is a method of analyzing the variability of a dependent variable by using information available on a set of independent variables. Unfortunately, no drawing can illustrate the relationship the way the simple regression model was illustrated earlier in this paper.

Five independent variables, known in advance likely to relate statistically to student achievement on state tests, are collected annually from Rhode Island schools and students. These variables are:

» The percentage of students within a school eligible for free or reduced lunch
» The percentage of minority students (i.e., non-white) within a school
» The highest education level for the child's mother as reported by the student or
     the highest education level achieved by the most educated parent (Which
     question was asked depended on the test. Next year it will be student responses
      to the latter question that will be collected on all state tests.)
» The percentage of students in a school enrolled in LEP or bilingual education
     programs
» The percentage of students within a school receiving services under special
     education law

Because of the small number of schools in Rhode Island, comparisons were made among all schools in the state rather than just among groups of schools with similar demographic characteristics. Additionally in 1999, we used two years of test data (96-97 and 97-98) to add greater precision to the model. These two years of test data were treated independently rather than being combined into a mean average. Essentially this doubled the number of students within the sample. The correlation of each individual variable with student achievement is shown in Table 1. Overall, there is a strong relationship between these five student characteristics and academic performance across all grade levels of the state assessments.

The first three variables listed above (free and reduced lunch status, minority status, and parent education level) were found to have "multicollinearity." In other words, when one variable shifted, the others also shifted in similar manner. The correlations between the percentage of students eligible for free and reduced lunch and percentage of non-white students in a school, for example, was greater than .9 across all three tested grade levels. (Recall the discussion above about correlations approaching 1.00). One of the remedies for multicollinearity is to group variables in blocks. (We elected not to use factor analysis since it does not lead to a numerical result that can be easily understood by non-statisticians.) Therefore, equally weighted averages of the variables of eligibility for free and reduced lunch, mother's (parent's) level of education, and minority status were used as a single block. For mother's (parent's) education, the percentage of mothers (parents) whose education was reported as beyond high school was recoded to run in the same direction as the other two variables, which is to say that the lower the education level of groups of mothers (parents), the lower the educational achievement of corresponding groups of students. The combined equally weighted variable that results can be thought of as a poverty index (low SES) which is more stable across grade levels than any of the three variables viewed individually. As the index number increases in size, the more poor students there are within the school.

A second block was created using the percentage of students receiving bilingual or LEP services and the percentage of students within a school receiving special education services These variables were each introduced separately into the model but after the application of the SES variable described above. The researchers are aware that some students within a school receive both types of services and would be counted in this model twice. Students with multiple learning needs require more support, which in turn factors in the cumulative effects of multiple challenges.

Due to the limitations inherent in the relatively small data sets (the small number of schools in RI and the small number of students per tested grade level in some RI schools), the researchers chose to use overall building level variable data rather than data associated solely with the grade tested. So, for example, we used eligibility for free and reduced lunch data for the entire school rather than just the grade tested. Empirically, while these data sets were highly correlated, the researchers were more interested in overall school context rather than just the context specific to particular grades, consistent with RI's focus on school (not grade-level) accountability. This signals, for example, that grade four student achievement is not only the responsibility of the teachers and administrators specific to that grade, but is also conceptually and educationally linked to learning experiences in the prior grades. Table 2 shows the basic descriptive statistics for RI schools taken as a whole with the number of schools, range (how wide the scores were), mean (arithmetic average), and standard deviation (a measure of how spread out the scores are).

The dependent variables in the model are student academic achievement in each grade level tested across different subject areas. Researchers employed a separate regression model for each subject area and grade level, specifically:

» New Standards Reference Examination in Mathematics (Grades 4, 8, and 10)
» New Standards Reference Examination in English Language Arts (Grades 4 and 8)
» RI Writing Performance Assessment (Grade 10)

For both the New Standards Reference Examinations and the Rhode Island Writing Performance Assessment, the percentages of students who achieved proficiency or above was computed and used as the dependent variables of interest. Mainly because of space limitations, the school reports show only the results of New Standards Mathematics Skills and Problem Solving, rather than all three components. Problem Solving was selected over Concepts because it represents a more complex set of skills. Regression models were based on a total of 51 schools for grade 10 (including Area Career and Technical Centers), 53 schools for grade 8, and 185 schools for grade 4.

Table 3 shows the results from the hierarchical regression analyses. Across all grade levels, the school SES variable produced significant effects in relation to student achievement. Schools with more economically disadvantaged students have lower student achievement across all subject areas and grades. Once the variation in school SES is accounted for in the model, indicators of school context related to special needs (special education services and LEP/bilingual education programs) rarely have significant effects on student achievement. It appears that significant and negative effects in relation to student achievement become stronger among schools with higher percentages of special needs students as students get older. The last column of Table 3 represents the proportion of variances (fluctuations) in student achievement scores that were "statistically" explained (rather than "causally" explained) by the independent variables in the model. The model, for example, explained almost 75% of the variance in student achievement on Mathematics Skills in Grade 10. It appears that variances explained by the model get smaller for younger students as the relationship between school SES and elementary student achievement is less prominent.

« back

next »

 


Return to Technical Brief Home Page || Return to Information Works Home Page