Technical Brief
Multiple Regression and the RI Model
Table 1, Table 2, Table 3
Multiple Regression and the RI Model
Up to this point, we have discussed the computer modeling of student
achievement in terms of the relationships between only two variables. Research shows that
student achievement results are the result of a whole variety of factors ranging from
things which are clearly definable and collectable -- like eligibility for free and
reduced lunch and receiving certain kinds of special services -- to intangible but
important factors such as an individual student's motivation to perform well on a state
test or the general climate of a school. (For example, studies show that most students
perform better on state tests when there are local consequences attached to their
performance.)
Statisticians over the years have built increasingly sophisticated models to relate
various factors simultaneously. One of the most powerful of these methods is Hierarchical
Linear Modeling (HLM). When applied to schools, HLM would consider several characteristics
of a school as well as several characteristics for individual students. Researchers at URI
attempted to use HLM but ultimately rejected it as an approach because our sets of schools
that look similar to each other are too small to yield reliable results. HLM can predict
scores for individual students with certain characteristics, but Information Works! has no
intention of focusing on individual students (teachers, or administrators). The school as
a whole is the important unit of analysis and improvement for state accountability
purposes.
Instead, the URI research team used hierarchical regression analysis, which is a
specialized form of multivariate analysis.6 Multivariate, like its name implies, looks at how multiple variables acting
separately or combined in various ways, impact on the variable of interest (in this case
student achievement on selected state tests). Broadly speaking, multiple regression
analysis is a method of analyzing the variability of a dependent variable by using
information available on a set of independent variables. Unfortunately, no drawing can
illustrate the relationship the way the simple regression model was illustrated earlier in
this paper.
Five independent variables, known in advance likely to relate statistically to student
achievement on state tests, are collected annually from Rhode Island schools and students.
These variables are:
- The percentage of students within a school eligible for free
or reduced lunch
- The percentage of minority students (i.e., non-white) within
a school
- The highest education level achieved by the most educated
parent (reported by the student test taker)
- The percentage of students in a school enrolled in LEP or
bilingual education programs
- The percentage of students within a school receiving services
under special education law
Because of the small number of schools in Rhode Island,
comparisons were made among all schools in the state rather than just among groups of
schools with similar demographic characteristics. Variations between tests from 1997-98 to
1998-99 did not allow for us to combine two years worth of test data for the 2000 model.
Instead we just used data from the 1998-99 school year. The correlation of each individual
variable with student achievement is shown in Table 1. Overall,
there is a strong relationship between these five student characteristics and academic
performance across all grade levels of the state assessments.
The first three variables listed above (free and reduced lunch status, minority status,
and parent education level) were found to have "multicollinearity." In other
words, when one variable shifted, the others also shifted in similar manner. The
correlations between the percentage of students eligible for free and reduced lunch and
percentage of non-white students in a school, for example, was greater than .9 across all
three tested grade levels. (Recall the discussion above about correlations approaching
1.00). One of the remedies for multicollinearity is to group variables in blocks. (We
elected not to use factor analysis since it does not lead to a numerical result that can
be easily understood by non-statisticians.) Therefore, equally weighted averages of the
variables of eligibility for free and reduced lunch, mother's (parent's) level of
education, and minority status were used as a single block. For mother's (parent's)
education, the percentage of mothers (parents) whose education was reported as beyond high
school was recoded to run in the same direction as the other two variables, which is to
say that the lower the education level of groups of mothers (parents), the lower the
educational achievement of corresponding groups of students. The combined equally weighted
variable that results can be thought of as a poverty index (low SES) which is more stable
across grade levels than any of the three variables viewed individually. As the index
number increases in size, the more poor students there are within the school.
A second block was created using the percentage of students receiving bilingual or LEP
services and the percentage of students within a school receiving special education
services These variables were each introduced separately into the model but after the
application of the SES variable described above. The researchers are aware that some
students within a school receive both types of services and would be counted in this model
twice. Students with multiple learning needs require more support, which in turn factors
in the cumulative effects of multiple challenges.
Due to the limitations inherent in the relatively small data sets (the small number of
schools in RI and the small number of students per tested grade level in some RI schools),
the researchers chose to use overall building level variable data rather than data
associated solely with the grade tested. So, for example, we used eligibility for free and
reduced lunch data for the entire school rather than just the grade tested. Empirically,
while these data sets were highly correlated, the researchers were more interested in
overall school context rather than just the context specific to particular grades,
consistent with RI's focus on school (not grade-level) accountability. This signals, for
example, that grade four student achievement is not only the responsibility of the
teachers and administrators specific to that grade, but is also conceptually and
educationally linked to learning experiences in the prior grades. Table
2 shows the basic descriptive statistics for RI schools taken as a whole with the
number of schools, range (how wide the scores were), mean (arithmetic average), and
standard deviation (a measure of how spread out the scores are).
The dependent variables in the model are student academic achievement in each grade level
tested across different subject areas. Researchers employed a separate regression model
for each subject area and grade level, specifically:
- New Standards Reference Examination in Mathematics: Skills
(Grades 4, 8, and 10)
- New Standards Reference Examination in Mathematics: Problem
Solving (Grades 4, 8 and 10)
- New Standards Reference Examination in English Language Arts:
Reading Analysis (Grades 4 and 8)
- New Standards Reference Examination in English Language Arts:
Writing Effectiveness (Grades 4 and 8)
- RI Writing Performance Assessment (Grade 10)
For both the New Standards Reference Examinations and the
Rhode Island Writing Performance Assessment, the percentages of students who achieved
proficiency or above was computed and used as the dependent variables of interest. Mainly
because of space limitations, the school reports show only the results of New Standards
Mathematics Skills and Problem Solving, rather than also depicting results from
Mathematics Concepts. Problem Solving was selected over Concepts because it represents a
more complex set of skills. Regression models were based on a total of 52 schools for
grade 10 (including Area Career and Technical Centers), 52 schools for grade 8, and
185-186 schools for grade 4.
Table 3 shows the results from the hierarchical regression
analyses. Across all grade levels, the school SES variable produced significant effects in
relation to student achievement. Schools with more economically disadvantaged students
have lower student achievement across all subject areas and grades. Once the variation in
school SES is accounted for in the model, indicators of school context related to special
needs (special education services and LEP/bilingual education programs) rarely have
significant effects on student achievement. The last column of Table
3 represents the proportion of variances (fluctuations) in student achievement scores
that were "statistically" explained (rather than "causally" explained)
by the independent variables in the model. The model, for example, explained almost 58% of
the variance in student achievement on Mathematics Skills in Grade 10. It appears that
variances explained by the model get smaller for younger students as the relationship
between school SES and elementary student achievement is less prominent.
Back to top ||
Return to the Information Works Home Page |