Statisticians over the years have built increasingly sophisticated models to relate the various factors presented above. One of the most powerful of these methods is hierarchical linear modeling (HLM). When applied to schools, HLM considers characteristics of a school as well as the characteristics of individual students. Researchers at NCPE attempted to use HLM but ultimately rejected it as an approach because the sets of schools that look similar to each other are too small to yield reliable results. The NCPE research team thus used hierarchical regression analysis, which is a specialized form of multivariate
5.1 The Outcome (Dependent) Variable
The outcome variable, or the variable to predict, is the student’s academic achievement in each subject area across different grade levels. Researchers employed a separate hierarchical regression model for each subtest in each subject area (Mathematics, and English Language Arts) in each of the tested grade levels (grades 4, 8, and 10). The percentages of students who achieved proficiency or were above the proficiency level were computed and used as the outcome (dependent) variable.
5.2 The Explanatory (Independent) Variables
Five explanatory variables known to relate statistically to student achievement on state tests are collected annually from Rhode Island schools and districts. These variables are:
The percentage of students within a school eligible for free or reduced lunch
The percentage of minority students (i.e., non-white) within a school
The percentage of students whose most educated parent has at least some college education
The percentage of students in a school enrolled in ESL or bilingual education programs
The percentage of students within a school receiving services under special education law
Because of the small number of schools in Rhode Island, comparisons were made among all schools in the state for the given grade level rather than at a district level. Research has shown that some or all of these variables may be highly correlated.
(See Table 2: Correlation
Analysis). The researchers at NCPE have thus created a Poverty Index variable that will be used as an explanatory variable. The creation of the Poverty Index is discussed in
5.3 A Glimpse Into the Rhode Island Data
A descriptive analysis for the Rhode Island data is presented in
Table 2 presents the correlation results between the set of explanatory variables.
Table 3 presents the correlation results between the response variable and the set of explanatory variables.
Table 4 presents the results of the hierarchical regression analysis.
5.3.1 Descriptive Analysis
Table 1 shows the descriptive statistics for the
seven response variables (assessment subtests) and the
five explanatory variables that will be investigated
in the development of the RI school model.
Specifically, the information shows the minimum and
maximum value for the variable (range), mean
(arithmetic average), and standard deviation (a
measure of dispersion in the variable).
Analyzing the information in Table 1,
we find that among the five explanatory variables the
highest variability (standard deviation) exists in the
percentage of minority and free/reduced lunch students
enrolled in a particular grade. The lowest variability
in grade 4 and 8 exists in the percentage of students
in special education programs. However, in grade 10
the lowest variability is the percentage of students
enrolled in LEP. When analyzing the seven response
variables, that the highest variability changes by
grade. The only thing consistent is that the highest
variability exists in one of the subtest for ELA. Also
from Table 1, it is evident that, in general, the
percentage of students that meet or exceed the
standards decreases by grade level. That is, there are
fewer students that meet or exceed the standards in
Grade 10 than in Grade 4, as evidenced by the
decreasing mean (Mean Column).
5.3.2 Creating the Poverty Index Variable
The three explanatory variables that were highly correlated were: % Eligible for Subsidized Lunch, % Minority Students, and % With at Least One Parent Who Had Some College Education
(See Table 2). The variables were said to exhibit
multicollinearity. In other words, when one variable shifted, the others also shifted in similar manner. The correlations between the % Eligible for Subsidized Lunch and % Minority Students, for example, was greater than 0.84 across all three tested grade levels. One of the remedies for multicollinearity is to group variables in blocks. Therefore, the researchers computed an equally weighted average of the following three variables: % Eligible for Subsidized Lunch, % Minority Students, and % Parents with at least some college education. The first two variables and the last variable point in different directions, i.e., high % free or reduced lunch and high % of minorities indicate greater poverty, whereas a high % of students whose most educated parent has some college education indicates less poverty. Therefore, one variable was reversed when computing the combined Poverty Index. A low value within the Poverty Index indicates greater poverty in a school and a high value indicates lower poverty.
5.3.3 Correlation Analysis
The Statistical Model presented shows a very high correlation between achievement on selected state tests and most of the explanatory variables. This is presented in
Table 3. The only exception is the variable that measures the percentage of students in special education programs. For this variable, most of the correlations are not significant at the 95% confidence level.
5.3.4 Hierarchical Regression Analysis
After careful study of the correlation tables, a hierarchical regression analysis was performed. The results are presented in
Table 4. The first variable to be entered in the model is Poverty Index. We then added in the variable that measured the percentage of students in LEP programs. The variable that measures the percentage of students in the special education programs was entered next into the regression. (For a quick overview on Hierarchical Regression see
Appendix A7.3). In the Rhode Island Statistical Model, we used the
Poverty Index first because it accounts for the most variation in student achievement among the three variables. Research with several Hierarchical Models suggested that the next most important variable was
% Students in LEP. The residual variance was picked up by
% Students in Special Education programs. The researchers are aware that some students within a school receive both types of services and would be “counted” in this model twice. Students with multiple learning needs require more support, which in turn factors into the cumulative effects of multiple challenges.
Table 4 presents the results from the hierarchical
regression. Across all grade levels, the Poverty Index
variable explained most of the variation in student
achievement. Schools with more economically
disadvantaged students have lower student achievement
across all subject areas and grades. The residual
variation (variation not explained by the Poverty
Index) is then explained in part by students in LEP
programs and students receiving special education
services. These two variables rarely have significant
effects on student achievement as indicated in Table
4. The last column of Table 4, represents the
proportion of variance (r-square) in student
achievement scores that were “statistically” explained
by the set of explanatory variables. The model, for
example, explains 70.2% of the variance in Reading:
Basic Understanding for Grade 4, 69.7% for Grade 8,
and 63.8% for Grade 10.