menu

Technical Brief on the Statistical Model
5. The Rhode Island Model4


Statisticians over the years have built increasingly sophisticated models to relate the various factors presented above. One of the most powerful of these methods is hierarchical linear modeling (HLM). When applied to schools, HLM considers characteristics of a school as well as the characteristics of individual students. Researchers at NCPE attempted to use HLM but ultimately rejected it as an approach because the sets of schools that look similar to each other are too small to yield reliable results. The NCPE research team thus used hierarchical regression analysis, which is a specialized form of multivariate analysis5

5.1 The Outcome (Dependent) Variable

The outcome variable, or the variable to predict, is the student’s academic achievement in each subject area across different grade levels. Researchers employed a separate hierarchical regression model for each subtest in each subject area (Mathematics, and English Language Arts) in each of the tested grade levels (grades 4, 8, and 10). The percentages of students who achieved proficiency or were above the proficiency level were computed and used as the outcome (dependent) variable. 

5.2 The Explanatory (Independent) Variables

Five explanatory variables known to relate statistically to student achievement on state tests are collected annually from Rhode Island schools and districts. These variables are:

  1. The percentage of students within a school eligible for free or reduced lunch

  2. The percentage of minority students (i.e., non-white) within a school 

  3. The percentage of students whose most educated parent has at least some college education 

  4. The percentage of students in a school enrolled in ESL or bilingual education programs

  5. The percentage of students within a school receiving services under special education law

Because of the small number of schools in Rhode Island, comparisons were made among all schools in the state for the given grade level rather than at a district level. Research has shown that some or all of these variables may be highly correlated. (See Table 2: Correlation Analysis). The researchers at NCPE have thus created a Poverty Index variable that will be used as an explanatory variable. The creation of the Poverty Index is discussed in Section 5.3.2.

5.3 A Glimpse Into the Rhode Island Data

A descriptive analysis for the Rhode Island data is presented in Table 1. Table 2 presents the correlation results between the set of explanatory variables. Table 3 presents the correlation results between the response variable and the set of explanatory variables. Table 4 presents the results of the hierarchical regression analysis. 

5.3.1 Descriptive Analysis

Table 1 shows the descriptive statistics for the seven response variables (assessment subtests) and the five explanatory variables that will be investigated in the development of the RI school model. Specifically, the information shows the minimum and maximum value for the variable (range), mean (arithmetic average), and standard deviation (a measure of dispersion in the variable). 

Analyzing the information in Table 1, we find that among the five explanatory variables the highest variability (standard deviation) exists in the percentage of minority and free/reduced lunch students enrolled in a particular grade. The lowest variability exists in the percentage of students in special education programs. When analyzing the seven response variables, it is clear that the highest variability exists in math skills for the Grade 4 and Grade 8 students. However, the highest variability exists in Writing: Conventions for Grade 10 students. In fact, the variability for Writing Conventions increases by grade. Also from Table 1, it is evident that, in general, the percentage of students that meet or exceed the standards decreases by grade level. That is, there are fewer students that meet or exceed the standards in Grade 10 than in Grade 4, as evidenced by the decreasing mean (Mean Column).

5.3.2 Creating the Poverty Index Variable

The three explanatory variables that were highly correlated were: % Eligible for Subsidized Lunch, % Minority Students, and % With at Least One Parent Who Had Some College Education (See Table 2). The variables were said to exhibit multicollinearity. In other words, when one variable shifted, the others also shifted in similar manner. The correlations between the % Eligible for Subsidized Lunch and % Minority Students, for example, was greater than 0.85 across all three tested grade levels. One of the remedies for multicollinearity is to group variables in blocks. Therefore, the researchers computed an equally weighted average of the following three variables: % Eligible for Subsidized Lunch, % Minority Students, and % Parents with at least some college education. The first two variables and the last variable point in different directions, i.e., high % free or reduced lunch and high % of minorities indicate greater poverty, whereas a high % of students whose most educated parent has some college education indicates less poverty. Therefore, one variable was reversed when computing the combined Poverty Index. A low value within the Poverty Index indicates greater poverty in a school and a high value indicates lower poverty. 

5.3.3 Correlation Analysis

The Statistical Model presented shows a very high correlation between achievement on selected state tests and most of the explanatory variables. This is presented in Table 3. The only exception is the variable that measures the percentage of students in special education programs. For this variable, most of the correlations are not significant at the 95% confidence level. 

5.3.4 Hierarchical Regression Analysis

After careful study of the correlation tables, a hierarchical regression analysis was performed. The results are presented in Table 4. The first variable to be entered in the model is Poverty Index. We then added in the variable that measured the percentage of students in LEP programs. The variable that measures the percentage of students in the special education programs was entered next into the regression. (For a quick overview on Hierarchical Regression see Appendix A7.3). In the Rhode Island Statistical Model, we used the Poverty Index first because it accounts for the most variation in student achievement among the three variables. Research with several Hierarchical Models suggested that the next most important variable was % Students in LEP. The residual variance was picked up by % Students in Special Education programs. The researchers are aware that some students within a school receive both types of services and would be “counted” in this model twice. Students with multiple learning needs require more support, which in turn factors into the cumulative effects of multiple challenges.

Table 4, presents the results from the hierarchical regression. Across all grade levels, the Poverty Index variable explained most of the variation in student achievement. Schools with more economically disadvantaged students have lower student achievement across all subject areas and grades. The residual variation (variation not explained by the Poverty Index) is then explained in part by students in LEP programs and students receiving special education services. These two variables rarely have significant effects on student achievement as indicated in Table 4. It appears that as students get older, significant negative effects are felt in the student achievement of special needs students. The last column of Table 4, represents the proportion of variance (r-square) in student achievement scores that were “statistically” explained by the set of explanatory variables. The model, for example, explains 60% of the variance in Reading: Basic Understanding for Grade 4, 65% for Grade 8, and 51.6% for Grade 10. 

 

For further information call the Rhode Island Department of Education  
at 401-222-4600 x2231.
Information Works!  is produced in collaboration with the National Center on Public Education & Social Policy,
Robert D. Felner, Ph.D., Director.