logo.gif (2221 bytes)

infoworks.gif (1859 bytes)measure2.gif (1737 bytes)


children2.gif (10850 bytes)

toc_6blue.gif (4991 bytes)

Technical Brief on the 1999 Statistical Model


Simple Regression

diagram a.gif (46543 bytes)

Diagram "A"

The simplest kind of visual description of a relationship between two variables is a straight line. Imagine, if you will, plotting (scatter plotting) a whole set of spending per student data and then drawing a straight line that comes as close as possible to all the points in the scatter plot. (See diagram "A".)4  We call this procedure "regression," the resulting line the "regression line" and the formula that describes the line the "regression equation." The word "regression" originated from Francis Galton's work in the late 1800s when he realized that for many relationships there was "regression" (reversion) toward what he termed "mediocrity." We now express this frequently seen statistical phenomenon as "regression toward the mean." Human height data, for example, demonstrates that if two parents both have above average heights, their children are more likely than not to have average or below average heights.

diagram b.gif (10173 bytes)

Diagram "B"

Imagine if you will, plotting achievement scores for grade eight students on a particular achievement test. The vertical axis can be achievement scores recorded as a number. The horizontal axis can be the education level of the child's mother as reported by the child, also expressed as numbers assigned to each level. (Of course, this axis could be any other variable for which you have consistent data that you believe to be reliable). The question is, then, where is the best straight line that relates these two variables (achievement score and mother's education level) to each other? You could take a ruler and try to fit a line through the scatter plot. However, different people would draw different lines, based on their best visual guess as to which line is closest to most of the points. To find the one line out of the infinite possibilities that is as close as mathematically possible to all of the points, statisticians commonly use a procedure called the "least squares line." (See diagram "B".)5 To determine the least squares line, priority is given to the vertical axis (in this case achievement scores) to calculate how close the points fall to the line. Those distances are then squared and added up for all of the points in the sample. For the least squares line, that sum is smaller than it would be for any other line. The vertical distances are chosen because the equation is often used to predict that variable when the one on the horizontal axis (mother's education level) is known.

All straight lines can be expressed by this formula for the least squares line. The standard mathematical convention is to write an equation for the line relating the two variables as: y = a + bx, where y represents the vertical axis (achievement scores in our example); x represents the horizontal axis (education level of the mother in this example), and a and b are replaced by numbers, i.e., two unique constants derived from this particular regression line. The number represented by a is called the intercept and the number represented by b is called the slope. The intercept describes one particular point on the line that falls where the line crosses the vertical axis, when the horizontal axis is at zero. A positive slope describes how much of an increase there is for the variable on the vertical axis (here achievement scores) when the other variable, on the horizontal axis (education level of the mother), increases by one unit. A negative slope indicates a decrease in one variable as the other one increases. Thus, for example, as a school's population becomes poorer in the overall data set of all RI schools (e.g., an increase in the numbers of students eligible for free and reduced lunch), achievement tends to decline (decrease in scores).

« back

next »

 


Return to Technical Brief Home Page || Return to Information Works Home Page