WARNING! Using only four data points keeps a pedagogical example simple, while demonstrating the principles that apply just the same with hundreds of data points. Be aware, however, that it is almost always a bad idea to fit a straight line to such a small amount of data.
Data | Line 1 | Line 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
Actual data almost never lie exactly on a straight line, but we sometimes want to approximate them with a straight line that fits as well as possible. The most commonly used fitting technique is called least squares, and evaluates how well a line fits the data by calculating the sum of the squares of its prediction errors. We will illustrate that with two different straight lines, Y=1.2+.3X, and Y=.4+.6X.
The table has three columns for each of the two lines: a column of predicted values of Y, obtained by plugging the observed value of X into the equation; a column of prediction errors, calculated as (Observed Y - Predicted Y); and a column of squared prediction errors, whose sum is the overall index of how poorly the straight line fits the data. We would like the sum of squared errors to be as small as possible, so the 1.26 is preferable to 1.44, and Line 1 fits better, in the least squares sense, than Line 2.
Still there might be some other lines that fit better than either of those we tried. The computational algorithm used in regression programs (such as stata) systematically determines the coefficients of the very best fitting straight line, so we can concentrate on other aspects of the research problem rather than the computational chores.