PURPOSE: to show geometrically how the estimator formulas for the slope and intercept of an ordinary least squares regression are derived.
This actual numerical example is drawn from Problem Set #3 from Fall 1998. In that homework set, students were asked to work through the algebra for a regression through the points (X,Y) = (1,6), (2,5), (3,1) and (4,0). Virtual Handout #5 gives the OLS formulas for the slope and intercept of the best-fitting line. These parameters are found by minimizing the sum of squared errors, where the errors are measured vertically from the regression line that is chosen.
To review, the "fitted" values of Y are given by b1 + b2*Xi. The errors are measured as the actual values of Yi minus the fitted values of Y at each Xi. b1 and b2 are chosen so as to minimize the sum of the squares of these error terms, with the sum taken over all observations. If we use the fully written-out version of this objective function, it will read:
(Y1 - b1 - b2*X1)2 + (Y2 - b1 - b2*X2)2 + (Y3 - b1 - b2*X3)2 + (Y4 - b1 - b2*X4)2If we substitute into this long version of the formula the actual individual values of Xi and Yi for each of the four observations, we will get:
(6 - b1 - b2*1)2 + (5 - b1 - b2*2)2 + (1 - b1 - b2*3)2 + (0 - b1 - b2*4)2Now, we are going to use a Java Applet called the SurfacePlotter, by Yanto Suryono, to illustrate the shape of this objective function. This is much easier than learning specialized software that would allow us to look at the shape of this function. Note that the function is defined over values of two parameters, b1 and b2. We want to find the values of b1 and b2 that minimize this sum of squared errors.
The SurfacePlotter only knows how to use three labels for its axes: x and y for the floor of the function, and z for the height. This means we are going to have to work within this notation constraint. IMPORTANT: we will let the x in SurfacePlotter stand for b1 and the y in SurfacePlotter stand for b2. The z in SurfacePlotter will be the value of the function we are interested in, namely the sum of squared errors. Translating into SurfacePlotter notation, then, we want to look at the function:
(6 - x - y*1)2 + (5 - x - y*2)2 + (1 - x - y*3)2 + (0 - x - y*4)2
Since most students are not yet familiar with using the SurfacePlotter, I include below a screen shot of one freeze-frame of the rotating surface. The 3D shape is like a deep, sideways-squashed cereal bowl. This is a "quadratic surface" defined over b1 and b2, here denoted by x and y. The pair of values for b1 and b2 that jointly minimize the sum of squared errors depicted here is the pair that corresponds to the lowest point on the cereal bowl.
To get a better idea of the x,y coordinates of the lowest point in this cereal bowl, we can take advantage of the ability of SurfacePlotter to give us a contour plot--a two-dimensional graph that shows the level curves of this surface. The color gradient key on the side of this plot helps us figure out where the lowest values are (the blue ones). You can eyeball this plot to determine whether the values you find for the OLS parameter estimates for these particular data happen to lie at the lowest point of the quadratic surface represented by the sum of squared deviations.
Just to help you verify, I include the basic SHAZAM output for this regression. Does everything look consistent? Imagine you are trying to explain this to someone else. Can you do it?
If you want to experiment with the SurfacePlotter yourself, you can check the Lab listings for a guided tour of the Applet's capabilities. You will find a link to a local version of the program there. Note that the syntax for exponents in SurfacePlotter is ^2. You should copy the following function into the space for "function z1" in the SurfacePlotter:
(6 - x - y*1)^2 + (5 - x - y*2)^2 + (1 - x - y*3)^2 + (0 - x - y*4)^2If you had a different data set, with more than four observations, you would have additional terms in the objective function, but still only the two unknown parameter values to be found by minimizing this objective function over all possible values of x (i.e. b1) and y (i.e. b2).