UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Classroom Handout #15: Quadratic Forms


This example shows the simplest case of a quadratic (nonlinear) model for Y as a function of X. We use the data in n:lifecyc.dat and the program in n:lifecyc.sha.

 |_* example to illustrate quadratic functions in multiple regression

 |_sample 1 10
 |_read(lifecyc.dat) y x
 UNIT 88 IS NOW ASSIGNED TO: lifecyc.dat
    2 VARIABLES AND       10 OBSERVATIONS STARTING AT OBS       1
 
 |_genr x2=x*x
 |_* we will call y income and x age (life-cycle earnings)
 
Note that relationship is decidedly NOT linear in x.

 |_plot y x
                    *=Y
    21280.        |                      *
    20791.        |                  *       *
    20303.        |              *
    19815.        |                              *
    19326.        |
    18838.        |
    18349.        |           *
    17861.        |                                  *
    17372.        |       *
    16884.        |
    16396.        |
    15907.        |                                      *
    15419.        |
    14930.        |
    14442.        |
    13954.        |
    13465.        |   *
                   ________________________________________
 
              15.000    30.000    45.000    60.000    75.000
                                X

 |_ols y x
 
  R-SQUARE =   0.0836     R-SQUARE ADJUSTED =  -0.0310
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.59109E+07
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   2431.2
 SUM OF SQUARED ERRORS-SSE=  0.47287E+08
 MEAN OF DEPENDENT VARIABLE =   18816.
 LOG OF THE LIKELIHOOD FUNCTION = -91.0352
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.43115E+07      1.       0.43115E+07             0.729
 ERROR            0.47287E+08      8.       0.59109E+07           P-VALUE
 TOTAL            0.51598E+08      9.       0.57332E+07             0.418
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR       8 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 X          38.101      44.61      0.8541     0.418 0.289     0.2891     0.0952
 CONSTANT   17025.      2233.       7.624     0.000 0.938     0.0000     0.9048

If we imposed a linear relationship, it would appear that Y is not affected 
by the level of X, when in fact there is a very strong relationship.  It is
simply not a LINEAR relationship.  Therefore, it will not be picked up by
a LINEAR model.

 |_* look at the relationship between x-squared and x:
 
Since x-squared is a NONLINEAR function of x, there is no problem of perfect
multicollinearity, even though the same variable shows up in two places on
the right-hand side of the next regression.  Nonlinear relationships do not
necessarily produce multicollinearity among regressors.

 |_plot x2 x
 
    5368.4        |                                      *
    5052.6        |
    4736.8        |
    4421.1        |                                  *
    4105.3        |
    3789.5        |                              *
    3473.7        |
    3157.9        |
    2842.1        |                          *
    2526.3        |
    2210.5        |                      *
    1894.7        |                  *
    1578.9        |
    1263.2        |              *
    947.37        |           *
    631.58        |       *
    315.79        |   *
                   ________________________________________
 
              15.000    30.000    45.000    60.000    75.000
                                X

 |_ols y x x2 / predict=yhat coef=b
 
Note that we are saving the fitted coefficients in a three-element vector
called b.  We are also saving the fitted values of y in a vector called
yhat.

  R-SQUARE =   0.9826     R-SQUARE ADJUSTED =   0.9776
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.12822E+06
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   358.07
 SUM OF SQUARED ERRORS-SSE=  0.89751E+06
 MEAN OF DEPENDENT VARIABLE =   18816.
 LOG OF THE LIKELIHOOD FUNCTION = -71.2134
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.50701E+08      2.       0.25350E+08           197.717
 ERROR            0.89751E+06      7.       0.12822E+06           P-VALUE
 TOTAL            0.51598E+08      9.       0.57332E+07             0.000
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR       7 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 X          812.06      41.22       19.70     0.000 0.991     6.1610     2.0284
 X2        -8.2336     0.4329      -19.02     0.000-0.990    -5.9480    -1.0966
 CONSTANT   1282.6      890.6       1.440     0.193 0.478     0.0000     0.0682
 
The estimated coefficients are referred to as b:1, b:2, and b:3 respectively.

Note that now, both x and x-squared figure very strongly in explaining the
level of y.  Since the curve we plotted above looks very much like a quadratic
relationship, this strong quadratic association is not surprising.

 |_plot yhat x
 
Here is the "fitted curve" described by the yhat values as a function of x.

    21302.        |                      *
    20917.        |                  *       *
    20533.        |
    20149.        |              *
    19764.        |                              *
    19380.        |
    18996.        |
    18612.        |           *
    18227.        |                                  *
    17843.        |
    17459.        |
    17074.        |
    16690.        |       *
    16306.        |
    15921.        |                                      *
    15537.        |
    15153.        |
    14769.        |
    14384.        |
    14000.        |   *
                   ________________________________________
 
              15.000    30.000    45.000    60.000    75.000
                                X

To find the value of X that maximizes the value of fitted y, take the
derivative of the regression relationship between y and x, and set it
equal to zero.  When this slope goes to zero, we are at the maximum 
(providing, of course, the second derivative is negative, indicating
that the slope is decreasing as we move from left to right).  In this simple
model, the second derivative of y with respect to x is just two times the
coefficient on x-squared.

 |_* solve for the value of x that maximizes y:
 |_gen1 x_ymax=-b:1/(2*b:2)

 |_* plug this value of x into the equation for y to get largest fitted y:
 |_gen1 y_ymax=b:1*x_ymax + b:2*x_ymax*x_ymax + b:3

 |_print x_ymax y_ymax
     X_YMAX
    49.31375
     Y_YMAX
    21305.43

THINGS TO CHECK:  

1.  Is X_YMAX within the range of the observed data on X?  
If not, the "turning point" of the function is merely an artifact of
using the shoulder of the quadratic function to capture the degree of
curvature in the data.  We do not really know where the function
Y=f(X) goes outside the range of the observed data.

2.  The derivative of Y with respect to X in this model is b:1 + 2*b:2*X.
The interpretation of b:1 is therefore the slope of the function
when X=0 (whereas b:3, the intercept in the model, is the height of the 
function when X=0).  (2*b:2) is the amount by which the slope of the main
function changes as X increases by one unit.  If b:2 (the regression 
coefficient on the X-squared term) is negative, the function will have
a maximum; if it is positive, the function will have a minimum).

3.  If b:1 and b:2 (the coefficients on X and X-squared) are of opposite
signs, the minimum or maximum of the function will occur at a positive
value of X.  If these two coefficients have the same signs, the minimum
or maximum will occur at a negative value of X.

COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS
Updated: February 17, 1998
Prepared by: Trudy Ann Cameron