UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Computing Lab Session #5: Multicollinearity


Goals of this lab:
  1. Explore a multivariate relationship for evidence of multicollinearity among the regressors
  2. See what happens when SHAZAM encounters perfect multicollinearity
  3. Compare statistical significance of individual slope coefficients with overall significance of a regression
  4. Reiterate the relationship between multicollinearity and potential omitted variables bias in regression coefficients
  5. See what correlation among explanatory variables does to the shape of the joint confidence ellipse for hypotheses concerning pairs of slope coefficients
  6. Use of "auxiliary regressions" to reveal the potential source of multicollinearity problems (AUXRSQR option on OLS command).
  7. When can one safely delete one member of a pair of correlated variables that are leading to insignificant t-ratios?
  8. How (and when) might additional data be used to sort out a multicollinearity problem?
  9. How might outside information about the relationships among slopes be exploited to allow statistically significant estimates of at least some slope coefficients, providing this outside information is valid in the context of the current model?
  10. Is there always a viable solution for a multicollinearity problem???
Example

The main empirical example we will use to illustrate these points is contained in the file n:wtsize.sha, which uses the data in n:wtsize.dat. These stylized data consist of twenty observations on the quantity demanded (Q) of a commodity (which we will view as portable computing devices) as a function of the unit price (P), the sales tax per unit (TAX, equal to a constant fraction of price), the weight of the device (WT), and the size of the device (SZ).


Additional Numerical Example

It is somewhat unusual to stumble upon multiple regression examples where all pairwise correlations among the explanatory variables are relatively small, but higher-order multicollinearity exists and compromises our ability to estimate statistically significant slope estimates.

To demonstrate, however, that this can happen, I have created a contrived data set with exactly these properties. Consider the data in n:multicol.dat and the program that uses them in n:multicol.sha. Read through the program, noting the comments that have been included. You will want to run the program and send output to a file in the first pass (make sure you have adjusted the read statement appropriately). Then, you may want to run the program interactively, sending the output to the screen, so you can look at the gnuplot confidence ellipses and see clearly how correlations among the variables lead to results such that (a) individually, some slope coefficients are not statistically different from zero, but (b) jointly, these slopes are significantly different from zero.


COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS
Update date: February 16, 1998
Prepared by: Trudy Ann Cameron