UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics
Economics 143 (Cameron) - Applied Regression
Analysis
SHAZAM code [wtsize.sha]
* SHAZAM code downloaded from UCLA Econ 143 (CAMERON) WebSite:
* HTML file called e143sh13.htm, and should have
* been downloaded as wtsize.sha
*--------------------------------------------------------------
* EXAMPLE OF ORDINARY MULTICOLLINEARITY SITUATION
* DATA IN wtsize.dat CONSIST OF TWENTY OBSERVATIONS ON EACH OF FIVE
* VARIABLES: Q (quantity demanded), P (price), TAX (sales tax paid),
* WT (weight of product) and SZ (size of product).
* THIS STYLIZED EXAMPLE IS MEANT TO BE A DEMAND FUNCTION FOR A HETEROGENEOUS
* COMMODITY, WHERE ATTRIBUTES OF THE COMMODITY (NAMELY WEIGHT AND SIZE)
* ALSO INFLUENCE QUANTITY DEMANDED.
sample 1 20
read(wtsize.dat) q p tax wt sz
* run the "kitchen sink" regression of q on everything else: explain
* what happens
ols q p tax wt sz
* if you don't know already, check the correlations among the variables
* to see if there is a perfect multicollinearity problem anywhere;
* explanatory variables should be "scattered widely across the floor"
* of the regression model, not lying too closely along some line.
stat p tax wt sz / pcor
plot p tax / gnu
plot p wt / gnu
plot p sz / gnu
plot wt sz / gnu
* leave out the variable that doesn't add any extra information;
* given that tax is always 0.08*price, it is not possible in these
* data to ascertain the separate influence of price changes without tax
* changes, or tax changes without price changes...but maybe people only
* think of the gross price of things anyway, and these would have the
* same marginal effects on demand.
* check the statistical significance of the individual slope coefficients.
* check the R-squared value...is it consistent with the t-ratios?
ols q p wt sz
* do an F-test of the joing contribution of the three explanatory variables.
test
test p=0
test wt=0
test sz=0
end
* see if individual variables have any effect (but beware of omitted
* variables bias in the estimated coefficients in this simple regression)
ols q p
ols q wt
ols q sz
* try pairs of variables (note what happens to coefficient on first
* variable as second variable is added--order irrelevant, of course).
* check the correlations between the estimated coefficients in each
* regression model---different thing than the correlations among the
* variables used in the model
ols q p wt / pcor
confid p wt / gnu lineonly
ols q p sz / pcor
confid p sz / gnu lineonly
ols q wt sz / pcor
confid wt sz / gnu lineonly
* the confidence ellipses above provide the set of plausible joint
* hypotheses about the estimated coefficients; the rectangles are the
* intersection of plausible individual hypotheses...the marginal
* confidence intervals "spread out"--their overlap. NOTE that some
* plausible joint hypotheses are rejected individually, and vice versa.
* find the probable source of the multicollinearity problem by running
* auxiliary regressions: Problems among the regressors in the main model
* are evidenced by high R-squared values in these auxiliary regressions--
* high t-ratios in these models indicate the most likely culprit variable
* subsets
* NOTE the dependent variable in each of these regressions:
ols p wt sz
ols wt p sz
ols sz p wt
* SHAZAM is so sure you might want to do this sort of exploration that a
* special option on the "main" regression does these auxiliary regressions
* for you and reports the most interesting results: the AUXRSQR -
* auxiliary R-squared option
ols q p wt sz / auxrsqr
* check the output to see which explanatory variables can be really
* accurately represented by some linear function of the others.
* since p and wt are highly correlated, these are where you have to focus
* your attention in dealing with the multicollinearity problem.
* a.) if you aren't primarily interested in dq/dp or dq/dwt, but are most
* interested in how q is affected by variations in sz, you could probably
* drop one of p or wt and leave the remaining variable in this pair to
* capture the influence of both of them on q. This would ensure that your
* estimate of dq/dsz is not contaminated by omitted variables bias.
* Check the distortion in the coefficient on sz if neither p nor wt is
* included in the regression.
ols q p sz
* in the above regression, the coefficient on p is really the combined effect
* due to changes in p and that portion of the change in wt that "moves
* similarly" to p in your sample
* b.) you could go out and collect data on demand for other variants of this
* commodity for which p and wt are not so closely correlated. If price is
* higher when weight is lower, for example, you would want to include extra
* observations for cases where the commodity is heavy but still expensive,
* or light but cheap in spite of this.
* c.) you might be able to get an approximate value for the price derivative
* from some other study, and impose this on the model. Estimate weight
* effect separately.
* d.) may have to just give up trying to get separate weight and price effects
* from this data set.
Updated: 11:25 AM 11/2/98; Prepared by: Trudy Ann Cameron; Site Index