UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

SHAZAM code [wls_eff.sha]


If you are downloading this SHAZAM code for use on your own computer, select "File", then "Save As...", and save on your own diskette (a:) or your own hard drive (c:\) using the same filename e143sh25.sha.

IMPORTANT: you must then use an editor (like TED) to delete all of the HTML code from the top and the bottom of the file, leaving only the SHAZAM code. The line which reads "* SHAZAM code (e143sh25.sha) downloaded from UCLA Econ 143 (CAMERON) WebSite" should be the first line of your edited program file. Save the edited program as wls_eff.sha



* SHAZAM code downloaded from UCLA Econ 143 (CAMERON) WebSite: 
* HTML file called e143sh25.htm, and should have 
*  been downloaded as wls_eff.sha

********************** CONTENTS OF  n:wls_eff.sha  FILE *********************
* This exercise illustrates the greater efficiency of weighted least squares,
* as opposed to OLS, in estimating regression parameters when there is
* heteroscedasticity in the data.

set nocolor
* We will each create our own "population."  By construction, these data
* are heteroscedastic.  The true population error variance is approximately
* proportional to the squared value of the explanatory variable.  

popsize:300
sample 1 [popsize]
* CREATE SOME "POPULATION" DATA:
* Let non-profit organization revenues (r) be distributed uniformly between
* $50,000 and $1,500,000.
genr r=50+uni(1000)
* Let fund-raising expenditures or "development" costs (d) be given by
* a known "data-generating process" plus an error u that depends on revenue
* The standard deviation of the PRF regression error is "sig" times r.
* Thus, E(u-squared) is proportional to the square of r.
sig:.05
genr sigma=[sig]*r
genr u=nor(sigma)
*
* Create values of the "dependent variable" that have heteroscedasticity.
genr d = 5 + 0.2*r + u 
pause
* Look at the "true" population data we've created.....
* Depending on what "population" you've created, it may have more or
* less actual heteroscedasticity.  The more apparent heteroscedasticity
* your population exhibits, the more likely you will be to observe
* the outcome that WLS estimates have lesser variance than OLS estimates
pause
plot d r

pause
* write out these data each time, in case you want to look at them later
* write(develop.dat) d r
*pause

* check out the "true" population regression function; note its parameters.
ols d r
* recall that these data were created assuming an intercept of 5 and a
*  slope of 0.2.  How close to we come in the set of data you created?
* Jot down these new numbers; these are YOUR true PRF parameters.
pause

* Now, explore what happens when you draw random samples, with replacement,
* from this population of 300.

********* HERE'S WHERE YOU DEFINE SAMPLE SIZE AND # SIMULATED SAMPLES *******
* if the exercise takes too long with nsim (number of simulated samples)
*   equal to 100, try 50, or 25.  Or, make the number of simulations larger... 

* set nsamp to the number of observations in each sample..must be <300
nsamp:60

* set nsim to the number of simulated samples to "draw" from population
nsim:50

*****************************************************************************

* make enough space to hold simulated quantities that will be appended
*   in each simulated sample and its regression:

* b1 and b2 are the slope and intercept values for each simulated sample
dim bb1 [nsim]      bb2 [nsim] 
dim et1 [nsim]      et2 [nsim]
dim bb1w [nsim]     bb2w [nsim]

* In what follows, for each regression:
*  - save estimated coefficients beta:1=slope beta:2=intercept
*  - save fitted values of d in dfit, save associated r values in rdat

********************** HERE ARE THE SAMPLE SIMULATIONS ***********************

do #=1,[nsim]

* first make whole "population" of [popsize] eligible to be sampled
sample 1 [popsize]

* generate a uniformly distributed random number vector length [popsize]
genr rndm=uni(1)

* sort all of the data according to the values of this random number vector
sort rndm d r

* take the first [nsamp] obs. These will be RANDOM sample from the [popsize]

sample 1 [nsamp]
* print d r


* run an ordinary least squares regression, save fitted coefficients
* also save the fitted values of dhat for each value of r in this sample
ols d r / coef=bb resid=e
genr e2=e*e
genr rr=r*r
ols e2 rr / tratio=tt
genr wt=1/(r*r)
ols d r / coef=bbw weight=wt

* begin to fill in the matrix of b1 coefficients, one for each sample
matrix bb1(#)=bb:2
matrix et1(#)=tt:2
matrix bb1w(#)=bbw:2

* begin to fill in the matrix of b2 coefficients, one for each sammple
matrix bb2(#)=bb:1
matrix et2(#)=tt:1
matrix bb2w(#)=bbw:1

endo
*************************** END OF SIMULATIONS *****************************
pause

* now look at the joint distribution and marginal distns of b1 and b2, and
* for b1w and b2w

sample 1 [nsim]
genr b1=bb1
genr b2=bb2
genr bw1=bb1w
genr bw2=bb2w

* For each of the [nsim] different samples drawn from the "population,"
* the point estimates produced by OLS are bb1 and bb2.  The point estimates
* produce by the more efficient estimator WLS are bbw1 and bbw2
* When the program has finished executing, you might want to explore
* further by entering the following commands (without comment characters):

sample 1 [nsim]

* plot bb1 / histo groups=60
* pause
* plot bb2 / histo groups=60
* pause
* plot bbw1 / histo groups=60
* pause
* plot bbw2 / histo groups=60
* pause

* plot bb1 bb2 / gnu
* pause
* plot bbw1 bbw2 / gnu
* pause

* To compare the efficiency of the two estimators in the presence
* of heteroscedasticity where u has variance approximately 
* ([sig]*r)-squared, compare the standard deviations of the OLS 
* and the WLS point estimates across the [nsim] different samples used


stat b1 b2 et1 et2 bw1 bw2 
*
*      1.) b1 and b2 are the naive OLS intercept and slope estimates from 
* each sample.  et1 and et2 are the t-ratios on the intercept and slope
* coefficients for the corresponding regressions of e2 on r*r to detect
* heteroscedasticity. bw1 and bw2 are WLS estimates using wt=1/(r*r) which
* means that all data are multiplied by 1/r before the regression on
* transformed data is run.
*      2.) if WLS is more efficient than OLS under heteroscedasticity,
* we would expect to see the std dev of bw2 being less than the std dev of
* b2 (and similarly for the intercepts).  Is this borne out in your case?
*      3.) if OLS remains unbiased despite presence of heteroscedasticity,
* are you surprised that average point estimates differ between OLS and
* WLS?  Why or why not?



COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS

Updated: February 27, 1998
Prepared by: Trudy Ann Cameron