UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Problem Set #4: Los Angeles Apartment Rental Rate Example

Outline of Solutions


If you have not reached this page using a link from the associated problem set questions page, you may want to link to the questions for this homework set. The character [] has been added to the end of each question to provide a link to the relevant part of this solution set.

The first question on this problem set was the subject of study.sha and colds.sha examples. These were covered in the lab sessions (in Lab #4) so I will not duplicate the discussion here.

This outline of solutions pertains to the larent.sha program and larent.dat data set for multiple regression. See comments in red in the following SHAZAM output. These will help you address some of the questions in the problem set. They will also increase your familiarity with interpreting SHAZAM output in general.

PLEASE keep sight of the fact that problem sets are intended to "stretch" you. Struggling with how to do something makes the achievement more rewarding (in theory). Since I always detested "regurgitation" homeworks, these are intentionally challenging. I would be truly amazed if anyone actually achieved these answers based on a cold start (so to speak). If you are privy to solution sets from prior quarters, of course, you will have an advantage over some of your classmates. Depending upon how much time I have, homeworks are revised relatively more or less from previous editions of the class.

I am pleased to see that so many of you are meeting the challenges of these homeworks. Remember, the adage is "learn it any way you can." Collaboration is strongly advised.

 |_sample 1 26

 |_* nowarnskip suppresses endless messages that certain observations are
 |_* being skipped
 |_set nowarnskip

 |_read(larent.dat) rent sqkld bed sqbed bath sqbath pkg beach ucla
 UNIT 88 IS NOW ASSIGNED TO: larent.dat
    9 VARIABLES AND       26 OBSERVATIONS STARTING AT OBS       1

  
 |_* get descriptive statistics and data covariance matrix
 |_stat / pcor
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 RENT        26   1128.8       278.60       77619.       757.00       1709.0
 SQKLD       26   412.69       68.224       4654.5       330.00       600.00
 BED         26   1.9615      0.87090      0.75846       1.0000       3.0000
 SQBED       26   214.69       112.45       12646.       76.000       420.00
 BATH        26   1.4904      0.50240      0.25240      0.75000       2.5000
 SQBATH      26   57.346       19.424       377.28       26.000       90.000
 PKG         26   1.5769      0.64331      0.41385      0.00000       2.0000
 BEACH       26   3.6538       1.9327       3.7354      0.00000       7.0000
 UCLA        26   4.3269       2.2492       5.0588      0.00000       8.0000
 
  CORRELATION MATRIX OF VARIABLES -       26 OBSERVATIONS
 
 
 RENT       1.0000
 SQKLD     0.90355       1.0000
 BED       0.79161      0.63128       1.0000
 SQBED     0.97359      0.86365      0.88332      1.0000
 BATH      0.85260      0.88772      0.47908      0.79912       1.0000
 SQBATH    0.81551      0.80990      0.47611      0.78374       0.93596
            1.0000
 PKG       0.61452      0.47357      0.54096      0.59032      0.54385
           0.58200       1.0000
 BEACH    -0.70540E-01  0.18785     -0.82261E-02  0.17343E-01  0.23331
           0.22601      0.10270       1.0000
 UCLA      0.36064E-01 -0.17801      0.31298      0.92693E-01 -0.26260
          -0.16430      0.12706     -0.53193      
1.0000
              RENT         SQKLD        BED          SQBED        BATH
              SQBATH       PKG          BEACH        UCLA  
 
 |_plot beach ucla

Observe strong negative correlation between distance from beach and distance from UCLA , with a few "exceptions" highlighted in red.
 

 REQUIRED MEMORY IS PAR=     3 CURRENT PAR=   500
 FOR MAXIMUM EFFICIENCY USE AT LEAST PAR=     3
        26 OBSERVATIONS
                    *=BEACH
                    M=MULTIPLE POINT
    8.0000        |
    7.5789        |
    7.1579        |
    6.7368        |    *
    6.3158        |
    5.8947        |*   M    *                        *    *
    5.4737        |
    5.0526        |
    4.6316        |         *                        *
    4.2105        |
    3.7895        |                   *    *
    3.3684        |                 * *
    2.9474        |              M    M    M
    2.5263        |
    2.1053        |
    1.6842        |                        M    *
    1.2632        |
   0.84211        |                             *    *    *
   0.42105        |
   0.44409E-14    |                             *
                   ________________________________________
 
               0.000     2.000     4.000     6.000     8.000
 
                                UCLA  


 
 |_* regress rent on everything available
 |_*   save the relevant sums of squares and degrees of freedom to use in
 |_*   explicit F-tests later, but do automated F-tests following the ols.
 
 |_ols rent sqkld bed sqbed bath sqbath pkg beach ucla / pcov

Note that when you as for "pcov" on OLS, you get the variance-covariance matrix for the vector of fitted coefficients. If you were explicitly calculating a variance (and standard error) for some linear combination of estimated coefficients, you could use these covariances in the formulas.

Coefficient on PKG is the effect on expected rent of an additional parking space. Coefficient on BEACH is the effect on expected rent of being one mile further from the beach. We would expect this to be negative, since the beach is considered an amenity.

 
 REQUIRED MEMORY IS PAR=     6 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.9964     R-SQUARE ADJUSTED =   0.9947
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   410.13
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   20.252
 SUM OF SQUARED ERRORS-SSE=   6972.1
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -109.583

Note that the overall F-test for the joint significance of the complete set of slope coefficients soundly rejects the hypothesis that all slopes could be simultaneously zero. The P-value indicates that the probability out in the tail of the relevant F-distribution (beyond the 589.306 cutoff) is smaller than 0.0005.

                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.19335E+07      8.       0.24169E+06           589.306
 ERROR             6972.1         17.        410.13               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
We generally ignore the ANOVA from ZERO...
                     ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.35061E+08      9.       0.38956E+07          9498.627
 ERROR             6972.1         17.        410.13               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000

Slope coefficient P-values of less the 0.05 tell us that the corresponding t-test statistic value is far enough out in the tail of the relevant t-distribution (df=17 here) such that less than 5% of the probability lies beyond the symmetric pair of cutoffs defined by this t-ratio value. Thus, we tend to reject the null hypotheses that the associated coefficients are individually statistically significantly different from zero.

 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      17 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 SQKLD     0.95126     0.1714       5.550     0.000 0.803     0.2329     0.3478
 BED       -16.049      23.19     -0.6919     0.498-0.166    -0.0502    -0.0279
 SQBED      1.7343     0.2703       6.416     0.000 0.841     0.7000     0.3299
 BATH       55.194      33.97       1.625     0.123 0.367     0.0995     0.0729
 SQBATH   -0.25117     0.7350     -0.3417     0.737-0.083    -0.0175    -0.0128
 PKG        43.974      8.817       4.988     0.000 0.771     0.1015     0.0614
 BEACH     -27.332      3.247      -8.418     0.000-0.898    -0.1896    -0.0885
 UCLA      -7.6989      2.784      -2.766     0.013-0.557    -0.0622    -0.0295
 CONSTANT   391.31      56.36       6.943     0.000 0.860     0.0000     0.3467

Including numbers of bedrooms (BED) and numbers of bathrooms (BATH) in addition to the areas in bedrooms and bathrooms (SQBED, SQBATH) does not individually add much to the explanatory power of the model, since the coefficients on BED and BATH are not individually significant.


 VARIANCE-COVARIANCE MATRIX OF COEFFICIENTS
 SQKLD     0.29376E-01
 BED       0.70850       537.96
 SQBED    -0.17334E-01  -5.9025      0.73077E-01
 BATH      -2.3503       279.40      -3.0318       1153.6
 SQBATH    0.35786E-01   8.1927     -0.99788E-01  -11.715      0.54024  
 PKG       0.19753      -58.038      0.44427      -60.824      -1.4257
            77.731
 BEACH    -0.11681      -45.642      0.53296      -12.199     -0.97854
            1.1534       10.541
 UCLA      0.54307E-02  -24.756      0.19110       15.180     -0.65328
           -1.8470       5.1032       7.7499
 CONSTANT  -8.2488      -601.24       9.8087       100.05      -14.286
           -9.4428       35.193      -29.136       3176.0
              SQKLD        BED          SQBED        BATH         SQBATH
              PKG          BEACH        UCLA         CONSTANT

While the individual coefficients on BED and BATH are not significantly different from zero, let's see whether they could be jointly equal to zero. If there is multicollinearity between the variables BED and BATH, it is possible we simply cannot distinguish their separate contributions.

 
 |_test
 |_test bed=0
 |_test bath=0
 |_end

This F-test shows that the null hypothesis that the slopes on BED and BATH are simultaneously zero cannot be rejected at the 5% level of significance (nor at the 10% level, although at the 14% level, we could reject). There is roughly 13.7% of the probability out in the right-hand tail of the relevant F-distribution if the null hypothesis is true.

 F STATISTIC =   2.2402820     WITH    2 AND   17 D.F.  P-VALUE= 0.13691
 WALD CHI-SQUARE STATISTIC =   4.4805640     WITH    2 D.F.  P-VALUE= 0.10643
 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.44637

Test whether the coefficients on all the square-footage variables could be simultaneously zero (e.g. null hypothesis that none of the square-footage variables matters when these are included in addition to the numbers of rooms of each type.)

 |_test
 |_test sqkld=0
 |_test sqbed=0
 |_test sqbath=0
 |_end

Given the individual significance of the coefficients, it is not surprising that this joint hypothesis is rejected soundly. Check the P-value. If the null hypothesis was true, this F-test statistic value would be virtually impossible to observe.

 
 F STATISTIC =   40.012118     WITH    3 AND   17 D.F.  P-VALUE= 0.00000
 WALD CHI-SQUARE STATISTIC =   120.03635     WITH    3 D.F.  P-VALUE= 0.00000
 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.02499

Test whether the effect of an extra mile from the beach is identical to the effect on rent of being an extra mile from campus. This hypothesis could be expressed as BEACH=UCLA , or as BEACH-UCLA =0.

 |_test beach-ucla=0
 TEST VALUE =  -19.633     STD. ERROR OF TEST VALUE   2.8434
 T STATISTIC =  -6.9046602     WITH   17 D.F.    P-VALUE= 0.00000
 F STATISTIC =   47.674333     WITH    1 AND   17 D.F.  P-VALUE= 0.00000
 WALD CHI-SQUARE STATISTIC =   47.674333     WITH    1 D.F.  P-VALUE= 0.00000
 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.02098

Since the P-value associated with this test is extremely tiny, the null hypothesis is not supported by the data. This test value would be very unlikely to be observed if the null hypothesis was true.

 
 |_confid beach ucla  Try a joint confidence ellipse for these two coefficients.
 USING 95% AND 90% CONFIDENCE INTERVALS
 
 CONFIDENCE INTERVALS BASED ON T-DISTRIBUTION WITH  17 D.F.
 5% and 10% critical values for relevant t-distribution are given first; we usually
 use the 5% critical value.
      - T CRITICAL VALUES =   2.110 AND   1.740 
 NAME   LOWER 2.5%   LOWER 5%   COEFFICIENT   UPPER 5%   UPPER 2.5% 
 BEACH     -34.18      -32.98      -27.332      -21.68      -20.48       3.247
 UCLA      -13.57      -12.54      -7.6989      -2.855      -1.825       2.784

The above table lets you read off the one-dimensional confidence intervals for each coefficient by itself (midpoint and bottom and top boundaries). These boundaries are indicated by the "box" formed by the "+" signs in the plot below, and the joint mean is the "*" in the center of the ellipse.

 CONFIDENCE REGION PLOT FOR beach    AND ucla
  USING F DISTRIBUTION WITH 2 AND   17 D.F.     F-VALUE =   3.590
 
 REQUIRED MEMORY IS PAR=     3 CURRENT PAR=   500
 FOR MAXIMUM EFFICIENCY USE AT LEAST PAR=     6
       205 OBSERVATIONS
                    M=MULTIPLE POINT
   -18.000        |
   -19.263        |                        ** *  M  ***
   -20.526        |     +             *M*M*          +*MM
   -21.789        |               *MMM                  MM
   -23.053        |            MMM*                      M
   -24.316        |         *MM                         MM
   -25.579        |       *MM                           M
   -26.842        |     *MM                           *M
   -28.105        |    MM              *             MM
   -29.368        |   MM                           MMM
   -30.632        |  M*                          MM*
   -31.895        |  M                        *MM
   -33.158        |  M                     *MMM
   -34.421        |  M  +               MMM*         +
   -35.684        |   MM**       *****M*
   -36.947        |       * M   *
   -38.211        |
   -39.474        |
   -40.737        |
   -42.000        |
                   ________________________________________
 
           -0.16E+02 -0.12E+02 -0.80E+01 -0.40E+01  0.00E+00
 
                                UCLA  

A fancy gnuplot version of the confidence interval reveals the following:

In the above diagram, pairs of coefficient values in the "++++" box are individually acceptable hypotheses about the two population coefficients, but only those pairs in the ellipse are jointly acceptable hypotheses. The lesson is that some pairs which are individually acceptable (technically, are not rejected) are not jointly acceptable (technically, are jointly rejected).

SHAZAM has "temporary" variables that you can render permanent by copying them into explicitly named scalars or variables. This allows you to use these values later on. The temporary variables (beginning with $...) are overwritten by subsequent OLS runs so that they always contain the current values for the most recent regression.

Let's call the most recent model the "unrestricted" model, and save its explained sum or squares as urexss, its residual sum of squares as urress, and its degrees of freedom as urdf.

 |_gen1 urexss=$ssr
 ..NOTE..CURRENT VALUE OF $SSR =  0.19335E+07
 |_gen1 urress=$sse
 ..NOTE..CURRENT VALUE OF $SSE =   6972.1
 |_gen1 urdf=$df
 ..NOTE..CURRENT VALUE OF $DF  =   17.000


 
 |_* regress rent on just the number of rooms of each type
 
 |_ols rent bed bath pkg beach ucla

We don't use a variable for number of kitchens, livingrooms and diningrooms because all apartments presumably have just one of each. This variable would be colinear with the intercept term.

 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26

R-squared just tells whether a model gives a better fit than another with the same dependent variable. We do not know a distribution for R-squared under the null hypothesis (what null hypothesis), so it is not used for statistical tests.

  R-SQUARE =   0.9710     R-SQUARE ADJUSTED =   0.9638
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   2810.1
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   53.010
 SUM OF SQUARED ERRORS-SSE=   56202.
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -136.714
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.18843E+07      5.       0.37686E+06           134.108
 ERROR             56202.         20.        2810.1               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.35011E+08      6.       0.58352E+07          2076.520
 ERROR             56202.         20.        2810.1               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      20 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 BED        159.63      16.91       9.439     0.000 0.904     0.4990     0.2774
 BATH       345.57      30.46       11.35     0.000 0.930     0.6232     0.4563
 PKG        20.637      21.75      0.9487     0.354 0.208     0.0477     0.0288
 BEACH     -39.560      6.667      -5.934     0.000-0.799    -0.2744    -0.1281
 UCLA      -13.440      6.782      -1.982     0.061-0.405    -0.1085    -0.0515
 CONSTANT   470.78      56.24       8.370     0.000 0.882     0.0000     0.4171

The coefficients on the parking spot variable and the distance from ucla variable are individually statistically insignificant.

 |_gen1 r2exss=$ssr
 ..NOTE..CURRENT VALUE OF $SSR =  0.18843E+07

This is a second restricted model, so we save the explained sum of squares (alias regression sum of squares) as r2exss, for later use in a special F-test.


 
 |_* regress rent on just the square feet variables

This restricts the coefficients on the "number of rooms" variables all to be zero.

 |_ols rent sqkld sqbed sqbath pkg beach ucla
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.9955     R-SQUARE ADJUSTED =   0.9940
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   463.67
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   21.533
 SUM OF SQUARED ERRORS-SSE=   8809.7
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -112.624

The information in the next table is the stuff for the "restricted model."

                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.19317E+07      6.       0.32195E+06           694.343
 ERROR             8809.7         19.        463.67               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.35059E+08      7.       0.50084E+07         10801.659
 ERROR             8809.7         19.        463.67               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      19 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 SQKLD      1.1436     0.1545       7.402     0.000 0.862     0.2800     0.4181
 SQBED      1.5561     0.9696E-01   16.05     0.000 0.965     0.6281     0.2960
 SQBATH    0.99921     0.4287       2.331     0.031 0.472     0.0697     0.0508
 PKG        44.176      8.930       4.947     0.000 0.750     0.1020     0.0617
 BEACH     -29.418      2.719      -10.82     0.000-0.928    -0.2041    -0.0952
 UCLA      -10.203      2.568      -3.974     0.001-0.674    -0.0824    -0.0391
 CONSTANT   347.40      51.38       6.761     0.000 0.840     0.0000     0.3078

All of the individual slope coefficients are individually statistically significant at the 5% level. The +2.331 and -2.331 values of a t-distribution with 19 degrees of freedom would leave 3.1% of the probability out in the tails of the distribution.

 |_gen1 r1exss=$ssr
 ..NOTE..CURRENT VALUE OF $SSR =  0.19317E+07

This saves the explained sum of squares for this particular restricted model.

 |_*  ---  now do some F-tests of interest explicitly ----- *

You can certainly use a block of test commands enclosed by TEST and END to do these joint hypothesis tests the long-hand way. What we are doing the old-fashioned way is what SHAZAM goes and does when you issue a block of TEST commands.

 |_* test whether joint contribution of bed and bath is statistically significant
 |_gen1 f1=((urexss-r1exss)/2)/(urress/urdf)
 |_print f1
     F1
    2.240282

This number has to be compared to the 5% critical value of an F-distribution with (2,17) degrees of freedom (i.e. 2 restrictions, 17 unresticted model df). This critical value, from the back of your text, is 3.59. Our test value cannot "beat" this critical value, so we cannot reject the restrictions embodied in the first restricted model above. I.e. the model that restrict the "numbers of rooms" coefficients to be jointly zero.

 |_* test whether joint contribution of sqkld,sqbed and sqbath is significant
 |_gen1 f2=((urexss-r2exss)/3)/(urress/urdf)
 |_print f2
     F2
    40.01212

This number has to be compared to the 5% critical value of an F-distribution with (3,17) degrees of freedom. The critical value is 3.20. We readily beat this value with a test statistic of over 40, so we conclude that the null hypothesis--that the square-footage variables do not need to be in the model--is implausible.

 
 |_* try some of the set of auxiliary regressions to look for sources of
 |_*   multicollinearity

Remember that the AUXRSQR option on the main regression of RENT on all of these explanatory variables would cycle through each of these regressors treating each one alternately as the "dependent" variable and regressing it on all of the others.

 
  |_ols sqkld bed sqbed bath sqbath pkg beach ucla 

 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = SQKLD
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 

The high R-squared value suggests that SQKLD, for example, is pretty well explained by some linear combination of the other variables on the RHS of the main unrestricted regression.

 R-SQUARE =   0.8800     R-SQUARE ADJUSTED =   0.8334
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   775.64
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   27.850
 SUM OF SQUARED ERRORS-SSE=   13961.
 MEAN OF DEPENDENT VARIABLE =   412.69
 LOG OF THE LIKELIHOOD FUNCTION = -118.610
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.10240E+06      7.        14629.                18.860
 ERROR             13961.         18.        775.64               P-VALUE
 TOTAL            0.11636E+06     25.        4654.5                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.45306E+07      8.       0.56632E+06           730.141
 ERROR             13961.         18.        775.64               P-VALUE
 TOTAL            0.45446E+07     26.       0.17479E+06             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      18 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 BED       -24.119      31.39     -0.7685     0.452-0.178    -0.3079    -0.1146
 SQBED     0.59009     0.3448       1.712     0.104 0.374     0.9726     0.3070
 BATH       80.009      42.73       1.872     0.078 0.404     0.5892     0.2889
 SQBATH    -1.2182     0.9692      -1.257     0.225-0.284    -0.3468    -0.1693
 PKG       -6.7242      12.02     -0.5594     0.583-0.131    -0.0634    -0.0257
 BEACH      3.9763      4.366      0.9109     0.374 0.210     0.1126     0.0352
 UCLA     -0.18487      3.828     -0.4829E-01 0.962-0.011    -0.0061    -0.0019
 CONSTANT   280.80      40.32       6.964     0.000 0.854     0.0000     0.6804

There may be more multicollinearity among these "pseudo-regressors" that obscures the individual contributions of these variables to explaining the variation in SQKLD, but it looks like the number of baths (and perhaps the square feet of bedrooms) could be correlated with SQKLD. If all are simultaneously included in the same regression, it might be hard to sort out their individual contributions to explaining RENT.

 |_ols bed sqkld sqbed bath sqbath pkg beach ucla  
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = BED
 ...NOTE..SAMPLE RANGE SET TO:      1,     26

Another high auxiliary R-squared value...BED included alone in any regression will pick up systematic variations in the other variables in this regression.

 R-SQUARE =   0.9598     R-SQUARE ADJUSTED =   0.9442
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.42354E-01
 STANDARD ERROR OF THE ESTIMATE-SIGMA =  0.20580
 SUM OF SQUARED ERRORS-SSE=  0.76237
 MEAN OF DEPENDENT VARIABLE =   1.9615
 LOG OF THE LIKELIHOOD FUNCTION =  8.99009
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        18.199          7.        2.5999                61.385
 ERROR            0.76237         18.       0.42354E-01           P-VALUE
 TOTAL             18.962         25.       0.75846                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION        118.24          8.        14.780               348.958
 ERROR            0.76237         18.       0.42354E-01           P-VALUE
 TOTAL             119.00         26.        4.5769                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      18 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 SQKLD    -0.13170E-02 0.1714E-02 -0.7685     0.452-0.178    -0.1032    -0.2771
 SQBED     0.10972E-01 0.9267E-03   11.84     0.000 0.941     1.4167     1.2009
 BATH     -0.51937     0.3227      -1.609     0.125-0.355    -0.2996    -0.3946
 SQBATH   -0.15229E-01 0.6550E-02  -2.325     0.032-0.481    -0.3397    -0.4452
 PKG       0.10788     0.8591E-01   1.256     0.225 0.284     0.0797     0.0867
 BEACH     0.84843E-01 0.2624E-01   3.233     0.005 0.606     0.1883     0.1580
 UCLA      0.46018E-01 0.2613E-01   1.761     0.095 0.383     0.1188     0.1015
 CONSTANT   1.1176     0.5085       2.198     0.041 0.460     0.0000     0.5698
 
 |_ols beach sqkld bed sqbed bath sqbath pkg ucla  
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = BEACH
 ...NOTE..SAMPLE RANGE SET TO:      1,     26

Not as high an R-squared as in the other auxiliary regressions.

 R-SQUARE =   0.5834     R-SQUARE ADJUSTED =   0.4214
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   2.1614
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   1.4702
 SUM OF SQUARED ERRORS-SSE=   38.906
 MEAN OF DEPENDENT VARIABLE =   3.6538
 LOG OF THE LIKELIHOOD FUNCTION = -42.1321
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        54.479          7.        7.7827                 3.601
 ERROR             38.906         18.        2.1614               P-VALUE
 TOTAL             93.385         25.        3.7354                 0.013
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION        401.59          8.        50.199                23.225
 ERROR             38.906         18.        2.1614               P-VALUE
 TOTAL             440.50         26.        16.942                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      18 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 SQKLD     0.11081E-01 0.1217E-01  0.9109     0.374 0.210     0.3911     1.2515
 BED        4.3298      1.339       3.233     0.005 0.606     1.9510     2.3244
 SQBED    -0.50558E-01 0.1559E-01  -3.243     0.005-0.607    -2.9417    -2.9707
 BATH       1.1573      2.451      0.4722     0.642 0.111     0.3008     0.4720
 SQBATH    0.92828E-01 0.4867E-01   1.907     0.073 0.410     0.9329     1.4569
 PKG      -0.10942     0.6395     -0.1711     0.866-0.040    -0.0364    -0.0472
 UCLA     -0.48411     0.1668      -2.902     0.009-0.565    -0.5634    -0.5733
 CONSTANT  -3.3385      4.015     -0.8315     0.417-0.192     0.0000    -0.9137

 
 
 |_* create total square feet and total distance, also to be used to test whether
 |_*  the coefficients on each set of variables are identical within the group

You could test whether the coefficients on each square foot variable were the same by using a TEST...END block of commands including TEST SQKLD=SQBED and TEST SQKLD=SQBATH. This would be two restrictions, since SQKLD could be whatever the data suggest. Likewise, testing the equality of the mileage variables could be done in a single test command, TEST BEACH=UCLA or TEST BEACH-UCLA =0. This would be one restriction.

 |_genr feet=sqkld+sqbed+sqbath
 |_genr dist=beach+ucla

[joint confidence ellipse for beach and ucla coefficients in prior model]

 |_* regress rent on only the two aggregated variables, without number of rooms
 |_*   of each type

You can impose the restriction that the coefficients on all of the square footage variables are identical by summing the variables (amounts to collecting terms with identical coefficients).

 |_ols rent feet pkg dist
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
 R-SQUARE =   0.9739     R-SQUARE ADJUSTED =   0.9704
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   2297.9
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   47.936
 SUM OF SQUARED ERRORS-SSE=   50553.
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -135.337
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.18899E+07      3.       0.62998E+06           274.157
 ERROR             50553.         22.        2297.9               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.35017E+08      4.       0.87543E+07          3809.729
 ERROR             50553.         22.        2297.9               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      22 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEET       1.3455     0.6162E-01   21.84     0.000 0.978     0.9227     0.8162
 PKG        47.853      18.79       2.546     0.018 0.477     0.1105     0.0669
 DIST      -15.647      4.852      -3.225     0.004-0.567    -0.1147    -0.1106
 CONSTANT   256.88      50.30       5.107     0.000 0.737     0.0000     0.2276

The above model suggests that you pay an extra $1.35 per month, on average, for each extra square foot of apartment space. For each extra parking spot, you pay $47.85 per month. For each mile further from "amenities" (either campus or the beach) you get an apartment that is cheaper, on average, by $15.65. Do these effects seem plausible?

 |_* regress rent on feet and distance with number of rooms
 
 |_ols rent feet bed bath pkg dist  
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.9793     R-SQUARE ADJUSTED =   0.9741
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   2009.4
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   44.827
 SUM OF SQUARED ERRORS-SSE=   40189.
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -132.355
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.19003E+07      5.       0.38006E+06           189.136
 ERROR             40189.         20.        2009.4               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.35027E+08      6.       0.58379E+07          2905.228
 ERROR             40189.         20.        2009.4               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      20 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEET       1.3206     0.2416       5.467     0.000 0.774     0.9056     0.8011 
 BED        29.407      30.98      0.9492     0.354 0.208     0.0919     0.0511
 BATH      -34.426      64.58     -0.5331     0.600-0.118    -0.0621    -0.0455
 PKG        48.744      18.71       2.605     0.017 0.503     0.1126     0.0681
 DIST      -20.364      5.099      -3.994     0.001-0.666    -0.1493    -0.1440
 CONSTANT   303.78      59.58       5.099     0.000 0.752     0.0000     0.2691

 |_* regress rent on feet, bed, bath, pkg and separate distance variables
 
 |_ols rent feet bed bath pkg beach ucla
 
 REQUIRED MEMORY IS PAR=     6 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.9952     R-SQUARE ADJUSTED =   0.9937
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   487.15
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   22.071
 SUM OF SQUARED ERRORS-SSE=   9255.8
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -113.266
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.19312E+07      6.       0.32187E+06           660.728
 ERROR             9255.8         19.        487.15               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.35058E+08      7.       0.50083E+07         10280.960
 ERROR             9255.8         19.        487.15               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      19 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEET       1.1804     0.1202       9.817     0.000 0.914     0.8094     0.7160
 BED        26.753      15.26       1.753     0.096 0.373     0.0836     0.0465
 BATH       43.613      33.27       1.311     0.206 0.288     0.0786     0.0576
 PKG        40.256      9.275       4.340     0.000 0.706     0.0930     0.0562
 BEACH     -31.737      2.888      -10.99     0.000-0.930    -0.2202    -0.1027
 UCLA      -9.6086      2.850      -3.371     0.003-0.612    -0.0776    -0.0368
 CONSTANT   297.13      29.35       10.12     0.000 0.918     0.0000     0.2632

Being a mile closer to the beach costs more, on average, than being a mile closer to ucla.

Compared to the original main model, this is yet another possible set of restrictions on the parameters. Save the explained sum of squares and construct an F-test of the restrictions embodied in this model.

 |_gen1 r3exss=$ssr
 ..NOTE..CURRENT VALUE OF $SSR =  0.19312E+07
 |_gen1 f3=((urexss-r3exss)/2)/(urress/urdf)
 |_print f3
     F3
    2.784101

Compare to the 5% critical value of an F-distributed random variable with (2,17) degrees of freedom (which is 3.59). We are not "out in the tail" of this F-distribution, so we cannot reject the two restrictions embodied in the restrictions of the last regression (relative to the first one).

 

Activist looks at relationship between rent and distance from ucla, ignoring other explanatory variables.

 |_*  part (h.)
 
 |_plot rent ucla
 This plot looks pretty much like a "blob" with little systematic relationship.
 
 REQUIRED MEMORY IS PAR=     3 CURRENT PAR=   500
 FOR MAXIMUM EFFICIENCY USE AT LEAST PAR=     4
        26 OBSERVATIONS
                    *=RENT
                    M=MULTIPLE POINT
    1800.0        |
    1736.8        |
    1673.7        |         *              *
    1610.5        |
    1547.4        |              *
    1484.2        |                        *
    1421.1        |
    1357.9        |    *              *
    1294.7        |                 *
    1231.6        |                        *
    1168.4        |                             *         M
    1105.3        |                                  *
    1042.1        |                   M    *
    978.95        |                        *
    915.79        |    *                        M
    852.63        |    *    *                        M
    789.47        |                   *
    726.32        |*             *
    663.16        |
    600.00        |
                   ________________________________________
 
               0.000     2.000     4.000     6.000     8.000
 
                                UCLA  

 |_ols rent ucla
 
 REQUIRED MEMORY IS PAR=     4 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.0013     R-SQUARE ADJUSTED =  -0.0403
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   80748.
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   284.16
 SUM OF SQUARED ERRORS-SSE=  0.19380E+07
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -182.740
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        2523.7          1.        2523.7                 0.031
 ERROR            0.19380E+07     24.        80748.               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.861

We cannot reject the null hypothesis that "all slopes are simultaneously zero." In this case, it means just one slope (that on UCLA ), so the F-test is equivalent to a t-test (squared). Check that 0.031 is roughly the square of 0.1768. This means that there is no statistical relationship between distance from ucla and rents. And the point estimate even seems to be positive, suggesting that, if anything, rents are lower closer to ucla.

 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.33130E+08      2.       0.16565E+08           205.141
 ERROR            0.19380E+07     24.        80748.               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      24 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 UCLA       4.4671      25.27      0.1768     0.861 0.036     0.0361     0.0171
 CONSTANT   1109.4      122.7       9.041     0.000 0.879     0.0000     0.9829

Activist now controls for distance from beach before looking at effect of distance from ucla on rental rates.

 |_ols rent ucla beach

This regression controls for distance from the beach before trying to determine the effect of an additional mile from UCLA on RENT.

 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
 NOTE crummy R-squared value.
   R-SQUARE =   0.0050    R-SQUARE ADJUSTED = -0.0815
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   83949.
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   289.74
 SUM OF SQUARED ERRORS-SSE=  0.19308E+07
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -182.692
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        9661.5          2.        4830.7                 0.058
 ERROR            0.19308E+07     23.        83949.               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.944

Cannot reject null hypothesis that slopes on UCLA and BEACH are simultaneously equal to zero--namely that neither of these two variables explains RENT.

 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.33137E+08      3.       0.11046E+08           131.575
 ERROR            0.19308E+07     23.        83949.               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      23 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 UCLA     -0.25200      30.43     -0.8283E-02 0.993-0.002    -0.0020    -0.0010
 BEACH     -10.324      35.41     -0.2916     0.773-0.061    -0.0716    -0.0334
 CONSTANT   1167.6      235.4       4.960     0.000 0.719     0.0000     1.0344

Information in failure to reject zero slopes (by F-test) is borne out by the t-tests of the two coefficients individually. If you cannot reject the hypothesis that both the slopes are jointly zero, you certainly will not be able to reject the hypotheses that each is individually zero. It still looks like distance from ucla has no statistically significant effect on rental rates. Now, however, the point estimate is negative, suggesting that greater distance means lower rents (if anything).

 |_ols rent ucla beach bed
 Include number of bedrooms to see if this explains rents.

 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
 R-squared goes up a lot...
   R-SQUARE =   0.7277     R-SQUARE ADJUSTED =   0.6906
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   24014.
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   154.97
 SUM OF SQUARED ERRORS-SSE=  0.52831E+06
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -165.844
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.14122E+07      3.       0.47072E+06            19.602
 ERROR            0.52831E+06     22.        24014.               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000

Hypothesis that all slopes simultaneously zero is now soundly rejected by the F-test for the "overall significance of the regression."

 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.34539E+08      4.       0.86348E+07           359.571
 ERROR            0.52831E+06     22.        24014.               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      22 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 UCLA      -48.924      17.47      -2.800     0.010-0.513    -0.3950    -0.1875
 BEACH     -39.371      19.32      -2.038     0.054-0.399    -0.2731    -0.1274
 BED        292.06      38.22       7.642     0.000 0.852     0.9130     0.5075
 CONSTANT   911.42      130.3       6.995     0.000 0.831     0.0000     0.8074

Variables UCLA and BED are now strongly significant at the 5% level and BEACH approaches significance at the 5% level (it is significant at the 10% level).

 |_ols rent ucla beach bed pkg
 Examine what happens as we control for number of parking spots as well.... 

 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.7846     R-SQUARE ADJUSTED =   0.7436
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   19904.
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   141.08
 SUM OF SQUARED ERRORS-SSE=  0.41798E+06
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -162.799
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.15225E+07      4.       0.38063E+06            19.123
 ERROR            0.41798E+06     21.        19904.               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.34650E+08      5.       0.69299E+07           348.171
 ERROR            0.41798E+06     21.        19904.               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      21 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 UCLA      -49.748      15.91      -3.126     0.005-0.564    -0.4016    -0.1907
 BEACH     -44.295      17.71      -2.501     0.021-0.479    -0.3073    -0.1434
 BED        243.16      40.52       6.001     0.000 0.795     0.7601     0.4226
 PKG        123.82      52.59       2.354     0.028 0.457     0.2859     0.1730
 CONSTANT   833.64      123.1       6.770     0.000 0.828     0.0000     0.7385

Now, all coefficients are individually significant (so of course, the F-test of "all coefficients jointly zero" is readily rejected). Controlling for BED and PKG finally reveals a strongly significant negative relationship between the distance between the apartment and the campus and rental rates. In other words, the closer you rent to campus, the more you will pay. The activist's intuition has finally been confirmed by the data. Check the STAT/PCOR to see why the observed changes occur in coefficients as BED and PKG are added to the model.


 
 |_* part (i.)
 |_* for a point estimate, plug the values of the variables into an 
 |_* appropriate fitted regression model, see what predicted rent emerges.
 |_* Do not worry about the confidence interval for now, although there are
 |_* ways of getting SHAZAM to produce this for you, we have not covered
 |_* this explicitly for multiple regression.  

The gist of the idea is to view the predicted value of rent for a certain apartment profile as a linear combination of estimated coefficients (random variables with individual variances and covariances) and "numbers" (coefficients) that you plug into that linear formula. To construct a confidence interval for mean prediction that will give you a range of plausible hypotheses about the expected value of RENT for these characteristics, you first need to come up with the variance of this linear combination and then take its square root to use in the usual confidence interval formula.

You can get the required parameter variances and covariances by using the / PCOV option on the OLS command. Unlike the use of this option on the STAT command, you will get the variances and covariances for the parameter estimates, rather than for the variables. These variances and covariances get plugged into the usual general formula for the variance of a linear combination of random variables.

 
 |_* part (j.)
 |_* You want to be really careful about making "out of sample" predictions.
 |_* These data are outside the range of the data used for estimation.
 

 
 |_*  part (k.)
 
 |_plot rent pkg

All apartments in the sample have either 0, 1, or 2 parking spots.

Average rent appears to go up as number of parking spaces increases.

 REQUIRED MEMORY IS PAR=     3 CURRENT PAR=   500
 FOR MAXIMUM EFFICIENCY USE AT LEAST PAR=     4
        26 OBSERVATIONS
                    *=RENT
                    M=MULTIPLE POINT
    1800.0        |
    1736.8        |
    1673.7        |                                       M
    1610.5        |
    1547.4        |                                       *
    1484.2        |                                       *
    1421.1        |
    1357.9        |                                       M
    1294.7        |                                       *
    1231.6        |                   *
    1168.4        |                                       M
    1105.3        |                                       *
    1042.1        |                                       M
    978.95        |                                       *
    915.79        |                   M                   *
    852.63        |                   M                   *
    789.47        |                   *
    726.32        |M
    663.16        |
    600.00        |
                   ________________________________________
 
               0.000     0.500     1.000     1.500     2.000
 
                                PKG
 
 
 |_ols rent beach ucla bed bath pkg
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       26 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.9710     R-SQUARE ADJUSTED =   0.9638
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   2810.1
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   53.010
 SUM OF SQUARED ERRORS-SSE=   56202.
 MEAN OF DEPENDENT VARIABLE =   1128.8
 LOG OF THE LIKELIHOOD FUNCTION = -136.714
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.18843E+07      5.       0.37686E+06           134.108
 ERROR             56202.         20.        2810.1               P-VALUE
 TOTAL            0.19405E+07     25.        77619.                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.35011E+08      6.       0.58352E+07          2076.520
 ERROR             56202.         20.        2810.1               P-VALUE
 TOTAL            0.35068E+08     26.       0.13488E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      20 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 BEACH     -39.560      6.667      -5.934     0.000-0.799    -0.2744    -0.1281
 UCLA      -13.440      6.782      -1.982     0.061-0.405    -0.1085    -0.0515
 BED        159.63      16.91       9.439     0.000 0.904     0.4990     0.2774
 BATH       345.57      30.46       11.35     0.000 0.930     0.6232     0.4563
 PKG        20.637      21.75      0.9487     0.354 0.208     0.0477     0.0288
 CONSTANT   470.78      56.24       8.370     0.000 0.882     0.0000     0.4171

P-values for zero-hypothesis for slopes on UCLA and PKG are too large; cannot reject zero hypothesis for these two coefficients.

 |_* use the subset with one bedroom
 |_skipif(bed.ne.1)  Note only ten observations  
 |_stat rent sqkld sqbed sqbath pkg beach ucla
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 RENT        10   886.60       96.814       9372.9       757.00       1064.0
 SQKLD       10   372.00       28.694       823.33       330.00       420.00
 SQBED       10   102.20       24.621       606.18       76.000       150.00
 SQBATH      10   45.600       14.010       196.27       26.000       70.000
 PKG         10   1.2000      0.78881      0.62222      0.00000       2.0000
 BEACH       10   3.8000       2.3944       5.7333      0.00000       7.0000
 UCLA        10   3.4000       2.3664       5.6000      0.00000       7.0000
 
 |_ols rent sqkld sqbed sqbath pkg beach ucla
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       10 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.9833     R-SQUARE ADJUSTED =   0.9499
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   469.40
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   21.666
 SUM OF SQUARED ERRORS-SSE=   1408.2
 MEAN OF DEPENDENT VARIABLE =   886.60
 LOG OF THE LIKELIHOOD FUNCTION = -38.9268
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        82948.          6.        13825.                29.452
 ERROR             1408.2          3.        469.40               P-VALUE
 TOTAL             84356.          9.        9372.9                 0.009
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.79435E+07      7.       0.11348E+07          2417.557
 ERROR             1408.2          3.        469.40               P-VALUE
 TOTAL            0.79450E+07     10.       0.79450E+06             0.000
 
 These results are for the subset of 1-bedroom apartments in the sample.
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR       3 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 SQKLD      1.6706     0.5404       3.091     0.054 0.872     0.4951     0.7009
 SQBED      3.1643      1.261       2.510     0.087 0.823     0.8047     0.3648
 SQBATH    -3.0520      3.161     -0.9656     0.405-0.487    -0.4416    -0.1570
 PKG        62.660      17.84       3.513     0.039 0.897     0.5105     0.0848
 BEACH     -37.074      11.43      -3.244     0.048-0.882    -0.9169    -0.1589
 UCLA      -19.556      10.62      -1.841     0.163-0.728    -0.4780    -0.0750
 CONSTANT   213.11      168.6       1.264     0.295 0.590     0.0000     0.2404

Slopes on SQBATH and UCLA are not individually statistically significantly different from zero.

 |_delete skip$
 VARIABLE SKIP$    IS DELETED        26 WORDS RELEASED
 This gets back the observations dropped for the preceding regression.

 |_* use the subset with three bedrooms
 |_skipif(bed.ne.3) Note that there are nine observations in this subset.
 |_stat rent sqkld sqbed sqbath pkg beach ucla
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 RENT         9   1393.2       232.72       54157.       1118.0       1709.0
 SQKLD        9   471.67       79.530       6325.0       360.00       600.00
 SQBED        9   330.00       70.578       4981.3       240.00       420.00
 SQBATH       9   66.667       19.685       387.50       35.000       90.000
 PKG          9   2.0000      0.00000      0.00000       2.0000       2.0000
 BEACH        9   3.7778       1.9861       3.9444       1.0000       6.0000
 UCLA         9   5.0000       2.5495       6.5000       1.0000       8.0000

Here, there is no variance in the nubmer of parking spaces. It is 2 for all three-bedroom apartments in the sample.

 |_ols rent sqkld sqbed sqbath pkg beach ucla
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
        9 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 ...WARNING...VARIABLE PKG      IS A CONSTANT
 ...MATRIX IS NOT POSITIVE DEFINITE..FAILED IN ROW   7

This cryptic comment from SHAZAM means that somehow, you have got perfect multicollinearity in your explanatory variables. The matrix to which SHAZAM is referring is the matrix inner product (X'X), which has to be inverted to solve for the vector of OLS parameter point estimates in the matrix version of regression. If this matrix is not "positive definite," then it cannot be inverted. "ROW 7" refers to the implicit 7th "variable" on the right hand side--the intercept term, which is a column of ones. Since SHAZAM has including PKG (2 for all observations) in the regression, when it gets to the intercept (1 for all observations), it detects perfect multicollinearity. SHAZAM 8.0 is now nice enough to tell you the identity of the culprit variable(s): PKG in this case.

So now we drop the constant PKG variable and see if the rest of the model works. The effect of parking is now absorbed into the intercept, but the other coefficients can be estimated.

 |_ols rent sqkld sqbed sqbath beach ucla
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
        9 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
  R-SQUARE =   0.9936     R-SQUARE ADJUSTED =   0.9831
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   917.70
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   30.294
 SUM OF SQUARED ERRORS-SSE=   2753.1
 MEAN OF DEPENDENT VARIABLE =   1393.2
 LOG OF THE LIKELIHOOD FUNCTION = -38.5251
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.43050E+06      5.        86100.                93.822
 ERROR             2753.1          3.        917.70               P-VALUE
 TOTAL            0.43326E+06      8.        54157.                 0.002
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.17900E+08      6.       0.29834E+07          3250.909
 ERROR             2753.1          3.        917.70               P-VALUE
 TOTAL            0.17903E+08      9.       0.19892E+07             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR       3 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 SQKLD      1.5025     0.8502       1.767     0.175 0.714     0.5135     0.5087
 SQBED      1.1701      1.044       1.121     0.344 0.543     0.3549     0.2772
 SQBATH    0.57250      1.238      0.4624     0.675 0.258     0.0484     0.0274
 BEACH     -30.972      8.305      -3.729     0.034-0.907    -0.2643    -0.0840
 UCLA      -11.895      5.817      -2.045     0.133-0.763    -0.1303    -0.0427
 CONSTANT   436.71      125.7       3.475     0.040 0.895     0.0000     0.3135
 |_delete skip$
 VARIABLE SKIP$    IS DELETED        26 WORDS RELEASED


 |_* try the subset with two bedrooms
 |_skipif(bed.ne.2)   Note only seven observations...

 |_stat rent sqkld sqbed sqbath pkg beach ucla
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 RENT         7   1134.7       185.63       34458.       900.00       1364.0
 SQKLD        7   395.00       37.193       1383.3       360.00       450.00
 SQBED        7   227.14       62.640       3923.8       150.00       300.00
 SQBATH       7   62.143       19.334       373.81       40.000       90.000
 PKG          7   1.5714      0.53452      0.28571       1.0000       2.0000
 BEACH        7   3.2857       1.2199       1.4881       1.0000       5.0000
 UCLA         7   4.7857       1.2864       1.6548       3.5000       7.0000
 
 |_ols rent sqkld sqbed sqbath pkg beach ucla
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
        7 OBSERVATIONS     DEPENDENT VARIABLE = RENT
 ...NOTE..SAMPLE RANGE SET TO:      1,     26
 
 ...WARNING..ZERO DEGREES OF FREEDOM LEFT

 Note perfect R-squared value 
  R-SQUARE =   1.0000     R-SQUARE ADJUSTED =   1.0000
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.19359E-24  (essentially zero!) 
 STANDARD ERROR OF THE ESTIMATE-SIGMA =  0.43999E-12  (essentially zero!)
 SUM OF SQUARED ERRORS-SSE=  0.19359E-24
 MEAN OF DEPENDENT VARIABLE =   1134.7
 LOG OF THE LIKELIHOOD FUNCTION =  196.042
 
                      ANALYSIS OF VARIANCE - FROM MEAN

The program bombs and may even "throw you out." Shazam does not take kindly to being asked to do something truly stupid. We cannot calculate variances, so everything comes to a halt. Just as two points might be enough to perfectly fit a line (which has two parameters), but there will be zero error variance around the line, we need more than seven points to fit a "hyperplane" with seven parameters.


Updated: 11/2/98; Prepared by: Trudy Ann Cameron; Site Index