If you have not reached this page using a link from the associated problem set questions page, you may want to link to the questions for this homework set. The character [] has been added to the end of each question to provide a link to the relevant part of this solution set.
The first question on this problem set was the subject of study.sha and colds.sha examples. These were covered in the lab sessions (in Lab #4) so I will not duplicate the discussion here.
This outline of solutions pertains to the larent.sha program and larent.dat data set for multiple regression. See comments in red in the following SHAZAM output. These will help you address some of the questions in the problem set. They will also increase your familiarity with interpreting SHAZAM output in general.
PLEASE keep sight of the fact that problem sets are intended to "stretch" you. Struggling with how to do something makes the achievement more rewarding (in theory). Since I always detested "regurgitation" homeworks, these are intentionally challenging. I would be truly amazed if anyone actually achieved these answers based on a cold start (so to speak). If you are privy to solution sets from prior quarters, of course, you will have an advantage over some of your classmates. Depending upon how much time I have, homeworks are revised relatively more or less from previous editions of the class.
I am pleased to see that so many of you are meeting the challenges of these homeworks. Remember, the adage is "learn it any way you can." Collaboration is strongly advised.
|_sample 1 26
|_* nowarnskip suppresses endless messages that certain observations are
|_* being skipped
|_set nowarnskip
|_read(larent.dat) rent sqkld bed sqbed bath sqbath pkg beach ucla
UNIT 88 IS NOW ASSIGNED TO: larent.dat
9 VARIABLES AND 26 OBSERVATIONS STARTING AT OBS 1
|_* get descriptive statistics and data covariance matrix
|_stat / pcor
NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM
RENT 26 1128.8 278.60 77619. 757.00 1709.0
SQKLD 26 412.69 68.224 4654.5 330.00 600.00
BED 26 1.9615 0.87090 0.75846 1.0000 3.0000
SQBED 26 214.69 112.45 12646. 76.000 420.00
BATH 26 1.4904 0.50240 0.25240 0.75000 2.5000
SQBATH 26 57.346 19.424 377.28 26.000 90.000
PKG 26 1.5769 0.64331 0.41385 0.00000 2.0000
BEACH 26 3.6538 1.9327 3.7354 0.00000 7.0000
UCLA 26 4.3269 2.2492 5.0588 0.00000 8.0000
CORRELATION MATRIX OF VARIABLES - 26 OBSERVATIONS
RENT 1.0000
SQKLD 0.90355 1.0000
BED 0.79161 0.63128 1.0000
SQBED 0.97359 0.86365 0.88332 1.0000
BATH 0.85260 0.88772 0.47908 0.79912 1.0000
SQBATH 0.81551 0.80990 0.47611 0.78374 0.93596
1.0000
PKG 0.61452 0.47357 0.54096 0.59032 0.54385
0.58200 1.0000
BEACH -0.70540E-01 0.18785 -0.82261E-02 0.17343E-01 0.23331
0.22601 0.10270 1.0000
UCLA 0.36064E-01 -0.17801 0.31298 0.92693E-01 -0.26260
-0.16430 0.12706 -0.53193
1.0000
RENT SQKLD BED SQBED BATH
SQBATH PKG BEACH UCLA
|_plot beach ucla
Observe strong negative correlation between distance from beach and
distance from UCLA , with a few "exceptions" highlighted in red.
REQUIRED MEMORY IS PAR= 3 CURRENT PAR= 500
FOR MAXIMUM EFFICIENCY USE AT LEAST PAR= 3
26 OBSERVATIONS
*=BEACH
M=MULTIPLE POINT
8.0000 |
7.5789 |
7.1579 |
6.7368 | *
6.3158 |
5.8947 |* M * * *
5.4737 |
5.0526 |
4.6316 | * *
4.2105 |
3.7895 | * *
3.3684 | * *
2.9474 | M M M
2.5263 |
2.1053 |
1.6842 | M *
1.2632 |
0.84211 | * * *
0.42105 |
0.44409E-14 | *
________________________________________
0.000 2.000 4.000 6.000 8.000
UCLA
|_* regress rent on everything available
|_* save the relevant sums of squares and degrees of freedom to use in
|_* explicit F-tests later, but do automated F-tests following the ols.
|_ols rent sqkld bed sqbed bath sqbath pkg beach ucla / pcov
Note that when you as for "pcov" on OLS, you get the variance-covariance matrix for the vector of fitted coefficients. If you were explicitly calculating a variance (and standard error) for some linear combination of estimated coefficients, you could use these covariances in the formulas.
Coefficient on PKG is the effect on expected rent of an additional parking space. Coefficient on BEACH is the effect on expected rent of being one mile further from the beach. We would expect this to be negative, since the beach is considered an amenity.
REQUIRED MEMORY IS PAR= 6 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9964 R-SQUARE ADJUSTED = 0.9947
VARIANCE OF THE ESTIMATE-SIGMA**2 = 410.13
STANDARD ERROR OF THE ESTIMATE-SIGMA = 20.252
SUM OF SQUARED ERRORS-SSE= 6972.1
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -109.583
Note that the overall F-test for the joint significance of the complete set of slope coefficients soundly rejects the hypothesis that all slopes could be simultaneously zero. The P-value indicates that the probability out in the tail of the relevant F-distribution (beyond the 589.306 cutoff) is smaller than 0.0005.
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.19335E+07 8. 0.24169E+06 589.306
ERROR 6972.1 17. 410.13 P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
We generally ignore the ANOVA from ZERO...
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.35061E+08 9. 0.38956E+07 9498.627
ERROR 6972.1 17. 410.13 P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
Slope coefficient P-values of less the 0.05 tell us that the corresponding t-test statistic value is far enough out in the tail of the relevant t-distribution (df=17 here) such that less than 5% of the probability lies beyond the symmetric pair of cutoffs defined by this t-ratio value. Thus, we tend to reject the null hypotheses that the associated coefficients are individually statistically significantly different from zero.
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 17 DF P-VALUE CORR. COEFFICIENT AT MEANS SQKLD 0.95126 0.1714 5.550 0.000 0.803 0.2329 0.3478 BED -16.049 23.19 -0.6919 0.498-0.166 -0.0502 -0.0279 SQBED 1.7343 0.2703 6.416 0.000 0.841 0.7000 0.3299 BATH 55.194 33.97 1.625 0.123 0.367 0.0995 0.0729 SQBATH -0.25117 0.7350 -0.3417 0.737-0.083 -0.0175 -0.0128 PKG 43.974 8.817 4.988 0.000 0.771 0.1015 0.0614 BEACH -27.332 3.247 -8.418 0.000-0.898 -0.1896 -0.0885 UCLA -7.6989 2.784 -2.766 0.013-0.557 -0.0622 -0.0295 CONSTANT 391.31 56.36 6.943 0.000 0.860 0.0000 0.3467
Including numbers of bedrooms (BED) and numbers of bathrooms (BATH) in addition to the areas in bedrooms and bathrooms (SQBED, SQBATH) does not individually add much to the explanatory power of the model, since the coefficients on BED and BATH are not individually significant.
VARIANCE-COVARIANCE MATRIX OF COEFFICIENTS
SQKLD 0.29376E-01
BED 0.70850 537.96
SQBED -0.17334E-01 -5.9025 0.73077E-01
BATH -2.3503 279.40 -3.0318 1153.6
SQBATH 0.35786E-01 8.1927 -0.99788E-01 -11.715 0.54024
PKG 0.19753 -58.038 0.44427 -60.824 -1.4257
77.731
BEACH -0.11681 -45.642 0.53296 -12.199 -0.97854
1.1534 10.541
UCLA 0.54307E-02 -24.756 0.19110 15.180 -0.65328
-1.8470 5.1032 7.7499
CONSTANT -8.2488 -601.24 9.8087 100.05 -14.286
-9.4428 35.193 -29.136 3176.0
SQKLD BED SQBED BATH SQBATH
PKG BEACH UCLA CONSTANT
While the individual coefficients on BED and BATH are not significantly different from zero, let's see whether they could be jointly equal to zero. If there is multicollinearity between the variables BED and BATH, it is possible we simply cannot distinguish their separate contributions.
|_test |_test bed=0 |_test bath=0 |_end
This F-test shows that the null hypothesis that the slopes on BED and BATH are simultaneously zero cannot be rejected at the 5% level of significance (nor at the 10% level, although at the 14% level, we could reject). There is roughly 13.7% of the probability out in the right-hand tail of the relevant F-distribution if the null hypothesis is true.
F STATISTIC = 2.2402820 WITH 2 AND 17 D.F. P-VALUE= 0.13691 WALD CHI-SQUARE STATISTIC = 4.4805640 WITH 2 D.F. P-VALUE= 0.10643 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.44637
Test whether the coefficients on all the square-footage variables could be simultaneously zero (e.g. null hypothesis that none of the square-footage variables matters when these are included in addition to the numbers of rooms of each type.)
|_test |_test sqkld=0 |_test sqbed=0 |_test sqbath=0 |_end
Given the individual significance of the coefficients, it is not surprising that this joint hypothesis is rejected soundly. Check the P-value. If the null hypothesis was true, this F-test statistic value would be virtually impossible to observe.
F STATISTIC = 40.012118 WITH 3 AND 17 D.F. P-VALUE= 0.00000 WALD CHI-SQUARE STATISTIC = 120.03635 WITH 3 D.F. P-VALUE= 0.00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.02499
Test whether the effect of an extra mile from the beach is identical to the effect on rent of being an extra mile from campus. This hypothesis could be expressed as BEACH=UCLA , or as BEACH-UCLA =0.
|_test beach-ucla=0 TEST VALUE = -19.633 STD. ERROR OF TEST VALUE 2.8434 T STATISTIC = -6.9046602 WITH 17 D.F. P-VALUE= 0.00000 F STATISTIC = 47.674333 WITH 1 AND 17 D.F. P-VALUE= 0.00000 WALD CHI-SQUARE STATISTIC = 47.674333 WITH 1 D.F. P-VALUE= 0.00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.02098
Since the P-value associated with this test is extremely tiny, the null hypothesis is not supported by the data. This test value would be very unlikely to be observed if the null hypothesis was true.
|_confid beach ucla Try a joint confidence ellipse for these two coefficients.
USING 95% AND 90% CONFIDENCE INTERVALS
CONFIDENCE INTERVALS BASED ON T-DISTRIBUTION WITH 17 D.F.
5% and 10% critical values for relevant t-distribution are given first; we usually
use the 5% critical value.
- T CRITICAL VALUES = 2.110 AND 1.740
NAME LOWER 2.5% LOWER 5% COEFFICIENT UPPER 5% UPPER 2.5%
BEACH -34.18 -32.98 -27.332 -21.68 -20.48 3.247
UCLA -13.57 -12.54 -7.6989 -2.855 -1.825 2.784
The above table lets you read off the one-dimensional confidence intervals for each coefficient by itself (midpoint and bottom and top boundaries). These boundaries are indicated by the "box" formed by the "+" signs in the plot below, and the joint mean is the "*" in the center of the ellipse.
CONFIDENCE REGION PLOT FOR beach AND ucla
USING F DISTRIBUTION WITH 2 AND 17 D.F. F-VALUE = 3.590
REQUIRED MEMORY IS PAR= 3 CURRENT PAR= 500
FOR MAXIMUM EFFICIENCY USE AT LEAST PAR= 6
205 OBSERVATIONS
M=MULTIPLE POINT
-18.000 |
-19.263 | ** * M ***
-20.526 | + *M*M* +*MM
-21.789 | *MMM MM
-23.053 | MMM* M
-24.316 | *MM MM
-25.579 | *MM M
-26.842 | *MM *M
-28.105 | MM * MM
-29.368 | MM MMM
-30.632 | M* MM*
-31.895 | M *MM
-33.158 | M *MMM
-34.421 | M + MMM* +
-35.684 | MM** *****M*
-36.947 | * M *
-38.211 |
-39.474 |
-40.737 |
-42.000 |
________________________________________
-0.16E+02 -0.12E+02 -0.80E+01 -0.40E+01 0.00E+00
UCLA
A fancy gnuplot version of the confidence interval reveals the following:
In the above diagram, pairs of coefficient values in the "++++" box are individually acceptable hypotheses about the two population coefficients, but only those pairs in the ellipse are jointly acceptable hypotheses. The lesson is that some pairs which are individually acceptable (technically, are not rejected) are not jointly acceptable (technically, are jointly rejected).
SHAZAM has "temporary" variables that you can render permanent by copying them into explicitly named scalars or variables. This allows you to use these values later on. The temporary variables (beginning with $...) are overwritten by subsequent OLS runs so that they always contain the current values for the most recent regression.
Let's call the most recent model the "unrestricted" model, and save its explained sum or squares as urexss, its residual sum of squares as urress, and its degrees of freedom as urdf.
|_gen1 urexss=$ssr ..NOTE..CURRENT VALUE OF $SSR = 0.19335E+07 |_gen1 urress=$sse ..NOTE..CURRENT VALUE OF $SSE = 6972.1 |_gen1 urdf=$df ..NOTE..CURRENT VALUE OF $DF = 17.000 |_* regress rent on just the number of rooms of each type |_ols rent bed bath pkg beach ucla
We don't use a variable for number of kitchens, livingrooms and diningrooms because all apartments presumably have just one of each. This variable would be colinear with the intercept term.
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-squared just tells whether a model gives a better fit than another with the same dependent variable. We do not know a distribution for R-squared under the null hypothesis (what null hypothesis), so it is not used for statistical tests.
R-SQUARE = 0.9710 R-SQUARE ADJUSTED = 0.9638
VARIANCE OF THE ESTIMATE-SIGMA**2 = 2810.1
STANDARD ERROR OF THE ESTIMATE-SIGMA = 53.010
SUM OF SQUARED ERRORS-SSE= 56202.
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -136.714
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.18843E+07 5. 0.37686E+06 134.108
ERROR 56202. 20. 2810.1 P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.35011E+08 6. 0.58352E+07 2076.520
ERROR 56202. 20. 2810.1 P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 20 DF P-VALUE CORR. COEFFICIENT AT MEANS
BED 159.63 16.91 9.439 0.000 0.904 0.4990 0.2774
BATH 345.57 30.46 11.35 0.000 0.930 0.6232 0.4563
PKG 20.637 21.75 0.9487 0.354 0.208 0.0477 0.0288
BEACH -39.560 6.667 -5.934 0.000-0.799 -0.2744 -0.1281
UCLA -13.440 6.782 -1.982 0.061-0.405 -0.1085 -0.0515
CONSTANT 470.78 56.24 8.370 0.000 0.882 0.0000 0.4171
The coefficients on the parking spot variable and the distance from ucla variable are individually statistically insignificant.
|_gen1 r2exss=$ssr ..NOTE..CURRENT VALUE OF $SSR = 0.18843E+07
This is a second restricted model, so we save the explained sum of squares (alias regression sum of squares) as r2exss, for later use in a special F-test.
|_* regress rent on just the square feet variables
This restricts the coefficients on the "number of rooms" variables all to be zero.
|_ols rent sqkld sqbed sqbath pkg beach ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9955 R-SQUARE ADJUSTED = 0.9940
VARIANCE OF THE ESTIMATE-SIGMA**2 = 463.67
STANDARD ERROR OF THE ESTIMATE-SIGMA = 21.533
SUM OF SQUARED ERRORS-SSE= 8809.7
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -112.624
The information in the next table is the stuff for the "restricted model."
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.19317E+07 6. 0.32195E+06 694.343
ERROR 8809.7 19. 463.67 P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.35059E+08 7. 0.50084E+07 10801.659
ERROR 8809.7 19. 463.67 P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 19 DF P-VALUE CORR. COEFFICIENT AT MEANS
SQKLD 1.1436 0.1545 7.402 0.000 0.862 0.2800 0.4181
SQBED 1.5561 0.9696E-01 16.05 0.000 0.965 0.6281 0.2960
SQBATH 0.99921 0.4287 2.331 0.031 0.472 0.0697 0.0508
PKG 44.176 8.930 4.947 0.000 0.750 0.1020 0.0617
BEACH -29.418 2.719 -10.82 0.000-0.928 -0.2041 -0.0952
UCLA -10.203 2.568 -3.974 0.001-0.674 -0.0824 -0.0391
CONSTANT 347.40 51.38 6.761 0.000 0.840 0.0000 0.3078
All of the individual slope coefficients are individually statistically significant at the 5% level. The +2.331 and -2.331 values of a t-distribution with 19 degrees of freedom would leave 3.1% of the probability out in the tails of the distribution.
|_gen1 r1exss=$ssr ..NOTE..CURRENT VALUE OF $SSR = 0.19317E+07
This saves the explained sum of squares for this particular restricted model.
|_* --- now do some F-tests of interest explicitly ----- *
You can certainly use a block of test commands enclosed by TEST and END to do these joint hypothesis tests the long-hand way. What we are doing the old-fashioned way is what SHAZAM goes and does when you issue a block of TEST commands.
|_* test whether joint contribution of bed and bath is statistically significant
|_gen1 f1=((urexss-r1exss)/2)/(urress/urdf)
|_print f1
F1
2.240282
This number has to be compared to the 5% critical value of an F-distribution with (2,17) degrees of freedom (i.e. 2 restrictions, 17 unresticted model df). This critical value, from the back of your text, is 3.59. Our test value cannot "beat" this critical value, so we cannot reject the restrictions embodied in the first restricted model above. I.e. the model that restrict the "numbers of rooms" coefficients to be jointly zero.
|_* test whether joint contribution of sqkld,sqbed and sqbath is significant
|_gen1 f2=((urexss-r2exss)/3)/(urress/urdf)
|_print f2
F2
40.01212
This number has to be compared to the 5% critical value of an F-distribution with (3,17) degrees of freedom. The critical value is 3.20. We readily beat this value with a test statistic of over 40, so we conclude that the null hypothesis--that the square-footage variables do not need to be in the model--is implausible.
|_* try some of the set of auxiliary regressions to look for sources of |_* multicollinearity
Remember that the AUXRSQR option on the main regression of RENT on all of these explanatory variables would cycle through each of these regressors treating each one alternately as the "dependent" variable and regressing it on all of the others.
|_ols sqkld bed sqbed bath sqbath pkg beach ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = SQKLD
...NOTE..SAMPLE RANGE SET TO: 1, 26
The high R-squared value suggests that SQKLD, for example, is pretty well explained by some linear combination of the other variables on the RHS of the main unrestricted regression.
R-SQUARE = 0.8800 R-SQUARE ADJUSTED = 0.8334
VARIANCE OF THE ESTIMATE-SIGMA**2 = 775.64
STANDARD ERROR OF THE ESTIMATE-SIGMA = 27.850
SUM OF SQUARED ERRORS-SSE= 13961.
MEAN OF DEPENDENT VARIABLE = 412.69
LOG OF THE LIKELIHOOD FUNCTION = -118.610
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.10240E+06 7. 14629. 18.860
ERROR 13961. 18. 775.64 P-VALUE
TOTAL 0.11636E+06 25. 4654.5 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.45306E+07 8. 0.56632E+06 730.141
ERROR 13961. 18. 775.64 P-VALUE
TOTAL 0.45446E+07 26. 0.17479E+06 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 18 DF P-VALUE CORR. COEFFICIENT AT MEANS
BED -24.119 31.39 -0.7685 0.452-0.178 -0.3079 -0.1146
SQBED 0.59009 0.3448 1.712 0.104 0.374 0.9726 0.3070
BATH 80.009 42.73 1.872 0.078 0.404 0.5892 0.2889
SQBATH -1.2182 0.9692 -1.257 0.225-0.284 -0.3468 -0.1693
PKG -6.7242 12.02 -0.5594 0.583-0.131 -0.0634 -0.0257
BEACH 3.9763 4.366 0.9109 0.374 0.210 0.1126 0.0352
UCLA -0.18487 3.828 -0.4829E-01 0.962-0.011 -0.0061 -0.0019
CONSTANT 280.80 40.32 6.964 0.000 0.854 0.0000 0.6804
There may be more multicollinearity among these "pseudo-regressors" that obscures the individual contributions of these variables to explaining the variation in SQKLD, but it looks like the number of baths (and perhaps the square feet of bedrooms) could be correlated with SQKLD. If all are simultaneously included in the same regression, it might be hard to sort out their individual contributions to explaining RENT.
|_ols bed sqkld sqbed bath sqbath pkg beach ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = BED
...NOTE..SAMPLE RANGE SET TO: 1, 26
Another high auxiliary R-squared value...BED included alone in any regression will pick up systematic variations in the other variables in this regression.
R-SQUARE = 0.9598 R-SQUARE ADJUSTED = 0.9442
VARIANCE OF THE ESTIMATE-SIGMA**2 = 0.42354E-01
STANDARD ERROR OF THE ESTIMATE-SIGMA = 0.20580
SUM OF SQUARED ERRORS-SSE= 0.76237
MEAN OF DEPENDENT VARIABLE = 1.9615
LOG OF THE LIKELIHOOD FUNCTION = 8.99009
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 18.199 7. 2.5999 61.385
ERROR 0.76237 18. 0.42354E-01 P-VALUE
TOTAL 18.962 25. 0.75846 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 118.24 8. 14.780 348.958
ERROR 0.76237 18. 0.42354E-01 P-VALUE
TOTAL 119.00 26. 4.5769 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 18 DF P-VALUE CORR. COEFFICIENT AT MEANS
SQKLD -0.13170E-02 0.1714E-02 -0.7685 0.452-0.178 -0.1032 -0.2771
SQBED 0.10972E-01 0.9267E-03 11.84 0.000 0.941 1.4167 1.2009
BATH -0.51937 0.3227 -1.609 0.125-0.355 -0.2996 -0.3946
SQBATH -0.15229E-01 0.6550E-02 -2.325 0.032-0.481 -0.3397 -0.4452
PKG 0.10788 0.8591E-01 1.256 0.225 0.284 0.0797 0.0867
BEACH 0.84843E-01 0.2624E-01 3.233 0.005 0.606 0.1883 0.1580
UCLA 0.46018E-01 0.2613E-01 1.761 0.095 0.383 0.1188 0.1015
CONSTANT 1.1176 0.5085 2.198 0.041 0.460 0.0000 0.5698
|_ols beach sqkld bed sqbed bath sqbath pkg ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = BEACH
...NOTE..SAMPLE RANGE SET TO: 1, 26
Not as high an R-squared as in the other auxiliary regressions.
R-SQUARE = 0.5834 R-SQUARE ADJUSTED = 0.4214
VARIANCE OF THE ESTIMATE-SIGMA**2 = 2.1614
STANDARD ERROR OF THE ESTIMATE-SIGMA = 1.4702
SUM OF SQUARED ERRORS-SSE= 38.906
MEAN OF DEPENDENT VARIABLE = 3.6538
LOG OF THE LIKELIHOOD FUNCTION = -42.1321
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 54.479 7. 7.7827 3.601
ERROR 38.906 18. 2.1614 P-VALUE
TOTAL 93.385 25. 3.7354 0.013
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 401.59 8. 50.199 23.225
ERROR 38.906 18. 2.1614 P-VALUE
TOTAL 440.50 26. 16.942 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 18 DF P-VALUE CORR. COEFFICIENT AT MEANS
SQKLD 0.11081E-01 0.1217E-01 0.9109 0.374 0.210 0.3911 1.2515
BED 4.3298 1.339 3.233 0.005 0.606 1.9510 2.3244
SQBED -0.50558E-01 0.1559E-01 -3.243 0.005-0.607 -2.9417 -2.9707
BATH 1.1573 2.451 0.4722 0.642 0.111 0.3008 0.4720
SQBATH 0.92828E-01 0.4867E-01 1.907 0.073 0.410 0.9329 1.4569
PKG -0.10942 0.6395 -0.1711 0.866-0.040 -0.0364 -0.0472
UCLA -0.48411 0.1668 -2.902 0.009-0.565 -0.5634 -0.5733
CONSTANT -3.3385 4.015 -0.8315 0.417-0.192 0.0000 -0.9137
|_* create total square feet and total distance, also to be used to test whether
|_* the coefficients on each set of variables are identical within the group
You could test whether the coefficients on each square foot variable were the same by using a TEST...END block of commands including TEST SQKLD=SQBED and TEST SQKLD=SQBATH. This would be two restrictions, since SQKLD could be whatever the data suggest. Likewise, testing the equality of the mileage variables could be done in a single test command, TEST BEACH=UCLA or TEST BEACH-UCLA =0. This would be one restriction.
|_genr feet=sqkld+sqbed+sqbath |_genr dist=beach+ucla
[joint confidence ellipse for beach and ucla coefficients in prior model]
|_* regress rent on only the two aggregated variables, without number of rooms |_* of each type
You can impose the restriction that the coefficients on all of the square footage variables are identical by summing the variables (amounts to collecting terms with identical coefficients).
|_ols rent feet pkg dist
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9739 R-SQUARE ADJUSTED = 0.9704
VARIANCE OF THE ESTIMATE-SIGMA**2 = 2297.9
STANDARD ERROR OF THE ESTIMATE-SIGMA = 47.936
SUM OF SQUARED ERRORS-SSE= 50553.
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -135.337
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.18899E+07 3. 0.62998E+06 274.157
ERROR 50553. 22. 2297.9 P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.35017E+08 4. 0.87543E+07 3809.729
ERROR 50553. 22. 2297.9 P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 22 DF P-VALUE CORR. COEFFICIENT AT MEANS
FEET 1.3455 0.6162E-01 21.84 0.000 0.978 0.9227 0.8162
PKG 47.853 18.79 2.546 0.018 0.477 0.1105 0.0669
DIST -15.647 4.852 -3.225 0.004-0.567 -0.1147 -0.1106
CONSTANT 256.88 50.30 5.107 0.000 0.737 0.0000 0.2276
The above model suggests that you pay an extra $1.35 per month, on average, for each extra square foot of apartment space. For each extra parking spot, you pay $47.85 per month. For each mile further from "amenities" (either campus or the beach) you get an apartment that is cheaper, on average, by $15.65. Do these effects seem plausible?
|_* regress rent on feet and distance with number of rooms
|_ols rent feet bed bath pkg dist
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9793 R-SQUARE ADJUSTED = 0.9741
VARIANCE OF THE ESTIMATE-SIGMA**2 = 2009.4
STANDARD ERROR OF THE ESTIMATE-SIGMA = 44.827
SUM OF SQUARED ERRORS-SSE= 40189.
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -132.355
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.19003E+07 5. 0.38006E+06 189.136
ERROR 40189. 20. 2009.4 P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.35027E+08 6. 0.58379E+07 2905.228
ERROR 40189. 20. 2009.4 P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 20 DF P-VALUE CORR. COEFFICIENT AT MEANS
FEET 1.3206 0.2416 5.467 0.000 0.774 0.9056 0.8011
BED 29.407 30.98 0.9492 0.354 0.208 0.0919 0.0511
BATH -34.426 64.58 -0.5331 0.600-0.118 -0.0621 -0.0455
PKG 48.744 18.71 2.605 0.017 0.503 0.1126 0.0681
DIST -20.364 5.099 -3.994 0.001-0.666 -0.1493 -0.1440
CONSTANT 303.78 59.58 5.099 0.000 0.752 0.0000 0.2691
|_* regress rent on feet, bed, bath, pkg and separate distance variables
|_ols rent feet bed bath pkg beach ucla
REQUIRED MEMORY IS PAR= 6 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9952 R-SQUARE ADJUSTED = 0.9937
VARIANCE OF THE ESTIMATE-SIGMA**2 = 487.15
STANDARD ERROR OF THE ESTIMATE-SIGMA = 22.071
SUM OF SQUARED ERRORS-SSE= 9255.8
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -113.266
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.19312E+07 6. 0.32187E+06 660.728
ERROR 9255.8 19. 487.15 P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.35058E+08 7. 0.50083E+07 10280.960
ERROR 9255.8 19. 487.15 P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 19 DF P-VALUE CORR. COEFFICIENT AT MEANS
FEET 1.1804 0.1202 9.817 0.000 0.914 0.8094 0.7160
BED 26.753 15.26 1.753 0.096 0.373 0.0836 0.0465
BATH 43.613 33.27 1.311 0.206 0.288 0.0786 0.0576
PKG 40.256 9.275 4.340 0.000 0.706 0.0930 0.0562
BEACH -31.737 2.888 -10.99 0.000-0.930 -0.2202 -0.1027
UCLA -9.6086 2.850 -3.371 0.003-0.612 -0.0776 -0.0368
CONSTANT 297.13 29.35 10.12 0.000 0.918 0.0000 0.2632
Being a mile closer to the beach costs more, on average, than being a mile closer to ucla.
Compared to the original main model, this is yet another possible set of restrictions on the parameters. Save the explained sum of squares and construct an F-test of the restrictions embodied in this model.
|_gen1 r3exss=$ssr
..NOTE..CURRENT VALUE OF $SSR = 0.19312E+07
|_gen1 f3=((urexss-r3exss)/2)/(urress/urdf)
|_print f3
F3
2.784101
Compare to the 5% critical value of an F-distributed random variable with (2,17) degrees of freedom (which is 3.59). We are not "out in the tail" of this F-distribution, so we cannot reject the two restrictions embodied in the restrictions of the last regression (relative to the first one).
Activist looks at relationship between rent and distance from ucla, ignoring other explanatory variables.
|_* part (h.)
|_plot rent ucla
This plot looks pretty much like a "blob" with little systematic relationship.
REQUIRED MEMORY IS PAR= 3 CURRENT PAR= 500
FOR MAXIMUM EFFICIENCY USE AT LEAST PAR= 4
26 OBSERVATIONS
*=RENT
M=MULTIPLE POINT
1800.0 |
1736.8 |
1673.7 | * *
1610.5 |
1547.4 | *
1484.2 | *
1421.1 |
1357.9 | * *
1294.7 | *
1231.6 | *
1168.4 | * M
1105.3 | *
1042.1 | M *
978.95 | *
915.79 | * M
852.63 | * * M
789.47 | *
726.32 |* *
663.16 |
600.00 |
________________________________________
0.000 2.000 4.000 6.000 8.000
UCLA
|_ols rent ucla
REQUIRED MEMORY IS PAR= 4 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.0013 R-SQUARE ADJUSTED = -0.0403
VARIANCE OF THE ESTIMATE-SIGMA**2 = 80748.
STANDARD ERROR OF THE ESTIMATE-SIGMA = 284.16
SUM OF SQUARED ERRORS-SSE= 0.19380E+07
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -182.740
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 2523.7 1. 2523.7 0.031
ERROR 0.19380E+07 24. 80748. P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.861
We cannot reject the null hypothesis that "all slopes are simultaneously zero." In this case, it means just one slope (that on UCLA ), so the F-test is equivalent to a t-test (squared). Check that 0.031 is roughly the square of 0.1768. This means that there is no statistical relationship between distance from ucla and rents. And the point estimate even seems to be positive, suggesting that, if anything, rents are lower closer to ucla.
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.33130E+08 2. 0.16565E+08 205.141
ERROR 0.19380E+07 24. 80748. P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 24 DF P-VALUE CORR. COEFFICIENT AT MEANS
UCLA 4.4671 25.27 0.1768 0.861 0.036 0.0361 0.0171
CONSTANT 1109.4 122.7 9.041 0.000 0.879 0.0000 0.9829
Activist now controls for distance from beach before looking at effect of distance from ucla on rental rates.
|_ols rent ucla beach
This regression controls for distance from the beach before trying to determine the effect of an additional mile from UCLA on RENT.
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
NOTE crummy R-squared value.
R-SQUARE = 0.0050 R-SQUARE ADJUSTED = -0.0815
VARIANCE OF THE ESTIMATE-SIGMA**2 = 83949.
STANDARD ERROR OF THE ESTIMATE-SIGMA = 289.74
SUM OF SQUARED ERRORS-SSE= 0.19308E+07
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -182.692
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 9661.5 2. 4830.7 0.058
ERROR 0.19308E+07 23. 83949. P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.944
Cannot reject null hypothesis that slopes on UCLA and BEACH are simultaneously equal to zero--namely that neither of these two variables explains RENT.
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.33137E+08 3. 0.11046E+08 131.575
ERROR 0.19308E+07 23. 83949. P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 23 DF P-VALUE CORR. COEFFICIENT AT MEANS
UCLA -0.25200 30.43 -0.8283E-02 0.993-0.002 -0.0020 -0.0010
BEACH -10.324 35.41 -0.2916 0.773-0.061 -0.0716 -0.0334
CONSTANT 1167.6 235.4 4.960 0.000 0.719 0.0000 1.0344
Information in failure to reject zero slopes (by F-test) is borne out by the t-tests of the two coefficients individually. If you cannot reject the hypothesis that both the slopes are jointly zero, you certainly will not be able to reject the hypotheses that each is individually zero. It still looks like distance from ucla has no statistically significant effect on rental rates. Now, however, the point estimate is negative, suggesting that greater distance means lower rents (if anything).
|_ols rent ucla beach bed
Include number of bedrooms to see if this explains rents.
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-squared goes up a lot...
R-SQUARE = 0.7277 R-SQUARE ADJUSTED = 0.6906
VARIANCE OF THE ESTIMATE-SIGMA**2 = 24014.
STANDARD ERROR OF THE ESTIMATE-SIGMA = 154.97
SUM OF SQUARED ERRORS-SSE= 0.52831E+06
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -165.844
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.14122E+07 3. 0.47072E+06 19.602
ERROR 0.52831E+06 22. 24014. P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
Hypothesis that all slopes simultaneously zero is now soundly rejected by the F-test for the "overall significance of the regression."
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.34539E+08 4. 0.86348E+07 359.571
ERROR 0.52831E+06 22. 24014. P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 22 DF P-VALUE CORR. COEFFICIENT AT MEANS
UCLA -48.924 17.47 -2.800 0.010-0.513 -0.3950 -0.1875
BEACH -39.371 19.32 -2.038 0.054-0.399 -0.2731 -0.1274
BED 292.06 38.22 7.642 0.000 0.852 0.9130 0.5075
CONSTANT 911.42 130.3 6.995 0.000 0.831 0.0000 0.8074
Variables UCLA and BED are now strongly significant at the 5% level and BEACH approaches significance at the 5% level (it is significant at the 10% level).
|_ols rent ucla beach bed pkg
Examine what happens as we control for number of parking spots as well....
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.7846 R-SQUARE ADJUSTED = 0.7436
VARIANCE OF THE ESTIMATE-SIGMA**2 = 19904.
STANDARD ERROR OF THE ESTIMATE-SIGMA = 141.08
SUM OF SQUARED ERRORS-SSE= 0.41798E+06
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -162.799
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.15225E+07 4. 0.38063E+06 19.123
ERROR 0.41798E+06 21. 19904. P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.34650E+08 5. 0.69299E+07 348.171
ERROR 0.41798E+06 21. 19904. P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 21 DF P-VALUE CORR. COEFFICIENT AT MEANS
UCLA -49.748 15.91 -3.126 0.005-0.564 -0.4016 -0.1907
BEACH -44.295 17.71 -2.501 0.021-0.479 -0.3073 -0.1434
BED 243.16 40.52 6.001 0.000 0.795 0.7601 0.4226
PKG 123.82 52.59 2.354 0.028 0.457 0.2859 0.1730
CONSTANT 833.64 123.1 6.770 0.000 0.828 0.0000 0.7385
Now, all coefficients are individually significant (so of course, the F-test of "all coefficients jointly zero" is readily rejected). Controlling for BED and PKG finally reveals a strongly significant negative relationship between the distance between the apartment and the campus and rental rates. In other words, the closer you rent to campus, the more you will pay. The activist's intuition has finally been confirmed by the data. Check the STAT/PCOR to see why the observed changes occur in coefficients as BED and PKG are added to the model.
|_* part (i.) |_* for a point estimate, plug the values of the variables into an |_* appropriate fitted regression model, see what predicted rent emerges. |_* Do not worry about the confidence interval for now, although there are |_* ways of getting SHAZAM to produce this for you, we have not covered |_* this explicitly for multiple regression.
The gist of the idea is to view the predicted value of
rent for a certain apartment profile as a linear combination of estimated coefficients (random
variables with individual variances and covariances) and "numbers" (coefficients)
that you plug into that linear formula. To construct a confidence interval for
mean prediction that will give you a range of plausible hypotheses about
the expected value of RENT for these characteristics, you first need to
come up with the variance of this linear combination and then take its
square root to use in the usual confidence interval formula. You can get the required parameter variances and covariances by using
the / PCOV option on the OLS command. Unlike the use of this option
on the STAT command, you will get the variances and covariances for the
parameter estimates, rather than for the variables. These variances and
covariances get plugged into the usual general formula for the variance of
a linear combination of random variables.
|_* part (j.) |_* You want to be really careful about making "out of sample" predictions. |_* These data are outside the range of the data used for estimation. |_* part (k.) |_plot rent pkg
All apartments in the sample have either 0, 1, or 2 parking spots.
Average rent appears to go up as number of parking spaces increases.
REQUIRED MEMORY IS PAR= 3 CURRENT PAR= 500
FOR MAXIMUM EFFICIENCY USE AT LEAST PAR= 4
26 OBSERVATIONS
*=RENT
M=MULTIPLE POINT
1800.0 |
1736.8 |
1673.7 | M
1610.5 |
1547.4 | *
1484.2 | *
1421.1 |
1357.9 | M
1294.7 | *
1231.6 | *
1168.4 | M
1105.3 | *
1042.1 | M
978.95 | *
915.79 | M *
852.63 | M *
789.47 | *
726.32 |M
663.16 |
600.00 |
________________________________________
0.000 0.500 1.000 1.500 2.000
PKG
|_ols rent beach ucla bed bath pkg
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
26 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9710 R-SQUARE ADJUSTED = 0.9638
VARIANCE OF THE ESTIMATE-SIGMA**2 = 2810.1
STANDARD ERROR OF THE ESTIMATE-SIGMA = 53.010
SUM OF SQUARED ERRORS-SSE= 56202.
MEAN OF DEPENDENT VARIABLE = 1128.8
LOG OF THE LIKELIHOOD FUNCTION = -136.714
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.18843E+07 5. 0.37686E+06 134.108
ERROR 56202. 20. 2810.1 P-VALUE
TOTAL 0.19405E+07 25. 77619. 0.000
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.35011E+08 6. 0.58352E+07 2076.520
ERROR 56202. 20. 2810.1 P-VALUE
TOTAL 0.35068E+08 26. 0.13488E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 20 DF P-VALUE CORR. COEFFICIENT AT MEANS
BEACH -39.560 6.667 -5.934 0.000-0.799 -0.2744 -0.1281
UCLA -13.440 6.782 -1.982 0.061-0.405 -0.1085 -0.0515
BED 159.63 16.91 9.439 0.000 0.904 0.4990 0.2774
BATH 345.57 30.46 11.35 0.000 0.930 0.6232 0.4563
PKG 20.637 21.75 0.9487 0.354 0.208 0.0477 0.0288
CONSTANT 470.78 56.24 8.370 0.000 0.882 0.0000 0.4171
P-values for zero-hypothesis for slopes on UCLA and PKG are too large; cannot reject zero hypothesis for these two coefficients.
|_* use the subset with one bedroom
|_skipif(bed.ne.1) Note only ten observations
|_stat rent sqkld sqbed sqbath pkg beach ucla
NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM
RENT 10 886.60 96.814 9372.9 757.00 1064.0
SQKLD 10 372.00 28.694 823.33 330.00 420.00
SQBED 10 102.20 24.621 606.18 76.000 150.00
SQBATH 10 45.600 14.010 196.27 26.000 70.000
PKG 10 1.2000 0.78881 0.62222 0.00000 2.0000
BEACH 10 3.8000 2.3944 5.7333 0.00000 7.0000
UCLA 10 3.4000 2.3664 5.6000 0.00000 7.0000
|_ols rent sqkld sqbed sqbath pkg beach ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
10 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9833 R-SQUARE ADJUSTED = 0.9499
VARIANCE OF THE ESTIMATE-SIGMA**2 = 469.40
STANDARD ERROR OF THE ESTIMATE-SIGMA = 21.666
SUM OF SQUARED ERRORS-SSE= 1408.2
MEAN OF DEPENDENT VARIABLE = 886.60
LOG OF THE LIKELIHOOD FUNCTION = -38.9268
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 82948. 6. 13825. 29.452
ERROR 1408.2 3. 469.40 P-VALUE
TOTAL 84356. 9. 9372.9 0.009
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.79435E+07 7. 0.11348E+07 2417.557
ERROR 1408.2 3. 469.40 P-VALUE
TOTAL 0.79450E+07 10. 0.79450E+06 0.000
These results are for the subset of 1-bedroom apartments in the sample.
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 3 DF P-VALUE CORR. COEFFICIENT AT MEANS
SQKLD 1.6706 0.5404 3.091 0.054 0.872 0.4951 0.7009
SQBED 3.1643 1.261 2.510 0.087 0.823 0.8047 0.3648
SQBATH -3.0520 3.161 -0.9656 0.405-0.487 -0.4416 -0.1570
PKG 62.660 17.84 3.513 0.039 0.897 0.5105 0.0848
BEACH -37.074 11.43 -3.244 0.048-0.882 -0.9169 -0.1589
UCLA -19.556 10.62 -1.841 0.163-0.728 -0.4780 -0.0750
CONSTANT 213.11 168.6 1.264 0.295 0.590 0.0000 0.2404
Slopes on SQBATH and UCLA are not individually statistically significantly different from zero.
|_delete skip$ VARIABLE SKIP$ IS DELETED 26 WORDS RELEASED This gets back the observations dropped for the preceding regression. |_* use the subset with three bedrooms |_skipif(bed.ne.3) Note that there are nine observations in this subset. |_stat rent sqkld sqbed sqbath pkg beach ucla NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM RENT 9 1393.2 232.72 54157. 1118.0 1709.0 SQKLD 9 471.67 79.530 6325.0 360.00 600.00 SQBED 9 330.00 70.578 4981.3 240.00 420.00 SQBATH 9 66.667 19.685 387.50 35.000 90.000 PKG 9 2.0000 0.00000 0.00000 2.0000 2.0000 BEACH 9 3.7778 1.9861 3.9444 1.0000 6.0000 UCLA 9 5.0000 2.5495 6.5000 1.0000 8.0000
Here, there is no variance in the nubmer of parking spaces. It is 2 for all three-bedroom apartments in the sample.
|_ols rent sqkld sqbed sqbath pkg beach ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
9 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
...WARNING...VARIABLE PKG IS A CONSTANT
...MATRIX IS NOT POSITIVE DEFINITE..FAILED IN ROW 7
This cryptic comment from SHAZAM means that somehow, you have got perfect multicollinearity in your explanatory variables. The matrix to which SHAZAM is referring is the matrix inner product (X'X), which has to be inverted to solve for the vector of OLS parameter point estimates in the matrix version of regression. If this matrix is not "positive definite," then it cannot be inverted. "ROW 7" refers to the implicit 7th "variable" on the right hand side--the intercept term, which is a column of ones. Since SHAZAM has including PKG (2 for all observations) in the regression, when it gets to the intercept (1 for all observations), it detects perfect multicollinearity. SHAZAM 8.0 is now nice enough to tell you the identity of the culprit variable(s): PKG in this case.
So now we drop the constant PKG variable and see if the rest of the model works. The effect of parking is now absorbed into the intercept, but the other coefficients can be estimated.
|_ols rent sqkld sqbed sqbath beach ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
9 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
R-SQUARE = 0.9936 R-SQUARE ADJUSTED = 0.9831
VARIANCE OF THE ESTIMATE-SIGMA**2 = 917.70
STANDARD ERROR OF THE ESTIMATE-SIGMA = 30.294
SUM OF SQUARED ERRORS-SSE= 2753.1
MEAN OF DEPENDENT VARIABLE = 1393.2
LOG OF THE LIKELIHOOD FUNCTION = -38.5251
ANALYSIS OF VARIANCE - FROM MEAN
SS DF MS F
REGRESSION 0.43050E+06 5. 86100. 93.822
ERROR 2753.1 3. 917.70 P-VALUE
TOTAL 0.43326E+06 8. 54157. 0.002
ANALYSIS OF VARIANCE - FROM ZERO
SS DF MS F
REGRESSION 0.17900E+08 6. 0.29834E+07 3250.909
ERROR 2753.1 3. 917.70 P-VALUE
TOTAL 0.17903E+08 9. 0.19892E+07 0.000
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 3 DF P-VALUE CORR. COEFFICIENT AT MEANS
SQKLD 1.5025 0.8502 1.767 0.175 0.714 0.5135 0.5087
SQBED 1.1701 1.044 1.121 0.344 0.543 0.3549 0.2772
SQBATH 0.57250 1.238 0.4624 0.675 0.258 0.0484 0.0274
BEACH -30.972 8.305 -3.729 0.034-0.907 -0.2643 -0.0840
UCLA -11.895 5.817 -2.045 0.133-0.763 -0.1303 -0.0427
CONSTANT 436.71 125.7 3.475 0.040 0.895 0.0000 0.3135
|_delete skip$
VARIABLE SKIP$ IS DELETED 26 WORDS RELEASED
|_* try the subset with two bedrooms
|_skipif(bed.ne.2) Note only seven observations...
|_stat rent sqkld sqbed sqbath pkg beach ucla
NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM
RENT 7 1134.7 185.63 34458. 900.00 1364.0
SQKLD 7 395.00 37.193 1383.3 360.00 450.00
SQBED 7 227.14 62.640 3923.8 150.00 300.00
SQBATH 7 62.143 19.334 373.81 40.000 90.000
PKG 7 1.5714 0.53452 0.28571 1.0000 2.0000
BEACH 7 3.2857 1.2199 1.4881 1.0000 5.0000
UCLA 7 4.7857 1.2864 1.6548 3.5000 7.0000
|_ols rent sqkld sqbed sqbath pkg beach ucla
REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 500
OLS ESTIMATION
7 OBSERVATIONS DEPENDENT VARIABLE = RENT
...NOTE..SAMPLE RANGE SET TO: 1, 26
...WARNING..ZERO DEGREES OF FREEDOM LEFT
Note perfect R-squared value
R-SQUARE = 1.0000 R-SQUARE ADJUSTED = 1.0000
VARIANCE OF THE ESTIMATE-SIGMA**2 = 0.19359E-24 (essentially zero!)
STANDARD ERROR OF THE ESTIMATE-SIGMA = 0.43999E-12 (essentially zero!)
SUM OF SQUARED ERRORS-SSE= 0.19359E-24
MEAN OF DEPENDENT VARIABLE = 1134.7
LOG OF THE LIKELIHOOD FUNCTION = 196.042
ANALYSIS OF VARIANCE - FROM MEAN
The program bombs and may even "throw you out." Shazam does not take kindly to being asked to do something truly stupid. We cannot calculate variances, so everything comes to a halt. Just as two points might be enough to perfectly fit a line (which has two parameters), but there will be zero error variance around the line, we need more than seven points to fit a "hyperplane" with seven parameters.