UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Problem Set #5: Functional Form; Dummy Variables for Seasonality

Outline of Solutions


Below is some SHAZAM output associated with the questions in this problem set. Note that this was the first time you were expected to write your own SHAZAM programs completely from scratch. Prior assignments provided a *.sha program to guide you, but things are now getting more realistic. Some people have discovered that while the order of explanatory variables in an OLS command does not matter, the order of variables in a read statement is crucial, since this must match the order of appearance of variables in the data file being read.

 |_sample 1 36

 |_read(prodtn.dat) q usk sk
 UNIT 88 IS NOW ASSIGNED TO: prodtn.dat
    3 VARIABLES AND       36 OBSERVATIONS STARTING AT OBS       1

Always do a STAT to ensure that your data are what you expect them to be.
 |_stat / pcor
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 Q           36   142.94       61.833       3823.4       14.297       272.70
 USK         36   52.417       31.465       990.02       10.000       110.00
 SK          36   7.0000       2.2678       5.1429       4.0000       10.000
 
  CORRELATION MATRIX OF VARIABLES -       36 OBSERVATIONS
  
 Q          1.0000
 USK       0.80422       1.0000
 SK        0.37603      0.34035E-01   1.0000
              Q            USK          SK

 
 |_* a.) linear model
 
 |_ols q usk sk
 
 REQUIRED MEMORY IS PAR=     3 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = Q
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.7685     R-SQUARE ADJUSTED =   0.7544
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   938.88
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   30.641
 SUM OF SQUARED ERRORS-SSE=   30983.
 MEAN OF DEPENDENT VARIABLE =   142.94
 LOG OF THE LIKELIHOOD FUNCTION = -172.720
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.10283E+06      2.        51417.                54.764
 ERROR             30983.         33.        938.88               P-VALUE
 TOTAL            0.13382E+06     35.        3823.4                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.83838E+06      3.       0.27946E+06           297.654
 ERROR             30983.         33.        938.88               P-VALUE
 TOTAL            0.86937E+06     36.        24149.                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      33 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 USK        1.5571     0.1647       9.454     0.000 0.855     0.7923     0.5710
 SK         9.5176      2.285       4.165     0.000 0.587     0.3491     0.4661
 CONSTANT  -5.2993      18.63     -0.2844     0.778-0.049     0.0000    -0.0371

Individually, the coefficients on both USK and SK are strongly statistically significantly different from zero.

 
 |_* b.) quadratic models
 |_genr usk2=usk*usk
 |_genr sk2=sk*sk
 
 |_ols q usk usk2 sk
 
 REQUIRED MEMORY IS PAR=     4 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = Q
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.9153     R-SQUARE ADJUSTED =   0.9073
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   354.37
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   18.825
 SUM OF SQUARED ERRORS-SSE=   11340.
 MEAN OF DEPENDENT VARIABLE =   142.94
 LOG OF THE LIKELIHOOD FUNCTION = -154.628
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.12248E+06      3.        40826.               115.207
 ERROR             11340.         32.        354.37               P-VALUE
 TOTAL            0.13382E+06     35.        3823.4                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.85803E+06      4.       0.21451E+06           605.322
 ERROR             11340.         32.        354.37               P-VALUE
 TOTAL            0.86937E+06     36.        24149.                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      32 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 USK        4.6751     0.4308       10.85     0.000 0.887     2.3790     1.7144
 USK2     -0.26160E-01 0.3514E-02  -7.445     0.000 -0.796    -1.6317    -0.6790
 SK         9.0153      1.406       6.414     0.000 0.750     0.3306     0.4415
 CONSTANT  -68.166      14.22      -4.793     0.000-0.646     0.0000    -0.4769

Quadratic term in USK makes a significant contribution to the model.

 |_ols q usk sk sk2
 
 REQUIRED MEMORY IS PAR=     4 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = Q
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.7689     R-SQUARE ADJUSTED =   0.7472
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   966.48
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   31.088
 SUM OF SQUARED ERRORS-SSE=   30927.
 MEAN OF DEPENDENT VARIABLE =   142.94
 LOG OF THE LIKELIHOOD FUNCTION = -172.688
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.10289E+06      3.        34297.                35.486
 ERROR             30927.         32.        966.48               P-VALUE
 TOTAL            0.13382E+06     35.        3823.4                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.83844E+06      4.       0.20961E+06           216.881
 ERROR             30927.         32.        966.48               P-VALUE
 TOTAL            0.86937E+06     36.        24149.                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      32 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 USK        1.5569     0.1671       9.317     0.000 0.855     0.7922     0.5709
 SK         5.1606      18.28      0.2823     0.780 0.050     0.1893     0.2527
 SK2       0.31122      1.295      0.2403     0.812 0.042     0.1611     0.1176
 CONSTANT   8.4031      60.08      0.1399     0.890 0.025     0.0000     0.0588

When you add a quadratic term in SK, neither the coefficient on SK nor that on SK2 is individually statistically significantly different from zero.

 |_test
 |_test sk=0
 |_test sk2=0
 |_end

This test automates the process of doing two regressions (one unrestricted, and one restricted) finding the explained sum of squares from each model's Analysis of Variance from Means table, taking the difference, dividing by the number of restrictions by which the models differ, and then dividing the whole thing by the error variance of the unrestricted model.

 F STATISTIC =   8.4544796     WITH    2 AND   32 D.F.  P-VALUE= 0.00113
 WALD CHI-SQUARE STATISTIC =   16.908959     WITH    2 D.F.  P-VALUE= 0.00021
 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.11828

Since the p-value is LESS than 0.05, we easily reject the hypothesis that both coefficients (on SK and SK2) could be jointly zero. This is consistent with the finding that the coefficient on SK when it is used alone is strongly different from zero.

 |_ols q usk usk2 sk sk2 / coef=b
 
Saving the fitted coefficients allows us to refer to them later on...

 REQUIRED MEMORY IS PAR=     4 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = Q
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.9155     R-SQUARE ADJUSTED =   0.9046
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   364.61
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   19.095
 SUM OF SQUARED ERRORS-SSE=   11303.
 MEAN OF DEPENDENT VARIABLE =   142.94
 LOG OF THE LIKELIHOOD FUNCTION = -154.569
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.12251E+06      4.        30629.                84.004
 ERROR             11303.         31.        364.61               P-VALUE
 TOTAL            0.13382E+06     35.        3823.4                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.85806E+06      5.       0.17161E+06           470.674
 ERROR             11303.         31.        364.61               P-VALUE
 TOTAL            0.86937E+06     36.        24149.                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      31 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 USK        4.6736     0.4371       10.69     0.000 0.887     2.3782     1.7138
 USK2     -0.26148E-01 0.3564E-02  -7.336     0.000-0.797    -1.6310    -0.6787
 SK         5.4745      11.23      0.4875     0.629 0.087     0.2008     0.2681
 SK2       0.25294     0.7957      0.3179     0.753 0.057     0.1309     0.0956
 CONSTANT  -57.003      37.97      -1.501     0.143-0.260     0.0000    -0.3988

 |_* calculate values of usk and sk where derivative goes to zero

The marginal productivity of USK changes over the range of the data. The quantity of USK that produces the most output is about 89. While there is no statistical evidence of curvature with respect to SK, we have included a quadratic term anyway. A minimum occurs at -10.8 units, so the relevant part of the curve is rising in SK, but the whole curvature story is not really warranted in the SK direction.

 |_gen1 uskstar=-(b:1)/(2*b:2)
 |_gen1 skstar=-(b:3)/(2*b:4)
 |_print uskstar skstar
     USKSTAR
    89.36683
     SKSTAR
   -10.82188

 
 |_* c.)
 |_genr usksk=usk*sk

The interaction term means each derivative of q with respect to usk and sk depends on the level of the "other" input.
 
 |_ols q usk usk2 sk usksk
 
 REQUIRED MEMORY IS PAR=     5 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = Q
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.9294     R-SQUARE ADJUSTED =   0.9203
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   304.81
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   17.459
 SUM OF SQUARED ERRORS-SSE=   9449.0
 MEAN OF DEPENDENT VARIABLE =   142.94
 LOG OF THE LIKELIHOOD FUNCTION = -151.344
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.12437E+06      4.        31092.               102.006
 ERROR             9449.0         31.        304.81               P-VALUE
 TOTAL            0.13382E+06     35.        3823.4                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.85992E+06      5.       0.17198E+06           564.238
 ERROR             9449.0         31.        304.81               P-VALUE
 TOTAL            0.86937E+06     36.        24149.                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      31 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 USK        3.9500     0.4944       7.989     0.000 0.820     2.0100     1.4485
 USK2     -0.26178E-01 0.3259E-02  -8.033     0.000-0.822    -1.6329    -0.6795
 SK         3.5029      2.569       1.364     0.182 0.238     0.1285     0.1715
 USKSK     0.10487     0.4211E-01   2.491     0.018 0.408     0.4439     0.2709
 CONSTANT  -30.230      20.15      -1.500     0.144-0.260     0.0000    -0.2115

The interaction term's coefficient is statistically significantly different from zero, so each derivative (marginal productivity) increases in the amount of the other input. Note that with the interaction term, the linear effect of SK drops into insignificance.

 
 |_* d.)
 |_genr lq=log(q)
 |_genr lusk=log(usk)
 |_genr lsk=log(sk)
 |_genr lusk2=lusk*lusk
 |_genr lsk2=lsk*lsk
 |_genr lusklsk=lusk*lsk
 
Including the LOGLOG option on the ols command allows comparison of the maximized log-likelihood values for models that use q and lq as dependent variables. If you don't tell SHAZAM that the dependent variable is a logged quantity, it has no way of knowing. With the option in place, the results of the particular regression are not affected, but the reported maximized log-likelihood (and elasticities) are computed differently.

 |_ols lq lusk lsk / loglog
 
 REQUIRED MEMORY IS PAR=     6 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = LQ
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.8318     R-SQUARE ADJUSTED =   0.8216
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.71946E-01
 STANDARD ERROR OF THE ESTIMATE-SIGMA =  0.26823
 SUM OF SQUARED ERRORS-SSE=   2.3742
 MEAN OF DEPENDENT VARIABLE =   4.8187
 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -175.616
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        11.742          2.        5.8710                81.603
 ERROR             2.3742         33.       0.71946E-01           P-VALUE
 TOTAL             14.116         35.       0.40332                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION        847.66          3.        282.55              3927.292
 ERROR             2.3742         33.       0.71946E-01           P-VALUE
 TOTAL             850.04         36.        23.612                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      33 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 LUSK      0.76361     0.6317E-01   12.09     0.000 0.903     0.8636     0.7636
 LSK       0.48040     0.1306       3.679     0.001 0.539     0.2628     0.4804
 CONSTANT   1.0518     0.3384       3.108     0.004 0.476     0.0000     1.0518
 
 |_ols lq lusk lusk2 lsk / loglog
 
 REQUIRED MEMORY IS PAR=     6 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = LQ
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.9134     R-SQUARE ADJUSTED =   0.9053
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.38197E-01
 STANDARD ERROR OF THE ESTIMATE-SIGMA =  0.19544
 SUM OF SQUARED ERRORS-SSE=   1.2223
 MEAN OF DEPENDENT VARIABLE =   4.8187
 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -163.665
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        12.894          3.        4.2980               112.521
 ERROR             1.2223         32.       0.38197E-01           P-VALUE
 TOTAL             14.116         35.       0.40332                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION        848.81          4.        212.20              5555.508
 ERROR             1.2223         32.       0.38197E-01           P-VALUE
 TOTAL             850.04         36.        23.612                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      32 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 LUSK       3.0776     0.4239       7.261     0.000 0.789     3.4806     3.0776
 LUSK2    -0.32650     0.5946E-01  -5.492     0.000-0.697    -2.6326    -0.3265
 LSK       0.48512     0.9515E-01   5.099     0.000 0.670     0.2654     0.4851
 CONSTANT  -2.8802     0.7573      -3.803     0.001-0.558     0.0000    -2.8802
 
 |_ols lq lusk lsk lsk2 / loglog
 
 REQUIRED MEMORY IS PAR=     6 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = LQ
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.8344     R-SQUARE ADJUSTED =   0.8189
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.73056E-01
 STANDARD ERROR OF THE ESTIMATE-SIGMA =  0.27029
 SUM OF SQUARED ERRORS-SSE=   2.3378
 MEAN OF DEPENDENT VARIABLE =   4.8187
 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -175.338
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        11.778          3.        3.9261                53.742
 ERROR             2.3378         32.       0.73056E-01           P-VALUE
 TOTAL             14.116         35.       0.40332                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION        847.70          4.        211.92              2900.855
 ERROR             2.3378         32.       0.73056E-01           P-VALUE
 TOTAL             850.04         36.        23.612                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      32 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 LUSK      0.76283     0.6366E-01   11.98     0.000 0.904     0.8627     0.7628
 LSK      -0.78744      1.800     -0.4375     0.665-0.077    -0.4308    -0.7874
 LSK2      0.34549     0.4892      0.7062     0.485 0.124     0.6955     0.3455
 CONSTANT   2.1762      1.628       1.337     0.191 0.230     0.0000     2.1762
 
As before curvature in the direction of SK does not seem to be present.

 |_ols lq lusk lusk2 lsk lsk2 / loglog
 
 REQUIRED MEMORY IS PAR=     7 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = LQ
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.9159     R-SQUARE ADJUSTED =   0.9051
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.38295E-01
 STANDARD ERROR OF THE ESTIMATE-SIGMA =  0.19569
 SUM OF SQUARED ERRORS-SSE=   1.1871
 MEAN OF DEPENDENT VARIABLE =   4.8187
 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -163.140
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        12.929          4.        3.2323                84.404
 ERROR             1.1871         31.       0.38295E-01           P-VALUE
 TOTAL             14.116         35.       0.40332                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION        848.85          5.        169.77              4433.199
 ERROR             1.1871         31.       0.38295E-01           P-VALUE
 TOTAL             850.04         36.        23.612                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      31 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 LUSK       3.0756     0.4244       7.246     0.000 0.793     3.4783     3.0756
 LUSK2    -0.32632     0.5953E-01  -5.481     0.000-0.702    -2.6312    -0.3263
 LSK      -0.76019      1.303     -0.5833     0.564-0.104    -0.4159    -0.7602
 LSK2      0.33935     0.3542      0.9581     0.345 0.170     0.6832     0.3394
 CONSTANT  -1.7736      1.382      -1.284     0.209-0.225     0.0000    -1.7736

Curvature in the USK dimension is present, but apparently not in the SK direction. No diminishing MP of sk within the range of the data.

 |_ols lq lusk lusk2 lsk lsk2 lusklsk / loglog
 
 REQUIRED MEMORY IS PAR=     7 CURRENT PAR=   500
  OLS ESTIMATION
       36 OBSERVATIONS     DEPENDENT VARIABLE = LQ
 ...NOTE..SAMPLE RANGE SET TO:      1,     36
 
  R-SQUARE =   0.9248     R-SQUARE ADJUSTED =   0.9123
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.35384E-01
 STANDARD ERROR OF THE ESTIMATE-SIGMA =  0.18811
 SUM OF SQUARED ERRORS-SSE=   1.0615
 MEAN OF DEPENDENT VARIABLE =   4.8187
 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -161.127
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        13.055          5.        2.6109                73.788
 ERROR             1.0615         30.       0.35384E-01           P-VALUE
 TOTAL             14.116         35.       0.40332                 0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION        848.97          6.        141.50              3998.840
 ERROR             1.0615         30.       0.35384E-01           P-VALUE
 TOTAL             850.04         36.        23.612                 0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      30 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 LUSK       3.5616     0.4827       7.379     0.000 0.803     4.0280     3.5616
 LUSK2    -0.32980     0.5725E-01  -5.760     0.000-0.725    -2.6592    -0.3298
 LSK       0.11856      1.337      0.8869E-01 0.930 0.016     0.0649     0.1186
 LSK2      0.34962     0.3405       1.027     0.313 0.184     0.7038     0.3496
 LUSKLSK  -0.24465     0.1298      -1.884     0.069-0.325    -0.7430    -0.2447
 CONSTANT  -3.5082      1.616      -2.171     0.038-0.368     0.0000    -3.5082

In the log-log model with interaction term, the coefficient on the interaction term is not statistically significantly different from zero at the 5% level.   |_* e.) we will cover this part in the lab  

2. This data set was used in a lab session in some previous years. Here are the relevant bits of output:

 |_* Suppose you have a sample of mid-level managers who have been surveyed
 |_* concerning the number of hours per week they spend on work-related activities,
 |_* either in the office or at home.  (These data are fictional.)  The dependent
 |_* variable is HOURS (per week, averaged over a three-month period) and the
 |_* explanatory variables you are considering are:
 |_*    FEMALE=1 if female; 0 if male
 |_*    SPOUSE=1 if married or equivalent; 0 otherwise
 |_*    SWORK=1 if spouse full-time employed; 0 otherwise

 |_sample 1 60
 |_read(mgr.dat) hours female spouse swork 
 
 |_stat
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 HOURS       60   40.924       5.8742       34.506       26.630       52.760
 FEMALE      60  0.33333      0.47538      0.22599      0.00000       1.0000
 SPOUSE      60  0.76667      0.42652      0.18192      0.00000       1.0000
 SWORK       60  0.35000      0.48099      0.23136      0.00000       1.0000
 
 |_* Q# 2a) what is marginal mean number of hours worked by all managers?
 
 |_ols hours
 
  R-SQUARE =   0.0000     R-SQUARE ADJUSTED =   0.0000
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   34.506
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   5.8742
 SUM OF SQUARED ERRORS-SSE=   2035.9
 MEAN OF DEPENDENT VARIABLE =   40.924
 LOG OF THE LIKELIHOOD FUNCTION = -190.866
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION      -0.11823E-10      0.       0.00000                 0.000
 ERROR             2035.9         59.        34.506               P-VALUE
 TOTAL             2035.9         59.        34.506                 1.000
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      59 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 CONSTANT   40.924     0.7584       53.96     0.000 0.990     0.0000     1.0000

Y-bar is about 41 hours if we lump all types of managers together. Note that the R-squared value is zero here, because no regressors are used.
 
 |_* Q# 2b) next see how manager hours depends on gender
 
 |_ols hours female
 
  R-SQUARE =   0.1068     R-SQUARE ADJUSTED =   0.0914
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   31.354
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   5.5995
 SUM OF SQUARED ERRORS-SSE=   1818.5
 MEAN OF DEPENDENT VARIABLE =   40.924
 LOG OF THE LIKELIHOOD FUNCTION = -187.479
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        217.35          1.        217.35                 6.932
 ERROR             1818.5         58.        31.354               P-VALUE
 TOTAL             2035.9         59.        34.506                 0.011
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      58 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEMALE     4.0375      1.533       2.633     0.011 0.327     0.3267     0.0329
 CONSTANT   39.578     0.8854       44.70     0.000 0.986     0.0000     0.9671

For the male manager group, FEMALE takes on a value of zero, so mean hours for male managers are just 39.578, with a standard error of 0.8854 hours. For female managers, the FEMALE variable is always equal to one, so female manager mean hours is the sum of the "intercept" and the "slope" in this model. Specifically, (39.578 + 4.0375). Female managers work, on average, 4.0375 hours more per week than male managers. Is this difference statistically significant? Yes. The P-value on the differences is less than 0.05, so we can reject the hypothesis that the difference (the slope on FEMALE) is zero.
 
 |_* Q# 2c) does marital status affect expected work hours?
 
 |_ols hours female spouse
 
  R-SQUARE =   0.2289     R-SQUARE ADJUSTED =   0.2019
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   27.540
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   5.2478
 SUM OF SQUARED ERRORS-SSE=   1569.8
 MEAN OF DEPENDENT VARIABLE =   40.924
 LOG OF THE LIKELIHOOD FUNCTION = -183.066
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        466.11          2.        233.06                 8.463
 ERROR             1569.8         57.        27.540               P-VALUE
 TOTAL             2035.9         59.        34.506                 0.001
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      57 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEMALE     1.8862      1.606       1.175     0.245 0.154     0.1526     0.0154
 SPOUSE    -5.3783      1.789      -3.005     0.004-0.370    -0.3905    -0.1008
 CONSTANT   44.418      1.812       24.52     0.000 0.956     0.0000     1.0854

If we control for whether a manager is male or female and the look at the average difference in work hours across managers without spouses and managers with spouses, the point estimate of the difference between these two groups is -5.3783, and this difference is statistically significant. It appears that having a spouse, on average across the sample, lowers your weekly hours by more than 5. When interpreting this regression, the intercept gives means work hours for the group with zero values for both FEMALE and SPOUSE--namely, single male managers. For single female managers, expected work hours is (44.418+1.8862). For married male managers, expected work hours is (44.418-5.3783). For married female managers, expected work hours is (44.418+1.8862-5.3783). Since there are no interaction terms, the effect of FEMALE on expected work hours is the same regardless of whether a manager is married or not. Likewise, absent any interaction terms, the effect of SPOUSE on expected work hours is the same for males and females. However, notice that when you control for marital status the difference between male and female manager hours becomes statistically insignificant at the 5% level. This means that FEMALE and SPOUSE must be somewhat correlated.
 
 |_* Q# 2d) difference in effect of having a spouse according to whether
 |_*        the spouse works or not
 
 |_ols hours female spouse swork
 
  R-SQUARE =   0.2562     R-SQUARE ADJUSTED =   0.2163
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   27.042
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   5.2002
 SUM OF SQUARED ERRORS-SSE=   1514.4
 MEAN OF DEPENDENT VARIABLE =   40.924
 LOG OF THE LIKELIHOOD FUNCTION = -181.988
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        521.52          3.        173.84                 6.429
 ERROR             1514.4         56.        27.042               P-VALUE
 TOTAL             2035.9         59.        34.506                 0.001
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      56 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEMALE    0.88742      1.737      0.5108     0.611 0.068     0.0718     0.0072
 SPOUSE    -6.9729      2.094      -3.330     0.002-0.407    -0.5063    -0.1306
 SWORK      2.4060      1.681       1.431     0.158 0.188     0.1970     0.0206
 CONSTANT   45.132      1.863       24.22     0.000 0.955     0.0000     1.1028

We only observe whether a spouse works or not if there IS a spouse. If there is no spouse, you should find that the SWORK variable takes on a value of zero, rather than being undefined. In a sense, the undefined values are set to zero because we have implicitly "multiplied" the SWORK data by the SPOUSE variable. If SPOUSE=1, SWORK equals 1 or zero according to whether or not the spouse is employed. If SPOUSE=0, (0*undefined) is set equal to zero, avoiding the problem of undefined variable values. In this model, the effect of having a spouse on manager hours is the derivative of HOURS with respect to spouse, which equals the coefficient on SPOUSE plus the coefficient on SWORK times SWORK. Conceptually, you want to know what happens to E[HOURS] when SPOUSE goes from zero to one. The answer depends on whether the spouse works or not. If not, SWORK is zero and the answer is "hours fall by 6.9729." If the spouse works, you get not only the -6.9729, but also the +2.4060 term. Thus, the answer to the question posed is just "+2.4060 hours," although this number is not statistically significantly different from zero. Statistically, there is no difference in the effect of having a spouse according to whether that spouse works or not.
 
 |_* Q# 2e) generate some interesting interaction terms:
 |_genr spousef=spouse*female
 |_genr sworkf=swork*female
 
 |_ols hours female spouse spousef swork sworkf
 
  R-SQUARE =   0.3501     R-SQUARE ADJUSTED =   0.2900
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   24.501
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   4.9498
 SUM OF SQUARED ERRORS-SSE=   1323.0
 MEAN OF DEPENDENT VARIABLE =   40.924
 LOG OF THE LIKELIHOOD FUNCTION = -177.936
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION        712.84          5.        142.57                 5.819
 ERROR             1323.0         54.        24.501               P-VALUE
 TOTAL             2035.9         59.        34.506                 0.000
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      54 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEMALE     6.9590      2.928       2.376     0.021 0.308     0.5632     0.0567
 SPOUSE    -2.2954      2.673     -0.8587     0.394-0.116    -0.1667    -0.0430
 SPOUSEF   -14.589      5.839      -2.498     0.016-0.322    -0.9334    -0.0594
 SWORK      2.8296      1.750       1.617     0.112 0.215     0.2317     0.0242
 SWORKF     6.7337      5.503       1.224     0.226 0.164     0.4128     0.0247
 CONSTANT   40.795      2.475       16.48     0.000 0.913     0.0000     0.9969

In this model, the intercept applied to a single male manager. A single female manager, on average, works an amount given by the sum of the intercept and the slope on the FEMALE dummy: (40.795+6.959). For a male with a non-working spouse, expected hours are (40.795-2.2954) although the differential for this male manager having a spouse (-2.2954) is not significantly different from zero. For a male manager, the difference between having a non-working and a working spouse is 2.8296 (although this difference is not statistically significant either. For a female with a non-working spouse, expected hours are (40.795+6.9590-2.2954-14.589). For a female with a working spouse, the coefficients on SWORK and SWORKF must be added, changing the total by (+2.8296+6.7337). Thus, the non-working/working spouse differential for males and females differs by the amount of the estimated coefficient on the interaction term SWORKF. Here, the point estimate is 6.7337 and it is not statistically significantly different from zero.

These data suggest that female managers with non-working spouses lose 14 hours, whereas male managers with non-working spouses lose only about 2 hours. According to the point estimates, a male manager with a working spouse actually works more hours than a male manager with no spouse at all. A female manager with a working spouse still works almost 8 hours less per week than a female manager with no spouse at all. Possible interpretations of results like these could be quite entertaining.

Note that in retrospect, the question might be viewed as a trifle ambiguous. As it is written, the event in question concerns "having a non-working spouse." I should have specified "compared to what": (i) having no spouse at all, in which case the answer would concern the magnitude of the coefficient on SPOUSEF; or (ii.) having a working spouse, in which case the answer would concern the magnitude of the coefficient on SWORK...the way I have interpreted it here.


 
 |_* Problem 3 -----------------------------------------------------------
 |_sample 1 208

 |_read(credit.dat) year month credit
 UNIT 88 IS NOW ASSIGNED TO: credit.dat
    3 VARIABLES AND      208 OBSERVATIONS STARTING AT OBS       1

 |_stat / pcor
 NAME        N    MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 YEAR        208   85.173      5.0187      25.187       77.000       94.000
 MONTH       208   6.4231      3.4744      12.071       1.0000       12.000
 CREDIT      208   29671.      7565.5     0.57237E+08   14592.       54943.

  CORRELATION MATRIX OF VARIABLES -      208 OBSERVATIONS

 YEAR       1.0000
 MONTH    -0.39128E-01   1.0000
 CREDIT    0.85276      0.52290E-01   1.0000
              YEAR         MONTH        CREDIT

  
 |_* a.)
 |_genr t=time(0)
 
 |_plot credit t
 
 REQUIRED MEMORY IS PAR=     7 CURRENT PAR=   500
 FOR MAXIMUM EFFICIENCY USE AT LEAST PAR=    10
       208 OBSERVATIONS
                    *=CREDIT
                    M=MULTIPLE POINT

This crummy plot suggests that it would be a good time to do a gnuplot plot, with a nice crisp line connecting the points:

    54943.        |                    *
    52709.        |
    50475.        |
    48241.        |
    46008.        |
    43774.        |                       *
    41540.        |
    39306.        |                     *   * * *   *
    37072.        |                 * * MMM * *     M
    34838.        |               M * MM* M*MMM M ***
    32605.        |             * MMMMM*  *M**MM MMM
    30371.        |             MMM              M*
    28137.        |           * M
    25903.        |         M MM*
    23669.        |       * MMM
    21435.        |     *MMM*
    19202.        |   *MMMM
    16968.        | **MM
    14734.        |MMMM
    12500.        |M
                   ________________________________________
 
               0.000    60.000   120.000   180.000   240.000
 
                                T
 
This is how gnuplog works on my stand-alone machine, as opposed to the format we covered for the PS lab.

 |_plot credit t / gnu lineonly commfile=cre.gnu datafile=cre.dat &
 |   output=cre.ps


"Logical operators" include .eq. .ne. .gt. .lt. .ge. .le. SHAZAM evaluates the expression in the parentheses and executes the associated generate-type command if it is true. Otherwise, the variable is set equal to zero.

 |_if(month.eq.1) jan=1
 |_if(month.eq.2) feb=1
 |_if(month.eq.3) mar=1
 |_if(month.eq.4) apr=1
 |_if(month.eq.5) may=1
 |_if(month.eq.6) jun=1
 |_if(month.eq.7) jul=1
 |_if(month.eq.8) aug=1
 |_if(month.eq.9) sep=1
 |_if(month.eq.10) oct=1
 |_if(month.eq.11) nov=1
 |_if(month.eq.12) dec=1
 
This is a regression with 11 monthly dummies only.

 |_ols credit feb mar apr may jun jul aug sep oct nov dec 
 
 REQUIRED MEMORY IS PAR=    52 CURRENT PAR=   500
  OLS ESTIMATION
      208 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
  R-SQUARE =   0.0304     R-SQUARE ADJUSTED =  -0.0240
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.58610E+08
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   7655.7
 SUM OF SQUARED ERRORS-SSE=  0.11488E+11
 MEAN OF DEPENDENT VARIABLE =   29671.
 LOG OF THE LIKELIHOOD FUNCTION = -2149.15
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.36036E+09     11.       0.32760E+08             0.559
 ERROR            0.11488E+11    196.       0.58610E+08           P-VALUE
 TOTAL            0.11848E+11    207.       0.57237E+08             0.860

Since the p-value for the F-test of "all slopes simultaneously zero" is not small enough to reject the hypothesis, the 11 dummies, by themselves are not particularly helpful for explaining the observed variation in retail credit balances.
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.18347E+12     12.       0.15289E+11           260.865
 ERROR            0.11488E+11    196.       0.58610E+08           P-VALUE
 TOTAL            0.19496E+12    208.       0.93731E+09             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     196 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEB       -1259.7      2552.     -0.4936     0.622-0.035    -0.0469    -0.0037
 MAR       -1611.5      2552.     -0.6315     0.528-0.045    -0.0600    -0.0047
 APR       -1617.3      2552.     -0.6338     0.527-0.045    -0.0603    -0.0047
 MAY       -854.19      2589.     -0.3299     0.742-0.024    -0.0310    -0.0024
 JUN       -1976.8      2589.     -0.7635     0.446-0.054    -0.0718    -0.0054
 JUL       -2066.0      2589.     -0.7979     0.426-0.057    -0.0750    -0.0057
 AUG       -1857.7      2589.     -0.7175     0.474-0.051    -0.0674    -0.0051
 SEP       -1842.2      2589.     -0.7115     0.478-0.051    -0.0669    -0.0051
 OCT       -1574.2      2589.     -0.6080     0.544-0.043    -0.0571    -0.0043
 NOV       -637.66      2589.     -0.2463     0.806-0.018    -0.0231    -0.0018
 DEC        2907.0      2589.       1.123     0.263 0.080     0.1055     0.0080
 CONSTANT   30705.      1804.       17.02     0.000 0.772     0.0000     1.0349

Likewise, none of the individual monthly dummy variables is individually significant (or, more accurately, none of the coefficients on the individual monthly dummy variables is individually statistically significantly different from zero).

 
 |_* b.)  
Now we have included a linear time trend variable, t, in the model.
 
 |_ols credit feb mar apr may jun jul aug sep oct nov dec t / predict=credthat
 
 REQUIRED MEMORY IS PAR=    56 CURRENT PAR=   500
  OLS ESTIMATION
      208 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
  R-SQUARE =   0.7599     R-SQUARE ADJUSTED =   0.7452
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.14585E+08
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   3819.1
 SUM OF SQUARED ERRORS-SSE=  0.28441E+10
 MEAN OF DEPENDENT VARIABLE =   29671.
 LOG OF THE LIKELIHOOD FUNCTION = -2003.96
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.90038E+10     12.       0.75032E+09            51.443
 ERROR            0.28441E+10    195.       0.14585E+08           P-VALUE
 TOTAL            0.11848E+11    207.       0.57237E+08             0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.19212E+12     13.       0.14778E+11          1013.215
 ERROR            0.28441E+10    195.       0.14585E+08           P-VALUE
 TOTAL            0.19496E+12    208.       0.93731E+09             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     195 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 FEB       -1367.1      1273.      -1.074     0.284-0.077    -0.0509    -0.0040
 MAR       -1826.3      1273.      -1.435     0.153-0.102    -0.0680    -0.0053
 APR       -1939.6      1273.      -1.524     0.129-0.108    -0.0723    -0.0057
 MAY       -639.35      1292.     -0.4950     0.621-0.035    -0.0232    -0.0018
 JUN       -1869.4      1292.      -1.447     0.149-0.103    -0.0679    -0.0051
 JUL       -2066.0      1292.      -1.600     0.111-0.114    -0.0750    -0.0057
 AUG       -1965.1      1292.      -1.521     0.130-0.108    -0.0713    -0.0054
 SEP       -2057.1      1292.      -1.593     0.113-0.113    -0.0747    -0.0057
 OCT       -1896.4      1292.      -1.468     0.144-0.105    -0.0688    -0.0052
 NOV       -1067.3      1292.     -0.8263     0.410-0.059    -0.0387    -0.0029
 DEC        2370.0      1292.       1.835     0.068 0.130     0.0860     0.0065
 T          107.42      4.413       24.34     0.000 0.867     0.8546     0.3783
 CONSTANT   19641.      1008.       19.48     0.000 0.813     0.0000     0.6620

Things are quite a bit different. In particular, the F-test for the joint significance of all of the slopes (now including that on t) strongly rejects the hypothesis that "none of the explanatory variables matters". Individually, the monthly dummy variables are not statistically significant at the 5% level but the December dummy coefficient has a positive, as opposed to a negative, point estimate and is significant at the 10% level (and even at the 6.8% level).

 
 |_* c.) 
Now consider a model that is quadratic in "time."
 |_genr t2=t*t
 
 |_ols credit t t2 / coef=b
 
 REQUIRED MEMORY IS PAR=    40 CURRENT PAR=   500
  OLS ESTIMATION
      208 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
  R-SQUARE =   0.8948     R-SQUARE ADJUSTED =   0.8938
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.60807E+07
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   2465.9
 SUM OF SQUARED ERRORS-SSE=  0.12465E+10
 MEAN OF DEPENDENT VARIABLE =   29671.
 LOG OF THE LIKELIHOOD FUNCTION = -1918.17
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.10601E+11      2.       0.53007E+10           871.723
 ERROR            0.12465E+10    205.       0.60807E+07           P-VALUE
 TOTAL            0.11848E+11    207.       0.57237E+08             0.000

Since t was individually significant, even in conjunction with the set of dummies, it is not surprising that t in conjunction with just t2 will jointly be significant.
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.19371E+12      3.       0.64571E+11         10618.964
 ERROR            0.12465E+10    205.       0.60807E+07           P-VALUE
 TOTAL            0.19496E+12    208.       0.93731E+09             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     205 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 T          304.26      11.44       26.59     0.000  0.880     2.4206     1.0716
 T2       -0.94081     0.5303E-01  -17.74     0.000 -0.778    -1.6151    -0.4606
 CONSTANT   11541.      517.9       22.28     0.000 0.841     0.0000     0.3890

Both the linear AND the quadratic terms in t are individually strongly significant. There is strong evidence of curvature in the relationship between credit and time. Furthermore, since the coefficient on t squared is negative, we know that the slope is decreasing over time, so the quadratic shape opens downwards and there may be a maximum value of fitted credit somewhere within the sample range of time (t) values.

 
 |_* d.)
 |_gen1 tstar=-(b:1)/(2*b:2)
 |_print tstar
     TSTAR
    161.7017
 
We know that the t variables ranges from 1 to 208 in the sample (there are 208 observations) so the peak of fitted credit occurs between months 161 and 162. If you used PRINT YEAR MONTH T, you could look for the year and month that corresponds to t=161 and see whether this is somewhere around the 1986 change in tax law concerning the deductibility of retail credit interest payments.

 |_ols credit t t2 feb mar apr may jun jul aug sep oct nov dec / coef=bb
 
Now control for seasonality as well as a curvilinear time trend.

 REQUIRED MEMORY IS PAR=    59 CURRENT PAR=   500
  OLS ESTIMATION
      208 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
Look at that R-squared value now!

  R-SQUARE =   0.9218     R-SQUARE ADJUSTED =   0.9166
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.47746E+07
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   2185.1
 SUM OF SQUARED ERRORS-SSE=  0.92627E+09
 MEAN OF DEPENDENT VARIABLE =   29671.
 LOG OF THE LIKELIHOOD FUNCTION = -1887.29
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.10922E+11     13.       0.84013E+09           175.958
 ERROR            0.92627E+09    194.       0.47746E+07           P-VALUE
 TOTAL            0.11848E+11    207.       0.57237E+08             0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.19403E+12     14.       0.13860E+11          2902.759
 ERROR            0.92627E+09    194.       0.47746E+07           P-VALUE
 TOTAL            0.19496E+12    208.       0.93731E+09             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     194 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 T          304.61      10.16       29.99     0.000 0.907     2.4233     1.0728
 T2       -0.94348     0.4708E-01  -20.04     0.000-0.821    -1.6197    -0.4619
 FEB       -1369.0      728.4      -1.880     0.062-0.134    -0.0510    -0.0040
 MAR       -1828.2      728.4      -2.510     0.013 -0.177    -0.0681    -0.0053
 APR       -1939.6      728.4      -2.663     0.008 -0.188    -0.0723    -0.0057
 MAY       -1026.2      739.3      -1.388     0.167-0.099    -0.0372    -0.0028
 JUN       -2261.8      739.3      -3.060     0.003 -0.215    -0.0821    -0.0062
 JUL       -2462.2      739.3      -3.331     0.001 -0.233    -0.0894    -0.0068
 AUG       -2363.2      739.3      -3.197     0.002 -0.224    -0.0858    -0.0065
 SEP       -2455.2      739.3      -3.321     0.001 -0.232    -0.0891    -0.0068
 OCT       -2292.7      739.3      -3.101     0.002 -0.217    -0.0832    -0.0063
 NOV       -1459.8      739.3      -1.975     0.050 -0.140    -0.0530    -0.0040
 DEC        1983.1      739.4       2.682     0.008  0.189     0.0720     0.0055
 CONSTANT   12997.      665.4       19.53     0.000 0.814     0.0000     0.4380

Retail credit balances are not statistically different from the January level in February or in May, but the are statistically significantly lower in all other months, except December, where the holiday shopping effect seems to come through loud and clear. The higher balances in May might be being driven by the outlier in May of 1997 (and supported by people using credit cards to cover their income tax payments in other years as well).

 |_gen1 tstar2=-(bb:1)/(2*bb:2)
 |_print tstar2
     TSTAR2
    161.4265

The historical fitted peak month for credit balances is again around month 161-162.


 |_genr t3=t*t*t
 
 |_ols credit t t2 t3 feb mar apr may jun jul aug sep oct nov dec / coef=bb
 
 REQUIRED MEMORY IS PAR=    63 CURRENT PAR=   500
  OLS ESTIMATION
      208 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
  R-SQUARE =   0.9294     R-SQUARE ADJUSTED =   0.9243
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.43347E+07
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   2082.0
 SUM OF SQUARED ERRORS-SSE=  0.83659E+09
 MEAN OF DEPENDENT VARIABLE =   29671.
 LOG OF THE LIKELIHOOD FUNCTION = -1876.70
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.11011E+11     14.       0.78653E+09           181.450
 ERROR            0.83659E+09    193.       0.43347E+07           P-VALUE
 TOTAL            0.11848E+11    207.       0.57237E+08             0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.19412E+12     15.       0.12942E+11          2985.585
 ERROR            0.83659E+09    193.       0.43347E+07           P-VALUE
 TOTAL            0.19496E+12    208.       0.93731E+09             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     193 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 T          203.02      24.34       8.341     0.000 0.515     1.6152     0.7151
 T2        0.26849     0.2702      0.9937     0.322 0.071     0.4609     0.1314
 T3       -0.38659E-02 0.8499E-03  -4.549     0.000-0.311    -1.3164    -0.2960
 FEB       -1349.1      694.0      -1.944     0.053-0.139    -0.0503    -0.0039
 MAR       -1788.4      694.1      -2.577     0.011-0.182    -0.0666    -0.0052
 APR       -1879.9      694.2      -2.708     0.007-0.191    -0.0700    -0.0055
 MAY       -1049.0      704.4      -1.489     0.138-0.107    -0.0381    -0.0029
 JUN       -2269.6      704.4      -3.222     0.001-0.226    -0.0824    -0.0063
 JUL       -2454.9      704.4      -3.485     0.001-0.243    -0.0891    -0.0068
 AUG       -2340.9      704.4      -3.323     0.001-0.233    -0.0850    -0.0064
 SEP       -2417.9      704.4      -3.432     0.001-0.240    -0.0878    -0.0067
 OCT       -2240.3      704.5      -3.180     0.002-0.223    -0.0813    -0.0062
 NOV       -1392.4      704.6      -1.976     0.050-0.141    -0.0505    -0.0038
 DEC        2065.7      704.7       2.931     0.004 0.206     0.0750     0.0057
 CONSTANT   14759.      743.0       19.86     0.000 0.819     0.0000     0.4974

Adding the cubed term makes the squared term statistically insignificant, so there is probably some collinearity between them. This could be verified with another STAT / PCOR command. The coefficient implies that the slope of the time trend in decreasing at a decreasing rate as time passes. A useful execise at this point would be to use the fitted coefficients on the t terms to plot the shape of a cubic function with these parameters.

 
 |_* e.)
 
 |_ols credit t t2 feb mar apr may jun jul aug sep oct nov dec / coef=bb
 
 REQUIRED MEMORY IS PAR=    61 CURRENT PAR=   500
  OLS ESTIMATION
      208 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
  R-SQUARE =   0.9218     R-SQUARE ADJUSTED =   0.9166
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.47746E+07
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   2185.1
 SUM OF SQUARED ERRORS-SSE=  0.92627E+09
 MEAN OF DEPENDENT VARIABLE =   29671.
 LOG OF THE LIKELIHOOD FUNCTION = -1887.29
 
                      ANALYSIS OF VARIANCE - FROM MEAN
                       SS         DF             MS                 F
 REGRESSION       0.10922E+11     13.       0.84013E+09           175.958
 ERROR            0.92627E+09    194.       0.47746E+07           P-VALUE
 TOTAL            0.11848E+11    207.       0.57237E+08             0.000
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.19403E+12     14.       0.13860E+11          2902.759
 ERROR            0.92627E+09    194.       0.47746E+07           P-VALUE
 TOTAL            0.19496E+12    208.       0.93731E+09             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     194 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 T          304.61      10.16       29.99     0.000 0.907     2.4233     1.0728
 T2       -0.94348     0.4708E-01  -20.04     0.000-0.821    -1.6197    -0.4619
 FEB       -1369.0      728.4      -1.880     0.062-0.134    -0.0510    -0.0040
 MAR       -1828.2      728.4      -2.510     0.013-0.177    -0.0681    -0.0053
 APR       -1939.6      728.4      -2.663     0.008-0.188    -0.0723    -0.0057
 MAY       -1026.2      739.3      -1.388     0.167-0.099    -0.0372    -0.0028
 JUN       -2261.8      739.3      -3.060     0.003-0.215    -0.0821    -0.0062
 JUL       -2462.2      739.3      -3.331     0.001-0.233    -0.0894    -0.0068
 AUG       -2363.2      739.3      -3.197     0.002-0.224    -0.0858    -0.0065
 SEP       -2455.2      739.3      -3.321     0.001-0.232    -0.0891    -0.0068
 OCT       -2292.7      739.3      -3.101     0.002-0.217    -0.0832    -0.0063
 NOV       -1459.8      739.3      -1.975     0.050-0.140    -0.0530    -0.0040
 DEC        1983.1      739.4       2.682     0.008 0.189     0.0720     0.0055
 CONSTANT   12997.      665.4       19.53     0.000 0.814     0.0000     0.4380

Since we have other variables besides just the dummies, we cannot use the automatic F-test that is produced for every SHAZAM ols regression in the Analysis of Variance from Means table. We need to do a special F-test that asks whether just the dummy variable coefficients could be jointly zero. You could always do this the old-fashioned way by doing both this unrestricted regression and the restricted regression corresponding to the null hypothesis being true, then check the ANOVA tables for both to find the ingredients for constructing this F-test statistic yourself.

 |_test
 |_test feb=0
 |_test mar=0
 |_test apr=0
 |_test may=0
 |_test jun=0
 |_test jul=0
 |_test aug=0
 |_test sep=0
 |_test oct=0
 |_test nov=0
 |_test dec=0
 |_end

The null hypothesis is soundly rejected.

 F STATISTIC =   6.0980951     WITH   11 AND  194 D.F.  P-VALUE= 0.00000
 WALD CHI-SQUARE STATISTIC =   67.079046     WITH   11 D.F.  P-VALUE= 0.00000
 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.16399


 
 |_* f.) just interpretation.  Huge tax bills came due on April 15, 1987, for
 |_*  1986 tax year.  People realized retail credit interest was no longer
 |_*  deductible.   It was more expensive, so less of it began to take place.

 
 |_* g.)  
Using regression to net out expected seasonal differences in credit

Note that we are now suppressing the intercept term and using the full set of 12 dummy variables, so each coefficient is now an "intercept" for each month, as opposed to the regular intercept being the January value and the dummy variable coefficients being the differentials between January and each other month.
 
 |_ols credit jan feb mar apr may jun jul aug sep oct nov dec / noconstant resid=e
 
A key feature of this regression is that we save, for each observation, the fitted error (the amount by which that observation differs from what we would expect for a january or a february, etc.).

 REQUIRED MEMORY IS PAR=    59 CURRENT PAR=   500
  OLS ESTIMATION
      208 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
  R-SQUARE =   0.0304     R-SQUARE ADJUSTED =  -0.0240
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.58610E+08
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   7655.7
 SUM OF SQUARED ERRORS-SSE=  0.11488E+11
 MEAN OF DEPENDENT VARIABLE =   29671.
 LOG OF THE LIKELIHOOD FUNCTION = -2149.15
 RAW MOMENT R-SQUARE =   0.9411
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.18347E+12     12.       0.15289E+11           260.865
 ERROR            0.11488E+11    196.       0.58610E+08           P-VALUE
 TOTAL            0.19496E+12    208.       0.93731E+09             0.000
 
 
All the individual dummy coefficients are now statistically significantly different from zero, since we are asking if each monthly average credit balance could be zero, NOT whether each month's balance differs from the usual January balance.

 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     196 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 JAN        30705.      1804.       17.02     0.000 0.772     1.1438     0.0896
 FEB        29445.      1804.       16.32     0.000 0.759     1.0969     0.0859
 MAR        29093.      1804.       16.12     0.000 0.755     1.0838     0.0849
 APR        29088.      1804.       16.12     0.000 0.755     1.0836     0.0848
 MAY        29851.      1857.       16.08     0.000 0.754     1.0835     0.0822
 JUN        28728.      1857.       15.47     0.000 0.741     1.0428     0.0791
 JUL        28639.      1857.       15.42     0.000 0.740     1.0395     0.0789
 AUG        28847.      1857.       15.54     0.000 0.743     1.0471     0.0795
 SEP        28863.      1857.       15.54     0.000 0.743     1.0477     0.0795
 OCT        29131.      1857.       15.69     0.000 0.746     1.0574     0.0802
 NOV        30067.      1857.       16.19     0.000 0.756     1.0914     0.0828
 DEC        33612.      1857.       18.10     0.000 0.791     1.2201     0.0926

We then calculate the overall mean credit balance over the entire series, use that as the baseline, and then add back in the "unexpected" amount of credit associated with each observation, controlling for what usually happens in each month of the year.

 |_stat credit / mean=mcredit
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 CREDIT     208   29671.       7565.5      0.57237E+08   14592.       54943.

 |_genr creditsa=mcredit+e
 
Another plot that will look much nicer as a gnuplot line plot.

 |_plot credit creditsa t
 
 REQUIRED MEMORY IS PAR=    37 CURRENT PAR=   500
 FOR MAXIMUM EFFICIENCY USE AT LEAST PAR=    42
       208 OBSERVATIONS
                    *=CREDIT
                    +=CREDITSA
                    M=MULTIPLE POINT
    54943.        |                    *
    52709.        |                    +
    50475.        |
    48241.        |
    46008.        |
    43774.        |                       *
    41540.        |
    39306.        |                     * + * * *   *
    37072.        |                 * * MMM * *     M
    34838.        |               M M+MMM MMMMM M *M*
    32605.        |             * MMMMM*  *M**MM+MMM
    30371.        |             MMM              MM
    28137.        |           *MM
    25903.        |         MMMM*
    23669.        |       *MMMM
    21435.        |     *MMMM
    19202.        |   *MMMM
    16968.        | *MMM
    14734.        |MMMM
    12500.        |M+
                   ________________________________________
 
               0.000    60.000   120.000   180.000   240.000
 
                                T
 
 |_plot credit creditsa t / gnu lineonly commfile=cred.gnu datafile=cred.dat &
 |   output=cred.ps
 
 

See if the seasonal adjustment process is being unduly influenced by the extreme outlier in May of 1987.

 |_* try deleting influential outlier in May 1987...
 |_skipif(credit.gt.50000)
 OBSERVATION   125 WILL BE SKIPPED
 
 |_ols credit jan feb mar apr may jun jul aug sep oct nov dec / noconstant resid=e
 
 REQUIRED MEMORY IS PAR=    63 CURRENT PAR=   500
  OLS ESTIMATION
      207 OBSERVATIONS     DEPENDENT VARIABLE = CREDIT
 ...NOTE..SAMPLE RANGE SET TO:      1,    208
 
  R-SQUARE =   0.0346     R-SQUARE ADJUSTED =  -0.0199
 VARIANCE OF THE ESTIMATE-SIGMA**2 =  0.55480E+08
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   7448.5
 SUM OF SQUARED ERRORS-SSE=  0.10819E+11
 MEAN OF DEPENDENT VARIABLE =   29549.
 LOG OF THE LIKELIHOOD FUNCTION = -2133.10
 RAW MOMENT R-SQUARE =   0.9436
 
                      ANALYSIS OF VARIANCE - FROM ZERO
                       SS         DF             MS                 F
 REGRESSION       0.18112E+12     12.       0.15094E+11           272.053
 ERROR            0.10819E+11    195.       0.55480E+08           P-VALUE
 TOTAL            0.19194E+12    207.       0.92725E+09             0.000
 
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR     195 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
 JAN        30705.      1756.       17.49     0.000 0.781     1.1759     0.0904
 FEB        29445.      1756.       16.77     0.000 0.768     1.1276     0.0867
 MAR        29093.      1756.       16.57     0.000 0.765     1.1142     0.0856
 APR        29088.      1756.       16.57     0.000 0.765     1.1139     0.0856
 MAY        28282.      1862.       15.19     0.000 0.736     1.0265     0.0740
 JUN        28728.      1807.       15.90     0.000 0.751     1.0720     0.0798
 JUL        28639.      1807.       15.85     0.000 0.750     1.0687     0.0796
 AUG        28847.      1807.       15.97     0.000 0.753     1.0764     0.0802
 SEP        28863.      1807.       15.98     0.000 0.753     1.0770     0.0802
 OCT        29131.      1807.       16.13     0.000 0.756     1.0870     0.0810
 NOV        30067.      1807.       16.64     0.000 0.766     1.1220     0.0836
 DEC        33612.      1807.       18.61     0.000 0.800     1.2542     0.0934

 |_stat credit / mean=mcredit
 NAME        N   MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
 CREDIT     207   29549.       7375.6      0.54399E+08   14592.       43833.
 |_genr creditsa=mcredit+e
 
 |_plot credit creditsa t
 
 REQUIRED MEMORY IS PAR=    39 CURRENT PAR=   500
 FOR MAXIMUM EFFICIENCY USE AT LEAST PAR=    44
       207 OBSERVATIONS
                    *=CREDIT
                    +=CREDITSA
                    M=MULTIPLE POINT
    43833.        |                       *
    41921.        |
    40008.        |                     *           *
    38096.        |                   * MMM * * *   *
    36184.        |               * * * MMM MMM * **M
    34272.        |               *+MMMM* MMMMM+M+*M
    32359.        |             *+MMMMM        M MMM
    30447.        |             MMM              MM
    28535.        |           *+M
    26623.        |         * *MM
    24710.        |         MMMM
    22798.        |     * MMM*
    20886.        |     *MMM
    18974.        |   *MMMM
    17061.        | *MMM
    15149.        |MMMM
    13237.        |MM
    11325.        |
    9412.3        |
    7500.0        |
                   ________________________________________
 
               0.000    60.000   120.000   180.000   240.000
 
                                T
 
Or, a much cleaner plot by gnuplot:

 |_plot credit creditsa t / gnu lineonly commfile=cre1.gnu datafile=cre1.dat &
 |   output=cre1.ps




Updated: 11/21/98; Prepared by: Trudy Ann Cameron; Site Index