UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics
Economics 143 (Cameron) - Applied Regression
Analysis
Problem Set #5: Functional Form; Dummy
Variables
November 10, 1998
Due: Thursday, November 19, 1998
NETWORK FILES NEEDED: prodtn.dat, mgr.dat, credit.dat
1. The first data set we will examine is a set of production data for a
hypothetical firm. There are 36 observations on three variables in the file
prodtn.dat. The variables are output (q), unskilled labor (u)
and
skilled labor (s). Your task is to calibrate a "production function" using
these data.
a.) begin by assuming a linear functional form: q = b1 +
b2 u + b3 s + e;
report the marginal productivity of unskilled and skilled labor for this
short-run production function. Are these marginal productivities the same
at all levels of employment? Does the marginal productivity of either
factor depend upon the quantity of the other being utilized?
[]
b.) now consider a form that is quadratic in u and s (i.e.
genr u2=u*u and genr s2=s*s).
- Comparing this specification to one
without the quantity of skilled labor appearing in the model at all, use an F-test
to determine whether s and s2 make a statistically significant joint
contribution to explaining q (even if they are not now individually
significant).
- Does the evidence
suggest that the marginal productivity of unskilled labor is constant across
all quantities of unskilled labor, u? How about the marginal productivity
of
skilled labor, s?
- Considering the estimated parameters,
does either input eventually exhibit negative marginal productivity? Beyond
what level of that input? (HINT: Set the derivative with respect to
that
input equal to zero and solve for the input level at which output is
maximized.)
[]
c.) now generate a so-called "interaction" term: genr us=u*s. Include
this
variable in the model with quadratic terms. What happens?
- Write down the
algebraic formulas (in general) for the marginal productivity of each type
of labor input for this type of specification (HINT: These are the
derivatives of output with respect to the quantity of each type of labor
input.)
- Is the marginal productivity of each type of input now independent
of the quantity of the other input? Why or why not?
- There appears to be one superfluous variable in the fully quadratic
model with an interaction term. Which is it? What "restricted model" represents
an adequate specification for these data? How do you know?
- Can you "intuit" the shape of the production function surface that you
have managed to fit?
[]
d.) researchers in economics frequently find it convenient to use log-log
specifications for production functions because the coefficients can be
interpreted as elasticities. (Why?)
- Redo the models in parts (a.) thorough
(c.) using the logs of all variables, rather than their levels.
- A model
which is linear in logs of output and inputs is called a "Cobb-Douglas"
production function. A specification that is quadratic in the logs of the
inputs is called a "translog" production function. Identify the type of
production function on your output.
- Compare these logarithm-
based models with the models in parts (a.) through (c.). Which are "better"
and why? [HINT: we've learned one way to compare models where the dependent
variable is in "levels" in one case and in "logs" in the other.]
- Compare the implications of the levels and logs models concerning
the elasticity of output with respect to skilled labor input and the elasticity of
output with respect to unskilled labor input. Comment.
[]
e.) OPTIONAL (and more challenging): draw on your knowledge of microeconomic
theory to
figure
out the formulas for the isoquants of this production function (based on the
point estimates of the parameters from, say, the "best fitting" model from
the linear-in-q set of models). Recall that the slope of an isoquant is
referred to as the "marginal rate of technical substitution." Can you give
an algebraic expression for the MRTS? If you knew the relative prices of
the two types of labor inputs, how would you determine the cost-minimizing
input quantities for a given desired output level? Nice plots of this
calibrated production function will form part of the subject matter of a later lab
session, using the SurfacePlotter Applet.
[]
2. The data file mgr.dat contains a hypothetical
sample of mid-level managers who have been surveyed concerning the number of hours
per week they spend on work-related activities, either in the office or at home.
The dependent variable is HOURS (per week, averaged over a three-month period) and
the explanatory variables you are considering are: FEMALE=1 if female, 0 if male;
SPOUSE=1 if married or equivalent, 0 otherwise; SWORK=1 if spouse full-time
employed, 0 otherwise.
sample 1 60
read(mgr.dat) hours female spouse swork
There are other variables in the data set, but we will ignore them for now.
a.) Use ols hours to estimate the marginal mean number of hours worked
by all managers, regardless of gender, marital status, and spouse's labor force
participation. []
b.) Use ols hours female and test whether, on average, male or female
managers work more hours. []
c.) In a linear model, does marital status affect expected work hours (without
controlling for spouse's labor force participation)? []
d.) swork is actually an interaction term. Explain. What is the
difference in the effect of having a spouse on manager hours according to whether
the spouse works or not? []
e.) Does the presence of a non-working spouse have a different effect on male
manager hours than it does on female manager hours? [HINT: you will have to
create some interaction terms in order to test the hypothesis of no
difference.] []
3. The third data set we will examine is a simple time series of consumer
installment credit outstanding (revolving credit with retailers, in millions
of dollars, counted at month-end, not seasonally adjusted). This is the
variable CCIURR on the CITI1 subdirectory of the CITIBASE online data base
maintained by SSC, but the information has been downloaded for you into the
file credit.dat. This data set contains 208 monthly
observations
(January 1977 to March 1994) on three variables: year, month and
credit (in
that order). Unfortunately, CITIBASE has not reported data for this variable
since March 1994.
a.) Create a "time trend" variable, using genr t=time(0). Examine a
simple plot of credit against t. Do "crummy plots" have adequate resolution for
this task? Use a set of twelve
commands like if(month.eq.1) jan=1 to create "dummy" variables for January,
February,.... [HINT: .eq. means "equal to", .ne. means "not equal to", .gt.
means "greater than", etc.] Observations for which the if statement is not true
will be assigned a value of zero for the variable, which is just fine. Run an
OLS regression on all month dummies except January. (Why exclude
January?)
Comment on the results. If you are clever, you may be able to figure out
some special features of SHAZAM that make the process of generating sets of
dummy variables much easier, but it is alright to do it inelegantly.
[]
b.) Now include the time trend variable, t in the model with the
dummies used in part (a.). Comment on the effect.
[]
c.) Now create a squared term in the time trend: genr t2=t*t.
Include
this variable, along with t itself, in a model with no dummy variables and
in a model with the set of dummies used in (a.). What happens?
[]
d.) Based on your estimates for the two models in part (c.), solve for the
time period (value of t) where retail revolving credit appears to have
peaked. (HINT: set the derivative of E(credit) with respect to t
equal to zero.) Does the presence of "seasonal dummies" in the model affect the
apparent timing of the peak?
[]
e.) Using the models with a quadratic time trend and seasonal dummies in part
(c.), test statistically whether there is statistically significant "seasonal
variation" in the level of revolving retail credit in the U.S. (HINT: you are
testing the joint contribution of the whole set of seasonal dummies.)
[]
f.) Comment upon the pattern of seasonal variation in revolving retail
credit that you observe in these data over the period of 1977 to 1994. Can
you think of any reasons why credit should "behave" this way? Also, what
happened in May of 1987? (HINT: tax reform; lots of people discovered that their
income taxes due were a lot higher than they used to be.) What do you think
happened around this period?
[]
g.) OPTIONAL: Create a "seasonally adjusted" credit variable as
follows:
ols credit jan feb mar apr may jun jul aug sep oct nov dec / noconstant
resid=e
stat credit / mean=mcredit
genr creditsa=mcredit+e
plot credit creditsa t
This leaves creditsa showing only "unexpected" deviations from typical
levels. Try plotting the unadjusted and seasonally adjusted time
series. Does all of the seasonal regularity disappear? Now try the same process
leaving out the obvious outlying observation in May of 1987. Have things
improved? Why? [HINT: this will be too hard to see in a "crummy plot." You may
turn in "fancy gnuplot" output, or, you may describe what you see on the screen
when you run the program interactively and issue the appropriate plot .../ gnu
commands at the end.]
[]
Updated: 4:16 PM 11/9/98; Prepared by: Trudy Ann Cameron; Site Index