UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Problem Set #5: Functional Form; Dummy Variables


February 24, 1998

Due:  Tuesday, March 3, 1998

NETWORK FILES NEEDED:  n:prodtn.dat, n:mgr.dat, n:credit.dat

1. The first data set we will examine is a set of production data for a hypothetical firm. There are 36 observations on three variables in the file n:prodtn.dat. The variables are output (q), unskilled labor (u) and skilled labor (s). Your task is to calibrate a "production function" using these data. a.) begin by assuming a linear functional form: q = b1 + b2 u + b3 s + e; report the marginal productivity of unskilled and skilled labor for this short-run production function. Are these marginal productivities the same at all levels of employment? Does the marginal productivity of either factor depend upon the quantity of the other being utilized? b.) now consider a form that is quadratic in u and s (i.e. genr u2=u*u and genr s2=s*s). - Comparing this specification to one without the quantity of skilled labor appearing in the model at all, use an F-test to determine whether s and s2 make a statistically significant joint contribution to explaining q (even if they are not now individually significant). - Does the evidence suggest that the marginal productivity of unskilled labor is constant across all quantities of unskilled labor, u? How about the marginal productivity of skilled labor, s? - Considering the estimated parameters, does either input eventually exhibit negative marginal productivity? Beyond what level of that input? (HINT: Set the derivative with respect to that input equal to zero and solve for the input level at which output is maximized.) c.) now generate a so-called "interaction" term: genr us=u*s. Include this variable in the model with quadratic terms. What happens? - Write down the algebraic formulas (in general) for the marginal productivity of each type of labor input for this type of specification (HINT: These are the derivatives of output with respect to the quantity of each type of labor input.) - Is the marginal productivity of each type of input now independent of the quantity of the other input? Why or why not? - There appears to be one superfluous variable in the fully quadratic model with an interaction term. Which is it? What "restricted model" represents an adequate specification for these data? How do you know? - Can you "intuit" the shape of the production function surface that you have managed to fit? d.) researchers in economics frequently find it convenient to use log-log specifications for production functions because the coefficients can be interpreted as elasticities. (Why?) - Redo the models in parts (a.) thorough (c.) using the logs of all variables, rather than their levels. - A model which is linear in logs of output and inputs is called a "Cobb-Douglas" production function. A specification that is quadratic in the logs of the inputs is called a "translog" production function. Identify the type of production function on your output. - Compare these logarithm- based models with the models in parts (a.) through (c.). Which are "better" and why? [HINT: we've learned one way to compare models where the dependent variable is in "levels" in one case and in "logs" in the other.] - Compare the implications of the levels and logs models concerning the elasticity of output with respect to skilled labor input and the elasticity of output with respect to unskilled labor input. Comment. e.) OPTIONAL (and more challenging): draw on your knowledge of microeconomic theory to figure out the formulas for the isoquants of this production function (based on the point estimates of the parameters from, say, the "best fitting" model from the linear-in-q set of models). Recall that the slope of an isoquant is referred to as the "marginal rate of technical substitution." Can you give an algebraic expression for the MRTS? If you knew the relative prices of the two types of labor inputs, how would you determine the cost-minimizing input quantities for a given desired output level? Nice plots of this calibrated production function will form part of the subject matter of a later lab session. 2. The data file n:mgr.dat contains a hypothetical sample of mid-level managers who have been surveyed concerning the number of hours per week they spend on work-related activities, either in the office or at home. The dependent variable is HOURS (per week, averaged over a three-month period) and the explanatory variables you are considering are: FEMALE=1 if female, 0 if male; SPOUSE=1 if married or equivalent, 0 otherwise; SWORK=1 if spouse full-time employed, 0 otherwise.
sample 1 60
read(n:mgr.dat) hours female spouse swork
There are other variables in the data set, but we will ignore them for now.

a.) Use ols hours to estimate the marginal mean number of hours worked by all managers, regardless of gender, marital status, and spouse's labor force participation.

b.) Use ols hours female and test whether, on average, male or female managers work more hours.

c.) In a linear model, does marital status affect expected work hours (without controlling for spouse's labor force participation)?

d.) swork is actually an interaction term. Explain. What is the difference in the effect of having a spouse on manager hours according to whether the spouse works or not?

e.) Does the presence of a non-working spouse have a different effect on male manager hours than it does on female manager hours? [HINT: you will have to create some interaction terms in order to test the hypothesis of no difference.]

We will be working with this example in more detail in one of the labs.


3. The second data set we will examine is a simple time series of consumer installment credit outstanding (revolving credit with retailers, in millions of dollars, counted at month-end, not seasonally adjusted). This is the variable CCIURR on the CITI1 subdirectory of the CITIBASE online data base maintained by SSC, but the information has been downloaded for you into the file n:credit.dat. This data set contains 208 monthly observations (January 1977 to March 1994) on three variables: year, month and credit (in that order). Unfortunately, CITIBASE has not reported data for this variable since March 1994.

a.) Create a "time trend" variable, using genr t=time(0). Examine a simple plot of credit against t. Do "crummy plots" have adequate resolution for this task? Use a set of twelve commands like if(month.eq.1) jan=1 to create "dummy" variables for January, February,.... [HINT: .eq. means "equal to", .ne. means "not equal to", .gt. means "greater than", etc.] Observations for which the if statement is not true will be assigned a value of zero for the variable, which is just fine. Run an OLS regression on all month dummies except January. (Why exclude January?) Comment on the results. If you are clever, you may be able to figure out some special features of SHAZAM that make the process of generating sets of dummy variables much easier, but it is alright to do it inelegantly. b.) Now include the time trend variable, t in the model with the dummies used in part (a.). Comment on the effect. c.) Now create a squared term in the time trend: genr t2=t*t. Include this variable, along with t itself, in a model with no dummy variables and in a model with the set of dummies used in (a.). What happens? d.) Based on your estimates for the two models in part (c.), solve for the time period (value of t) where retail revolving credit appears to have peaked. (HINT: set the derivative of E(credit) with respect to t equal to zero.) Does the presence of "seasonal dummies" in the model affect the apparent timing of the peak? e.) Using the models with a quadratic time trend and seasonal dummies in part (c.), test statistically whether there is statistically significant "seasonal variation" in the level of revolving retail credit in the U.S. (HINT: you are testing the joint contribution of the whole set of seasonal dummies.) f.) Comment upon the pattern of seasonal variation in revolving retail credit that you observe in these data over the period of 1977 to 1994. Can you think of any reasons why credit should "behave" this way? Also, what happened in May of 1987? (HINT: tax reform; lots of people discovered that their income taxes due were a lot higher than they used to be.) What do you think happened around this period? g.) OPTIONAL: Create a "seasonally adjusted" credit variable as follows:
ols credit jan feb mar apr may jun jul aug sep oct nov dec / noconstant resid=e
stat credit / mean=mcredit
genr creditsa=mcredit+e
plot credit creditsa t

This leaves creditsa showing only "unexpected" deviations from typical levels. Try plotting the unadjusted and seasonally adjusted time series. Does all of the seasonal regularity disappear? Now try the same process leaving out the obvious outlying observation in May of 1987. Have things improved? Why? [HINT: this will be too hard to see in a "crummy plot." You may turn in "fancy gnuplot" output, or, you may describe what you see on the screen when you run the program interactively and issue the appropriate plot .../ gnu commands at the end.]

COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS

Updated: February 24, 1998
Prepared by: Trudy Ann Cameron