IMPORTANT: Some people will still be under the mistaken impression that they will actually be gathering data and doing an original regression analysis as a term paper for Economics 143. This is not the case. A "research proposal" is just that--a plan for what you might do in the way of gathering data and estimating a model. Your job is to be persuasive regarding the fundamental importance of the quantitiative relationships you want to measure, and to convince the reader of the proposal that you are a competent regression analyst.
Before you prepare your final proposal, you are strongly encouraged to undertake the following. This is a valuable exercise in that it gets you thinking clearly about the likely nature of your proposal. In very brief point form, be sure you can provide the following information.
1. Identify the population that your
proposed sample is intended to represent, and be sure you know what constitutes an "observation". For example:
a. Economics majors at UCLA (observation = a student) cross-sectional data
b. Firms in the S&P 500 (observation = a firm) cross-sectional data
c. Canada in specified years (observation = a year) annual time-series data
d. Games played by professional basketball teams (observation = a team in a particular game) pooled cross-sectional data over time
e. Counties in California (observation = county) cross-sectional data
2. Identify the dependent variable and
be sure it is likely to display some variation over your proposed sample
(so that there will be something to explain using a regression model).
Make sure it can be measured at the level of the individual observation.
a. Example: avoid
county-level average data for Y (e.g. proportion of homeowners) and
individual-level
data for the X variables (e.g. age of household head, ethnic identity of
household head, household income). The Y variable cannot be more highly
aggregated (over observations or time) than the X variables.
b. Example: if data
for Y are at the level of the individual household, but for some X variables,
household data are not available, it is sometimes possible to use, say,
zip code median values for those variables (with the clear caveat that
these variables only capture "neighborhood" characteristics, which may
or may not be highly correlated with the desired but missing X variable
for the household).
c. Be aware that
it IS possible to use categories of outcomes as a dependent variable. These
are called "discrete choice models." If your proposal concerns a YES/NO
dependent variable, proceed for now as though this was a conventional continuous
dependent variable. We will cover procedures for these models (better than
OLS) in the last couple of lectures.
3. Identify some plausible explanatory (X) variables. Some of these will be key to the main hypotheses you are hoping to test; others will be incidental determinants of the dependent (Y) variable, included in order to avoid omitted variables bias in the coefficients on the key variables.
4. Identify at least one interesting hypothesis that can be tested using your proposed model. Remember that for reliable hypothesis testing, you need good precise (small standard error) and unbiased estimates of the relevant slope coefficients. It is not possible to get a fix on the slope of a regression function with respect to an explanatory variable if that variable does not display sufficient variation across observations. For example, if you are trying to explain the effect of business taxes on firm location in city, you will not get far if business taxes are the same everywhere in that city.