MISSING DATA RECODES AND UNIVARIATE STATISTICS.
 
In this assignment you create your personal data set that
includes only your own selected variables, and save it on your
personal diskette. You will provide stata with the appropriate
missing data codes, and have stata produce certain univariate
statistics to help you determine whether your data reorganization
attempts have been successful.
 
Elementary statistics is concerned mainly with univariate
distributions, i.e., distributions of variables considered one at
a time. They ask questions such as: What are the mean and
standard deviation of respondents' political attitudes (on the
POLVIEWS variable)?

In this course we are concerned mainly with questions of the
relationship among variables considered two or more at a time. We
ask questions such as: How do respondents' political attitudes
(on the POLVIEWS variable) depend on their educational levels
(the EDUC variable) and income levels (the RINCOM91 variable)?

We use univariate statistics not for their own sake, but to help
us make sure we have properly downloaded and cleaned the data.  

1.  Before downloading your personal dataset, use the GSS
codebook appendix to ascertain that all the variables you want
are available in the same year, and note what year that is.

2.  Download, in ascii text format (not the default SPSS format),
those variables, and copy them from the default workspace area to
your personal preferences diskette.  Keep track of the order in
which you specify your choice of variables.

3.  Use the stata 'infile' command to read in the ascii format
dataset, and save it in stata's .dta format.

4.  Use the stata 'replace' command to recode missing values. 
Replace 98 and 99 (or whatever values denote missing data on a
particular variable) with a period, ., the code stata uses for
missing data.  Save the modified dataset.

5. Check each variable, comparing its frequencies shown in the
codebook with corresponding frequencies in your dataset,
calculated using the stata 'tabulate' command.  Check that you
have gotten the right variable and that you have removed all of
the responses that should be treated as missing.

6. Some two-digit variables, such as AGE, have only their first
digit tabulated in the codebook. For such variables, temporarily
create a grouped variable, following the pattern of the AGE-TEMP
variable on the Selected Stata Commands page.

7. Use the stata 'pwcorr' command with the ', obs' option, and
look at the number of cases shown for each pair of variables.
Make sure you aren't trying to study relationship of two
variables that weren't asked of the same respondents.

8.  Go back and fix any problems this exercise has uncovered.
Now your .dta file should contain the variables you want, with
missing data properly recoded.  This completes the preliminaries,
and you now are ready to proceed to analyze the data for the
relationships you hypothesized.