29oct2000


Soc. 210a, McFarland

Assignment 6.  Study Design


1. After reading Sections 3.1 and 3.2 of Moore and McCabe, do the
following exercises beginning on page 250.  Explain how you
arrived at each answer; do not merely state the answers.
        3.9, 3.11, 3.13, 3.18, 3.23, 3.29, 3.31

2. After reading Section 3.3 of Moore and McCabe, do the
following exercises beginning on page 262. Explain how you
arrived at each answer; do not merely state the answers.
        3.33*, 3.35, 3.39, 3.43, 3.45, 3.49

* In exercise 3.33, you should go beyond Moore and McCabe. Notice
that many "employed adult women" are employed in types of
occupations which do not lead to membership in a "business and
professional women's club", and distinguish between the
theoretically relevant population and the population from which
the sample was actually selected.

3. After reading Section 3.4 of Moore and McCabe, do the
following exercises beginning on page 262. Explain how you
arrived at each answer; do not merely state the answers.
        3.51, 3.53, 3.55, 3.73, 3.79

4.  Prevalence of various kinds of sexual behavior has been
subject of anecdotal and volunteer studies until fairly recently,
but the GSS 1994 data include some such questions asked in a
properly designed survey sample.

The GSS variable SEXSEX asks respondents to specify whether their
sex partners were all male, all female, or both sexes. The
variable SEX tells whether the respondents themselves were male
or female. Cross-classify those to determine the proportion of
each sex that report either only their own sex or both sexes.
After taking care of missing data, the stata command
        tabulate   sex sexsex  ,  row
produces a table from which the answers can be extracted
separately for males and females.  

5. Skim Appendix A of the GSS codebook to find Table A.3, near
the bottom of which is a row indicating the Response Rate year by
year. (a) What was the 1994 GSS response rate? (b) Review your
lecture notes, and write a sentence describing one of the ways
survey researchers attempt to assess the effects of
nonparticipation on survey results.

6. Read Hamilton, pages 55-58, "Using Random Variables and Random
Sampling".

Load the stata program, use the TVHOURS variable from the
GSS94.DTA file, and change -1 and 99 codes to missing data codes.
Do the usual univariate statistics for the TVHOURS variable.

Ordinarily the GSS is treated as a random sample of US adults,
and examined for what light it may shed on the entire population
of US adults. Here, however, we wish to examine how random
samples differ from each other and from the population from which
they are selected. For this purpose we will temporarily treat the
GSS as a KNOWN POPULATION, and draw random samples from it, and
compare them with not only each other but with the GSS from which
they were selected.

Set seed xxxxxx 
(replacing xxxxxx with last 6 digits of your student id number).

Next look at TVHOURS for a random sample that is 10% of the GSS
cases. One way to do this is to generate a variable x1, which has
the same values as TVHOURS for a random 10% sample of the GSS
cases, and is missing for the rest:

generate p1=uniform()
generate x1=.
replace x1=tvhours if p1<.1

Now that is just one of a very large number of different possible
samples; let's look at a few more.

generate p2=uniform()
generate x2=.
replace x2=tvhours if p2<.1

and similarly for x3, x4, and x5.

Calculate the summary statistics from those 5 different samples,
particularly noting the number of cases in each ("Obs"), the
mean, the smallest value ("Min") and the largest value ("Max").
Did the person who reports watching TV 24 hours per day happen to
be included in any of your samples?

Write a paragraph about how the five random samples you took
differed from one another, and how they differed from the whole
2992 case "population" from which they were selected.