29oct2000 Soc. 210a, McFarland Assignment 6. Study Design 1. After reading Sections 3.1 and 3.2 of Moore and McCabe, do the following exercises beginning on page 250. Explain how you arrived at each answer; do not merely state the answers. 3.9, 3.11, 3.13, 3.18, 3.23, 3.29, 3.31 2. After reading Section 3.3 of Moore and McCabe, do the following exercises beginning on page 262. Explain how you arrived at each answer; do not merely state the answers. 3.33*, 3.35, 3.39, 3.43, 3.45, 3.49 * In exercise 3.33, you should go beyond Moore and McCabe. Notice that many "employed adult women" are employed in types of occupations which do not lead to membership in a "business and professional women's club", and distinguish between the theoretically relevant population and the population from which the sample was actually selected. 3. After reading Section 3.4 of Moore and McCabe, do the following exercises beginning on page 262. Explain how you arrived at each answer; do not merely state the answers. 3.51, 3.53, 3.55, 3.73, 3.79 4. Prevalence of various kinds of sexual behavior has been subject of anecdotal and volunteer studies until fairly recently, but the GSS 1994 data include some such questions asked in a properly designed survey sample. The GSS variable SEXSEX asks respondents to specify whether their sex partners were all male, all female, or both sexes. The variable SEX tells whether the respondents themselves were male or female. Cross-classify those to determine the proportion of each sex that report either only their own sex or both sexes. After taking care of missing data, the stata command tabulate sex sexsex , row produces a table from which the answers can be extracted separately for males and females. 5. Skim Appendix A of the GSS codebook to find Table A.3, near the bottom of which is a row indicating the Response Rate year by year. (a) What was the 1994 GSS response rate? (b) Review your lecture notes, and write a sentence describing one of the ways survey researchers attempt to assess the effects of nonparticipation on survey results. 6. Read Hamilton, pages 55-58, "Using Random Variables and Random Sampling". Load the stata program, use the TVHOURS variable from the GSS94.DTA file, and change -1 and 99 codes to missing data codes. Do the usual univariate statistics for the TVHOURS variable. Ordinarily the GSS is treated as a random sample of US adults, and examined for what light it may shed on the entire population of US adults. Here, however, we wish to examine how random samples differ from each other and from the population from which they are selected. For this purpose we will temporarily treat the GSS as a KNOWN POPULATION, and draw random samples from it, and compare them with not only each other but with the GSS from which they were selected. Set seed xxxxxx (replacing xxxxxx with last 6 digits of your student id number). Next look at TVHOURS for a random sample that is 10% of the GSS cases. One way to do this is to generate a variable x1, which has the same values as TVHOURS for a random 10% sample of the GSS cases, and is missing for the rest: generate p1=uniform() generate x1=. replace x1=tvhours if p1<.1 Now that is just one of a very large number of different possible samples; let's look at a few more. generate p2=uniform() generate x2=. replace x2=tvhours if p2<.1 and similarly for x3, x4, and x5. Calculate the summary statistics from those 5 different samples, particularly noting the number of cases in each ("Obs"), the mean, the smallest value ("Min") and the largest value ("Max"). Did the person who reports watching TV 24 hours per day happen to be included in any of your samples? Write a paragraph about how the five random samples you took differed from one another, and how they differed from the whole 2992 case "population" from which they were selected.