UCLA Soc. 210A, Topic 9, More Inference Procedures

27nov2000

UCLA Soc. 210A, Topic 9, More Inference Procedures

Professor: David D. McFarland

Web Pages for Fall 2000

Syllabus for logistics
ClassWeb site for announcements, discussion board
Outline for course content

Topic 9: More Inference Procedures

Assignment 9

This final segment of the quarter is devoted to two of the many things that go beyond the basics already covered:

Robustness and the Bootstrap. Some kinds statistical procedures have rationales that invoke assumptions, such as a Normal population distribution. We would like to know to what extent those procedures are "robust", in the sense of still leading to approximately correct conclusions, even when the population distribution differs from those assumptions. We will also consider "the bootstrap", one specific computationally intensive technique that can sometimes be used to avoid such assumptions.
Inference comparing two populations. This gets beyond the predominantly univariate emphasis in 210A, to consider the relationship between two variables, one of which distinguishes the two populations (e.g., gender), and the other of which has a distribution in each population (e.g., education).

Reading Assignments:

Moore and McCabe, material on Robustness (pp 515-517 in 7.1, "Inference for the Mean of a Population; p 570 in 7.3, "Optional Topics")
Moore and McCabe, material on Bootstrap (pp 445-446 in 6.1, "Estimating with Confidence")
Hamilton, pp 302-309, "Bootstrapping"
Moore and McCabe, 8.2, "Comparing Two Proportions"
Moore and McCabe, 7.2, "Comparing Two Means"
Hamilton, 119-122, material on ttest
Moore and McCabe, ch 9, "Inference in 2-way Tables"
Hamilton, 103-106, "Frequency Tables and Two-Way Cross-Tabulations"

Robustness and the Bootstrap

Statistical procedures commonly have rationales that include assumptions which might not be true. For example, many of them assume that the population distribution is Normal. This might not be true; we know there are many non-Normal distributions and the population about which we are making inferences might be one of those.

A robust statistical procedure is one whose conclusions are still approximately correct even when its assumptions are not quite true. A procedure will be robust if the probability calculations it requires are not sensitive to deviations from the assumptions.

Moore and McCabe discuss robustness or lack thereof for various procedures. For example, they assert, "the t procedures are quite robust against nonnormality of the population except in the case of outliers or strong skewness" (p 516), but "the F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice" (p 570).

When the population distribution is not Normal, the statistician is faced with deciding which other distribution is appropriate, and then working through the equations for a sampling distribution from that alternative population distribution. Often such equations are intractible. (Even mathematical statisticians are frustrated by equations they are unable to solve!)

The Bootstrap is one of the methods that aims to substitute computational power for deductive power, at least in cases where attempts to deduce sampling distributions have failed.

From the population, select (without replacement) an original sample of size n
From that original sample, select (with replacement) a bootstrap sample, also of size n. [Note that because of replacement, some members of the original sample will be selected more than once, others once, yet others not at all.]
Repeat the previous step k times, drawing a total of k successive bootstrap samples, all from the same original sample. [Here k will be large, in the hundreds or thousands.]
Regard the bootstrap samples as constituting an empirical sampling distribution from the original sample. Regard the original sample in turn as an approximation of the population. Combining those, regard the bootstrap samples as constituting an approximate sampling distribution from the population.
For example, use the 2.5th and 97.5th percentiles of the distribution of bootstrap samples as endpoints of a 95% confidence interval. [This result is called the "percentile confidence interval", and denoted (P); See Hamilton pp 304-305 on this and alternative confidence intervals.]
Note that this procedure, while not assumption-free (it does, after all, depend on the original sample having a distribution that closely approximates that of the population), does not depend on the population distribution being Normal or any other particular shape.

Inference Comparing Two Populations

Inference about the Difference of Two Proportions. This deals with two populations, and the null hypothesis is that the proportion of cases with some characteristic is the same in the two; i.e., that the observed difference in proportions between the two samples is from sampling variability, rather than from actual difference between the two populations.
[Although Moore and McCabe cover this after covering the difference of two means, this is actually simpler, in that there is no second parameter around complicating matters here, as the standard deviation does in inference about means.]
The difference between two sample proportions is a random variable whose mean is the corresponding difference between population proportions (0 under the null hypothesis), and whose variance is the sum of the variances of the two sample proportions. [Note that it is variances, not standard deviations, that are additive; hence the formula for the standard error (Moore and McCabe, page 602) has two familiar- looking p(1-p)/n terms added together, all under the square root symbol.]
Otherwise the calculations proceed much as for a single proportion; in particular, with suitably large samples, one can use the Normal distribution as previously to calculate the critical region for a test with prespecified significance level, and, once data are available, confidence limits, or significance level of the data against the null hypothesis.
Inference about the Difference of Two Means. This deals with two populations, and the null hypothesis that the two have identical population means on some variable. For example, that employed men and women have identical means on the GSS number of years of schooling variable, EDUC.
Overall, the two-sample case here resembles the one-sample case considered previously, in Topic 8. In particular, both are based on t distributions. But they have some important differences; compare the formulae for t in the boxes on pages 508 and 541:
- the numerator is now the difference between the two sample means (rather than the difference between the sample mean and the hypothesized population mean).
- the denominator is now the standard error of the difference between two sample means (instead of the standard error of a single sample mean).
- the degrees of freedom does not have a simple formula, but may be approximated various ways, preferably by computer (see pages 549-550).
As previously, the test would be one-tailed or two-tailed depending on whether the alternative hypothesis is directional or not; and except in quite small samples the Gaussian is a good approximation to the t distribution.
Is that the appropriate comparison? Many substantive questions about two populations really pertain not to their means, but to other aspects of their distributions.
For example, when one asks why males are overrepresented among highly paid professionals, and asks whether this may be related to gender differences in education, it hardly seems relevant that their means may differ by a fraction of a year, since both means are around 13 years, far below the educational levels of highly paid professionals. The skewnesses of the two education distributions might be more relevant, since doctors and lawyers are in the upper tail of the distributions of education. Similarly, the spreads of the two education distributions might be more relevant than their means, and the spreads look even more promising when one notices that males are overrepresented among the homeless near the bottom of the social hierarchy, as well as among highly paid professionals near the top of the social hierarchy.
The statistical test considered next is one that permits comparison of the entire shapes of two distributions, not just their means.
Chi-square Goodness of Fit Test. This is a versatile test for how well some data fit some theoretical model.
In Chi-square tests, one calculates the expected frequencies under some theoretical model, and compares them with the corresponding observed frequencies, using the formula:
ChiSquare = Sum[((observed - expected)^2)/expected]
where the summation is over all cells of the table.
The degrees of freedom, which tells which part of the Chi-square table to use to find the significance level, is found as follows: df = (number of categories) - (number of parameters estimated from the dataset being fitted) - (number of constraints on parameters). The latter constraints are such things as requiring expected frequencies to have the same marginal totals as the observed frequencies.
We have already seen an instance of the Chi-square test, earlier in the course when we covered conditional probability and independence; the theoretical model in that case was independence of the row and column variables in a table. This special case, where the model being fitted is one of independence, is treated in Chapter 9. In our previous application to actual data, stata automatically calculated expected values, the value of Chi-squared, the degrees of freedom, and the significance probability.
In comparison of two populations, the null hypothesis is that the score on the measured variable (e.g., education) is independent of the population of which one is a member (e.g., gender). The expected number for any particular cell of the table is the product: (number of cases)(fraction of cases in that population)(fraction of cases in that score category). Note that this can be interpreted as a n times a joint probability calculated (under independence) as the product of the two corresponding marginal probabilities. [On page 634 of Moore and McCabe there is a computational formula with n cancelled from both numerator and denominator, thereby making the arithmetic a bit simpler, but that may seem ad-hoc compared to the general rule about joint probabilities when the events are independent.]