A robust statistical procedure is one whose conclusions are still approximately correct even when its assumptions are not quite true. A procedure will be robust if the probability calculations it requires are not sensitive to deviations from the assumptions.
Moore and McCabe discuss robustness or lack thereof for various procedures. For example, they assert, "the t procedures are quite robust against nonnormality of the population except in the case of outliers or strong skewness" (p 516), but "the F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice" (p 570).
When the population distribution is not Normal, the statistician is faced with deciding which other distribution is appropriate, and then working through the equations for a sampling distribution from that alternative population distribution. Often such equations are intractible. (Even mathematical statisticians are frustrated by equations they are unable to solve!)
The Bootstrap is one of the methods that aims to substitute computational power for deductive power, at least in cases where attempts to deduce sampling distributions have failed.
[Although Moore and McCabe cover this after covering the difference of two means, this is actually simpler, in that there is no second parameter around complicating matters here, as the standard deviation does in inference about means.]
The difference between two sample proportions is a random variable whose mean is the corresponding difference between population proportions (0 under the null hypothesis), and whose variance is the sum of the variances of the two sample proportions. [Note that it is variances, not standard deviations, that are additive; hence the formula for the standard error (Moore and McCabe, page 602) has two familiar- looking p(1-p)/n terms added together, all under the square root symbol.]
Otherwise the calculations proceed much as for a single proportion; in particular, with suitably large samples, one can use the Normal distribution as previously to calculate the critical region for a test with prespecified significance level, and, once data are available, confidence limits, or significance level of the data against the null hypothesis.
Overall, the two-sample case here resembles the one-sample case considered previously, in Topic 8. In particular, both are based on t distributions. But they have some important differences; compare the formulae for t in the boxes on pages 508 and 541:
For example, when one asks why males are overrepresented among highly paid professionals, and asks whether this may be related to gender differences in education, it hardly seems relevant that their means may differ by a fraction of a year, since both means are around 13 years, far below the educational levels of highly paid professionals. The skewnesses of the two education distributions might be more relevant, since doctors and lawyers are in the upper tail of the distributions of education. Similarly, the spreads of the two education distributions might be more relevant than their means, and the spreads look even more promising when one notices that males are overrepresented among the homeless near the bottom of the social hierarchy, as well as among highly paid professionals near the top of the social hierarchy.
The statistical test considered next is one that permits comparison of the entire shapes of two distributions, not just their means.
In Chi-square tests, one calculates the expected frequencies under some theoretical model, and compares them with the corresponding observed frequencies, using the formula:
ChiSquare = Sum[((observed - expected)^2)/expected]
where the summation is over all cells of the table.
The degrees of freedom, which tells which part of the Chi-square table to use to find the significance level, is found as follows: df = (number of categories) - (number of parameters estimated from the dataset being fitted) - (number of constraints on parameters). The latter constraints are such things as requiring expected frequencies to have the same marginal totals as the observed frequencies.
We have already seen an instance of the Chi-square test, earlier in the course when we covered conditional probability and independence; the theoretical model in that case was independence of the row and column variables in a table. This special case, where the model being fitted is one of independence, is treated in Chapter 9. In our previous application to actual data, stata automatically calculated expected values, the value of Chi-squared, the degrees of freedom, and the significance probability.
In comparison of two populations, the null hypothesis is that the score on the measured variable (e.g., education) is independent of the population of which one is a member (e.g., gender). The expected number for any particular cell of the table is the product: (number of cases)(fraction of cases in that population)(fraction of cases in that score category). Note that this can be interpreted as a n times a joint probability calculated (under independence) as the product of the two corresponding marginal probabilities. [On page 634 of Moore and McCabe there is a computational formula with n cancelled from both numerator and denominator, thereby making the arithmetic a bit simpler, but that may seem ad-hoc compared to the general rule about joint probabilities when the events are independent.]