Here we consider just a few concrete examples of hypothesis tests; the Moore and McCabe book contains many more, and omits far more than it contains. The point here is to understand the main logic of hypothesis tests, not to try to learn all the tests for all the possible situations.
One of the most important uses of statistical tests is really beyond the predominantly univariate scope of this quarter's course, but will arise frequently in 210b and 210c, which treat complicated multivariate models. This is a test of an hypothesis to the effect that a specific simpler model will suffice for the data at hand.
For example, in predicting whether a high school graduate goes on to college, based on parental and grandparental education and income and other variables, one might wish to test whether grandparents have any direct effect on grandchildren, beyond their indirect effects through the intervening generation, and, if not, to simplify the model by omitting the grandparental variables.
Note that a researcher may be accustomed to stating substantive hypotheses differently, in two respects: positively, and vaguely. A sociological theory may lead one to expect that some coefficient is important, rather than unimportant, as a value of 0 would imply, but sociological theories are seldom sufficiently precise to specify any particular numerical value.
With very large samples, authors commonly save a tree or two by not bothering to report that every null hypothesis they considered was emphatically rejected, or save some effort by not bothering with formal tests in the first place. For example, some research on the 5-county Los Angeles area, which had a 14.5 million population in 1990, is based on a 5% sample, the US Census Bureau's 1990 Public Use Microdata Sample (PUMS). But 5% of 14.5 million is around 700,000, and when that value of n is plugged into a formula for standard error (in the denominator), the result is very close to zero.
The only cautions are when rare subpopulations, rather than the entire population, are the topic under consideration, and (more relevant in 210B and 210C than here) when one is testing hypotheses about a complicated model that incorporates a large number of variables.
Example: The book, Ethnic Los Angeles, edited by Waldinger and Bozorgmehr, includes many results based on about 700,000 cases in the 5% PUMS sample, rather than the 14.5 million in the entire population. Most of its prose ignores that distinction--as it should, since 700,000 cases is a huge sample, much larger than needed for precise estimates of the kinds of things being discussed therein. The book contains numerous tables, but they are not cluttered with p-values and double asterisks denoting statistical significance beyond the .01 level. [One apparent exception turns out not to be. A table with columns labeled "P*" in Ortiz' chapter on the Mexican-origin population (page 270) is not about either p-values or null hypotheses rejected at the .05 significance level. Rather, the quantity denoted P* there is an index that measures exposure of members of one ethnic group to members of another ethnic group (page 476).]
Several related concepts are as follows:
The sampling distribution of the sample mean is the t distribution, so one calculates t from the sample data, and compares it with values in the table of the t distribution.
Unless one has a directional alternative hypothesis, the alternative is simply that this population is different from the one specified in the null hypothesis, and the appropriate test is two-tailed. Using the conventional .05 significance level, the critical region would be chosen to cut off .025 probability in each tail, and the null hypothesis rejected if the observed t value lies in either half of the critical region.
The t distribution differs from the Gaussian when df is small, such as 10 or 20, but for df as large as 100 or so, the Gaussian (shown in the t table as the bottom row, with df = infinity) is a good approximation.
In Chi-square tests, one calculates the expected frequencies under some theoretical model, and compares them with the corresponding observed frequencies, using the formula: Chi-square = Sum[ (observed - expected)^2 /expected ] Each discrepancy is squared, and the square divided by the expected frequency; then all such terms are summed.
Under the null hypothesis that observed frequencies are from the same distribution used to calculate the expected frequencies, the value of Chi-square follows a distribution of the same name, which appears in Moore and McCabe's Table F, on page T-20. The Chi-square distribution has one parameter, called "degrees of freedom", or "df" for short.
The degrees of freedom, which tells which part of the Chi-square table to use to find the significance level, is found as follows: df = (number of categories) - (number of parameters estimated from the dataset being fitted) - (number of constraints on parameters). The latter constraints are such things as requiring expected frequencies to have the same marginal totals as the observed frequencies.
We have already seen an instance of the Chi-square test, earlier in the course when we covered conditional probability and independence; the theoretical model in that case was independence of the row and column variables in a table. This special case, where the model being fitted is one of independence, is treated in Moore and McCabe, Section 9.2. In our application to actual data, stata automatically calculated expected values, the value of Chi-squared, the degrees of freedom, and the significance probability.
A more recent discussion of some of the same issues, but now with a Bayesian slant, is given in the article by Berk, Western, and Weiss (1995). Note, however, that those authors have not settled the matter to the satisfaction of their own critics. Still, this article does provide some progress over earlier authors who merely complained about hypothesis tests being used to justify inferences to theoretical relevant populations from which the data at hand were not random samples; Berk et al. take the further step of proposing alternative procedures for some such situations.
Notice that the controversy is not about statistical hypothesis testing per se, as much as about its use in situations where the data being analyzed are not a probability sample from some larger population of theoretical interest.
Consider a simplified situation involving only two hypotheses, H1 and H2, and only two possible values for the data to be collected, D1 and D2. The likelihood of an hypothesis, given the data, is defined as the conditional probability of the observed data, conditioned on that hypothesis being true. Thus if we observed data outcome D1, we would consider the likelihoods of the two different hypotheses, given the one data outcome actually observed:
Bayes' Theorem, with subjective prior and posterior probabilities, commonly is used with continuous distributions, but we will consider only a couple of discrete examples, whose mathematics is much more straightforward, while still giving some of the flavor of Bayesian inference.
The Bayesian begins with subjective prior probabilities expressing his or her degree of belief in the hypotheses, namely p(H1) and p(H2), two non-negative numbers which (in the simplified case we consider, which has only two hypotheses) sum to 1.0.
On observing the data, the Bayesian revises those subjective probabilities, replacing his or her prior probabilities with posterior probabilities; in particular, replacing p(H1) with p(H1|data) and replacing p(H2) with p(H2|data). Bayes' Rule tells how to calculate the appropriate revised subjective probabilities.
In case the data happened to have the outcome D1, these revisions would be:
A probability distribution for a set of competing hypotheses is called diffuse if the various hypotheses (two in our example) are given nearly equal values, and is called informative if some hypotheses are given much higher probabilities than others.
Data may also be informative or not, depending on whether some particular outcomes are much more probable under some hypotheses than under other hypotheses. If the data are relatively informative, compared to the priors, the posterior probabilities will depend mainly on the data, and the Bayesian will reach conclusions similar to those of a frequentist.
Berk, Richard A., Bruce Western, and Robert E. Weiss. 1995. "Statistical Inference for Apparent Populations." Sociological Methodology 25: 421-458. [With discussions by: Kenneth A. Bollen; Glenn Firebaugh; Donald B. Rubin; and Reply by Berk, Western, and Weiss.]
Ortiz, Vilma. 1996. "The Mexican-Origin Population: Permanent Working Class or Emerging Middle Class?" Chapter 9, pp. 247-277, in Waldinger and Bozorghmer 1996.
Waldinger, Roger, and Medhi Bozorghmer, eds. 1996. Ethnic Los Angeles. New York: Russell Sage Foundation.