11nov2000

UCLA Soc. 210A, Topic 8, Logic of Statistical Inference

Professor: David D. McFarland


Web Pages for Fall 2000



Topic 8: Logic of Statistical Inference


This section is devoted to the logic of statistical inference, particularly the rationale for tests of statistical hypotheses.

Here we consider just a few concrete examples of hypothesis tests; the Moore and McCabe book contains many more, and omits far more than it contains. The point here is to understand the main logic of hypothesis tests, not to try to learn all the tests for all the possible situations.

One of the most important uses of statistical tests is really beyond the predominantly univariate scope of this quarter's course, but will arise frequently in 210b and 210c, which treat complicated multivariate models. This is a test of an hypothesis to the effect that a specific simpler model will suffice for the data at hand.

For example, in predicting whether a high school graduate goes on to college, based on parental and grandparental education and income and other variables, one might wish to test whether grandparents have any direct effect on grandchildren, beyond their indirect effects through the intervening generation, and, if not, to simplify the model by omitting the grandparental variables.

Null hypothesis

A "null" hypothesis posits a particular numerical value for some population parameter, and the statistical test determines how compatible the data are with that hypothesis. Typically it is an hypothesis to the effect that some coefficient is 0, or that there is no difference between two coefficients, or some other negatively stated proposition.

Note that a researcher may be accustomed to stating substantive hypotheses differently, in two respects: positively, and vaguely. A sociological theory may lead one to expect that some coefficient is important, rather than unimportant, as a value of 0 would imply, but sociological theories are seldom sufficiently precise to specify any particular numerical value.

Alternative hypotheses

An alternative hypothesis may be at several different levels of specificity.

Sample size

An important consideration in the design stage (not after the data have already been collected) is the sample size. For example, one might wish to have a sample sufficiently large to detect it if the population value were as large as .52, rather than .5 as specified in the null hypothesis. That type of consideration gives a specific alternative hypothesis, which we can put into the appropriate formula, and solve for the required sample size.

With very large samples, authors commonly save a tree or two by not bothering to report that every null hypothesis they considered was emphatically rejected, or save some effort by not bothering with formal tests in the first place. For example, some research on the 5-county Los Angeles area, which had a 14.5 million population in 1990, is based on a 5% sample, the US Census Bureau's 1990 Public Use Microdata Sample (PUMS). But 5% of 14.5 million is around 700,000, and when that value of n is plugged into a formula for standard error (in the denominator), the result is very close to zero.

The only cautions are when rare subpopulations, rather than the entire population, are the topic under consideration, and (more relevant in 210B and 210C than here) when one is testing hypotheses about a complicated model that incorporates a large number of variables.

Example: The book, Ethnic Los Angeles, edited by Waldinger and Bozorgmehr, includes many results based on about 700,000 cases in the 5% PUMS sample, rather than the 14.5 million in the entire population. Most of its prose ignores that distinction--as it should, since 700,000 cases is a huge sample, much larger than needed for precise estimates of the kinds of things being discussed therein. The book contains numerous tables, but they are not cluttered with p-values and double asterisks denoting statistical significance beyond the .01 level. [One apparent exception turns out not to be. A table with columns labeled "P*" in Ortiz' chapter on the Mexican-origin population (page 270) is not about either p-values or null hypotheses rejected at the .05 significance level. Rather, the quantity denoted P* there is an index that measures exposure of members of one ethnic group to members of another ethnic group (page 476).]

Power of a Statistical Test

The probability that a statistical test will conclude with rejection of the null hypothesis depends on (1) how far wrong the null hypothesis is, and (2) how large the sample is.

Several related concepts are as follows:

Some Specific Hypothesis Tests


Statistical Tests vs Interval Estimates

Tests and interval estimates can be converted back and forth, with a test rejecting (or accepting) an hypothesis if the hypothesized parameter value lies outside (or within) the confidence interval.

Controversy

Earlier in the quarter we cited a classic article by S. F. Camilleri, for its discussion of the three different ways in which probability considerations arise in sociology. That article also contained one of the early critiques of use of significance tests in sociology in circumstances where they could not be justified in terms of a random sample selected from the population to which inferences were being made.

A more recent discussion of some of the same issues, but now with a Bayesian slant, is given in the article by Berk, Western, and Weiss (1995). Note, however, that those authors have not settled the matter to the satisfaction of their own critics. Still, this article does provide some progress over earlier authors who merely complained about hypothesis tests being used to justify inferences to theoretical relevant populations from which the data at hand were not random samples; Berk et al. take the further step of proposing alternative procedures for some such situations.

Notice that the controversy is not about statistical hypothesis testing per se, as much as about its use in situations where the data being analyzed are not a probability sample from some larger population of theoretical interest.

Likelihood functions and Bayesian inference

Likelihood functions, are useful in inference, especially in some of the more complex models of 210B and 210C. We already saw likelihoods when we studied conditional probability, but now use them to revise prior beliefs in light of new data.

Consider a simplified situation involving only two hypotheses, H1 and H2, and only two possible values for the data to be collected, D1 and D2. The likelihood of an hypothesis, given the data, is defined as the conditional probability of the observed data, conditioned on that hypothesis being true. Thus if we observed data outcome D1, we would consider the likelihoods of the two different hypotheses, given the one data outcome actually observed:

Note that while both of those quantities are probabilities, they do not together constitute a probability distribution, and do not sum to 1.0 except by coincidence. They are probabilities of the same data outcome, D1, not probabilities of a mutually exclusive and exhaustive set of different possible data outcomes, as in the case of a probability distribution.

Bayes' Theorem, with subjective prior and posterior probabilities, commonly is used with continuous distributions, but we will consider only a couple of discrete examples, whose mathematics is much more straightforward, while still giving some of the flavor of Bayesian inference.

The Bayesian begins with subjective prior probabilities expressing his or her degree of belief in the hypotheses, namely p(H1) and p(H2), two non-negative numbers which (in the simplified case we consider, which has only two hypotheses) sum to 1.0.

On observing the data, the Bayesian revises those subjective probabilities, replacing his or her prior probabilities with posterior probabilities; in particular, replacing p(H1) with p(H1|data) and replacing p(H2) with p(H2|data). Bayes' Rule tells how to calculate the appropriate revised subjective probabilities.

In case the data happened to have the outcome D1, these revisions would be:

Bayesians are noncommital as to where the prior probabilities come from, and different Bayesians may bring different sets of priors to the same problem.

A probability distribution for a set of competing hypotheses is called diffuse if the various hypotheses (two in our example) are given nearly equal values, and is called informative if some hypotheses are given much higher probabilities than others.

Data may also be informative or not, depending on whether some particular outcomes are much more probable under some hypotheses than under other hypotheses. If the data are relatively informative, compared to the priors, the posterior probabilities will depend mainly on the data, and the Bayesian will reach conclusions similar to those of a frequentist.


References:

Berk, Richard A., Bruce Western, and Robert E. Weiss. 1995. "Statistical Inference for Apparent Populations." Sociological Methodology 25: 421-458. [With discussions by: Kenneth A. Bollen; Glenn Firebaugh; Donald B. Rubin; and Reply by Berk, Western, and Weiss.]

Ortiz, Vilma. 1996. "The Mexican-Origin Population: Permanent Working Class or Emerging Middle Class?" Chapter 9, pp. 247-277, in Waldinger and Bozorghmer 1996.

Waldinger, Roger, and Medhi Bozorghmer, eds. 1996. Ethnic Los Angeles. New York: Russell Sage Foundation.