THE UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Policy Studies
Policy Studies 208 - Policy Research and Analysis
January 13, 1998
Cameron
Problem Set # 1: Univariate Statistics (Review)
Outline of Solutions
 
[For review. Will not be graded in detail. Will not count quantitatively towards course grade; but must be submitted for inspection.]
 
INSTRUCTIONS: To get you up to speed, this problem set highlights some of the concepts you should have grasped in your prerequisite coursework. The first three or four lectures will be devoted to a thorough, although quick, review of this material. This problem set covers material in Gujarati, Chapter 2.

1. We will be using summation notation in this course. What do the following stand for? (Simplify to the extent possible.)

a.)  = x-2 + x-1 + x0 + x1

b.)  = (0+1) + (1+1) + (2+1) + (3+1) = 1+2+3+4=10

c.)  = 22 + 23 = 4+8 = 12

d.)  = x1y3 + x1y4 + x2y3 + x2y4 + x3y3 + x3y4

e.)  = 4(5xj2 - 2), since the summation is over i, not j.

f.)  = 7n, since this is the sum of a constant n times.
 

2. Correlation is a measure of the degree of linear relatedness of two variables. If Y and X are uncorrelated, then they are statistically independent (i.e. a scatterplot of their values will be an amorphous blob). True, False, Uncertain? Explain.

False. Two variables may display a very rigid relationship that is nonlinear, so that correlation will be zero. Consider, for example, two variables for which a scatterplot looks like a perfect circle. Their correlation will be zero, but they definitely bear a systematic relationship.

3. For each of the following, is this a complete and valid probability distribution (in the case of a discrete random variable) or a complete and valid probability density function (in the case of a continuous random variable)? Why or why not?

a.)  f(Y) = P(Y = yi) = .50  This is not a complete probability distribution because for the entire distribution to be described, the sum of the probabilities must be one. We have the probability associated with one possible value of the random variable, not its whole distribution.

b.)  f(X) = .2 when x = 0, 1, 2;  f(X) = .1 when x = 3, 4, 5, 10;  f(X) = 0 otherwise.  There are seven distinct discrete values that this random variable can take on, with probability .2 associated with each of the first three values and .1 associated with the last four. This means that each probability is between zero and one, and the sum of the probabilities is unity. This is a valid probability function for a discrete random variable. Note that there is no need for the admissible x values to be all positive or all equally spaced.

c.) The random variable X can take on four different values: -1, 0 , 1 , 3, with corresponding probabilities f(X) = -.2, .5, .9, -.2.  It is alright for the values of a discrete random variable to be negative, but the probabilities associated with any value of a discrete random variable cannot be negative, hence this is not a valid probability function, even though the sum of the probabilities is unity.

d.) f(X) =  x-1,   1 < x < 3;   0 otherwise  This is a continuous random variable. To assess the validity of this probability density function, sketch the shape of the function in a graph. In a plot of f(X) against X, this function is a straight line, with height zero at X=1, and height 2 and X=3. The area under this function is a triangular shape, with base 2 and height 2, and thus area (1/2)*b*h= 2. The area is too large, so it is not a valid probability density function.

e.) f(X,Z) = 1/3,   0 < x < 1;   2 < z < 5;   0 otherwise  This is appears to be a joint continuous distribution. The height of the probability density function is the same everywhere (uniform). The volume under the function must cumulate to one, so this is what we must check. This volume is a box with width 1 and length 3 and height 1/3. Volume is then 1*3*(1/3)=1. Therefore, this appears to be a valid joint probility density function.

f.) f(X,Z) = 1/9,   x = 0, 1, 2;   z = 2, 3, 4;   0 otherwise  This appears to be a joint discrete distribution. The joint probability function can be represented as a "forest of sticks." All of the sticks are the same height=1/9. There are nine possible values for the joint distribution (three each for both x and z). Thus this is a valid probability function.

4.  For the following joint (or bivariate) discrete distribution of the variables Y and X:

     3   |    0.1      0.1      0.1
         |
     2   |    0.1      0.2       0
         |
   Y=1   |    0.3      0.1       0
--------------------------------------
   X=    |     0        1        2

Assuming that this joint discrete probability function is the true population distribution, f(X,Y), compute:

a.) the marginal distribution f(Y), its mean and its variance;

For the variable Y, we ignore the different possible values of X and "sweep" all the probability associated with each possible value of Y over to the right margin of the table. For Y = 1, 2, 3, we get probabilities 0.4, 0.3, and 0.3, respectively.

To get the marginal mean of Y, multiply each value of Y by its associated probability and add: 1(.4)+2(.3)+3(.3) = 0.4+0.6+0.9 = 1.9. To get the variance, use the simplest variance formula Var[Y] = E[Y2] - (E[Y])2. We already know the E[Y]. For E[Y2], weight each squared value of Y by with the probability associated with the value of Y you are squaring. The formula will be 0.4(1)+0.3(4)+0.3(9) = 0.4 + 1.2 + 2.7 = 4.3. Now, Var[Y] = 4.30 - (1.9)(1.9) = 4.30 - 3.61 = 0.69.

b.) the conditional distribution of Y and its mean, given that X=0; given that X=2. Does the conditional mean of Y appear to be related to the magnitude of X? How? [Recall: the relative frequencies in a conditional distribution must be scaled so that the probabilities sum to one.]

If X=0, we want to concentrate only on the first column in the table of the joint distribution. At the margin, there is an overall 0.5 probability of X=0, so we divide all joint probabilities in the first column by 0.5 to "blow up" the probability function so that the sum of the probabilities is one, yet the relative amounts of probability associated with each value of Y are the same as in the body of the joint distribution for X=0. The conditional distribution p(Y|X=0) associates probabilities .6, .2, .2 with Y = 1, 2, 3, respectively. To get the expected value of Y given X=0, we take each possible value of Y, weight by its associated probability, and add up the terms. E[Y|X=0]=.6(1)+.2(2)+.2(3) = .6 + .4 + .6 = 1.6.

When X=2, the conditional distribution of Y involves dividing the joint probabilities in the third column of the table by 0.1. This rather simple, since the only value of Y that can occur when X=2 is 3. Probabililty 1.0 is thus associate with Y=3. The expected value of Y when X=2 is 3.

Yes, the conditional expectation of Y appears to get larger as X gets larger. This means the variables are correlated.

c.) given your answer in (b.), can the random variable Y be statistically independent of X? (I.e., is the test for independence, f(X,Y) = f(X)f(Y) violated for any of these specific (x,y) pairs?)

No, since the conditional distribution of Y varies with X in these data, it is not possible for Y to be statistically independent of X. You could verify this by finding at least one cell in the joint distribution for which the probability in the cell is not equal to the product of the two marginals. For example, pick the lowest left cell. The marginal probability of X=0 is 0.5; the marginal probability of Y=1 is 0.4. If these two variables were independent, it would have to be the case that the probability in the corresponding cell in the body of the table was (0.5)*(0.4)=0.20. However, the entry in this cell is 0.3, thus the two variables cannot be statistically independent.

d.) compute the covariance between X and Y and then the correlation between these variables. Bear in mind that Cov(X,Y) equals E(XY)-E(X)E(Y) and Corr(X,Y) equals Cov(X,Y) divided by the product of the individual marginal standard deviations of the two variables.

We did not get to this in class prior to the due date for this homework, so we will go easy on the grading. A useful strategy is as follows: First, compute a table containing the values of X times Y for each cell:

     3   |     0        3        6 
         |
     2   |     0        2        4
         |
   Y=1   |     0        1        2
--------------------------------------
   X=    |     0        1        2

To get the E[XY], you need to weight each value of XY by its corresponding probability, then add up across all nine cells. It is easiest to do this by creating a third table, and first putting zeros everywhere either table has a zero:

     3   |     0       0.3      0.6
         |
     2   |     0       0.4       0
         |
   Y=1   |     0       0.1       0
--------------------------------------
   X=    |     0        1        2

Now, when you add up across all these cells, you get E[XY]=1.4. But you are not yet done, because you still need the product of the marginal expectations of Y and X separately: E[Y]=1.9 as above. E[X]=0.5(0)+0.4(1)+0.1(2)=0.4+0.2=0.6. Thus Cov[X,Y]=1.4 - (1.9)(0.6) = 0.26.

To get the correlation, we need to divide the covariance by the two standard deviations of the individual variables. We already know the variance of Y, so take its square root to get Std.Dev[Y] = 0.831. Now we need to go back and compute the marginal distribution of X and get its variance. The E[X]=0.5(0) + 0.4(1) + 0.1(2) = 0.4+0.2 = 0.6. To get the variance of X, first calculate E[X2]. This will be 0.5(0) + 0.4(1) + 0.1(4) = 0.4 + 0.4 = 0.8. Var[X] = 0.8 - (0.6)(0.6)= 0.8 - 0.36 = 0.44. The square root of this is 0.663. Therefore, finally, the correlation is Cov[X,Y] / (Std.Dev[X] Std.Dev[Y]) = 0.26/(0.831*0.663) = 0.472.

5. Suppose you are told that E(X) = 4 and that Var(X) = 16. What are the expected values and variances of the following expressions? [Recall the formula for the mean and variance of a linear function of a single random variable.]

a.) Y = 3X + 2:    E[Y] = 3*E[X] + 2 = 3*4 + 2 = 14; Var[Y] = 9*Var[X] = 9*16 = 144.

b.) Y = .6X - 3:    E[Y] = .6*E[X]-3 = .6*4 - 3 = .24-3= -0.6; Var[Y] = (0.6)(0.6)Var[X] = 0.36*16 = 5.76.

c.) Y = X/5:    E[Y]=E[X]/5 = 4/5 = 0.8; Var[Y]=(1/5)(1/5)Var[X]=(1/25)*16 = 16/25.

d.) Y = aX + b, where a and b are scalar constants:    E[Y] = a E[X] + b; Var[Y] = a2 Var[X].

 
6. What is the formula for the variance of the linear combination aX1 + bX2, where X1 and X2 are two random variables? Let X1 stand for the rate of return on one security, and let X2 stand for the rate of return on another security, and let E(X1) = E(X2). A simple "investment portfolio" would consist of a combination of these two securities. Suppose that Var(X1) is 16 and Var(X2) is 9 and that the rates of return have a correlation of 0.6. If you had $10,000 to invest, would you be better off to invest all of it in security 1, in security 2, or half in each? This is the essence of modern portfolio theory.

The relevant formula is Var[aX1 + bX2] = a2Var[X1] + b2Var[X2] + 2ab Cov[X1,X2]. If both investments have the same expected return, then an portfolio consisting of different proportions of the two would have the same return as well. You would want to choose a combination that has lesser variance. You should therefore compare the variance of 1*X1, 0.5*X1+0.5*X2, and 1*X2. The variance in the first case is 16; in the third case, it is 9. In the second case, we need to use the formula for a linear combination of random variables. We have the two separate variances, but we need the covariance. It can be recovered from information about the correlation. Since Corr[X1,X2] = Cov [X1,X2]/{Std.Dev[X1]*Std.Dev[X2]}. Thus, to get the covariance, we need to use the correlation times the two standard deviations, which gives 0.6*4*3 = 7.2. Therefore, the variance of the portfolio that is made up of half one security and half the other is 0.25*16 + 0.25*9 + 2*0.5*0.5*7.2 = 4+2.25*3.6 = 9.85. If you are limited to only these three choices, the lowest-variance choice putting all your money in security 2.

However, if you are ambitious, try letting the coefficients be (a) and (1-a), where a is the proportion of your money you put in the first security, and (1-a) is the proportion in the second security. If you know calculus, try minimizing the variance of the linear combination over all possible choices of a. If I recall correctly from last time I worked this out longhand, the optimal a is not zero, although it is closer to zero than to .5.

7. One of my favorite ways to remember the difference between probability theory and statistical inference is to contrast the endeavors of the professor and the student around final exam time. The student (having sat through the course) knows the population of possible questions that could be asked on the final exam and, in the process of studying, tries to ascertain which ones are most likely to be asked on the final. On the other hand, the professor asks only a limited number of questions on the final exam, but must try to ascertain from this sample what proportion of the subject matter each student has actually mastered. Who is thinking about probability theory and who is conducting statistical inference? Explain.

The students are thinking about probability theory, and the professor must conduct statistical inference to assess who knows enough about the overall body of subject matter.

8. Is there any difference between a mean, an average, an expected value, and a first moment around zero? Is there any difference between a standard deviation and a standard error? Is there any difference between a mean squared deviation (MSD), a variance, and a second moment around the mean? Is there any difference between variance in the population and variance in a sample? [If your prerequisite used different terminology, don't panic. We'll sort it out.] 

For our purposes, we can treat the first four as being equivalent. Technically, the concept of "average" is usually used with samples, whereas "expected value" refers to the population distributions, as does the "first moment about zero." We often use standard deviation to refer to the population distribution for a random variable, whereas standard error is typically used to describe the dispersion in an estimated parameter. In a population distribution, the MSD, variance and second moment about the mean are the same thing. However, the population variance, when being inferred from the MSD in a sample, requires a "degrees of freedom" correction, because the sample MSD is a biased estimate of the corresponding population characteristic.


COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS

Updated: January 12, 1998
Prepared by: Trudy Ann Cameron