1. We will be using summation notation in this course. What do the following stand for? (Simplify to the extent possible.)
a.)
= x-2 + x-1 + x0 +
x1
b.)
= (0+1) + (1+1) + (2+1) + (3+1) = 1+2+3+4=10
c.)
= 22 + 23 = 4+8 = 12
d.)
= x1y3 + x1y4 +
x2y3 + x2y4 +
x3y3 + x3y4
e.)
= 4(5xj2 - 2), since the summation is
over i, not j.
f.)
= 7n, since this is the sum
of a constant n times.
2. Correlation is a measure of
the degree of linear relatedness of two variables. If Y and X are
uncorrelated, then they are statistically independent (i.e. a
scatterplot
of their values will be an amorphous blob). True, False, Uncertain?
Explain.
False. Two variables may display a very rigid
relationship that is nonlinear, so that correlation will be zero.
Consider, for example, two variables for which a scatterplot looks like a perfect
circle. Their correlation will be zero, but they definitely bear a systematic
relationship.
3. For each of the following, is this a complete and valid probability distribution (in the case of a discrete random variable) or a complete and valid probability density function (in the case of a continuous random variable)? Why or why not?
a.) f(Y) = P(Y = yi) = .50 This is not a complete probability distribution because for the entire distribution to be described, the sum of the probabilities must be one. We have the probability associated with one possible value of the random variable, not its whole distribution.
b.) f(X) = .2 when x = 0, 1, 2; f(X) = .1 when x = 3, 4, 5, 10; f(X) = 0 otherwise. There are seven distinct discrete values that this random variable can take on, with probability .2 associated with each of the first three values and .1 associated with the last four. This means that each probability is between zero and one, and the sum of the probabilities is unity. This is a valid probability function for a discrete random variable. Note that there is no need for the admissible x values to be all positive or all equally spaced.
c.) The random variable X can take on four different values: -1, 0 , 1 , 3, with corresponding probabilities f(X) = -.2, .5, .9, -.2. It is alright for the values of a discrete random variable to be negative, but the probabilities associated with any value of a discrete random variable cannot be negative, hence this is not a valid probability function, even though the sum of the probabilities is unity.
d.) f(X) = x-1, 1 < x < 3; 0 otherwise This is a continuous random variable. To assess the validity of this probability density function, sketch the shape of the function in a graph. In a plot of f(X) against X, this function is a straight line, with height zero at X=1, and height 2 and X=3. The area under this function is a triangular shape, with base 2 and height 2, and thus area (1/2)*b*h= 2. The area is too large, so it is not a valid probability density function.
e.) f(X,Z) = 1/3, 0 < x < 1; 2 < z < 5; 0 otherwise This is appears to be a joint continuous distribution. The height of the probability density function is the same everywhere (uniform). The volume under the function must cumulate to one, so this is what we must check. This volume is a box with width 1 and length 3 and height 1/3. Volume is then 1*3*(1/3)=1. Therefore, this appears to be a valid joint probility density function.
f.) f(X,Z) = 1/9, x = 0, 1, 2; z = 2, 3, 4; 0 otherwise This appears to be a joint discrete distribution. The joint probability function can be represented as a "forest of sticks." All of the sticks are the same height=1/9. There are nine possible values for the joint distribution (three each for both x and z). Thus this is a valid probability function.
4. For the following joint (or bivariate) discrete distribution of the variables Y and X:
3 |
0.1
0.1 0.1
|
2 |
0.1
0.2 0
|
Y=1 |
0.3
0.1 0
--------------------------------------
X= |
0
1 2
Assuming that this joint discrete probability function is the true population distribution, f(X,Y), compute:
a.) the marginal distribution
f(Y), its mean and its variance;
For the variable Y, we ignore the different possible
values of X and "sweep" all the probability associated with each possible value of
Y over to the right margin of the table. For Y = 1, 2, 3, we get probabilities
0.4, 0.3, and 0.3, respectively. To get the marginal mean of Y, multiply each value of Y by its associated
probability and add: 1(.4)+2(.3)+3(.3) = 0.4+0.6+0.9 = 1.9. To get the variance,
use the simplest variance formula Var[Y] = E[Y2] - (E[Y])2.
We already know the E[Y]. For E[Y2], weight each squared value of Y by
with the probability associated with the value of Y you are squaring. The formula
will be 0.4(1)+0.3(4)+0.3(9) = 0.4 + 1.2 + 2.7 = 4.3. Now, Var[Y] = 4.30 -
(1.9)(1.9) = 4.30 - 3.61 = 0.69.
b.) the conditional distribution
of Y and its mean, given that X=0; given that X=2. Does the conditional
mean of Y appear to be related to the magnitude of X? How? [Recall: the
relative frequencies in a conditional distribution must be scaled so that
the probabilities sum to one.]
If X=0, we want to concentrate only on the first column
in the table of the joint distribution. At the margin, there is an overall 0.5
probability of X=0, so we divide all joint probabilities in the first column by
0.5 to "blow up" the probability function so that the sum of the probabilities is
one, yet the relative amounts of probability associated with each value of Y are
the same as in the body of the joint distribution for X=0. The conditional
distribution p(Y|X=0) associates probabilities .6, .2, .2 with Y = 1, 2, 3,
respectively. To get the expected value of Y given X=0, we take each possible
value of Y, weight by its associated probability, and add up the terms.
E[Y|X=0]=.6(1)+.2(2)+.2(3) = .6 + .4 + .6 = 1.6.
When X=2, the conditional distribution of Y involves dividing the joint
probabilities in the third column of the table by 0.1. This rather simple, since
the only value of Y that can occur when X=2 is 3. Probabililty 1.0 is thus
associate with Y=3. The expected value of Y when X=2 is 3. Yes, the conditional expectation of Y appears to get larger as X gets larger.
This means the variables are correlated.
c.) given your answer in (b.), can the
random variable Y be statistically independent of X? (I.e., is the
test for independence, f(X,Y) = f(X)f(Y) violated for any of these specific
(x,y) pairs?)
No, since the conditional distribution of Y varies with X
in these data, it is not possible for Y to be statistically independent of X. You
could verify this by finding at least one cell in the joint distribution for which
the probability in the cell is not equal to the product of the two marginals. For
example, pick the lowest left cell. The marginal probability of X=0 is 0.5; the
marginal probability of Y=1 is 0.4. If these two variables were independent, it
would have to be the case that the probability in the corresponding cell in the
body of the table was (0.5)*(0.4)=0.20. However, the entry in this
cell is 0.3, thus the two variables cannot be statistically
independent.
d.) compute the covariance between X and Y and then the correlation between these variables. Bear in mind that Cov(X,Y) equals E(XY)-E(X)E(Y) and Corr(X,Y) equals Cov(X,Y) divided by the product of the individual marginal standard deviations of the two variables.
We did not get to this in class prior to the due date for this homework, so we will go easy on the grading. A useful strategy is as follows: First, compute a table containing the values of X times Y for each cell:
3 |
0
3 6
|
2 |
0
2 4
|
Y=1 |
0
1 2
--------------------------------------
X= |
0
1 2
To get the E[XY], you need to weight each value of XY by its corresponding probability, then add up across all nine cells. It is easiest to do this by creating a third table, and first putting zeros everywhere either table has a zero:
3 |
0
0.3 0.6
Now, when you add up across all these cells, you get
E[XY]=1.4. But you are not yet done, because you still need the product of the
marginal expectations of Y and X separately: E[Y]=1.9 as above.
E[X]=0.5(0)+0.4(1)+0.1(2)=0.4+0.2=0.6. Thus Cov[X,Y]=1.4 - (1.9)(0.6) = 0.26.
To get the correlation, we need to divide the covariance by the two standard
deviations of the individual variables. We already know the variance of Y, so
take its square root to get Std.Dev[Y] = 0.831. Now we need to go back and
compute the marginal distribution of X and get its variance. The E[X]=0.5(0) +
0.4(1) +
0.1(2) = 0.4+0.2 = 0.6. To get the variance of X, first calculate
E[X2]. This will be 0.5(0) + 0.4(1) + 0.1(4) = 0.4 + 0.4 = 0.8.
Var[X] = 0.8 - (0.6)(0.6)= 0.8 - 0.36 = 0.44. The square root of this is 0.663.
Therefore, finally, the correlation is Cov[X,Y] / (Std.Dev[X] Std.Dev[Y]) =
0.26/(0.831*0.663) = 0.472.
|
2 |
0
0.4 0
|
Y=1 |
0
0.1 0
--------------------------------------
X= |
0
1 2
5. Suppose you are told that E(X) = 4 and that Var(X) = 16. What are the expected values and variances of the following expressions? [Recall the formula for the mean and variance of a linear function of a single random variable.]
a.) Y = 3X + 2: E[Y] = 3*E[X] + 2 = 3*4 + 2 = 14; Var[Y] = 9*Var[X] = 9*16 = 144.
b.) Y = .6X - 3: E[Y] = .6*E[X]-3 = .6*4 - 3 = .24-3= -0.6; Var[Y] = (0.6)(0.6)Var[X] = 0.36*16 = 5.76.
c.) Y = X/5: E[Y]=E[X]/5 = 4/5 = 0.8; Var[Y]=(1/5)(1/5)Var[X]=(1/25)*16 = 16/25.
d.) Y = aX + b, where a and b are scalar constants: E[Y] = a E[X] + b; Var[Y] = a2 Var[X].
The relevant formula is Var[aX1 +
bX2] =
a2Var[X1] + b2Var[X2] + 2ab
Cov[X1,X2]. If both investments have the same expected
return, then an portfolio consisting of different proportions of the two would
have the same return as well. You would want to choose a combination that has
lesser variance. You should therefore compare the variance of 1*X1,
0.5*X1+0.5*X2, and 1*X2. The variance in the
first case is 16; in the third case, it is 9. In the second case, we need to use
the formula for a linear combination of random variables. We have the two
separate variances, but we need the covariance. It can be recovered from
information about the correlation. Since Corr[X1,X2] = Cov
[X1,X2]/{Std.Dev[X1]*Std.Dev[X2]}.
Thus, to get the covariance, we need to use the correlation times the two standard
deviations, which gives 0.6*4*3 = 7.2. Therefore, the variance of the portfolio
that is made up of half one security and half the other is 0.25*16 + 0.25*9 +
2*0.5*0.5*7.2 = 4+2.25*3.6 = 9.85. If you are limited to only these three
choices, the lowest-variance choice putting all your money in security 2. However, if you are ambitious, try letting the coefficients be (a) and (1-a),
where a is the proportion of your money you put in the first security, and (1-a)
is the proportion in the second security. If you know calculus, try minimizing
the variance of the linear combination over all possible choices of a. If I
recall correctly from last time I worked this out longhand, the optimal a is not
zero, although it is closer to zero than to .5.
6. What is the formula for the variance
of the linear combination aX1 + bX2, where X1
and X2 are two random variables? Let X1 stand for
the rate of return on one security, and let X2 stand for the
rate of return on another security, and let E(X1) = E(X2).
A simple "investment portfolio" would consist of a combination of these
two securities. Suppose that Var(X1) is 16 and Var(X2)
is 9 and that the rates of return have a correlation of 0.6. If you had
$10,000 to invest, would you be better off to invest all of it in security
1, in security 2, or half in each? This is the essence of modern portfolio
theory.
7. One of my favorite ways to remember
the difference between probability theory and statistical inference is
to contrast the endeavors of the professor and the student around final
exam time. The student (having sat through the course) knows the population
of possible questions that could be asked on the final exam and, in the
process of studying, tries to ascertain which ones are most likely to be
asked on the final. On the other hand, the professor asks only a limited
number of questions on the final exam, but must try to ascertain from this
sample what proportion of the subject matter each student has actually
mastered. Who is thinking about probability theory and who is conducting
statistical inference? Explain.
The students are thinking about probability theory, and
the professor must conduct statistical inference to assess who knows enough about
the overall body of subject matter.
8. Is there any difference between a
mean, an average, an expected value, and a first
moment around zero? Is there any difference between a standard
deviation
and a standard error? Is there any difference between a mean
squared deviation (MSD), a variance, and a second moment
around the mean? Is there any difference between variance in the
population and variance in a sample? [If your prerequisite used
different terminology, don't panic. We'll sort it out.]
For our purposes, we can treat the first four as being
equivalent. Technically, the concept of "average" is usually used with samples,
whereas "expected value" refers to the population distributions, as does the
"first moment about zero." We often use standard deviation to refer to the
population distribution for a random variable, whereas standard error is typically
used to describe the dispersion in an estimated parameter. In a population
distribution, the MSD, variance and second moment about the mean are the same
thing. However, the population variance, when being inferred from the MSD in a
sample, requires a "degrees of freedom" correction, because the sample MSD is a
biased estimate of the corresponding population characteristic.
| COURSE OUTLINE | LECTURE OUTLINES | PROBLEM SETS | PROBLEM SOLUTIONS | COMPUTER LABS |
| SHAZAM EXAMPLES | DATA SETS | ONLINE QUIZZES | GRAPHICS | HANDOUTS |