### 5.3   Correlation Coefficient

Def:  Given discrete random variables X,Y their correlation coefficient is defined as
Gives a "normalized" value of covariance; always have
r  measures the strength of the linear relationship between X & Y. If the values of X and Y are recorded for a lage number of experiments, and the points (X,Y) are plotted (generating a scatter plot), then:
• if  r  is near 1 or -1, points (X,Y) will tend to fall near a line
• the slope of the line will be positive if r positive, negative if r negative
• if  r  is near 0, the points (X,Y) will show no clear linear trend when plotted

In fact:

• r = 1 or -1   if and only if   X and Y are directly linearly related, Y = a + bX for some constants a,b.
• If  X,Y are independent, then  r = 0   (although converse not true)
• follows since cov(X,Y) = 0
Note:  even if r = 0, X and Y may not be independent!!  May be directly related, but by a non-linear relationship!!

ex:

plant example from previous sections:
we found before that   cov(X,Y)  =  .2684,  E(X) = 1.83,  E(Y) = .92.
need

=   (12) fX(1)  +  (22) fX(2)  +  (32) fX(3)
=   (12)(.34)  +  (22)(.49)  +  (32)(.17)
=   3.83,

=   1.40    (in a similar fashion)

so
var(X)  =  E(X2) - E(X)2   =   3.83 - 1.832   =   .4811
var(Y)  =  E(Y2) - E(Y)2   =   1.40 - .922   =   .5536
Thus
The value of r is positive, about halfway between 0 and 1; thus the number of stems and number of blooms will tend to vary together, both being above or below average, but the trend is not a particularly strong one: it won't be the case that every plant with an above-average number of stems will have an above-average number of blooms.

Previous section  Next section