Refresher - The Correlation Coefficient

Besides looking at the scatter plot and seeing that a line seems reasonable, how can you tell if the line is a good predictor? Use the correlation coefficient as another indicator (besides the scatter plot) of the strength of the relationship between x and y. The correlation coefficient, r, is defined as:

r=n(xy)-(x)(y)n[(x2)-(x)2][n(y2)-(y)2]

where n=the number of data points.

If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.

One property of r is that -1r1.

If r=1, there is perfect positive correlation. If r=-1, there is perfect negative correlation. In both these cases, the original data points lie on a straight line. Of course, in the real world, this will not generally happen.

The formula for r looks formidable. However, many calculators and any regression and correlation computer program can calculate r. The sign of r is the same as the slope, b, of the best fit line.

 

Facts about the Correlation Coefficient for Linear Regression

  • A positive r means that when x increases, y increases and when x decreases, y decreases (positive correlation).
  • A negative r means that when x increases, y decreases and when x decreases, y increases (negative correlation).
  • An r of zero means there is absolutely no linear relationship between x and y (no correlation).
  • High correlation does not suggest that x causes y or y causes x. We say "correlation does not imply causation." For example, every person who learned math in the 17th century is dead. However, learning math does not necessarily cause death!

If r=-1 or r=+1, then all the data points lie exactly on a straight line.

If the line is significant, then within the range of the x-values, the line can be used to predict a y value.

As an illustration, consider the third exam/final exam example. The line of best fit is:

y'=-173.51+4.83x

Can the line be used for prediction? Given a third exam score (x value), can we successfully predict the final exam score (predicted y value). Test r=0.6631 with its appropriate critical value.