Correlation coefficient

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Correlation coefficient may refer to:

  • Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a measure of the strength and direction of the linear relationship between two variables that is defined as the (sample) covariance of the variables divided by the product of their (sample) standard deviations.
  • Intraclass correlation, a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups; describes how strongly units in the same group resemble each other.
  • Rank correlation, the study of relationships between rankings of different variables or different rankings of the same variable

Related concepts:

  • Correlation and dependence, a broad class of statistical relationships between two or more random variables or observed data values
  • Goodness of fit, any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model in question
  • Coefficient of determination, a measure of the proportion of variability in a data set that is accounted for by a statistical model; often called R2; equal in a single-variable linear regression to the square of Pearson's product-moment correlation coefficient.

Correlation Coefficient

The correlation coefficient of two variables in a data sample is their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.

Formally, the sample correlation coefficient is defined by the following formula, where sx and sy are the sample standard deviations, and sxy is the sample covariance.

     s

rxy =--xy

    sxsy

Similarly, the population correlation coefficient is defined as follows, where σx and σy are the population standard deviations, and σxy is the population covariance.

ρ = -σxy-

xy  σxσy

If the correlation coefficient is close to 1, it would indicates that the variables are positively linearly related and the scatter plot falls almost along a straight line with positive slope. For -1, it indicates that the variables are negatively linearly related and the scatter plot almost falls along a straight line with negative slope. And for zero, it would indicates a weak linear relationship between the variables.

Problem Find the correlation coefficient of the eruption duration and waiting time in the data set faithful. Observe if there is any linear relationship between the variables.

Solution We apply the cor function to compute the correlation coefficient of eruptions and waiting.

> duration = faithful$eruptions # the eruption durations > waiting = faithful$waiting # the waiting period > cor(duration, waiting) # apply the cor function [1] 0.90081 Answer The correlation coefficient of the eruption duration and waiting time is 0.90081. Since it is close to 1, we can conclude that the variables are positively linearly related.