Jump to content

File talk:Correlation examples2.svg

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Data is not random

[edit]

These examples are not iid observations, except for the last one. Note that in the first lines of the function "Others" that generates all of the samples, the x variable is a regular, increasing sequence from -1 to 1. So it is critical to note that this x is a time series. e.g. The first example generates both x and y as serially dependent data:

  x = seq(-1, 1, length.out = n)
  y = 4 * (x^2 - 1/2)^2 + runif(n, -1, 1)/3

The x data is nonrandom, and the y data has strong autocorrelation:

  > print(acf(y))
  Autocorrelations of series ‘y’, by lag
      0     1     2     3     4     5     6     7     8     9    10    11    12 
  1.000 0.767 0.773 0.773 0.769 0.762 0.759 0.760 0.751 0.749 0.767 0.746 0.745 

Tests of independence or linear dependence are typically designed for random samples from the bivariate distribution of (x, y), and in general are not valid for time series like these examples.

The R code should be modified as follows: at the beginning of "Others" function

 replace:   x = seq(-1, 1, length.out = n)
 with:      x = runif(n, -1, 1)

As it is, this example is misleading to statisticians who may use it to investigate performance of measures of association or dependence, leading to incorrect conclusions.

Does anyone object if I upload a corrected version of the file and the code? Mathstat (talk) 00:34, 1 March 2012 (UTC)[reply]

Beautiful picture

[edit]

Such a beautiful picture, thank you. Really teaches people what the correlation coefficient does not reveal about a distribution of points. WinterSpw (talk) 07:24, 21 November 2012 (UTC)[reply]