Talk:Curse of dimensionality

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Example[edit]

The current example is unfortunate, since the volume of unit hypercube is 1, exactly the same as the length of the unit interval: so the reader could be excused for saying "well, each sample corresponds to 1% of the space, that's the same, isn't it?". I'll see if I can rephrase it to make the "vastness" more obvious. -- 80.168.224.235 15:34, 10 November 2006 (UTC)

Problem types[edit]

There are (at least) two types of problem that exhibit the "curse of dimensionality". The article does not distinguish between them, and as a result the explanation is not very clear.

  • In the first type, each variable can take one of several discrete values, or each variable is mapped onto several discrete ranges. Taking the variables together, a huge number of combinations of values must be considered.
  • In the second type, samples are compared using a measure such as a Euclidean distance which is defined using the variables. Because most of the volume of a hypercube is concentrated in the corners, use of the measure to select samples may not be as successful as is naively expected.

The first type includes "optimization problems" such as those considered by Bellman. The objective function must by computed for each combination of values.

The first type also includes "machine learning problems" where each range of values is categorised as a separate feature. An enormous amount of training data is required so that there are several samples with each combination of values. The "Hughes effect" is that with a fixed amount of training data, the predictive power reduces as the dimensionality increases.

The second type includes "nearest neighbor search in high dimensional space". It is not possible to quickly reject candidates by using the difference of one variable as a lower bound for a distance based on all the variables. This is described in R. B. Marimont and M. B. Shapiro, "Nearest Neighbour Searches and the Curse of Dimensionality", Journal of the Institute of Mathematics and its Applications, 24, 1972, 59-70, and in E. Chavez et al., "Searching in Metric Spaces", ACM Computing Surveys, 33, 2001, 273-321.

JonH (talk) 11:33, 10 April 2010 (UTC)

I think the sentence "This is an important intuition for understanding the chi-squared distribution." should be removed unless there is an explanation for this. Otherwise it is just confusing. — Preceding unsigned comment added by Anoyzz (talkcontribs) 15:07, 16 May 2012 (UTC)

Bayesian statistics section[edit]

The claim in the Bayesian statistics section that the curse of dimensionality is overcome by MCMC needs clarification and support. Personally, I think that it's flat wrong as the goal in MCMC is to sample, and in the sampling section there is a very nice description of why this should be hard in very high dimensions. I cannot hope to read the mind of the author of the Bayesian statistics section, so if it's not improved in a couple months, half a year, whenever I check this next, I'm just going to remove the section altogether. bradweir (talk) 18:19, 13 March 2013 (UTC)

Link to Coursera Online Course[edit]

The curse of dimensionalty is explained in more detail in a video lecture on coursera: https://class.coursera.org/machlearning-001/lecture/215 I don't know where exactly to put it but I think it is very helpful to learn more about it. — Preceding unsigned comment added by 2001:638:902:200B:F21F:AFFF:FE5E:484 (talk) 11:14, 18 June 2014 (UTC)