# Talk:Naive Bayes classifier

WikiProject Robotics (Rated Start-class, High-importance)
Naive Bayes classifier is within the scope of WikiProject Robotics, which aims to build a comprehensive and detailed guide to Robotics on Wikipedia. If you would like to participate, you can choose to edit this article, or visit the project page (Talk), where you can join the project and see a list of open tasks.
Start  This article has been rated as Start-Class on the project's quality scale.
High  This article has been rated as High-importance on the project's importance scale.

## Feature Discretization

From Parameter estimation section, "Non-discrete features need to be discretized first. Discretization can be unsupervised (ad-hoc selection of bins) or supervised (binning guided by information in training data)."

The quoted statement is false. Continuous features (e.g., 1, 1.5, 2) can also be estimated using maximum likelihood without binning. Assuming a Gaussian distribution of the data, the maximum likelihood estimator of the mean is simply calculated by finding the mean of the set (i.e. add up the numbers and divide by size of the set, which equals 1.5 in the example set).Joseagonzalez (talk) 03:58, 11 August 2010 (UTC)

Now this was changed, which is good. But, I am still a bit puzzled by this whole paragraph:
"Another common technique for handling continuous values is to use binning to discretize the values. In general, the distribution method is a better choice if there is a small amount of training data, or if the precise distribution of the data is known. The discretization method tends to do better if there is a large amount of training data because it will learn to fit the distribution of the data. Since naive Bayes is typically used when a large amount of data is available (as more computationally expensive models can generally achieve better accuracy), the discretization method is generally preferred over the distribution method."
There are not any citations for this, and I find the whole paragraph a bit hand-waiving. The binning method would have the additional aspect that a certain number and distribution of bins has to be chosen. Also, if the "precise distribution of the data is known", then surely I don't need a learning algorithm, right? Plus, if the bins are coarse, the approximated distribution will be coarse, and if it is fine, the memory requirements rise, which may collide with the large-data statement above. Also, I'd say that the typical use-case of the algorithm is as a low-cost baseline? So, it would be great if the author of this paragraph could provide some more evidence, or if others would come in and discuss if these statements are really valid and worth having on the page in this generality. — Preceding unsigned comment added by 95.166.251.153 (talk) 16:47, 29 July 2012 (UTC)

## Statistical error in the worked example spam filter

In the "Multi-variate Bernoulli Model" section of [A Comparison of Event Models for Naive Bayes Text Classification], you will note that documents have to be modeled as being drawn from a particular distribution (multivariate bernoulli or multinomial being the most common). See also Andrew Moore's tutorial slides: the occurrence of an instance must be drawn from the joint distribution of mutually independent random variables.

The worked example however only multiplies over the words present in the document:

$p(D\vert C)=\prod_i p(w_i \vert C)\,$

Add up these "likelihoods" over all possible documents and the sum will be greater than 1 (making it an invalid distribution). Attempt to use a normalising constant, and the feature occurrences will be revealed to be non-independent (the length of the document is fixed, forcing a precise number of features to occur, which cannot be the case if feature occurrence are bernoulli trials, as implied by p(w_i|C)).

To be formally correct, the likelihood should multiply over the probabilities of failed bernoulli trials as well (for words that did not occur).

Let F be a multivariate bernoulli random variable of dimension V (size of vocabulary), with individual bernoulli random variables F_1, F_2, ... F_V. A particular document D=(f_1, f_2, ... f_V) represents the outcomes of the independent trials with success (1) for presence of each word, failure (0) for absence. Then the likelihood of D given class C is:

$p(F=D\vert C)=\prod_{i}^{\vert V \vert} p(F_i = f_i \vert C)\,$

This multiplies over all words in the vocubulary, and results in a somewhat lower likelihood (because non-occurrence of features will in real-world situations have probability a bit less than 1.0).

Winter Breeze (talk) 13:46, 20 November 2007 (UTC)

## Link to peer reviewed paper

Hi, I recently added an external link the source code of a Matlab Naive Bayes classifier implementation, the link has repeatedly been removed. I am currently doing my PhD in pattern recognition and I know the link is of high quality and very relevant.

Is the reference and the external link http://www.pattenrecognition.co.za suitable for this site? If not, what can I do so that this information is not repeatedly removed?

cvdwalt

I don't know, but probably because the link is dead. I think you have mispelled it. Is http://www.patternrecognition.co.za/ the correct one? --222.124.122.192 12:56, 17 September 2007 (UTC)
As of Nov 27, 2008, the link is not working, it goes to a hosting web site instead Martincamino (talk) 22:17, 27 November 2008 (UTC)
I would say "no". The website has no author information on it. It appears to be one anonymous person's collection of other's classification tutorials. Link to the original tutorials, not to the copies on patternrecognition.co.za. As for the paper, all are by "C.M. van der Walt" (cvdwalt), I don't think its appropriate to rate one's own site as being "of high quality and very relevant" Winter Breeze (talk) 08:04, 1 June 2009 (UTC)

## 2002-2004

Could someone please add an introduction which explains comprehensible to someone who is not a mathematician, what this thing is? --Elian 22:01 Sep 24, 2002 (UTC)

Just as soon as someone adds an introduction which explains comprehensible to someone, who is a mathematician (well, technically physics and computing stuff, only just started school, but anyway...), what those symbols are supposed to mean...

D is an object of type document, C is an object of type class, how can both p(D|C) and p(C|D) be meaningful (with two different values, even)? Can only guess whether "p(D and C)" is supposed to be something like boolean operations, set operations, surgical operations, or CIA operations... Cyp 19:42 Feb 10, 2003 (UTC)

I think the notation is clear to persons familiar with probability theory, but it could probably be explained more clearly for those who are not. Michael Hardy 19:44 Feb 10, 2003 (UTC)

Under the assumption that Probability axiom is right and meaningful, I've added a "(see Probability axiom)" and used that particular and symbol. Was an edit conflict, someone else LATEΧed the last two lines before I could submit the new and symbol... (Person put the text "and", hope I was right to replace it with $\cap$... Cyp 20:12 Feb 10, 2003 (UTC)

Aaargh... Now that I know what the notation means... Either I'm going mad, or all the fractions in the entire article are upsidedown... Cyp 20:41 Feb 10, 2003 (UTC)

From the article:

Important: Either I'm going mad, or the following formula, along with the rest of the formulas, are upsidedown (D/C instead of C/D)... If I wasn't considering the possibility of me going mad, I would correct this article myself. (Triple-checked I didn't accidentally reverse them myself, when adding $L_A T^E \chi$. If I'm mad, just remove this line. If I'm not, let me know, or correct it yourself (and remove this line anyway).

Fixed the upside-down equations. Please review my changes to make sure I've made the right changes.

Seems like what I'd have done... So I guess I wasn't mad, then. Cyp 17:20 Feb 11, 2003 (UTC)

Calling this page Naive Bayesian is extremely misleading. It is more generally known as a Naive Bayes classifier. For something to be Bayesian the parameters are treated as random variables. In the Naive Bayes Classifier this doesn't happen. I strongly suggest that the name is changed. Note that Google has 22,600 hits for "Naive Bayes" and 6,400 hits for "Naive Bayesian". A naive Bayesian is a Bayesian who is naive. Naive Bayes is a simple independance assumption. --Lawrennd 16:49 Sep 20, 2004

For something to be Bayesian the parameters are treated as random variables. This is simply not so. "Bayesian" has a much broader meaning than treating parameters as random variables. I agree that "naive Bayes classifier" is more commonly used (and therefore it's a more appropriate title), but the current name is not "extremely misleading". Wile E. Heresiarch 21:39, 20 Sep 2004 (UTC)
I agree that the title is somewhat inappropriate. "Naive Bayes" is clearly the more common name, which is sufficient motivation for changing the title. However, the very first paragraph of the article in fact points out that NB classification does not require any Bayesian methods. While that discussion could be improved, it is not deficient to the point of being misleading. The term "Bayesian" is often vague and can refer to something as generic as automatically trained methods: cf. "Bayesian spam filtering", which is usually not Bayesian in the sense of treating parameters (but not hyperparameters) as random variables. --MarkSweep 18:49, 21 Sep 2004 (UTC)
I'd prefer "Naïve Bayesian classification" to the current title. Κσυπ Cyp   23:00, 21 Sep 2004 (UTC)

ohnoooo... can anybody who speaks mathematics - translate this into a more common language? shouldnt it be english :-) no realy, i mean - bayesian networks are used in programming - so it would be usefull to talk about it in a common programming language... c++; java; (perl?) i dont think there are a lot of people who can understand this math notation... and it doesnt help if you are a programmer who wants to work with bayesian networks. so please translate this or add programmlanguages examples - of loops, simple calculations. (even the word class is very misunderstanding cause its very different from the word used in the world of programming)

Why do you think that C++ is a more common language than mathematics? Sympleko (Συμπλεκω) 15:12, 17 April 2008 (UTC)

## "Naïve Bayesian classification" moved to "Naive Bayes classifier"

Hello. I have reverted "naïve" to "naive" in the article text, as "naive" is the usual English spelling, and occurs more often than "naïve" in texts (papers, books, web pages, etc). I have also moved naïve Bayesian classification to naive Bayes classifier. For various combinations of terms I find the following:

• "naive Bayes classifier" yields approx 11,000 Google hits
• "naive Bayesian classifier" yields approx 5000 Google hits
• "naïve Bayes classifier" yields approx 1000 Google hits
• "naïve Bayesian classifier" yields approx 500 Google hits
• "naive bayesian classification" -wikipedia -encyclopedia yields approx 500 Google hits
• "naïve bayesian classification" -wikipedia -encyclopedia yields approx 150 Google hits

As this classifier is very common in computer-related texts, it is reasonable to suppose Google is a reliable indication of the currency of different variations of the name. Regards & happy editing, Wile E. Heresiarch 04:51, 27 Dec 2004 (UTC)

But "naïve" is proper English (with the umlaut), so wouldn't that "overrule" the "most common" phrase? WhisperToMe 05:28, 27 Dec 2004 (UTC)

For the benefit of other readers, I'll copy here some comments I put on user talk:WhisperToMe: (1) Re: standard English. I can't find any dictionaries or other sources which state that the correct spelling is "naïve". Every source I have found shows "naive" as the primary spelling, and shows "naïve" as an acceptable variation of "naive". It is clear that both spellings are acceptable. Naïve/naive isn't mentioned at Wikipedia:Manual of Style or American and British English differences. If you have some other sources I'd like to hear about it. (2) Agreed that the Google test only shows what's more common. However, since both spellings are acceptable and "naive" is more common, and much more common in a mathematics context, a crusade to change "naive" to "naïve" seems pointless at best. Wile E. Heresiarch 06:57, 27 Dec 2004 (UTC)
Our university professor taught us that naïve comes from the French language and insisted that it's the only correct spelling even in English. I had to fix two of my LaTeX handins just because of the diaeresis, albeit personally I prefer the naive spelling and after graduation I always wrote naive in my publications unless required otherwise (it did happen to me to be requested to fix a LaTeX paper just because the editor wanted the naïve spelling!). Apparently we should include both spellings in the article. Sofia Koutsouveli (talk) 22:30, 21 March 2014 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Wouldn't it be best to use the spelling "naïve", since it's easier to read? Otherwise many people will be reading it as "knave Bayes classifier" and getting confused... It's not as bad as trying to use "resume" as a noun, but I think it's better to use an ï, since it becomes easier for some people to read. Does anyone have trouble reading "naïve" but no trouble reading "naive"..? (If so, we can be nice, and make it even easier for them to read, by writing "naıve".) Κσυπ Cyp   15:55, 27 Dec 2004 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). – Could the reason be that Google naively treats "naïve" and "naive" as interchangeable? On what basis do you assert that "naive" is harder to read than "naïve"? How can "naive" be confused with "knave" by someone who's likely to understand the article? Sure, if a child or someone learning English is completely unfamiliar with the word "naive" they may think that it's pronounced like "knave", but then the real problem is that they don't know what "naive" means in the first place. If they decide to look it up in a dictionary, they would also learn about the correct pronunciation. I would say in the case where two forms like "naive" and "naïve" exist and are equally acceptable, it's better to use the form that is easier to type. If we only had "naïve" everywhere with no redirects, someone might try to search for "naive Bayes" (because that's easy to type for lots of people, whereas "naïve" is not, even on many types of European keyboards); they would be unable to find the article, and either give up or start a duplicate article. In the context of this article, I would say that "naive Bayes" is far more frequent than "naïve Bayes", but check for yourself: do a search for "naïve Bayes" on http://scholar.google.com/ and see how many occurrences of "naïve" you actually find. --MarkSweep 01:10, 28 Dec 2004 (UTC)
Ummm, yes, it could be because Google naïvely treats "ï" and "i" as interchangable. (As searching for either finds the same pages, disregarding whether they use the easy to read or easy to type version.) The scholar.google.com does the same thing, except it doesn't display the diaeresis until I follow the links. (Clicked on a random link that it found, and it used the "ï", not the "i".) I assert that "naive" is harder to read than "naïve", becuase "naive" looks rather strange and distracting to me. It's obvious what it meant, after spending an extra second reading it, but why make people spend an extra second reading it to understand it? I would say that in the case where two forms like "naïve" and "naive" exist and are equally acceptable, it's better to use the form that is easier to read. We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. Κσυπ Cyp   04:59, 28 Dec 2004 (UTC)
Let's not make decisions based on a single random link. Furthermore, while I don't have any evidence that "naive" won't cause any additional confusion (except for people who don't know the concept in the first place), you don't seem to have any evidence that "naïve" is easier to read either. I would say the burden of proof is on you here: can you demonstrate empirically that "naive" actually causes confusion? Significant confusion? Utterly hopeless cannot-make-heads-or-tails-of-it confusion? The other issue is with instances of "naïve" that do not occur in an article title. Does the new Mediawiki search facility treat "naive" and "naïve" as equivalent? (I don't know.) I suspect (without proof) that "naive" is a more frequent search query than "naïve", since it's easier to type for just about anyone. Unless both terms are treated as equal, searching for "naive" will miss pages that only have "naïve" in them (on second thought, this is turning into an argument in favor of inconsistent spelling, using all variants of a relevant word in an article).
Empirically, I find "naïve" easier to read than "naive". I do not, and have not claimed, that it is significant confusion, just that "naïve" is easier for me to read. If some people find both equally easy to read, and some find "naïve" easier to read, then it seems that "naïve" is easier to read on average. (As far as I can tell, noone has claimed that they find "naive" easier to read than "naïve".) The Mediawiki search seems to be disabled at the moment, although I would guess that it wouldn't treat them the same. I also suspect (also without proof) that "naive" is a more frequent search query than "naïve", since it's easier for many/most people to type. Since (hopefully) noone is going round deleting redirects between "naïve" and "naive", searching should find both spellings. (I certainly think that searching for "naive" should find the articles, as well as searching for "naïve"...) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
I'm sorry but "empirically, I find" just doesn't make sense: you're not stating an empirical observation, you're only stating your own opinion, which you are certainly entitled to. But since you have a stake in the outcome, you cannot count your own preferences in an empirical study. I could claim that I find "naive" easier to read (since it has fewer dots and looks more normal to me), but I would have to discount that as my own biased opinion, which isn't empirical evidence. Regarding the issue of full text search, I was referring to articles that do not have "naive" in the title and which can only be found by a full text search. However, my argument is not particularly good: by the same token, someone might search for "colour" and not find a relevant article that mentions "color" in the body text but not in the title. So all we have now in terms of arguments is (1) your opinion that "naïve" is easier to read; (2) the fact that in a non-random sample of 16 relevant publications (see below) "naïve" occurs in 2, but "naive" in 14; and (3) opinions from several editors that "naive Bayes" is more common. For all I know the conjunction of these three propositions is not a contradiction: it could be the case that "naïve" is in fact easier to read (though I will remain skeptical) and that "naive" is more common (for which I believe there is sufficient empirical evidence). In that case, we still need to make a decision which form we should pick, and there is precedent for choosing the more common form. --MarkSweep 19:52, 28 Dec 2004 (UTC)
After looking up "empirically" in dictionary.com, I'm not sure that I was using the word correctly. I meant, subjectively/personally, I find "naïve" a bit easier to read than "naive". I hadn't thought of full-text searches, before. If you do actually find "naive" easier for you to read, not just easier to write, then I'm fine with it being left as "naive". (I got the impression that noone here actually found "naive" easier to read, just thought it should be used because of being easier to type or more common.) (wɛn wɪl piːpl ɑːfɪʃəliː swɪtʃ tuː juzɪŋ ʌ fənɛtɪk əlfəbɛt fɔː ɪŋgɫɪʃ..?) Κσυπ Cyp   02:00, 29 Dec 2004 (UTC)
Some data points: The first two references cited in the present article both use "naive", not "naïve" (I was unable to check the third reference). Russell and Norvig use "naive", not "naïve". Among the first ten results returned by scholar.google.com, 8 use "naive" and 2 use "naïve". Added later: Mitchell's Machine Learning textbook (ISBN 0070428077), Data Mining by Han and Kamber (ISBN 1558604898), and Data Mining by Witten and Frank (ISBN 1558605525) all use "naive" exclusively. Score: "naive" 14, "naïve" 2. --MarkSweep 19:52, 28 Dec 2004 (UTC)
Finally, the insidious slippery slope argument: would you be in favor of writing "coördinate" and "reëlect" as well? How about "reärmed", since that could easily be confused with "rear med"? --MarkSweep 07:06, 28 Dec 2004 (UTC)
I wouldn't support or oppose a diaeresis on those words. The "ö" in "coördinate" seems slightly more appropriate than the "ä" in "reärmed", although I'm not sure why. Possibly because reading the "ä" as an umlaut would make the pronunciation completely wrong. (I think that pronouncing "naïve" without a diaeresis would sound much worse than pronouncing the other three words without a diaeresis.) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
User:Cyp, I can't tell what you're taking about. Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Googling for the exact phrase (with quote marks, and with -wikipedia -encyclopedia) I get 5000 for "naive" [1] and 500 for "naïve" [2] as reported above. Without quote marks (and with -wikipedia -encyclopedia) I get about 21,000 for "naive" [3] and 6000 for "naïve" [4]. So on what basis are you trying to claim "naïve Bayesian classifier" and "naive Bayesian classifier" are equally common? -- In any event, if you want to claim "naïve" is easier for some people to read you're going to have to come up with some evidence for that; "naive" looks rather strange and distracting to me simply doesn't count. -- We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. -- I'm sorry, I simply don't understand what you're getting at here. Wile E. Heresiarch 05:27, 28 Dec 2004 (UTC)
When I search with google, it treats "ï" and "i" as completely identical. I have no idea why it behaves differently, when you search. Last time I checked, I was a person, so I have already come up with evidence that some (at least one) people find "naïve" easier to read than "naive". Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)

## Example PHP script

It seems a bit silly to me that the choice of example script in this page (that PHP script) links to a resource that, whilst apparently free, is compiled into some fantastic Zend optimised format, and is therefore no good to read whatsoever. Any other examples out there? —Preceding unsigned comment added by 82.33.75.53 (talk) 21:03, 28 May 2006

I agree. I just added a Visual Basic implementation with source code, and out of interest checked out the PHP script. First of all, if it belongs anywhere, it belongs in Bayesian spam filtering. Second of all, since the source is missing it doesn't do anybody any good. I removed it. --Stimpy 13:47, 28 June 2006 (UTC)

## proposed merge with Bayesian spam filtering

Not much of a discussion three months after the merge was suggested. I'd rather keep them separate. Rl 15:44, 18 June 2006 (UTC)

They sould be kept seperate. Bayesian spam filtering is a topic of its own, and would clutter the relatively straight-forward article about Naive Bayes. --Stimpy 13:49, 28 June 2006 (UTC)
Now it's been five months and people agree it shouldn't happen. I removed the proposal. ~a (usertalkcontribs) 16:41, 16 August 2006 (UTC)

## naive Bayes conditional independence assumption

P(F_i|C,F_j)=P(F_i|C) only defines pairwise independence, which is not equivalent to mutual conditional independence which is needed!!!!

regards

You're right, I made a correction, hope it reads well 158.143.77.29 (talk) 11:43, 26 March 2013 (UTC)

## Skipped some steps?

Using Bayes' theorem, we write

$p(C \vert F_1,\dots,F_n) = \frac{p(C) \ p(F_1,\dots,F_n\vert C)}{p(F_1,\dots,F_n)}. \,$

Then the article says:

...The numerator is equivalent to the joint probability model

$p(C, F_1, \dots, F_n)\,$

What happened here? This is non-obvious. --Herdrick 18:49, 16 February 2007 (UTC).

afaik this follows directly from the definition of the _conditional_ probability, which is:
p(A|B)=p(A,B)/p(B)
greets Will
We don't want the joint probability - we want the probability of the features conditional on the class variable, so it's perfectly sensible. 137.158.205.123 (talk) 13:15, 20 November 2007 (UTC)

## Naive Bayes != Idiot's Bayes

Someone used this term once in a paper, surprisingly. It's not a recognized second name for the algorithm. (D.J. Hand and K. Yu, Idiot's Bayes Not so Stupid after All? Int'l Statistical Rev., vol. 69, no. 3, pp. 385-398, 2001)

- 128.252.5.115 02:07, 23 March 2007 (UTC)

(Probabilityislogic (talk) 11:59, 18 March 2011 (UTC))

This classifier is certainly a "naive" one, but it is "naive" only if one actually has knowledge of connections between the different attributes. The "naivety" is in throwing away potentially important information that probability theory can take into account.

If you do not know of any relationships or dependencies exist, then it is actually more conservative to assume that they do not (and certainly not naive). This is because the presence of correlations places additional constraints on the data, and it does this by lowering the number of ways a particular set of data can be produced (see principle of maximum entropy for details).

well,if you're not utterly starved for training samples, you usually do in fact know that dependencies are likely to exist... unless you are an idiot, of course. — Preceding unsigned comment added by 99.109.17.32 (talk) 05:39, 30 March 2012 (UTC)

I think it would be worthwhile to include this point in the article, as it is "hinted at" a few times, along with the "bewilderment" of why the results are so accurate, given that independence may not necessarily hold in the real world.

(Probabilityislogic (talk) 11:59, 18 March 2011 (UTC))

## Removed Statement

> The Naive Bayes classifier performs better than all other classifiers under very specific conditions.

This sentence is so non-specific that it is useless —The preceding unsigned comment was added by 128.2.16.65 (talk) 15:00, 11 May 2007 (UTC).

True, it's pretty useless like that. I will consult some of my notes from a class I took and maybe I can correct it with specifics. HebrewHammerTime 07:03, 2 August 2007 (UTC)
worst statement ever--thanks for removing. every classifier in existence performs better than all other classifiers under very specific conditions. — Preceding unsigned comment added by 99.109.17.32 (talk) 05:30, 30 March 2012 (UTC)

## Formulas dividing by zero?

I'm confused by some of the formulas where, for example p(wi|S) is used in the denominator of an expression. If a word never appears in a spam message, then that would be zero and the division undefined. Likewise when it appears in a numerator and then the log of it is taken where log(0) is also undefined.

Kevin 15:04, 1 August 2007 (UTC)

I'm not an expert on this topic, but I've done some programming with naive bayes and clearly dividing by 0 won't work, but in theory if you analyzed lots and lots of spam you'd very rarely have 0 in the denominator. But naturally these programs can't analyze that much and will sometimes have 0 there. The solution I used was to make a constant that was larger than other values so that a divide by 0 would be accurately represented. I actually found these instances very useful for my classification- giving them a bit of extra weight sometimes increased my classifier's accuracy. HebrewHammerTime 07:01, 2 August 2007 (UTC)
The probability can be zero if you use the maximum likelihood estimate of p(wi|S). If zeros occur, you are better off with a pseudocount or posterior estimate of the frequency - which is non-zero even if the word occurs zero times. 137.158.205.123 (talk)

## Apparent Plagiarism

The worked example seems to be an exact copy of this work (pdf). --Vince | Talk 08:33, 10 April 2008 (UTC)

At the top of that paper, it says "General Mathematics Vol. 14, No. 4 (2006), 135–138". Yet, when I took a cursory glance into the revision history, I thought I saw the corresponding examples in Wikipedia prior to 2006. Is it possible that it was copied in the other direction? Because it is so tabu in academics to cite Wikipedia, it can be difficult for scholarly papers to properly give attribution. Further, these "examples" consist of rather well-established formulas, and don't really express any creativity, so it might also be argued that Copyright may not be relevant here.--Headlessplatter (talk) 15:48, 18 March 2011 (UTC)

## How to join the different probabilities?

how is the total spam probability calculated? for example, if pr(spam | "viagra") = 0.9 and pr(spam | "hello") = 0.2 , how is the pr(spam | {"viagra", "hello"} ) calculated? —Preceding unsigned comment added by 82.155.78.196 (talk) 17:16, 18 August 2008 (UTC)

## Sex classification example

Hi, the example added last august about sex classification is puzzling me, could anyone tell me how to compute the P(heigth | man)? The author gave the value 1.5789 with a note stating that "probability distribution over one is OK. It is the area under the bell curve that is equal to one" which also puzzle me.

Anyway, the only example of naive bayes classification for real-valued I have found is available here and the author use the probability density function of a normal distribution with estimated parameters to compute the probability P(temperature=66), how correct is that? It is very surprising for me to use a PDF to compute a probability. -- Sam —Preceding unsigned comment added by 77.248.94.92 (talk) 18:55, 24 September 2010 (UTC)

Using probability densities instead of probabilities is correct and necessary because continuous random variables are involved. -- X7q (talk) 22:08, 6 December 2011 (UTC)

The example math is incorrect for this example as stated above, this is not the correct method to compute probabilities. Note, the proposed method (sample - mean)/stdev gives nonsensical answers, including its least probable to measure the sample mean and infinitely probable to measure something infinitely far from the sample mean. The correct method is using the standard normal distribution. — Preceding unsigned comment added by 134.134.139.70 (talk) 21:28, 17 November 2011 (UTC)

"proposed method (sample - mean)/stdev" - it doesn't look to me that the original author of the example proposed that. More like somebody didn't understood his derivation and inserted this dumb formula, and it somehow survived in the article. I've removed it. Yes, normal distribution's probability density formula is what is needed there. -- X7q (talk) 22:08, 6 December 2011 (UTC)
Not standard normal distribution, though, but normal with the parameters learned during training. -- X7q (talk)

Can someone post the correct way to compute this? I am still having trouble understanding. Edit: Yes, thanks! — Preceding unsigned comment added by 98.248.214.237 (talk) 21:38, 6 December 2011 (UTC)

I've made a few changes to this section just now. Looks any better to you? -- X7q (talk) 22:08, 6 December 2011 (UTC)

I propose a revamp of the notation in the Testing section. I understand that it is trying to convey a plain-English message to those who are not mathematically inclined, but it looks tacky. I'm going to start editing and hope people are OK with it. — Preceding unsigned comment added by LinuxN877 (talkcontribs) 05:38, 15 December 2012 (UTC)

## Commercial external examples

Is there any reason we shouldn't be able to add an external link to a website that performs Bayesian Classification? I tried to add DiscoverText.com but wikipedia removed it. — Preceding unsigned comment added by 24.9.167.226 (talk) 17:33, 27 June 2011 (UTC)