# Talk:Naive Bayes classifier

WikiProject Computing (Rated C-class, Mid-importance)
This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C  This article has been rated as C-Class on the project's quality scale.
Mid  This article has been rated as Mid-importance on the project's importance scale.
 This article has been automatically rated by a bot or other tool because one or more other projects use this class. Please ensure the assessment is correct before removing the |auto= parameter.
WikiProject Robotics (Rated High-importance)
Naive Bayes classifier is within the scope of WikiProject Robotics, which aims to build a comprehensive and detailed guide to Robotics on Wikipedia. If you would like to participate, you can choose to edit this article, or visit the project page (Talk), where you can join the project and see a list of open tasks.
High  This article has been rated as High-importance on the project's importance scale.
WikiProject Statistics (Rated C-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.

## little p versus capital P

Can somebody explain to me what p(...) means? Honestly, what symbols am I allowed to write inside the brackets of *little* p and how is it defined? I know P(...)... I am very confused once more why probabilities are computed using density functions and the difference of p and P is the reason for much of the trouble caused by this article.

(Edit: see this question of mine). — Preceding unsigned comment added by 78.51.30.89 (talk) 21:50, 29 May 2015 (UTC)

## "Naïve Bayesian classification" moved to "Naive Bayes classifier"

Hello. I have reverted "naïve" to "naive" in the article text, as "naive" is the usual English spelling, and occurs more often than "naïve" in texts (papers, books, web pages, etc). I have also moved naïve Bayesian classification to naive Bayes classifier. For various combinations of terms I find the following:

• "naive Bayes classifier" yields approx 11,000 Google hits
• "naive Bayesian classifier" yields approx 5000 Google hits
• "naïve Bayes classifier" yields approx 1000 Google hits
• "naïve Bayesian classifier" yields approx 500 Google hits
• "naive bayesian classification" -wikipedia -encyclopedia yields approx 500 Google hits
• "naïve bayesian classification" -wikipedia -encyclopedia yields approx 150 Google hits

As this classifier is very common in computer-related texts, it is reasonable to suppose Google is a reliable indication of the currency of different variations of the name. Regards & happy editing, Wile E. Heresiarch 04:51, 27 Dec 2004 (UTC)

But "naïve" is proper English (with the umlaut), so wouldn't that "overrule" the "most common" phrase? WhisperToMe 05:28, 27 Dec 2004 (UTC)

For the benefit of other readers, I'll copy here some comments I put on user talk:WhisperToMe: (1) Re: standard English. I can't find any dictionaries or other sources which state that the correct spelling is "naïve". Every source I have found shows "naive" as the primary spelling, and shows "naïve" as an acceptable variation of "naive". It is clear that both spellings are acceptable. Naïve/naive isn't mentioned at Wikipedia:Manual of Style or American and British English differences. If you have some other sources I'd like to hear about it. (2) Agreed that the Google test only shows what's more common. However, since both spellings are acceptable and "naive" is more common, and much more common in a mathematics context, a crusade to change "naive" to "naïve" seems pointless at best. Wile E. Heresiarch 06:57, 27 Dec 2004 (UTC)
Our university professor taught us that naïve comes from the French language and insisted that it's the only correct spelling even in English. I had to fix two of my LaTeX handins just because of the diaeresis, albeit personally I prefer the naive spelling and after graduation I always wrote naive in my publications unless required otherwise (it did happen to me to be requested to fix a LaTeX paper just because the editor wanted the naïve spelling!). Apparently we should include both spellings in the article. Sofia Koutsouveli (talk) 22:30, 21 March 2014 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Wouldn't it be best to use the spelling "naïve", since it's easier to read? Otherwise many people will be reading it as "knave Bayes classifier" and getting confused... It's not as bad as trying to use "resume" as a noun, but I think it's better to use an ï, since it becomes easier for some people to read. Does anyone have trouble reading "naïve" but no trouble reading "naive"..? (If so, we can be nice, and make it even easier for them to read, by writing "naıve".) Κσυπ Cyp   15:55, 27 Dec 2004 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). – Could the reason be that Google naively treats "naïve" and "naive" as interchangeable? On what basis do you assert that "naive" is harder to read than "naïve"? How can "naive" be confused with "knave" by someone who's likely to understand the article? Sure, if a child or someone learning English is completely unfamiliar with the word "naive" they may think that it's pronounced like "knave", but then the real problem is that they don't know what "naive" means in the first place. If they decide to look it up in a dictionary, they would also learn about the correct pronunciation. I would say in the case where two forms like "naive" and "naïve" exist and are equally acceptable, it's better to use the form that is easier to type. If we only had "naïve" everywhere with no redirects, someone might try to search for "naive Bayes" (because that's easy to type for lots of people, whereas "naïve" is not, even on many types of European keyboards); they would be unable to find the article, and either give up or start a duplicate article. In the context of this article, I would say that "naive Bayes" is far more frequent than "naïve Bayes", but check for yourself: do a search for "naïve Bayes" on http://scholar.google.com/ and see how many occurrences of "naïve" you actually find. --MarkSweep 01:10, 28 Dec 2004 (UTC)
Ummm, yes, it could be because Google naïvely treats "ï" and "i" as interchangable. (As searching for either finds the same pages, disregarding whether they use the easy to read or easy to type version.) The scholar.google.com does the same thing, except it doesn't display the diaeresis until I follow the links. (Clicked on a random link that it found, and it used the "ï", not the "i".) I assert that "naive" is harder to read than "naïve", becuase "naive" looks rather strange and distracting to me. It's obvious what it meant, after spending an extra second reading it, but why make people spend an extra second reading it to understand it? I would say that in the case where two forms like "naïve" and "naive" exist and are equally acceptable, it's better to use the form that is easier to read. We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. Κσυπ Cyp   04:59, 28 Dec 2004 (UTC)
Let's not make decisions based on a single random link. Furthermore, while I don't have any evidence that "naive" won't cause any additional confusion (except for people who don't know the concept in the first place), you don't seem to have any evidence that "naïve" is easier to read either. I would say the burden of proof is on you here: can you demonstrate empirically that "naive" actually causes confusion? Significant confusion? Utterly hopeless cannot-make-heads-or-tails-of-it confusion? The other issue is with instances of "naïve" that do not occur in an article title. Does the new Mediawiki search facility treat "naive" and "naïve" as equivalent? (I don't know.) I suspect (without proof) that "naive" is a more frequent search query than "naïve", since it's easier to type for just about anyone. Unless both terms are treated as equal, searching for "naive" will miss pages that only have "naïve" in them (on second thought, this is turning into an argument in favor of inconsistent spelling, using all variants of a relevant word in an article).
Empirically, I find "naïve" easier to read than "naive". I do not, and have not claimed, that it is significant confusion, just that "naïve" is easier for me to read. If some people find both equally easy to read, and some find "naïve" easier to read, then it seems that "naïve" is easier to read on average. (As far as I can tell, noone has claimed that they find "naive" easier to read than "naïve".) The Mediawiki search seems to be disabled at the moment, although I would guess that it wouldn't treat them the same. I also suspect (also without proof) that "naive" is a more frequent search query than "naïve", since it's easier for many/most people to type. Since (hopefully) noone is going round deleting redirects between "naïve" and "naive", searching should find both spellings. (I certainly think that searching for "naive" should find the articles, as well as searching for "naïve"...) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
I'm sorry but "empirically, I find" just doesn't make sense: you're not stating an empirical observation, you're only stating your own opinion, which you are certainly entitled to. But since you have a stake in the outcome, you cannot count your own preferences in an empirical study. I could claim that I find "naive" easier to read (since it has fewer dots and looks more normal to me), but I would have to discount that as my own biased opinion, which isn't empirical evidence. Regarding the issue of full text search, I was referring to articles that do not have "naive" in the title and which can only be found by a full text search. However, my argument is not particularly good: by the same token, someone might search for "colour" and not find a relevant article that mentions "color" in the body text but not in the title. So all we have now in terms of arguments is (1) your opinion that "naïve" is easier to read; (2) the fact that in a non-random sample of 16 relevant publications (see below) "naïve" occurs in 2, but "naive" in 14; and (3) opinions from several editors that "naive Bayes" is more common. For all I know the conjunction of these three propositions is not a contradiction: it could be the case that "naïve" is in fact easier to read (though I will remain skeptical) and that "naive" is more common (for which I believe there is sufficient empirical evidence). In that case, we still need to make a decision which form we should pick, and there is precedent for choosing the more common form. --MarkSweep 19:52, 28 Dec 2004 (UTC)
After looking up "empirically" in dictionary.com, I'm not sure that I was using the word correctly. I meant, subjectively/personally, I find "naïve" a bit easier to read than "naive". I hadn't thought of full-text searches, before. If you do actually find "naive" easier for you to read, not just easier to write, then I'm fine with it being left as "naive". (I got the impression that noone here actually found "naive" easier to read, just thought it should be used because of being easier to type or more common.) (wɛn wɪl piːpl ɑːfɪʃəliː swɪtʃ tuː juzɪŋ ʌ fənɛtɪk əlfəbɛt fɔː ɪŋgɫɪʃ..?) Κσυπ Cyp   02:00, 29 Dec 2004 (UTC)
Some data points: The first two references cited in the present article both use "naive", not "naïve" (I was unable to check the third reference). Russell and Norvig use "naive", not "naïve". Among the first ten results returned by scholar.google.com, 8 use "naive" and 2 use "naïve". Added later: Mitchell's Machine Learning textbook (ISBN 0070428077), Data Mining by Han and Kamber (ISBN 1558604898), and Data Mining by Witten and Frank (ISBN 1558605525) all use "naive" exclusively. Score: "naive" 14, "naïve" 2. --MarkSweep 19:52, 28 Dec 2004 (UTC)
Finally, the insidious slippery slope argument: would you be in favor of writing "coördinate" and "reëlect" as well? How about "reärmed", since that could easily be confused with "rear med"? --MarkSweep 07:06, 28 Dec 2004 (UTC)
I wouldn't support or oppose a diaeresis on those words. The "ö" in "coördinate" seems slightly more appropriate than the "ä" in "reärmed", although I'm not sure why. Possibly because reading the "ä" as an umlaut would make the pronunciation completely wrong. (I think that pronouncing "naïve" without a diaeresis would sound much worse than pronouncing the other three words without a diaeresis.) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
User:Cyp, I can't tell what you're taking about. Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Googling for the exact phrase (with quote marks, and with -wikipedia -encyclopedia) I get 5000 for "naive" [1] and 500 for "naïve" [2] as reported above. Without quote marks (and with -wikipedia -encyclopedia) I get about 21,000 for "naive" [3] and 6000 for "naïve" [4]. So on what basis are you trying to claim "naïve Bayesian classifier" and "naive Bayesian classifier" are equally common? -- In any event, if you want to claim "naïve" is easier for some people to read you're going to have to come up with some evidence for that; "naive" looks rather strange and distracting to me simply doesn't count. -- We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. -- I'm sorry, I simply don't understand what you're getting at here. Wile E. Heresiarch 05:27, 28 Dec 2004 (UTC)
When I search with google, it treats "ï" and "i" as completely identical. I have no idea why it behaves differently, when you search. Last time I checked, I was a person, so I have already come up with evidence that some (at least one) people find "naïve" easier to read than "naive". Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)

## Sex classification example

Hi, the example added last august about sex classification is puzzling me, could anyone tell me how to compute the P(heigth | man)? The author gave the value 1.5789 with a note stating that "probability distribution over one is OK. It is the area under the bell curve that is equal to one" which also puzzle me.

Anyway, the only example of naive bayes classification for real-valued I have found is available here and the author use the probability density function of a normal distribution with estimated parameters to compute the probability P(temperature=66), how correct is that? It is very surprising for me to use a PDF to compute a probability. -- Sam —Preceding unsigned comment added by 77.248.94.92 (talk) 18:55, 24 September 2010 (UTC)

Using probability densities instead of probabilities is correct and necessary because continuous random variables are involved. -- X7q (talk) 22:08, 6 December 2011 (UTC)

The example math is incorrect for this example as stated above, this is not the correct method to compute probabilities. Note, the proposed method (sample - mean)/stdev gives nonsensical answers, including its least probable to measure the sample mean and infinitely probable to measure something infinitely far from the sample mean. The correct method is using the standard normal distribution. — Preceding unsigned comment added by 134.134.139.70 (talk) 21:28, 17 November 2011 (UTC)

"proposed method (sample - mean)/stdev" - it doesn't look to me that the original author of the example proposed that. More like somebody didn't understood his derivation and inserted this dumb formula, and it somehow survived in the article. I've removed it. Yes, normal distribution's probability density formula is what is needed there. -- X7q (talk) 22:08, 6 December 2011 (UTC)
Not standard normal distribution, though, but normal with the parameters learned during training. -- X7q (talk)

Can someone post the correct way to compute this? I am still having trouble understanding. Edit: Yes, thanks! — Preceding unsigned comment added by 98.248.214.237 (talk) 21:38, 6 December 2011 (UTC)

I've made a few changes to this section just now. Looks any better to you? -- X7q (talk) 22:08, 6 December 2011 (UTC)

I propose a revamp of the notation in the Testing section. I understand that it is trying to convey a plain-English message to those who are not mathematically inclined, but it looks tacky. I'm going to start editing and hope people are OK with it. — Preceding unsigned comment added by LinuxN877 (talkcontribs) 05:38, 15 December 2012 (UTC)

The example for P(height|man) is wrong: The PDF for value 6 given mean 5.855 and sd 0.035033 is 0.002170381 (and not approx. 1.5789) 85.10.127.15 (talk) 12:15, 15 July 2015 (UTC)

## Naive or naïve

At university I was only taught the naïve spelling and I was told that naive is wrong, in fact our professor refused to accept our answers if we failed to add the diaeresis over the i, it happened twice to me and I had to re-write my LaTeX handins. I personally prefer the naive spelling and I hate the diaeresis as I've to switch to the French keyboard to add it, and actually after I finished from university I always wrote naive in my publications unless specifically required otherwise. Nevertheless, I feel we should surely note in the article, preferably in the lead section, that the term is written both with naive and naïve. Is there anyone feeling bad about including both spellings in the lead and what do you think? Sofia Koutsouveli (talk) 22:23, 21 March 2014 (UTC)

## Requested move 17 August 2014

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.

The result of the move request was: not moved. Jenks24 (talk) 14:42, 25 August 2014 (UTC)

Naive Bayes classifierNaive Bayes – Current page title is a bit of a tautology, since naive Bayes models are always classifiers. – QVVERTYVS (hm?) 14:32, 17 August 2014 (UTC)

This is a contested technical request (permalink). Anthony Appleyard (talk) 16:00, 17 August 2014 (UTC)
Anthony Appleyard, can you explain why this would be controversial? (Note that Naive Bayes redirects here already.) QVVERTYVS (hm?) 16:14, 17 August 2014 (UTC)
• "Naive Bayes" without "classifier" is unclear, it could mean a story about a naive man called Bayes, or various things. Anthony Appleyard (talk) 16:28, 17 August 2014 (UTC)
• That would be a WP:DICDEF unless you have a source to establish such a man's notability ;) QVVERTYVS (hm?) 16:34, 17 August 2014 (UTC)
• Oppose per WP:NOUN. Dicklyon (talk) 05:04, 18 August 2014 (UTC)
• Oppose: A great example of why excessive conciseness is not a goal here. It's almost always better to use noun phrases than shorter adjectival ones here, unless there's something adjectival about the topic. Which is rare, very rare.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  13:53, 24 August 2014 (UTC) PS: Cf. Fast Fourier transform and similar cases; we don't truncate them to things like Fast Fourier except as redirects, despite the propensity for some specialist sources to do so.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:56, 25 August 2014 (UTC)
Dicklyon, SMcCandlish, "naive Bayes" is a noun:
• "Naive Bayes is often used as a baseline in text classification because it is fast and easy to implement" ([5], abstract)
• "we investigate the optimality of naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of naive Bayes" ([6], abstract)
• "We also demonstrate that naive Bayes works well" ([7], abstract)
• "naive Bayes often performs classification very well." ([8], introduction)
... and I could go on. QVVERTYVS (hm?) 08:46, 25 August 2014 (UTC)
That's shorthand "nouning" of an adjectival phrase. I don't see how it's different from advertising and PR people using "creative" as a noun (e.g. "our firm specializes in social-media-based creative and messaging". This sort of "nouning" is a specialist, WP:JARGON usage, and will confuse people outside the specialty in question, because it's not a standard English usage pattern. A more exact comparison would be using "fast Fourier" instead of "fast Fourier transform" in a bunch of phrases like that (e.g. "A fast Fourier computes the DFT and produces exactly the same result as evaluating the DFT definition directly"). You can definitely find it used that way in specialist literature, but almost never in general-audience sources (like WP itself) because it "does not compute" for anyone but specialists in the fields in which that shorthand truncation is used. The "classifier" in the name of this article is only redundant to people who use naive Bayes classifiers; it's not redundant to the average reader, which is who WP is written for.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:56, 25 August 2014 (UTC)
Ok, point taken. Textbooks actually tend to spell it out on first use, AFAICT. QVVERTYVS (hm?) 11:16, 25 August 2014 (UTC)

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.

## Two steps?

Is it the example complete? I mean, some articles speak about two steps:

1) preclassification (get statistics from training data, calculate conditional probability and classify...)

and

2) Bayesian Classification (update a priori probability and classify...).

Where is the 2nd step? Don't you update the probabilities? Don't we need to iterate anything?

regards

— Preceding unsigned comment added by 81.202.7.175 (talkcontribs)

Which articles in particular are you referring to? QVVERTYVS (hm?) 21:48, 26 November 2014 (UTC)

## What does this mean?

In the Gender Classification example it says:

"Note that a value [1.5789] greater than 1 is OK here – it is a probability density rather than a probability, because height is a continuous variable."

This sentence makes no sense... I haven't heard of a pdf value that is greater than 1, and in any case - this is not the real value of this pdf!

If I'm not mistaken, for continuous distribution you need to choose some distance (a,b) around the actual value, for example (5.855-0.5, 5.855+0.5) - and just make sure to have the same distance for all the calculations, (i.e. in the example, both for male and female) - this way the effect of the different distributions are constant. — Preceding unsigned comment added by 5.22.130.201 (talk) 19:30, 6 June 2016 (UTC)

## Ad 'Constructing a classifier from the probability model'

Shouldn't be

${\displaystyle {\hat {y}}={\underset {k\in \{1,\dots ,K\}}{\operatorname {argmax} }}\ p(C_{k})\displaystyle \prod _{i=1}^{n}p(x_{i}\vert C_{k}).}$

replaced by

${\displaystyle k={\underset {k\in \{1,\dots ,K\}}{\operatorname {argmax} }}\ p(C_{k})\displaystyle \prod _{i=1}^{n}p(x_{i}\vert C_{k}).}$

? — Preceding unsigned comment added by 195.187.80.9 (talk) 07:30, 8 June 2016 (UTC)

## What other name?

In the second sentence of the second paragraph, it might be helpful to provide what the "different name" used in the text retrieval community is: "It was introduced under a different name into the text retrieval community in the early 1960s,[1]:488 and remains a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features." — Preceding unsigned comment added by 73.38.249.160 (talk) 20:52, 3 July 2016 (UTC)