Talk:Statistical classification

Merging/reorganizing Pattern Recognition and Statistical classification

Regarding Statistical classification/temp, I'm puzzled how to integrate it into existing articles. Check out Classification: there are two types of classification, Taxonomic classification and Statistical classification. I think you may be talking about taxonomic classification, but I'm not sure. We've made a distinction: taxonomic classification is based on human decision-making, while statistical classification is based on algorithmic decision-making.

-- hike395 July 1, 2005 17:55 (UTC)

I checked out the existing pages again. I'm talking about algorithmic decision making in the temp article--the classification of items into groups based on numerical/statistical analysis using some algorithm. My issue is that under the category of algorithmic decision making, the topic is discussed only in terms of pattern recognition/machine learning and there is no general explanation of what statistical classification is and does. Algorithmic, (or computational, or numerical or statistical--I use them synonymously) classification can be, and is, applied to all kinds of things. So I think an overall summary of what stat. classif. is and does--how it works, it's underlying ideas, types of approaches, specific algorithms etc., is needed. Particular applications can then be discussed after that. To jump to a pattern recognition application immediately is just getting too specific too fast on just one of many many applications.

Also, the applications listed under taxonomic classification are not necessarily based on human decision making. Some of them can be algorithmic as well, such as phenetics-based classifications of organisms.

I don't disagree with your idea to break the topic into human vs algorithmic-based procedures, but I think some work needs to be done to make everything clearer. What I wrote needs to be expanded on for sure, but is a basic intro which can be built on I hope.

Jeeb 2 July 2005 00:18 (UTC)

What a conundrum. I've thought about it, and I agree with you: Statistical classification should be the main article about classification in statistics. The problem is: what to do about Pattern recognition? Here are three issues: 1) lots of pages link to pattern recognition. If it turns into a redirect, it would surprise a lot of people; 2) if it is too similar to statistical classification, they will slowly evolve to have different/conflicting information (given that Wikipedians are not thorough about checking for redundant articles, and 3) ...

Issue 3 is a doozy, and it goes back to the sociology of AI research. AI research goes through boom/bust cycles that seem to last 10-20 years. Each cycle generates a new name. In the 1950s and 1960s, the statistical AI approach was called pattern recognition (especially applied to computer vision tasks). In the 1980s, it was called neural networks (and it was vaguely neuromorphic). In the 1990s, it was called machine learning. In each cycle (except for machine learning?), the researchers overpromised and their area fell into disrepute. The name fell out of favor, except for those die-hard people who stayed with the same techniques. Thus, we still have pattern recognition conferences (ICPR), neural network conferences (IJCNN), and machine learning conferences (ICML) that all co-exist.

So, I think that we should rewrite pattern recognition to be a more historical/sociological article about statistical AI, rather than a listing of techniques.

The problem is, it's an enormous undertaking, and people may not fully agree. I can take a stab at making a stubby start of the article. The problem is that, without a lot of meat in the article, it may drift into replicating statistical classification. Also, we would need to find sources for the histoy of pattern recognition, which is somewhat tricky.

-- hike395 July 7, 2005 06:05 (UTC)

More data! Check out the FOLDOC definition of Pattern Recognition. They distinguish PR from statistical classification by 1) claiming that PR is a subfield, 2) PR systems solve the whole problem (including pre-processing), and 3) there are non-statistical classification approaches to PR (including syntactic classification, which I had forgotten about). -- hike395 July 7, 2005 15:38 (UTC)

...and I realize, on re-reading pattern recognition (PR) that I had been thinking of it as synonymous with image analysis when I made my initial comments and wrote the temp article, but the article makes it clear that PR is broader than just image analysis, which I agree with. Nevertheless, I think PR and statistical classification (SC) are different because of (1) your comment that PR can involve non-statistical (e.g. syntactical) approaches, and (2) SC (and PR) can be unsupervised (the PR article as written focuses on training sets and mapping a set of items onto an appropriate classification label using such sets--which means it is talking only of supervised classification procedures. But classification can also be unsupervised, with the labeling of classification groups coming later via some independent, non mathematical procedure). So in some respects PR seems to me broader than SC, and in other ways narrower, so I'm not so sure that PR is a subfield of SC; I'm prone now to think it's actually broader, but at any rate, I think they're certainly different enough to warrant separate articles.

Including the historical evolution of PR sounds like a good idea, but I think some info on methods and techniques should be included as well, because PR seems to me to have important and distinguishing elements (like the incorporation of syntactic or contextual information that you mention). (It is in that respect especially that I think PR is broader than stat. classif., which never, to my knowledge, deals with syntactical information or the whole concept of topological relationships among items or groups).

How about two separate articles without any redirects, justified by clear distinctions between the two in the articles--simply remove the redirect from SC to PR that now exists, put the existing SC-temp article where SC now is, and then continue to edit the two articles using this (and future) discussion as a basis for it? No links to PR would be affected that way, and any existing links to SC would not redirect a reader to PR. As for the enormous undertaking, I think this minimizes it because we can just slowly continue to revise the two existing articles as we discuss the relationship between the two topics...

Jeeb