Wikipedia talk:Categorization

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Shortcut:
WikiProject Manual of Style
WikiProject icon This page falls within the scope of WikiProject Manual of Style, a drive to identify and address contradictions and redundancies, improve language, and coordinate the pages that form the MoS guidelines.
 
WikiProject Categories
WikiProject icon This page is within the scope of WikiProject Categories, a collaborative effort to improve the coverage of categories on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 

Article categories on Draft pages[edit]

I currently run a bot task that comments out article categories from user pages per WP:USERNOCAT. I would be happy to do the same for pages in the new Draft namespace. However, I don't see anything in this guideline that states article categories should not be added to pages in the Draft namespace. Should something be added to this guideline for the Draft namespace? Thanks! GoingBatty (talk) 13:37, 31 May 2014 (UTC)

Would adding something like including pages in the draft namespace be sufficient? Vegaswikian (talk) 16:49, 31 May 2014 (UTC)
Is there any reason to "comment out categories" rather than insert the initial colon ([[:Cat...)? Offhand I suppose it's easier for both humans and robots to perceive the latter when a page is moved to article space without category restoration. --P64 (talk) 17:25, 31 May 2014 (UTC)
Unsure. Using the colon leaves the category visible in the article text, using a hidden comment hides it unless you edit the article. Vegaswikian (talk) 18:10, 31 May 2014 (UTC)
I remember reading that articles in draft namespace remain unacknowledged to the category system. Yes, confirmed, no categories. --Ancheta Wis   (talk | contribs) 22:07, 31 May 2014 (UTC)
yes, only article space articles should be in article-space categories. I think commenting out is simply easier to do, and undo, than doing the whole colon trick. But if someone has already used the colons, I see no reason to change it to be commented out.--Obi-Wan Kenobi (talk) 13:13, 2 June 2014 (UTC)
@Vegaswikian: I added your suggestion to WP:USERNOCAT, but think it should eventually be expanded into its own section. Thanks! GoingBatty (talk) 13:08, 29 June 2014 (UTC)
On second thought, I just went ahead and created a Draft pages section. If there are any types of categories that are acceptable on draft pages, some examples could be added to this section. Thanks! GoingBatty (talk) 13:14, 29 June 2014 (UTC)
I just submitted Wikipedia:Bots/Requests for approval/BattyBot 33 to remove article categories from pages in the Draft namespace by inserting the initial colon as P64 suggested. GoingBatty (talk) 02:05, 3 July 2014 (UTC)

Change proposals to WP:COP#N relating to WP:DEFINING[edit]

WP:COP#N is that part of the Wikipedia:Categorization of people guideline that talks about categorizing biographies along lines of notability and definingness.

Several changes to this part of the WP:COP guideline have been proposed. Input welcome!

Please discuss at Wikipedia talk:Categorization of people#Proposed language change to WP:COP#N

--Francis Schonken (talk) 06:36, 22 June 2014 (UTC)

Categorization advice[edit]

I've recently started adding the Category:Fungus genera to relevant articles. As there are many 1000s of genera, this cat will soon become unwieldy, so I'd like to make it more manageable by make fungus genera categories specific for each fungus order. Should the parent cat of Category:Agaricales genera by Category:Agaricales or Category:Fungus genera? Would like some advice on how best to organize this before I start making 100s of changes. Sasata (talk) 19:11, 23 June 2014 (UTC)

Biology categorization is usually based on biological taxonomy systems - I'd suggest bringing the discussion to a WikiProject about biology or similar, then if there are questions once you've developed an approach, bring them here for broader categorization advice.--Obi-Wan Kenobi (talk) 20:12, 23 June 2014 (UTC)

Exceptions to the rule that members of subcategories should always fit into the supercategories they inherit[edit]

@Hyacinth, Obiwankenobi: I'm afraid I don't understand this edit summary. Under what circumstances should a page belonging in category X be allowed not to fit into the supercategories of X? Lacking any further information, I would understand this as a sign that the subcategory has not been properly categorized. Frankly, I'm baffled. Paradoctor (talk) 20:05, 28 June 2014 (UTC)

Categories aren't really like mathematical sets. for example, Bibliothèque municipale de Nancy is in Category:Buildings_and_structures_in_Nancy,_France which is in Category:Buildings_and_structures_in_France_by_city which is in Category:Cities_in_France. But the Biblotheque is clearly not a city in France. This sort of inconsistency abounds here. Generally we try to keep it clean one level up, but even that isn't always possible - there are sometimes entries in the subcategory that wouldn't really perfectly fairly fit in the parent - this is the case for example all across the Category:Ireland tree, where many items have dual parenting of UK and Ireland (due to the complex nature of the politics there). Thus, it's much better to leave that flexibility in, and leave people's judgement to decide when adding a super category that supports 99% of the content is reasonable, or not.--Obi-Wan Kenobi (talk) 20:15, 28 June 2014 (UTC)
Yep.

If we were to treat categorization as an exact tree-like hierarchy (like a phylogenetic tree) instead of a network of relationships, it would be impossible to have a single integrated category system because the whole universe of topics just don't relate in that way, and we'd just have a lot of confused readers who couldn't find the articles they were looking for. The purposes of grouping related articles and aiding reader navigation trump any strict classification. I've never encountered anyone honestly confused about whether the Eiffel Tower is a member state of the EU despite its placement deep down in that structure (Eiffel Tower -> Cat:Landmarks in France -> Cat:Visitor attractions in France -> Cat:Economy of France -> Cat:France -> Cat:Member states of the European Union), though there has been the occasional editor who nevertheless complains that this violates some kind of conceptual consistency that ultimately has no relevance or practical value in this context. postdlf (talk) 20:46, 28 June 2014 (UTC)

The examples you two gave are not examples of exceptions to the rule. Starting with Cities in France and Tourism in France, we have topic categories, so in both cases the articles actually belong into all their own supercategories in their own right. Could it be that, until now, you guys have not really understood the category system? ;) Paradoctor (talk) 20:59, 28 June 2014 (UTC)
I'm in full agreement with Postdlf. We should be serving our readers first and foremost – I believe the conceptual relationship model is more intuitive than an exclusive hierarchy one . That arrangement would mean something like Category:Men would need to include all the categories in the "men by X" subcats in order to make the hierarchy work. On top of that everything from men's health, culture, given names etc would be out as the contents of those categories are not instances of men. SFB 21:02, 28 June 2014 (UTC)
Did you read my above reply? What you said has no relation to what I said. If the exceptions alluded to in the guideline are the same kind as presented here, then they are no exception, and the parenthetical remark should be removed. If they are about something else, I'd like to know, because I can't come up with anything that might make sense. Paradoctor (talk) 21:21, 28 June 2014 (UTC)
"Under what circumstances should a page belonging in category X be allowed not to fit into the supercategories of X?"
There are a great many. Mediawiki categories are navigational, not defining. They just don't define exact membership criteria. It's simplistic and wrong to act as if they do.
As one of the most obvious examples, membership implied of a supercategory is associated with both the category (this category is a member of supercat) and also its members (all members of this category are implicitly transitive members of this supercategory). It's sometimes the case that the first is ordinal and clearly defined, the latter is much more vague. As a trivial example, Brunel is a Civil engineers of England, Bridges of Brunel are a sub cat of Brunel, but bridges aren't themselves civil engineers. Andy Dingley (talk) 22:07, 28 June 2014 (UTC)
Seriously, I'm getting the impression I passed through a couple of mirrors back there. Category:Bridges of Isambard Kingdom Brunel is not a subcategory of the non-existant Civil engineers of England. It is a subcategory of category:Engineering, but that is correct, as that is topic category, and bridges are clearly an engineering subtopic.
Now, to make it unmistakeably clear: I am aware of the distinction between set categories, topic categories, and set-topic categories. All examples given here so far of alleged "inconsistent" categorizations are, in fact, entirely in accord with the guideline, and are not "exceptions" to the rule. Paradoctor (talk) 23:00, 28 June 2014 (UTC)
(edit conflict) Could this be resolved by changing a definite article to an indefinite, that is to say a page in cat X should belong to some parent of X? Bridges of Brunel would also be categorized under Bridges, Man-made structures, & so on, likewise Bibliothèque de Nancy under Libraries as well as Europe. It’s pretty much a given that where ‘orthogonal’ trees intersect, not all the branches will represent hierarchical relations. From another angle, geographical cats can be conceived as having a special kind of relation with the people and things they contain, and likewise personal cats with their works & activities; unfortunately the software gives us only one kind of cat to work with, so semantic & structural differences have to be established through practice.
What would be more interesting to me is an example of a (fairly low-level) cat whose members fit none of its supercategories.—Odysseus1479 23:22, 28 June 2014 (UTC)
I would settle for an example where at least one supercategory does not fit. All examples given so far (library, Eiffel tower, Brunel) do not constitute a problem. They all fit into all their supercategories. What we need is an example of a page that is correctly categorized into some category, but does not actually belong into one of the inherited categories. The problem at hand has nothing to do with hierarchies or orthogonality, and I said nothing to that effect. Paradoctor (talk) 00:08, 29 June 2014 (UTC)
I‘ve been interpreting “supercategories“ and “inheritance” to include all the nodes on the way to the root, not just the level immediately above the cat of interest—as ISTM others commenting here have done as well. If that’s not what they mean, I’m quite at a loss as to the purpose of the guideline in question to start with.—Odysseus1479 00:25, 29 June 2014 (UTC)
I interpret it the same, and though I can't speak for the others, they seem to do so, too. And yes, the purpose of the parenthetical remark is the problem. Talking of exceptions when there appear to be none is pointless. Which was my point to begin with. Paradoctor (talk) 00:47, 29 June 2014 (UTC)
I guess I'm confused. There are exceptions - i.e. there are cases where we add a super-category to a given category, yet not ALL direct members of that starting category properly belong in the super-parent. I already gave the example of Ireland, where due to the history, Northern Ireland categories are parented by both "Ireland" and "UK" categories. Thus, Category:Barristers_from_Northern_Ireland is both Category:Irish barristers and Category:British barristers - however, technically, there are "Irish barristers", who are properly described as being from Northern Ireland, but who absolutely would not identify as British Barristers. I know @BrownHairedGirl: is busy now, but perhaps she'd like to share other examples. In general, however, we try our best to have a rough subset relationship, but as soon as you go one or two levels down, the notions of subset are lost, esp when dealing with topic categories, where things get a lot woolier.--Obi-Wan Kenobi (talk) 03:05, 29 June 2014 (UTC)
This would be the kind of miscategorized category I mentioned right at the start. Just create subcategories "Barristers from Northern Ireland who do not identify as British" and "Barristers from Northern Ireland who do identify as British", put them under "Barristers from Northern Ireland", and move the "British barristers" categorization to "Barristers from Northern Ireland who do identify as British". Problem solved, right? Paradoctor (talk) 11:51, 29 June 2014 (UTC)
you asked for an exception, one has been provided. The fact that this exception can be "fixed" by splitting all Northern Ireland categories suggests that you haven't spent much time working in that tree. Sometimes we tolerate slight inaccuracies in parenting in order to save work and keep things consistent and a bit simpler. Eponymous categories also cause similar issues - the eponymous category for Paris may contain things that wouldn't fit in the parent categories in which it finds itself.--Obi-Wan Kenobi (talk) 12:49, 29 June 2014 (UTC)
"you haven't spent much time working in that tree" That's right. Maybe I should, considering how obvious the solution is. Or are there problems with the solution I provided? If so, please let me hear them.
"save work" We're all volunteers and work exactly as much as we want to. Mistakes happen, and nobody says they have to be corrected by you personally this very instant. But that doesn't mean they shouldn't be recognized as such.
"keep things consistent" How do inaccuracies keep things consistent? Please remember that categorizations have to be verifiable. Categorizing a self-identified Northern Irish barrister as British is a clear violation of WP:CAT.
"Eponymous categories" Examples please? So far, all provided examples were either not exceptions at all, or could be easily fixed by proper categorization according to the guideline. Paradoctor (talk) 13:50, 29 June 2014 (UTC)
example: Category:Measurement works nicely with its eponym, which is how {{cat main}} works.
subsets of a Partially ordered set need not include its apparent exceptions. During the process here on this talk page, it would be necessary to identify its outliers and recognize there is a larger definition in play during the discussion of category. i.e. assume good faith, please --Ancheta Wis   (talk | contribs) 18:03, 29 June 2014 (UTC)
"example" Misunderstanding, my bad. I want an example of a specific page in some eponymous category that does not belong into one of the supercategories of that category.
"assume good faith" I beg your pardon? Please tell me where you think I didn't do that. Paradoctor (talk) 22:13, 29 June 2014 (UTC)
"splitting all Northern Ireland categories" I didn't suggest that. From what I understand, this problem only relates to people of Northern Ireland, as they can choose whether to identify as Northern Irish or British. Other than that, what needs doing, needs doing, I don't see a problem with that. Paradoctor (talk) 13:55, 29 June 2014 (UTC)
"notions of subset are lost" There is no reason to believe that. The subset relation is transitive, as is the subtopic relation. The Northern Ireland example rests on a categorization mistake. I still have not seen any exception to the rule. Paradoctor (talk) 12:26, 29 June 2014 (UTC)

Any way of categorization we choose will need to be explainable. Surely other people have categorized objects/subjects before, have ideas/systems/principles about how to do it, and have explained them. Hyacinth (talk) 00:57, 29 June 2014 (UTC)

This discussion is not about changing categorization in any way. I've come here to find out what the parenthetical remark in "ensure that the members of the subcategory really can be expected (with possibly a few exceptions[clarification needed]) to belong to the parent also" is supposed to refer to. I pinged you because you added the {{clarify}} tag. Paradoctor (talk) 11:51, 29 June 2014 (UTC)
The idea of creating "Barristers from Northern Ireland who do identify as British" is preposterous and distracting. Let's get to the point: an example would be that Buddhism in Costa Rica (in Category:Religion in Costa Rica) should not be in Category:Religion by country, as it is an example of one religion in one country, and not all religion in a country, which is the scope of the parent. SFB 15:13, 29 June 2014 (UTC)
Religion by country is not a set category, it is a topic category. Buddhism in Costa Rica is a subtopic of Religion by Country, so it does, in fact, belong into that category. Paradoctor (talk) 22:13, 29 June 2014 (UTC)

I agree with Obiwan's example of the barristers categories, and the suggested subdivision based on how they identify would only hinder navigation. That kind of hairsplitting and effectively triple categorization is contrary to WP:OCAT and not something that should be imposed on the category system, but something to be explained in article text, which is ultimately what anyone should be looking to for a full understanding of the subject.

Another example might be U.S. cities that extend into multiple counties. The respective county categories would then be appropriate parents for the city's eponymous category, despite the fact that most articles included in it will only be physically present or otherwise relate to only one of those counties. Depending on the city, subdividing every city subcategory further by county probably won't much sense, as this would just fragment things at the city level thus making navigation difficult, with really no added benefit except to the (hypothetical) very few who might confusedly think, merely by wrongheaded inference of the category structure, that every high school in the city is somehow present in all three counties it extends into. Again, that kind of genuine category-inspired confusion is not something I've ever actually encountered, and even if someone were confused it would be easily cleared up by a) reading the articles or b) asking a question ("Hey, does Foo High School extend into three counties just because its parent city does?" "No." "Okay, thanks.") postdlf (talk) 16:22, 29 June 2014 (UTC)

WP:OCAT does not apply, none of the criteria is satisfied. And specifically how would navigation be hindered? Most of all, why should WP:V be overridden?
"multiple counties" That is again a case of miscategorization. Any such city is never a proper subtopic of any of the county categories, so it simply doesn't belong under any of them. This is different from mono-county cities, which can safely be put under their (only) county category. Just move the county categories deeper, and the problem is solved. Let me stress that: Putting county categories above a multi-county city's category is a mistake, it violates WP:CAT. Paradoctor (talk) 22:13, 29 June 2014 (UTC)
A city that extends into multiple counties poses the same categorization issue as a country that extends into multiple continents, which is why for example Category:Russia is in both Category:Countries in Europe and Category:Countries in Asia. Are you claiming that it should be in neither, thus leaving those relationships uncategorized as if Russia was not a country in any continent? Please explain. postdlf (talk) 22:41, 29 June 2014 (UTC)
Thanks for pointing this out. Russia is not in any continent, at least not in the sense of "entire territory is a part of". That leads to St. Petersburg, a European city, being in category:Asia. Actually, there is a more direct path to an even clearer version of the same mistake: category:Cities and towns in Russia puts St. Petersburg among the category:Cities in Asia, and Vladivostok (which lies east of China) is suddenly a European city. That's about as wrong as you can get.
It appears that this kind of faulty categorization has been going on for a while. Spiffy. Paradoctor (talk) 01:00, 30 June 2014 (UTC)
Paradoctor, I don't recall your name participating in previous discussions at WP:CFD, nor at discussions around this guideline. that's fine, everyone is welcome, and you are asking good and provocative questions. However, suggesting that we've all been "miscategorizing" all these years and that you have simple solutions is a bit daft. Russia being in Europe and Asia is a classic example, there's no reasonable way to solve this using our current category system unless we accept this inconsistency. I've spent a great deal of time working in categories, and I've come to accept that there are, quite simply, "exceptions" to the general principle of sub cat. I don't think there are great ways around this given the complexity of our universe, so it's better to settle for "pretty much subsets" which is what we have now.--Obi-Wan Kenobi (talk) 01:13, 30 June 2014 (UTC)
"daft" Can we please keep to discussing the subject at hand?
"simple solutions" a) Where did say I had a simple solution? b) Believe it or not, simple solutions get overlooked all the time. c) If a solution works, why would you complain about its simplicity?
"there's no reasonable way to solve this using our current category system" That would be an issue if WP:CAT was actually correctly implemented. I contend that if it was, the problem should vanish. For more, please see below. Paradoctor (talk) 19:06, 1 July 2014 (UTC)
+1 on what Obiwankenobi said above. I agree that in many cases, we have to simply settle for a system where things work "pretty much", but not always precisely. From my view, accepting this is better than the alternatives. Good Ol’factory (talk) 01:33, 30 June 2014 (UTC)
-1 on your +1 Paradoctor (talk) 19:06, 1 July 2014 (UTC)
Yes, I think anyone who reads this thread understands your position. This was my first comment in the thread, and I'm entitled to make a comment, and I'm not sure it's too helpful to reply to my comment with a statement that amounts to nothing more than a statement that you disagree, since that was already clear in your previous comments. Good Ol’factory (talk) 21:40, 1 July 2014 (UTC)
I think at this point we should just direct Paradoctor to go back and reread our first replies to his original post and call it a day, as our comments from the start were directed at exactly this type of rigid and impractical interpretation of our clumsy though useful category system, an interpretation that is in any event counterfactual in the specifics (Russia is, in fact, a country in both continents) and disregarding of both the plain language and clear consensus-supported intent of category names ("in" ≠ "exclusively in"). The fact that Category:Russia has been categorized as both a European and an Asian country since the category's creation ten years ago should tell you something about what is considered "miscategorization" here. So good day, sir or madam, as the case may be, and I hope you may learn something from what everyone else has said here once you've had time to reflect upon it. postdlf (talk) 01:40, 30 June 2014 (UTC)
"rigid and impractical" Considering the information at Continent#Number of continents, I wonder why a) Eurasia gets so few mentions, b) whether the insistence to categorize Russia under Europe and Asia instead is not called "rigid and impractical".
"in" I disagree.
"The fact" It's not exactly a revelation that some mistakes take longer than others to die. We have a long, venerable tradition of releasing greenhouse gasses into the atmosphere. Not a reason not to stop.
"sir or madam" You may address me as "Sir" or "Your Lordship", I'm not picky. Paradoctor (talk) 19:06, 1 July 2014 (UTC)

To be continued[edit]

As this discussion has branched into issues not directly about the guideline itself, I will see if I can't get some clarity about the facts involved before we continue here. Anyone interested is invited to head over to

I will extend the list as the story unfolds. Paradoctor (talk) 19:06, 1 July 2014 (UTC)

We have clarity; everyone disagreed with you. That you still consider yourself correct and everyone else wrong is really uninteresting at this point. I started a discussion about your Russia recategorization at Wikipedia talk:WikiProject Russia#Recategorization of Category:Cities and towns in Russia, as that's really outside the scope of this page outside of it being a mere example in the discussion here. I strongly suggest you wait to see the outcome of this before you make any other such changes. postdlf (talk) 19:43, 1 July 2014 (UTC)
Having acquired some additional detachment, I see that the situation might benefit from additional information pertaining to postdlf's inquiry about my intentions (not "intent", which has connotations I don't particularly like).
From the discussion so far, a few things became clear to me.
  1. There is a serious discrepancy between WP:CAT and its implementation. This means that at least one of them needs to be changed to reflect the other.
  2. If I'm right, the problem has been here for a long time, and very probably exists not just in the Northern Irish, Russian and American categories.
  3. If I'm right, a lot of people will have to be convinced to change their categorization habits.
  4. While I think the potential solutions to the issues at hand would be rather straightforward, and judging from the reactions so far, either I'm smarter or wronger than I thought.
  5. It is possible that, even if I'm right, there could be enough resistance to prevent fixing the problem.
Since I am not a a masochist, I won't pursue this matter if the latter should turn out to be true. For this reason, I have started with a clear-cut case. I'll decide how to go on depending on the outcome of that one. Paradoctor (talk) 10:39, 2 July 2014 (UTC)

Wikipedia:Categorization#Template categorization[edit]

I'd like to challenge that. Those clicking the category link who are not editing would not be inconvenienced. Those clicking the category link who are improving the encyclopedia would find this imporant to be there. Anna Frodesiak (talk) 06:01, 4 July 2014 (UTC)

If you are suggesting that templates (and possibly, by extension, other bits of wp "plumbing") be placed under Category:Articles then I oppose. There are other ways for an editor to find a template - e.g. from an article, from a template category or from a (talk page) wikiproject category. DexDor (talk)
Hmmmmm, "other ways...from an article", that's a good point, and "plumbing", good point again -- why clog the cat with non-article hairballs. Anna Frodesiak (talk) 06:43, 4 July 2014 (UTC)
There's also a couple of other reasons: If templates were allowed under Category:Articles then it'd be less easy to spot templates that aren't categorized (correctly) under Category:Wikipedia templates and we'd get more categories appearing at CFD that contain just one article (usually an eponymous bio) and a corresponding template. DexDor (talk) 19:16, 5 July 2014 (UTC)

Why aren't attributes used instead of categories?[edit]

What is the rationale for using the current category system over using an attribute/tag system? For example, if we look at Jim Koch (who brought the beer Sam Adams to market) and Adolph Coors, we might say they are both male, and brewers. Thus we could create a "Male Brewers" category. But why not just tag both article's with "male" and "brewer" and then readers can see all the tags and then create their own category by selecting the desired attributes.Two kinds of pork (talk) 18:02, 5 July 2014 (UTC)

I'd like to also see such a system implemented, as I think it would accomplish a lot that the category system cannot but is often used for anyway (such as intersecting a multitude of different facts). Hopefully a tag system would also handle synonyms properly, so we wouldn't have endlessly picky renaming discussions over prepositions or capitalization (though disambiguation would continue to prove a problem...). But a tag system wouldn't provide the network of relationships that the category structure does, which aid navigation and browsing. So I don't think such a system would properly replace categories as we know them, but instead be complementary to it. postdlf (talk) 18:41, 5 July 2014 (UTC)
I agree that tagging bio articles as male/female (and a few other things like LGBT) would be useful; there are continual discussions at WP:CFD over categories for "Female fooers" etc. However, IMO, it would be better to extend the existing category system (i.e. create "Category:Male people" etc and improve category intersection) than to create a totally separate system. DexDor (talk) 19:24, 5 July 2014 (UTC)
Can you elaborate on this "network of relationships"? I'm not sure I understand how it is being used.Two kinds of pork (talk) 20:45, 5 July 2014 (UTC)
This has been an outstanding feature request since 2006 at Wikipedia:Category intersection. By network of relationships, I presume Postdlf is referring to following a category tree down various levels. There's nothing to say this couldn't be maintained as a feature with category intersection. SFB 09:30, 6 July 2014 (UTC)
By "network of relationships", I'm not only thinking of vertical navigation, but horizontal, like hopping from "American brewers" to "German brewers", or just grouping all related content together, like brewers being closely categorized with breweries, with beer, etc., so you can easily click from one to the other like following a web. I think the OP is proposing something different than category intersection. A "brewer" tag would just show you all the articles tagged with "brewer" (just like looking at the contents of one category), without a way to get to the breweries (as if that "brewer" category were uncategorized and thus had no connections). Unless the tags themselves are somehow integrated into a network...but then we'd just have the same thing as our current category system, wouldn't we? postdlf (talk) 14:50, 6 July 2014 (UTC)
@Postdlf: Not really, as a database call would be populating all related parents, contents and children, rather than a purely static system as we have now (although I appreciate that may be hard to conceptualise). Contents could be populated by locating all articles containing the selected attributes, children could be populated by cross-referencing shared additional attributes within that article set, parents would be populated reductions of the given attributes. Parents and children would also used a user-defined semantic tree that links together related/descending ideas (e.g. Breweries and brewers). The last part would be only part that would function as the system does now.
I think technical issues remain for this to be implemented as the sheer size of database calls on large categories would be prohibitive. SFB 20:06, 6 July 2014 (UTC)

Proposed change[edit]

I'd like to propose a change to align this guideline with actual practice in categorization wrt 'defining'. This would be added to the section on defining after the italian painter example.

Proposed addition:

A category embodies one or more defining characteristics. In the case of intersection categories (e.g. Category:Indian women journalists or Category:Defunct prisons in Paris), there is no requirement that sources commonly discuss the subject in terms of the full intersection of characteristics - it is sufficient that sources can be found to verify each of the defining characteristics independently (e.g. Indian and woman and journalist or prison and in Paris and defunct)
One exception to this rule are categories that intersect religious or social/political stances with occupations - generally, we should not classify people based on the intersection of views + occupation unless that particular intersection is defining for their work. For example, a writer who was raised as a Roman catholic would not necessarily be placed in Category:Roman Catholic writers unless sources demonstrate that the intersection itself is defining for that person; similarly, a person who simply self-identifies as a feminist and who also happens to be an artist should not be added to Category:Feminist artists unless that intersection can be shown to be defining.

Thoughts? --Obi-Wan Kenobi (talk) 15:37, 13 July 2014 (UTC)

I suggest that we strongly deprecate "multi-part categories" where the individual categories involved have no obvious direct linkage at all. This would be substantially broader than your religious exclusion by a great deal -
Categories should present sets of articles which are closely related to each element of such categories in a clearly related manner. In many cases, categories will thus be limited to a single well-defined term, allowing readers to look for articles fitting multiple categories rather than looking for a single category with multiple terms. If the article does not clearly relate the multiple terms to the subject of the article, the category ought not be used.
I think this fairly represents my opinion about the excessive compartmentalization found in too many current categories ("Nineteenth century women English writers specializing in anthropology" would be an excessively detailed category when the separate single term categories can be easily searched for.) Collect (talk) 16:03, 13 July 2014 (UTC)
I agree that excessive intersects should be deleted, but a number of such categories have been kept by broad consensus - American women novelists being a famous example. Given that, do we really want to limit membership in that category to those who are commonly introduced as 'American women novelists' and remove all who are not introduced in that way? Some people have interpreted defining in this way. I think defining should apply to individual characteristics, but if the intersection of LGBT + politician is considered to be a separate subject of study, then all people who are both known politicians and known to be LGBT should be in that category - the same applies for ethnicity and gender. --Obi-Wan Kenobi (talk) 17:07, 13 July 2014 (UTC)
Many of the categories are grandfathered in a sense - but that does not mean we ought not deprecate further such categories, IMO. Anyone wishing to find LGBT politicians should reasonably be expected to be able to search for "+LGBT +politician" IMO. Right now Wikipedia has a nearly astronomical number of categories and subcategories which, AFAICT, are exceedingly rarely searched for, and only clicked on "because they exist for an article", and not as a result of seeking the category out per se. Collect (talk) 18:02, 13 July 2014 (UTC) .
I'm a believer in incremental improvement. Category intersection has been discussed for 8 years and I've even done some work on it with Magnus Manske - see my user page for detail. But I think it's off topic here - pending that, which may still be a long way away, how should such intersections be populated, if it is agreed the category should be kept. We have many thousands of these categories, and while some are put up for deletion many survive. For those that do survive, should we fill them up or only fill them when the intersection itself can be proven defining for each individual bio. I note that if we do that we would remove hundreds or thousands of biographies for cats they're currently in since we wouldn't be able to demonstrate 'defining' for most individual biographies - for example the LGBT cats would be emptied of ~80% of their contents if we required that LGBT + job be defining for every single person in those categories.--Obi-Wan Kenobi (talk) 18:13, 13 July 2014 (UTC)
The problem with limiting inclusion to intersection categories to only those "known for" the intersection is that it then makes the categorization difficult to predict and further fragments the organization of articles. Every article that fits in a category according to its plain meaning should go in it. Otherwise, we are excluding articles on the basis of subjective judgments as to the intersection's importance to that subject as an intersection. Some LGBT politicans would be in a "LGBT politicians" category, others not, and there would be no clear reason for the separation apparent from the category itself. I had long ago opposed demographic (ethnicity, religion, sex) categories entirely, particularly when those traits are intersected with occupations, in favor of limiting them to lists that can be annotated to show the importance for that individual of being a Korean-American wrestler or whatever. But I lost that argument about a decade ago and consensus has repeatedly supported keeping at least some such categories, at least where the intersection could be shown to be notable or significant as a whole even if it wasn't for every literally every qualifying individual. postdlf (talk) 18:20, 13 July 2014 (UTC)
The world has changed in ten years - as has Wikipedia. We no longer proudly import EB articles from 1911 either <g>. Ten years ago, "bilateral relations of country a and country b" would likely have been kept. Now we are more discriminating. Collect (talk) 20:54, 13 July 2014 (UTC)
Then get some changed results at CFD to show it. I don't think you'll succeed. postdlf (talk) 22:53, 13 July 2014 (UTC)

Thanks for broadening the wording. Might

Categories use 'defining characteristics'. Complex categories require that each characteristic be sourced for members of the category. A wooden prison in France is easily in a category of "French prisons made of wood". In biographies, a mix of religious, social or political stances with occupations should be avoided unless sourced as important to that person. A person who is English, a Labour supporter, Methodist and a writer is not in "English Methodist writers who are Labour supporters" without a source for the combination.

be satisfactory? Reading Ease is finally up to 36 (up from 22) which is not great, but this sort of stuff generally is tricky to simplify too much (down almost a hundred words). Collect (talk) 15:21, 14 July 2014 (UTC)

I think you're still confusing the standard for creating and keeping a category with the standard for including an article in a category. We would not keep a conglomerate category like "English Methodist writers who are Labour supporters" unless sources show that this specific intersection is generally relevant or significant, or that it's of a kind of intersection that has been shown to be meaningful for the subject (e.g., if sources support "Scottish Methodist writers who are Labour supporters" and "Welsh Methodist writers who are Labour supporters", then "English Methodist writers who are Labour supporters" makes sense). We would then not second-guess whether a particular article belongs in that category if the individual components of the category are all verifiable for that subject. postdlf (talk) 16:00, 14 July 2014 (UTC)
Yes. I think it's a general confusion - in what ways does DEFINING refer to whether a category should exist "at all", and in what ways does it refer to when we can place contents within? For example, we regularly divide articles by geography, even if such geographical intersections can't be proven to be defining - they are used to split the size of large categories, and are almost always accepted without issue.--Obi-Wan Kenobi (talk) 16:07, 14 July 2014 (UTC)
My understanding is that it refers to whether a category should exist at all. This is how it is almost always used in practice, and it is the only practical way to interpret it. A U.S. President should be in the category for lawyers if they ever practiced law or in the category for soldiers if they served in the army, even though they are "defined" as a U.S. President and likely would not have achieved notability because of that earlier career. And I see that, for example, Barack Obama, a FA, is in the appropriate lawyer categories. postdlf (talk) 16:27, 14 July 2014 (UTC)
But an actor who worked as a waiter would not be in Category:Restaurant staff. So we do tend to apply a quasi-version of the DEFINING test to occupations; in the same way, not every writer who once wrote a poem is a poet. My suggested change above, however, is different - I'm suggesting "If the person is in Category:American poets, and they come publically as gay, they can be put in Category:LGBT poets."--Obi-Wan Kenobi (talk) 16:38, 14 July 2014 (UTC)

Anent the hypothetical example given - we do not have 'Category:Presbyterian US Presidents' which is what is dealt with here. Last I checked, "lawyer" is neither a religion, nor social or political stance. Anent the "but absurd categories already exist" is pure WP:OTHERSTUFFEXISTS and does not affect this suggested wording. Cheers. Collect (talk) 17:01, 14 July 2014 (UTC)

(mainly a reply to Obiwan) I think there's kind of a general issue of "how much do you need to do Foo before you are a Fooer?" That's far short of reaching "defining", however. For some occupations or positions, you either are or you aren't that thing for however long you're doing it (you either are or aren't admitted to the bar and have clients, you either are or you aren't sworn in as a congressman even if you die in office the next day). For other occupations, their main activity coincides with what are merely hobbies for many people (such as painting, basketball playing, etc.), or there are different levels to the job such that some people do it as a purely part-time gig during college while for others it's a career (such as restaurant work). The latter are trickier. Though I've seen even "lawyer" given as an example for the "he's not defined as that, so don't categorize him as that" way of thinking. postdlf (talk) 17:35, 14 July 2014 (UTC)
Yeah, but there are certainly business people who have law degrees, may have practiced law for a year, but we don't call them lawyers. So there is a threshhold for occupations, and WP:DEFINING is as close as we have for now, but it's a bit stringent and in practice we're a bit more flexible than DEFINING itself.--Obi-Wan Kenobi (talk) 17:45, 14 July 2014 (UTC)
I think we're falling into the defining/notability trap yet again. An actor's part-time job as a waiter has little to no bearing on what defines them (their acting work). Obama's work as a lawyer has clearly defined both his life and career path. The reality on the ground is that intersections of all attributes are fine with the exceptions of those noted at Wikipedia:Categorization/Ethnicity, gender, religion and sexuality. For this proposal I do think we need to divide up the ideas of (a) when a category should exist, and (b) when an article should have a category. Defining the first element clearly helps with defining the second. I don't think the above proposals have captured current practice well. Personally, I would think it a good idea if all EGRS categories were non-diffusing of the parent category without the EGRS attribute, but I know that is not current practice (yet?). SFB 17:49, 14 July 2014 (UTC)
Yes, all EGRS categories are supposed to be non-diffusing of the parent without said attribute. I agree on Obama. It sounds like you're suggesting refining the whole DEFINING guideline? My changes above are just to say, given DEFINING, do we need each intersection to be DEFINING, or is it sufficient to have each attribute be defining. Thus, if Obama is going to be in the lawyer category, can we put him in Category:African-American lawyers, or do we need to do a separate search of sources to confirm that?--Obi-Wan Kenobi (talk) 18:49, 14 July 2014 (UTC)
Obiwan: it's not a law degree that makes you a lawyer, it's being admitted to the bar. And we would require that they actually represent clients at some point, i.e., practiced law (even if that client is the government or in house counsel to a corporation). Beyond that, it's not up to us to decide someone wasn't "really" a lawyer because they didn't do it for a long enough time. That's not a hair the categories should be splitting. At least not with professional categories, which because of the education and licensing involved if nothing else, shouldn't be treated like food service categories even if it proves only a temporary career over the course of a lifetime. postdlf (talk) 18:11, 14 July 2014 (UTC)
I see your point. But Jerry_Springer for example practiced law, but isn't categorized as a lawyer. I think it is reasonable to assume there will be people who have law degrees and who practiced law but who would not ever be categorized as "lawyers". Anyway, it's getting a bit off-topic from the original proposal, which was more around the intersections, not the jobs themselves.--Obi-Wan Kenobi (talk) 18:53, 14 July 2014 (UTC)
IMO the most practical (and useful) way to decide whether to categorize someone by jobs other than what they are best known for is to consider whether that job would/could make the person pass WP:NOTABILITY. E.g. Ronald Reagan is notable as an actor (and hence belongs in both actor and politician categories), but Clint Eastwood is not notable as a grocery clerk. DexDor (talk) 19:35, 14 July 2014 (UTC)
(edit conflict) "Is" does not imply "ought". And people who have worn a lot of hats will inevitably get a lot of categories.

On the intersections, we should look to sources covering the intersection (or that type of intersection) as an intersection before we create a category. That's how we guarantee that the category will be meaningful for at least the balance of applicable articles. Then once the category is created, then we simply need to verify each individual component of the intersection for each article subject. postdlf (talk) 19:38, 14 July 2014 (UTC)

DexDor, that's not the practice nor is that a workable standard. People are notable because sources write about them, not because they necessarily accomplish certain things. And splitting categories based on "why" people merit articles would just make categories unpredictable and incomplete. Someone who spent twenty years as a low profile lawyer, never making it into the news, but then got elected to Congress, would have the lawyer category omitted despite that being a significant part of their biography because it wasn't "why" they are notable. And no one is notable for being born in a certain year, for being alumni of a certain school, or being from a certain place; however, like occupation, these are standard categories for articles about people. postdlf (talk) 20:26, 14 July 2014 (UTC)
I don't think lawyers need a special pass here. An actor who worked as a singing waiter for 15 years before his big break would still not be categorized as Category:Restaurant staff in most cases. Occupation is the quintessential case for WP:DEFINING, though it must be admitted the standard is lesser than what WP:DEFINING currently says - but it's not "If you had a job, you get that category".--Obi-Wan Kenobi (talk) 21:18, 14 July 2014 (UTC)
From WP:DEFINING: "Definingness is the test that is used to determine if a category should be created for a particular attribute of a topic." It's a standard hashed out from a series of CFDs, not from category inclusion discussions on article talk pages. "In disputed cases, the categories for discussion process may be used to determine whether a particular characteristic is defining or not." Again, it's for deciding whether to create or delete entire categories. There's simply no basis for using it beyond that, as you are trying to do, to limit category inclusion beyond the plain meaning and definition of the category itself.[1] If someone who went to law school, passed the bar, and worked at a law firm for several years is not a "lawyer", then we have a fundamental disconnect here. There is no "People most known for being a[n]..." implied at the beginning of all occupation categories. It seems like you're instead trying to use a rule for resolving harder cases to decide the truly easier ones, and it's especially perplexing because as my earlier comment should show you have my basic agreement that there are some harder occupations that shouldn't be taken as literally. So we should focus on those rather than waste time by making every occupation category contentious. postdlf (talk) 21:39, 14 July 2014 (UTC)
I have always taken occupation categories to be based on the DEFINING standard, regardless of what is written this is how it is used. This is what DEFINING says: "One of the central goals of the categorization system is to categorize articles by their defining characteristics." - so defining is not just about existence of categories, it also (sometimes) applies to contents. Now, I take issue with the current language, since I think there are exceptions to this, and we certainly go BEYOND only categorizing things by their defining characteristics. However, to limit category clutter, I think we should limit which verifiable occupations/jobs we actually categorize by. If someone did a job for 10 years that is NEVER mentioned in any RS about them, save their own autobiography of what happened before they became famous, and if that job is not listed in the lede of their article, that is the very definition of not-defining in my book.--Obi-Wan Kenobi (talk) 21:53, 14 July 2014 (UTC)
Re "that's not the practice" response to my comment above. For occupation (I'm not talking about date-of-birth/death etc) categorizing by whether the person has achieved notability in that field is the current "rule" (see for example the text at Category:People by occupation which links to WP:Notability) and is generally followed in practice. Thus, Category:Actors is for anyone who is a notable actor (i.e. sources write about them as an actor) whether or not they are also notable for something else. Category:Flight attendants is for articles about people who achieved notability as a flight attendent, not for ex (non-notable) flight attendents who later achieved notability as model/actor/politician etc (of which there are probably hundreds). P.S. I agree that "most known for" is not the way to use these categories - e.g. if someone is notable as a sportsperson they shouldn't have that category removed from them by later becoming better known as a TV presenter. DexDor (talk) 22:06, 14 July 2014 (UTC)
Obiwan: What you take it to be and what it is are not necessarily the same thing, particularly given that the sentence you quote is stated as the purpose behind subjecting category creation and retention to DEFINING, as is clear from what I quoted (not that wikilawyering this is at all constructive). And whether something is listed in the lede at any given time certainly doesn't determine what that article can or should be categorized by. If an article doesn't even mention a fact, then it should simply not be categorized by that fact. If a fact proves unverifiable for a particular subject, then the article also, obviously, should not be categorized by that fact. I think you agree with those simple statements. But an article should be categorized by any applicable and verifiable fact for which a category exists, with applicability determined by the plain meaning of the category name and the stated category description (to the extent that may attempt to add qualifiers to the category name). You seem to acknowledge that you are pushing for a narrower standard than that, rather than claiming that your preference is the standard and somehow supported by consensus. The additional criteria you are trying to impose on a wide swath of categories is and never has been supported by the names of the categories, their category description pages, or editor practice for those categories, let alone clear guideline language. Again, you're really shooting yourself in the foot over this if your real concern is the difficult categories, which can be dealt with in part through appropriate category descriptions. Otherwise you're just going to generate across the board opposition.

DexDor: Given that the container category Category:People by occupation lists "hobbies" as one of the things to categorize people by in its subcategories, I don't think you want to give too much weight to its description page, whatever the origins of that language. It certainly doesn't somehow dictate inclusion criteria for every subcategory. And as notability is a standard for including articles on Wikipedia, not for including information in articles on Wikipedia, it's not a meaningful modifier there. postdlf (talk) 22:28, 14 July 2014 (UTC)

We are a tad afield - no one is proposing that simple categories be eliminated at all. The issue is whether sexual orientation, political, social or religious attributes should be used in categorizing people where the combination of such attributes is not noted in reliable sources. At WP:BLP the discussion has already resulted in a requirement that ethnic (national), gender identity, and religious orientations must be based on self-identification and the extension here is to political beliefs as that is close in nature to the others with regard to self-identification. We ought not describe a person as a "Gnarphist" (for example) sans strong sourcing, and the best sourcing for beliefs in general is self-identification. Is there a problem in that part? Collect (talk) 22:40, 14 July 2014 (UTC)

actually there are several issues at play here.
  1. what is the test to determine whether a given, verifiable occupation category can be added to a bio. There seems some debate on this matter. On the one hand, we agree that not all actors who once worked as waiters should be in the waiters cat. On the other hand, we don't want to only categorize by the person's most famous job. The answe lies in between.
  2. given that a person is properly categorized as occupation X and happens to be African American or gay or female, and African-American X exists as a category, can we stick them in that intersection category without further research or evidence? Note: this would also apply to things like defunct prison in Paris, so it's not only about bios.
  3. Collect, you are questioning the very existence of such categories - in that case the proper forum is WP:EGRS to tighten criteria for creation, or CFD to establish that many more of these should be deleted.
Would y'all mind if we focused on #2 which is the question I started with? Note just to be clear, my proposed language which clearly needs work refers to categorization of articles in categories that already exist, not whether a given category should/could be created.Obi-Wan Kenobi (talk) 23:07, 14 July 2014 (UTC)
To focus on #2, the location of a physical thing is always meaningful, so I wouldn't class "X in Fooland" as the same kind of intersection category as the biographical intersections, which basically pose issues unique to that context. postdlf (talk) 23:47, 14 July 2014 (UTC)
there are intersections other than biographical which have these complexities. For example Category:Asian-American literature - should this include all authors who are asian American or only those for whom their literature is DEFINED as asian-American literature? I'm sure there are other examples in our massive tree beyond biographies and geographical intersections - the question is now much if the intersection itself must be discussed re: the subject in order for the subject to be so-classified?--Obi-Wan Kenobi (talk) 11:05, 16 July 2014 (UTC)
But not all "intersections" are the same. Most intersected facts retain a literal meaning. Category:Bridges on the National Register of Historic Places in New York City includes articles on bridges that are on the National Register of Historic Places in New York City. Period. The meaning or scope of the individual terms does not change just because they've been joined in a phrase.

It is only sometimes that combined facts represent a term of art that is narrower than the literal meaning. This is then dealt with by an appropriate category description. If "Asian-American literature" is a term of art describing a recognized and notable genre or body of work rather than simply any literature that happens to have been written by Asian-Americans, then the category description should define what is meant by it and link to the corresponding parent article. Just like Category:Modern art is not simply for "art that is modern", but rather refers to a particular period of art history. This is made clear by the category description page, which gives a definition and links to the parent article, Modern art. I don't know that we'd ever have a category based on a term of art that was not notable such that there wasn't a parent article.

Because it's the term of art exceptions that need special, individualized treatment, there is no possible abstract rule that would appropriately determine inclusion in all such "intersection" categories nor should there be. Unless it's just the default rule: "Category names are not presumed to represent terms of art unless specified as such. Otherwise, category names should be taken by their literal and plain meaning and an article should be included so long as the fact or facts the category represents are verifiable, whether together or independently." But that's general practice now, and I'm not aware of any mass confusion on that point such that further clarification is necessary. postdlf (talk) 16:16, 16 July 2014 (UTC)

  • Oppose — Obi's proposition would add unneccesary complexity to the categorization guideline. It would also implicitly change the guidance of WP:DEFINING:
    • Not possible to change one guideline by discussion on the talk page of another guideline without proper notification. See also WT:OVERCAT#RfC on WP:DEFINING categorization guideline which shows a no consensus on such changes thus far.
    • Would make the guidance of WP:DEFINING unnecessarily complex
    • Obi's proposal uses the terminology defining in the first sentence, without properly referencing to the existing guideline on the topic, the result would be two *contradicting* guidelines on "defining" for categories/categorization which may confuse editors, or at least result in incoherent editing depending on which of the two guidelines a non-suspecting editor encounters first. --Francis Schonken (talk) 06:51, 16 July 2014 (UTC)
Francis, this addition would be placed here, on this page, on this guideline, which already defines 'defining'. As far as I know defining is covered in two spots - here and in the overcategorization guideline. As this is the main policy page, we should fix it here first.--Obi-Wan Kenobi (talk) 10:41, 16 July 2014 (UTC)
@Obiwankenobi - you really don't get the "proper notification" thing do you? --Francis Schonken (talk) 10:48, 16 July 2014 (UTC)
Re. "this guideline ... already defines 'defining'" — indeed. I created the WP:CATDEF shortcut using an anchor that was already present at the spot, and added a shortcut template there.
That being done, I see even less reason to change the content of the guideline(s) w.r.t. defining. --Francis Schonken (talk) 05:19, 23 July 2014 (UTC)