Wikipedia talk:Categorization: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Categories are not articles: Inserted command into my proposed addition - no intended change to meaning
Line 97: Line 97:
I propose that [[WP:CAT]] should include the following subsection, under [[WP:CAT#Creating category pages|2. Creating category pages]]:
I propose that [[WP:CAT]] should include the following subsection, under [[WP:CAT#Creating category pages|2. Creating category pages]]:
{{quotation|2.1 '''Content'''
{{quotation|2.1 '''Content'''
Category pages are not article pages, so in general should not include text describing the ''subject '' of the category, except where required to help define the ''contents'' of the category as described in [[WP:CAT#Creating category pages|Creating category pages]]. Instead hatnotes such as {{tl|Cat main}} should direct the reader to the relevant article.
Category pages are not article pages, so in general should not include text describing the ''subject '' of the category, except where required to help define the ''contents'' of the category as described in [[WP:CAT#Creating category pages|Creating category pages]]. Instead, hatnotes such as {{tl|Cat main}} should direct the reader to the relevant article.
}}
}}
A few current examples that I think need cleaning up:
A few current examples that I think need cleaning up:

Revision as of 12:11, 24 August 2016

WikiProject iconManual of Style
WikiProject iconThis page falls within the scope of the Wikipedia:Manual of Style, a collaborative effort focused on enhancing clarity, consistency, and cohesiveness across the Manual of Style (MoS) guidelines by addressing inconsistencies, refining language, and integrating guidance effectively.
Note icon
This page falls under the contentious topics procedure and is given additional attention, as it closely associated to the English Wikipedia Manual of Style, and the article titles policy. Both areas are known to be subjects of debate.
Contributors are urged to review the awareness criteria carefully and exercise caution when editing.
Note icon
For information on Wikipedia's approach to the establishment of new policies and guidelines, refer to WP:PROPOSAL. Additionally, guidance on how to contribute to the development and revision of Wikipedia policies of Wikipedia's policy and guideline documents is available, offering valuable insights and recommendations.
WikiProject iconCategories
WikiProject iconThis page is within the scope of WikiProject Categories, a collaborative effort to improve the coverage of categories on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.

OK to switch English Wikipedia's category collation to uca-default?

In the 2015 Community Wishlist Survey, the 5th most popular proposal was numerical sorting in categories (for example, sort 99 before 100). The WMF Community Tech team is ready to implement this, but a pre-requisite for the change is that we must switch English Wikipedia's category collation from "uppercase" (a simple collation algorithm that sorts strings based on character values, but considers uppercase and lowercase letters the same) to "uca-default" (which is based on the Unicode Collation Algorithm (UCA), the official standard for how to sort Unicode characters). The most noticeable difference is that UCA groups characters with diacritics with the their non-diacritic versions. So, for example, English Wikipedia currently sorts Aztec, Ärsenik, Zoo, Aardvark as "Aardvark, Aztec, Zoo, Ärsenik", but UCA collation would sort them as "Aardvark, Ärsenik, Aztec, Zoo" (with Aardvark, Ärsenik, and Aztec grouped under a single "A" heading, instead of under 2 separate headings). There are numerous other advantages to using UCA collation, but they are a bit technical to discuss, so I'll refer you to the documentation instead: [1][2][3]. If you would like to experiment with UCA collation, go to https://ssl.icu-project.org/icu-bin/collation.html. Set the collation to "und (type=standard)" (the default) and turn on numeric sorting in the settings. If anyone has any concerns or questions about switching to UCA, please reply here or in the Phabricator ticket. Thanks! Ryan Kaldari (WMF) (talk) 00:24, 25 May 2016 (UTC)[reply]

  • Support as proposed above. — xaosflux Talk 00:59, 25 May 2016 (UTC)[reply]
  • Support. Can't wait for numeric sorting to be implemented. — JJMC89(T·C) 03:18, 25 May 2016 (UTC)[reply]
  • Proper collation of diacritics is hardly a disadvantage. —Cryptic 05:18, 25 May 2016 (UTC)[reply]
Sure, by and large, but any coding change can have unexpected consequences: if certain ‘hacks’ have been premised on an observed behaviour of a system despite its being a bug, when it gets fixed those features will break.—Odysseus1479 17:58, 25 May 2016 (UTC)[reply]
  • Support, with seconding Cryptic's comment above Goldenshimmer (talk) 05:41, 25 May 2016 (UTC)[reply]
  • Comment Currently, all articles are being sorted just as the Unicode Collation Algorithm would do via the DEFAULTSORT parameter. So, diacritics/accent marks aren't currently an issue in articles. With UCI, less DEFAULTSORTs will be needed in non-biography articles in the future. However, this will alter most current non-biography talk pages as |listas= is not set in those, therefore the uppercase algorithm currently applies on those. If I remember correctly, UCI handles every variant of dash/hyphen, single quote marks and few others as separate charachters, so DEFAULTSORT will still need to be set for those cases. Depending on what "switch" is set in the UCI algorithm, de Gaule, De Gaule, de-Gaule and De-Gaule will be sorted in different orders. Other wikis have already changed to UCI. French is one of them. Bgwhite (talk) 06:22, 25 May 2016 (UTC)[reply]
  • Administritive note - Discussion moved from WP:VPT, since this is a better place to discuss category issues. עוד מישהו Od Mishehu 13:25, 25 May 2016 (UTC)[reply]
    I agree with the venue change Od Mishehu. But someone should probably notify WP:BOT/WP:BAG, WP:AWB, WP:TWINKLE, WP:HOTCAT, etc., to make sure that this won't affect/break those tools. - jc37 17:32, 8 June 2016 (UTC)[reply]
  • Support of course!—Odysseus1479 17:58, 25 May 2016 (UTC)[reply]

Subcategories and geography-related categories

According to WP:Categorization#Subcategorization:

When making one category a subcategory of another, ensure that the members of the subcategory really can be expected (with possibly a few exceptions) to belong to the parent also.

However, geography categories don't seem to follow that rule. For example, Category:Rivers of Austria contains Category:Danube, which contains (indirectly but correctly) Aljmaš, Croatia; however, Aljmaš, Croatia doesn't belong in Category:Rivers of Austria. An other example: Category:Geography of Massachusetts contains (indirectly) Category:Boston, Massachusetts, which includes Category:People from Boston, Massachusetts - even though a person isn't part of the geography. עוד מישהו Od Mishehu 15:11, 8 June 2016 (UTC)[reply]

Category:Danube should not be in Category:Rivers of Austria etc because (as you point out) Category:Danube contains articles that are about (places in) other countries. Category:Danube would still be categorized (in Category:International rivers of Europe and a hidden category). The category would also still be reachable from the Danube article. There is no rule that Category:Foo should have all the same category tags as the Foo article - e.g. the Germany article may belong in Category:Member states of NATO, but Category:Germany (which contains articles about Germany that have nothing to do with NATO) does not.
That would still leave the article about the village in Category:Rivers, but I don't see that as being (such) a problem - Category:Rivers is for anything about the topic of rivers (e.g. things such as river surfing) and (at a bit of a stretch) an article about a riverside village is within the topic of rivers - that's certainly not as bad as putting a Croatian village in Category:Austria.
Regarding geography categories - afaik there's no clear definition of what belongs in a geography category and what belongs in its more general parent category - e.g. why is Category:Hiking trails in Massachusetts‎ categorized as geography, but Category:Landmarks in Massachusetts not? DexDor (talk) 22:29, 9 June 2016 (UTC)[reply]
Categories should generally be placed in fewer categories than the corresponding main article. I'm not sure if there is a good way to formalize this in a guideline, but we're in conceptual agreement. RevelationDirect (talk) 19:01, 12 June 2016 (UTC)[reply]

Minimal number of items in a category

What is the minimal number of items in a category? --Richard Arthur Norton (1958- ) (talk) 17:34, 26 July 2016 (UTC)[reply]

Many maintenance categories aim for zero items. for (;;) (talk) 17:47, 26 July 2016 (UTC)[reply]
If you mean content cats, the practical minimum is 1, this happens where WP:CATDIFFUSE is enforced. For instance, we should have an article for every national President, and these do not get put directly in Category:Presidents by country but in a subcategory, like Category:Presidents of Puerto Rico. --Redrose64 (talk) 17:59, 26 July 2016 (UTC)[reply]

Proposing a change to the WP:CATDEF wording

WP:CATDEF currently states that "The order in which categories are placed on a page is not governed by any single rule (for example, it does not need to be alphabetical, although partially alphabetical ordering can sometimes be helpful). Normally the most essential, significant categories appear first."

I would like to see it changed to something like "Although no single rule governs the order in which categories are placed on a page, alphanumeric order is useful for placing the categories into a coherent order. An exception is when a leading category is equal to or is closely associated with the name or subject of an article."

I am suggesting this from my exposure to cognitive psychology & user interface design (besides the classes for my Library & Information Studies MS degree, I have worked in IT for 25+ years & am also a former university reference librarian). I find that Chunking (psychology) is very useful to organizing information. The lack of any order among categories is chaotic & makes it difficult for a reader to follow. As one of the simplest forms of chunking, alphanumeric order is a basic & effective solution to this problem.

Peaceray (talk) 16:52, 5 August 2016 (UTC)[reply]

  • Support. Yes, I've been doing alphanumeric category ordering for years. Usually when I edit an article to alter the order in which categories appear, my edits are kept, though once in a while someone will revert or leave me a note saying the original (arbitrary) order is more useful. Well, the order may not seem arbitrary to one editor, but may to others. I support encouraging alphanumeric ordering. ---Another Believer (Talk) 17:59, 5 August 2016 (UTC)[reply]
  • Oppose in its current form, but I morally support the attempt at giving us more order (which is why I'm not bolding anything). Specifically, I have to oppose this particular wording because I think we should also allow for like-categories to be placed together. For instance, say I'm tagging a gridiron football defensive lineman who played both American and Canadian football. It makes perfect sense to keep Category:American football defensive linemen and Category:Canadian football defensive linemen next to each other. Similarly, all establishments/disestablishments categories should generally go together, etc. etc. This is all a bit of common sense, I think. How about a wording like "Although no single rule governs the order in which categories are placed on a page, there are a number of considerations worth taking into account. Categories which are closely associated with the name or subject of an article should generally lead the list of categories. Categories that are within the same category tree or are closely associated with each other should generally be placed together. In the absence of other meaningful schemes of ordering categories on a page, defaulting to alphanumeric ordering provides a useful coherent order." Open to copy-editing and other changes, of course. ~ Rob13Talk 20:15, 5 August 2016 (UTC)[reply]
    • H9, BU Rob13, do you have any suggestions for a different wording? Peaceray (talk) 00:16, 6 August 2016 (UTC)[reply]
      • I provided it already in my comment. ~ Rob13Talk 01:10, 6 August 2016 (UTC)[reply]
  • Oppose This was brought up very recently. --Redrose64 (talk) 23:03, 5 August 2016 (UTC)[reply]
    Hi, Redrose64, I just wanted to note that this previous discussion was about mandating as opposed to suggesting that alphanumeric order is useful. Peaceray (talk) 00:16, 6 August 2016 (UTC)[reply]
    As I noted in Wikipedia talk:WikiProject Categories#Concerning the presentation of categories a page belongs to when the list is humongous at 07:41, 10 May 2016 (UTC), a useful order for cats is basically one of descending order of relevance. Alphabetic (or alphanumeric) order could give rise to some very minor cats appearing early in the list. Consider Garsdale railway station: what is it? it's a railway station. Where is it? Cumbria. So, Category:Railway stations in Cumbria is highly relevant. What trivia is mentioned in the article that might be categorisable? There's a memorial for a dead dog. Even with grouping related cats together, placing those groups alphabetically would still place Category:Dog monuments (trivial) above Category:Railway stations in Cumbria. --Redrose64 (talk) 00:27, 6 August 2016 (UTC)[reply]
    I wish the categories at Garsdale railway station were displayed in alphabetical order. Looks arbitrary otherwise. ---Another Believer (Talk) 18:03, 6 August 2016 (UTC)[reply]
    From Category:Railway stations in Cumbria down to Category:Railway stations served by Northern (train operating company) (inclusive), these are the most important cats for an open railway station in the UK, and the listing order is conventional. From Category:Dog monuments onwards, they are somewhat less important, and the order of those four may as well be alphabetic. In short: no way should Category:Dog monuments go before Category:Former Midland Railway stations. --Redrose64 (talk) 21:09, 6 August 2016 (UTC)[reply]
  • Opppose. Alphabetical might be (slightly) preferable to random order, but it surely is not preferable to any reasonable scheme of logical order, whether that is important-stuff-first or grouping in some more complicated way.
If the main, important thing about a person is that she's a 20th century female Ukrainian Marxist writer, let's put those categories together (probably first or first after birth-death years, or even in the middle or the end, but at any rate together). Because "Ukrainian writer" and "Marxist writer" and "20th century writer" and "Female writer" are not near other in the alphabet, alphabetizing will lead to sprinkling in where she went to school or what cities she lived in or what awards she won and so forth among these categories. Doing that presents a conceptually random order.
I don't really even believe that alphabetical order is much preferable to random order, to be honest. It might be for certain kinds of searches. It might be useful if the person is searching for X to answer the question "Is this article in category X". My guess is that most readers are more interested in connecting the articles in other ways. (In other words, given some bio article, the question is more likely to be "Where is this person from?" (probably as an entre to "I want to see articles about other people who are like this person in important ways") rather than "Is this person from Aalborg or not?", although granted some non-zero number of readers will be asking that.)
But I think that alphabetical order gives a false sense of order. It helps the editor feel that article is more orderly. It doesn't really help the reader. It also helps us because its easy to do (a bot could alphabetize categories, actually) and it's not subject to dispute (assuming you've accepted alphanumberic order). But so?
If people are alphabetizing article categories, they should maybe consider not doing that.
I guess if its done with reasonable care, and since it's a lot quicker to alphabetize than to figure out a logical order, and assuming that the categories are more or less random (which I don't assume, but which is possible) it is possible (not certain IMO) that your alphabetizing is an improvement.) Herostratus (talk) 22:36, 6 August 2016 (UTC)[reply]

What are the guidelines on how to best structure "GA-class" categories?

Since well-over half of all articles contained in Category:GA-Class Animation articles seem to be about The Simpsons, and probably at least a quarter of the remaining articles are about either Family Guy or South Park, I feel that it would be beneficial to create subcategories focused on each of these shows. Would this be appropriate? I'm not familiar with the general guidelines on how to structure "GA-Class" categories. Ideally, I'd like to see all of the articles related to these three shows removed from the parent category and placed solely within their respective subcategories - that way, it would be easier to see which animation articles about other topics have attained GA status. I've skimmed through a handful of Help / Guideline pages about categories, but haven't seen anything written on the topic. Can someone point me to the relevant page, if it happens to exist, or if it doesn't, could someone let me know whether there are steps that I should take before moving stuff around (aside from simply consulting with the relevant WikiProjects)? Should I bring it to WP:Categories for Discussion or is that strictly for "renaming, merging, and deletion"? Also, since it seems that all GA articles within WP:Animation are automatically added to Category:GA-Class Animation articles, would it even be possible to remove the articles on Simpsons, Family Guy, and South Park from the category, without also removing them from the WikiProject? --Jpcase (talk) 19:55, 5 August 2016 (UTC)[reply]

@Jpcase: Talk pages are placed in Category:GA-Class Animation articles because they bear {{WikiProject Animation|class=GA}} perhaps with some other parameters. Similarly, Category:Stub-Class Animation articles contains talk pages which bear {{WikiProject Animation|class=stub}}; and Category:GA-Class The Simpsons articles contains talk pages which bear {{WikiProject The Simpsons|class=GA}}. These categories exist to indicate the intersection between a WikiProject and an article's class. It's all part of the way that WikiProject banner templates work. --Redrose64 (talk) 23:15, 5 August 2016 (UTC)[reply]
@Redrose64: Are there never instances in which these categories are partially diffused? I can't think of any instance in which an article would be added to WP:SIMPSONS, WP:SOUTHPARK, or WP:FAMILYGUY, and not also to WP:ANIMATION. They overlap entirely. So nothing would be lost if all of these articles are diffused into their respective subcategories. And as things stand right now, it's incredibly difficult to pick out those articles that aren't related to one of these three shows. I admittedly overstated things, when I suggested that three-quarters of all the articles contained in Category:GA-Class Animation articles are also contained in one of these three other subcategories (It was a random guess). But I just checked the actual numbers, and out of 711 articles in the Animation category, over three hundred are about The Simpsons and over 100 are about Family Guy (only 38 are about South Park, so that's less of a problem, but 38 still makes for a substantial subcategory). Having to search through all of this, in search of articles about animated feature films, or animated short films, or simply articles about other animated tv shows, makes the current category for GA animation articles almost more of hassle than it's worth. --Jpcase (talk) 23:57, 5 August 2016 (UTC)[reply]
@Jpcase: You can use the category intersection tool PetScan to get a list of GA animations without Simpsons, Family Guy, or South Park – try this query - Evad37 [talk] 00:23, 6 August 2016 (UTC)[reply]
Some WikiProject banners are set up to have task forces, and these may in turn be set up to place the talk pages into categories specific to that task force. But this does not take the page out of the main category group for the WikiProject. To do that would need the task force to be spun off to a separate WikiProject, with its own banner template.
As an example, {{WikiProject Animation}} puts pages in subcategories of Category:Animation articles by quality; and if |family-guy=yes is set, the pages are also put in subcategories of Category:Family Guy articles by quality, but are not taken out of the subcategories of Category:Animation articles by quality.
By contrast, {{WikiProject The Simpsons}} is the banner for a separate WikiProject, and it puts pages in subcategories of Category:The Simpsons articles by quality but not in subcategories of Category:Animation articles by quality, so any page that is in a subcategory of Category:The Simpsons articles by quality and of Category:Animation articles by quality must have both WikiProject banners.
You might argue that {{WikiProject Animation}} is redundant if {{WikiProject The Simpsons}} is present, but that's a decision for WT:WikiProject Animation and they may well say that The Simpsons does fall within their purview. It's a well established convention that each WikiProject reserves the right to set its own boundaries, even where they overlap significantly with those of another. --Redrose64 (talk) 08:46, 6 August 2016 (UTC)[reply]
@Redrose64:@Evad37: Thanks for the responses. I see what you're saying Redrose, and certainly wouldn't suggest that the entire Simpsons WikiProject should be folded into WP:Animation, but still, it seems to me that the current way of structuring GA-class categories is less than efficient. The Petscan method that you mentioned, Evad, looks like a good work around, though it would still be nice to see a more straightforward way of distinguishing GA-class articles by topic. --Jpcase (talk) 14:27, 6 August 2016 (UTC)[reply]


Categories are not articles

I propose that WP:CAT should include the following subsection, under 2. Creating category pages:

2.1 Content

Category pages are not article pages, so in general should not include text describing the subject of the category, except where required to help define the contents of the category as described in Creating category pages. Instead, hatnotes such as {{Cat main}} should direct the reader to the relevant article.

A few current examples that I think need cleaning up:

And a previous example, which has now been cleaned up:

What do other editors think? Mitch Ames (talk) 08:12, 20 August 2016 (UTC)[reply]

Note that the Categorization FAQ asks as to "avoid copying large quantities of text ... from an article to a category page". If the above proposal is approved, the FAQ might need updating, eg to something like "avoid copying text from an article to a category page unless it is required to define the scope of the category". Mitch Ames (talk) 08:48, 20 August 2016 (UTC)[reply]

Misunderstandings of categories and category mainspace are so intertwined in the history of wikipedia in the last 10 years, I think this is a fairly limited approach to a more complex issue:

Some readers are likely to be quite perplexed at the subject and contents differentiation, and I believe to complicate as to which is which would create situations where arguments and potential conflicts would arise, where the creation of the distinction is in the end of no particular help in the long run.

For a project or subject area to have an editor keen on clarifying the context or background of the category I believe does not harm the main space of a category. I believe the allowance for editorial comments on the subject or contents at a main space area on a category can in many cases clarify something that is otherwise difficult to place.

Many editors place links to wikiprojects, to portals, and to other subjects, so that if someone does venture to the category mainspace, it is not as a 'blank' clean main space - but a space with clues as to the category (many biota categories have nothing, so that an unacquainted reader is incapable of discerning whether the category is about animal vegetable or mineral).

I believe, before this gets out of hand in time or space, that there should be an effort to allow clarifying text of either subject or content, to remain in category mains space in the name of helping anyone who might arrive at the space to know how to get out or go somewhere for clarification, the proposal to cleanup the space I believe is retrograde, unhelpful and equivalent to a building inspector asking to remove Exit signs in buildings. JarrahTree 08:34, 20 August 2016 (UTC)[reply]

"Some readers are likely to be quite perplexed at the subject and contents differentiation"
I suspect that the readers don't need to make the distinction, so much as the editors. Otherwise you have a reasonable point. Possibly the word "contents" should be "scope" - I chose "contents" for consistency with

the desired contents of the category should be described on the category page, ... The category description should make direct statements about the criteria by which pages should be selected for inclusion

in Creating category pages. But, as always, I'm open to suggestions for improved wording. It may be helpful if we include an example or two in the additional text. Eg:
checkY "This category lists notable Australian-born people, or people who identify themselves as Australian."
– Defines the contents of the category, ie what articles should go into this category.
☒N"The Gordon River is located in the Franklin-Gordon Wild Rivers National Park in South West Tasmania."
– Describes the subject of the category's articles (the river), not the contents/scope of the category.
Mitch Ames (talk) 09:28, 20 August 2016 (UTC)[reply]
"many biota categories have nothing, so that an unacquainted reader is incapable of discerning whether the category is about animal vegetable or mineral"
Given that "... a hierarchy of categories which readers, knowing essential—defining—characteristics of a topic, can browse ...", the reader is presumed to already know that "biota" categories would include both animal and vegetable but not mineral. Do you have a specific example that would illustrate your point? Mitch Ames (talk) 09:16, 20 August 2016 (UTC)[reply]

One specific reason for the proposed "categories are not articles" addition to the guideline is the matter of references. Generally statements about a subject (eg "The Gordon River is ... in South West Tasmania") must be verifiable, and typically this is done by including references "at or near the bottom of the article" – ie on the same page as the statement being made. However "category pages should not contain ... citations"; this implies that category pages should not make statements about the subject. Mitch Ames (talk) 12:08, 20 August 2016 (UTC)[reply]

Although this is true, I think that we needn't add instructions, per Wikipedia:Avoid instruction creep. Specifically, I think that "Instead hatnotes such as {{Cat main}} should direct the reader to the relevant article." is not a good idea. I think that some words of explanation are usually also okay. Perhaps without that line I could agree with the proposed addition. Debresser (talk) 18:19, 20 August 2016 (UTC)[reply]