Jump to content

Wikipedia talk:Category intersection: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Line 55: Line 55:
So Wikipedia should be doing two things:
So Wikipedia should be doing two things:


#It should be made absolutely clear in policy that gender- and ethnicity-based categories are applied ''in addition'' to gender- and ethnicity-neutral categories, never as a replacement. In other words, these categories are not for diffusing or thinning out categories, but to provide researches with ''additional'' options they might want.
#It should be made absolutely clear in policy that <s>gender- and ethnicity-based categories are applied ''in addition'' to gender- and ethnicity-neutral categories, never as a replacement.</s> <u>no one should ever because of their gender, ethnicity, sexuality or religion be removed from a generic, gender-, ethnicity-, sexuality- or religion-neutral category that others not belonging to their group remain in.</u> In other words, these categories are not for diffusing or thinning out categories, but to provide researches with ''additional'' options they might want.
#Long-term, we should just apply simple categories or tags like "Man/Woman", "American", "African-American", "Poet/Novelist/..." to articles and avoid categories like "African-American women poets" altogether: users who want to see a list of African-American women poets should be enabled to use the CatScan and search for articles that have the tags "Woman" + "African-American" + "Poet" applied to them.
#Long-term, we should just apply simple categories or tags like "Man/Woman", "American", "African-American", "Poet/Novelist/..." to articles and avoid categories like "African-American women poets" altogether: users who want to see a list of African-American women poets should be enabled to use the CatScan and search for articles that have the tags "Woman" + "African-American" + "Poet" applied to them.


Line 61: Line 61:


*I really like the clarity that you're presenting here, although I wonder if it's a little too simple. I'm thinking right now about the wicked hard [[Wikipedia talk:Categorization/Ethnicity, gender, religion and sexuality#Correct categorization quiz|quiz]] that Obi-Wan put together. Have you tried it, with the above in mind? I recommend it; it's fun. <p> One problem you run into is that the parent associated with a category like "Jewish-American politicians", just to pick one example, is a totally depopulated container category ("[[:Category:American politicians]]"). In that case, we have to modify your #1 above to allow that she can be in the ethnicity-based category in addition to being in any other appropriate subcategories of the gender-neutral container category, e.g., "Green Party of the United States politicians". However, what if none of the other available subcats were to apply? <p> We could have a situation where the only place we allow ourselves to put someone is in an ethnicity-based category, simply for structural reasons. How do we handle that, or is that too outlandish to be worth considering? -[[User:GTBacchus|GTBacchus]]<sup>([[User talk:GTBacchus|talk]])</sup> 02:40, 29 April 2013 (UTC)
*I really like the clarity that you're presenting here, although I wonder if it's a little too simple. I'm thinking right now about the wicked hard [[Wikipedia talk:Categorization/Ethnicity, gender, religion and sexuality#Correct categorization quiz|quiz]] that Obi-Wan put together. Have you tried it, with the above in mind? I recommend it; it's fun. <p> One problem you run into is that the parent associated with a category like "Jewish-American politicians", just to pick one example, is a totally depopulated container category ("[[:Category:American politicians]]"). In that case, we have to modify your #1 above to allow that she can be in the ethnicity-based category in addition to being in any other appropriate subcategories of the gender-neutral container category, e.g., "Green Party of the United States politicians". However, what if none of the other available subcats were to apply? <p> We could have a situation where the only place we allow ourselves to put someone is in an ethnicity-based category, simply for structural reasons. How do we handle that, or is that too outlandish to be worth considering? -[[User:GTBacchus|GTBacchus]]<sup>([[User talk:GTBacchus|talk]])</sup> 02:40, 29 April 2013 (UTC)
**I've rephrased that passage above. Will look at the quiz. [[User:Jayen466|Andreas]] <small><font color=" #FFBF00">[[User_Talk:Jayen466|JN]]</font>[[Special:Contributions/Jayen466|466]]</small> 14:41, 29 April 2013 (UTC)


:* On some wikis (don't have examples at the moment) the problem of distinguishing container categories from other categories is solved by using a "by name" subcategory. For instance, [[:Category:American politicians by name]], which would include every single American politician, by name. It's intended to be large. [[:Category:American politicians]] is then a container category which contains various subcategories, like [[:Category:American politicians by ethnicity]], [[:Category:American politicians by political party]], [[:Category:American politicians by time]], [[:Category:American politicians by gender]], etc. ... Of course the intersection proposal above would solve the need for such a thing. --[[User:Lquilter|Lquilter]] ([[User talk:Lquilter|talk]]) 13:40, 29 April 2013 (UTC)
:* On some wikis (don't have examples at the moment) the problem of distinguishing container categories from other categories is solved by using a "by name" subcategory. For instance, [[:Category:American politicians by name]], which would include every single American politician, by name. It's intended to be large. [[:Category:American politicians]] is then a container category which contains various subcategories, like [[:Category:American politicians by ethnicity]], [[:Category:American politicians by political party]], [[:Category:American politicians by time]], [[:Category:American politicians by gender]], etc. ... Of course the intersection proposal above would solve the need for such a thing. --[[User:Lquilter|Lquilter]] ([[User talk:Lquilter|talk]]) 13:40, 29 April 2013 (UTC)

Revision as of 14:41, 29 April 2013

About this proposal

This proposal was started by User:Rick Block and User:SamuelWantman. The initial discussions leading to the proposal are in Archive 1. There is also discussion about the different options here.

Please leave comments about the proposal on this page. Thank you for your input.


No interest?

Aren't there any sysadmins interested in implementing this? Why are there no actions to move to Semantic MediaWiki? --Chricho ∀ (talk) 11:05, 10 May 2011 (UTC)[reply]

ditto... Meclee (talk) 18:36, 7 July 2012 (UTC)[reply]

Level of complexity for our volunteer editors would be a bit of a stumbling block, I would think. - jc37 20:32, 2 September 2012 (UTC)[reply]

American women novelists

There is currently a collection of discussions going on, following some negative press in the New York Times and on Salon. A good place on-wiki to find a lot of the discussion is at Wikipedia:Categories for discussion/Log/2013 April 24#Category:American women novelists. Essentially, the problem is this: as an editor was some editors were populating Category:American women novelists, and simultaneously removing those articles from Category:American novelists, a project which if carried to its conclusion would leave the parent category populated only by men. This was upsetting to a lot of people, so now there's a great big argument about what to do about it. In the course of that discussion, a few people have pointed to category intersection as an approach that would completely obviate many such problems.

However this particular situation is addressed within the current system, it seems inevitable that similar problems will continue to arise. This is especially true since there is a kind of dogma in place that large categories are a problem, and that they must be diffused into smaller subcategories, in order to make categories more useful. However, it's not clear how useful it is for Harper Lee, an important American novelist, not to be in the category "American Novelists" simply because she's in "American Women Novelists", too. Thus, people are asking questions like "Why are large categories a problem, anyway?", and "Why should parent categories be diffused into sub-categories, instead of keeping redundant listings?"

Again, even if we address these issues within the current system, it's not clear that the current system is sustainable. Therefore, I'm very interested in category intersection as a way forward. Where does this idea stand, as far as getting the attention of the developers who would actually have the power to do something about it? -GTBacchus(talk) 04:06, 28 April 2013 (UTC)[reply]

  • The summary above is incoreect. It should read "as editors", since multiple edotirs were invovled in these moves. I was a significant contributor to the project, but I neither created the category nor was I the first to make the moves.John Pack Lambert (talk) 04:55, 28 April 2013 (UTC)[reply]
    • Corrected, with my apologies. You were the only one I knew about who was doing mass category moves. My mistake. I've never claimed you created the category.

      I believe the rest of what I wrote above is fine, n'est-ce pas? -GTBacchus(talk) 12:13, 28 April 2013 (UTC)[reply]

  • Regardless of what's happening at the moment and who is doing what, yes, I think this is good proposal and the way to address the current system. I've placed this on my watch and would honestly be thrilled to see this implemented. Truthkeeper (talk) 13:59, 28 April 2013 (UTC)[reply]
  • To get more specific, a question for any developers out there - what would be the performance implications if we started adding all bios to Category:Men or Category:American men, and then running category intersections again sexuality, job, religion, etc. on a daily basis? These top level cats would be massive, obviously, so the searches would be trolling over large amounts of information - for example Category:Living people has 615,000 people in it - by the time we added other bios, Category:Men and Category:Women could have perhaps millions? (not sure if the dead outnumber the living here)--Obi-Wan Kenobi (talk) 14:30, 28 April 2013 (UTC)[reply]
Thanks - useful data. Now what happens if we start multiplying those searches by 2x (assuming 1M men), and those 32-second searches are happening 10,000 times/day. Can the servers handle the load? This is where we need to bring the devs into the discussion - and consider linkage with wikidata (see below). --Obi-Wan Kenobi (talk) 03:26, 29 April 2013 (UTC)[reply]
Would love to learn more about how wikidata could help here. Can we get someone from there to put together a demo of how to tag up a bio with wikidata and then somehow use that for categorization purposes?--Obi-Wan Kenobi (talk) 02:29, 29 April 2013 (UTC)[reply]
Here is Marissa Mayer's page on Wikidata, and since this discussion developed from questions about categorizing authors, here is J. K. Rowling's. From what I understand, each Wikidata entity has associated claims, which consist of a property and a value for that property, as well as qualifiers that can, for example, limit the scope of the claim to a particular time period. So in theory a person's occupation could be included (with an associated period of time), ethnicity, place of residence, and so forth. I don't know if any development work is planned on providing search capability based on a set of property values, but I assume Wikidata exists precisely to make this type of metadata management and browsing easier. isaacl (talk) 03:12, 29 April 2013 (UTC)[reply]
Ok, that's a great start. So how do we pull up all female authors? If we can do that today, we're halfway there. I love the fact that there are claims, but the claims must be sourced - nice stuff...--Obi-Wan Kenobi (talk) 03:24, 29 April 2013 (UTC)[reply]
It seems like more advanced queries is being planned for phase 3. There is a proposal for a lists task force to guide phase 3, but I'm not sure of its state. However, perhaps a conversation with the primary author of the proposal would be useful? isaacl (talk) 03:59, 29 April 2013 (UTC)[reply]

Wider view: ethnicity- and gender-based categories are not for diffusion, or "thinning out" categories

It should be noted that we have the same problems with ethnicity as we have with gender. For example, James Baldwin, one of the greatest American novelists of the 20th century, is currently not listed in Category:American novelists because he is in Category:African-American novelists instead. Maya Angelou similarly is not represented in Category:American poets, because she is in Category:African-American women poets and Category:American women poets – admitted to two sub-ghettoes, but not the main banquet hall where people like Walt Whitman sit.

So Wikipedia should be doing two things:

  1. It should be made absolutely clear in policy that gender- and ethnicity-based categories are applied in addition to gender- and ethnicity-neutral categories, never as a replacement. no one should ever because of their gender, ethnicity, sexuality or religion be removed from a generic, gender-, ethnicity-, sexuality- or religion-neutral category that others not belonging to their group remain in. In other words, these categories are not for diffusing or thinning out categories, but to provide researches with additional options they might want.
  2. Long-term, we should just apply simple categories or tags like "Man/Woman", "American", "African-American", "Poet/Novelist/..." to articles and avoid categories like "African-American women poets" altogether: users who want to see a list of African-American women poets should be enabled to use the CatScan and search for articles that have the tags "Woman" + "African-American" + "Poet" applied to them.

Now, if and when we have agreement on what approach to take, we need to think about where to raise that for community discussion. Thoughts? Andreas JN466 01:41, 29 April 2013 (UTC)[reply]

  • I really like the clarity that you're presenting here, although I wonder if it's a little too simple. I'm thinking right now about the wicked hard quiz that Obi-Wan put together. Have you tried it, with the above in mind? I recommend it; it's fun.

    One problem you run into is that the parent associated with a category like "Jewish-American politicians", just to pick one example, is a totally depopulated container category ("Category:American politicians"). In that case, we have to modify your #1 above to allow that she can be in the ethnicity-based category in addition to being in any other appropriate subcategories of the gender-neutral container category, e.g., "Green Party of the United States politicians". However, what if none of the other available subcats were to apply?

    We could have a situation where the only place we allow ourselves to put someone is in an ethnicity-based category, simply for structural reasons. How do we handle that, or is that too outlandish to be worth considering? -GTBacchus(talk) 02:40, 29 April 2013 (UTC)[reply]

  • Thanks Andreas for putting this together. How broadly would this extend? For instance, would "nationality" also be considered the sort of category appropriate for CatScan? Or perhaps my question is really, does CatScan apply to all categories, or does CatScan apply only to a selected set of categories? And if the latter, what are the selection criteria? --Lquilter (talk) 13:42, 29 April 2013 (UTC)[reply]