Jump to content

Wikipedia:Arbitration Committee/Requests for comment/Article creation at scale: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 243: Line 243:
# Support. Articles that:
# Support. Articles that:
1)were disapproved by the community at the noticeboard for article creation at scale or the Bot Approvals Group
1)were disapproved by the community at the noticeboard for article creation at scale or the Bot Approvals Group

or
or

2)contain no sources
2)contain no sources

or
or

3)contain only:
3)contain only:
- deprecated sources
- deprecated sources

Revision as of 21:19, 3 October 2022

Status as of 13:40 (UTC), Thursday, 11 July 2024 (Purge)


Introduction

This is the first of two RfCs about article creation and deletion at scale. Per the rules below, please feel free to add to questions/proposed changes for the first seven days; other suggestions, comments, questions or replies should be made within your own section.

This RfC will be announced at the articles for deletion talk page, the Arbitration Noticeboard, the administrators' noticeboard, the Bot policy talk page, Village pump (policy) and Centralized discussion.

Background

Page-related actions done at scale can overwhelm the community's ability to adequately monitor and participate effectively. The issue is exacerbated in the case of article creation at scale because it escapes the normal notification system.

In the past, Wikipedia did not discourage article creation at scale under the assumption this was the best way to achieve broad coverage of vast subjects such as sports, plant and animal life, geography. There exists a policy that automated or semi-automated creation requires a bot request for approval. More recently, concerns have been raised in multiple venues that the continuing creation of such articles has overwhelmed editors’ ability to track and assess these articles, and that the churn has become a waste of time and a cause of disruption. In a 2022 August decision, the Arbitration Committee (ArbCom) ordered an RfC addressing "how to handle mass nominations at Articles for Deletion" (termed "AfD at scale").

A strong argument was made that the article creation at scale (sometimes known as mass, rapid, or large-scale creation) is one of the causes of dysfunction at AfD with regard to article deletions at scale, and that addressing this issue is a necessary precursor to the ArbCom-ordered RfC addressing AfD at scale.

For a list of proposed solutions other than those initially presented here, please see Archive 2 of WT:ACAS.

Statistics for mass creation

  1. Editors who have created more than seven articles in the past week, including lists and disambiguation pages
  2. Editors who have created more than seven articles in the past week, excluding lists and disambiguation pages
  3. Editors who have created more than ten articles in June
  4. Editors who have created more than ten articles in July
  5. Editors who have created more than ten articles in August
  6. Editors who have created more than 100 articles in the past year
  7. Editors who have created more than 100 articles in the past year, by month
  8. Editors who created more than than 10 articles in 2021, by month
  9. Editors who created more than than 10 articles in 2020, by month
  10. Editors who created more than than 10 articles in 2019, by month
  11. Editors by number of articles created in the past five years

Notes:

  1. None of these contain redirects that were converted into articles by the listed editor, but they do contain redirects that were converted into articles by other editors. I'm looking into fixing the latter; the former can be fixed for smaller datasets, but is too intensive for larger ones.
  2. External links counts can be suggestive about the quality of the article, it can also be meaningless - a low number may be because a large number of offline sources were used, while a high number may be because a template that provides links to a large number of database sources was added.
  1. Articles by editor by day over one year (1138 editor-days exceeded 10 articles; 163 exceeded 25)
  2. Articles by editor by week over one year (922 editor-weeks exceeded 20 articles, 150 exceeded 50)
  3. Articles by editor by month over one year (640 editor-months exceeded 40 articles, 123 exceeded 100)
  4. Articles by editor by year since 2020 (1156 editor-years exceeded 80 articles; 407 exceeded 200)

Note that these do attempt to exclude false positives from editors converting redirects created by the original editor, but some still exist, and this attempt does result in some false negatives. This is also the reason why a hard technical limit will be difficult; we will need some way to identify editors converting redirects into articles, and count those articles towards their count rather than towards the count of the original article creator. (Compiled by BilledMammal)

Purpose of this discussion

This RfC is to find and develop solutions to issues surrounding article creation at scale, partially in preparation for the RfC on article deletions at scale.

Rules

  1. All editors are required to maintain a proper level of decorum. Rudeness, hostility, casting aspersions, and battleground mentality will not be tolerated. Inappropriate conduct will result in a partial block (p-block) from this discussion.
  2. The sole purpose of this RfC is to determine consensus about policy going forward surrounding creation of articles at scale and to form consensus on those solutions. It is not a venue for personal opinion on past creation or creators of such articles or about previous tolerance of such creations, nor about past mass deletions, ditto. Editors posting off-topic may be p-blocked from this discussion.
  3. All comments must be about issues and proposed policy changes surrounding article creation at scale. Comments about any contributor are prohibited and will result in a p-block from this discussion. Any violations will be reverted, removed, or redacted.
  4. Please do not make changes in RfC questions that have already been posted. Anyone is permitted to post additional questions/proposals, below the existing ones. Moderators may at their discretion merge, edit, or condense questions at any point in the process. Any user may suggest such changes.
  5. Please make all additional proposals within seven days of the start of this discussion. Subsequent proposals may be brought up in an editor's own section for consideration and inclusion at the discretion of the moderators.
  6. Discussion is unthreaded. Please create your own comments section within the discussion section for each question, placing your username in the section header. Within your own section you may present your !votes, post questions to other editors, or respond to other editors; unthreaded discussions with other editors can be created on the talk page. Threaded discussion on the RfC will be moved to the talk page by moderators/clerk. Hint: pipe long article titles and just timestamp (5 tildes) posts in your own section.
  7. Within a comment section each editor is limited to 300 words, including questions to and replies to other editors. (word count tool) Short quotes from other editors to provide clarity are excluded from the word count, but quoted material may be trimmed by moderators at their discretion. Moderators may at their discretion grant extensions following a request on the talk page that includes a brief explanation of why it is needed; please ping for such requests. Overlength statements will be collapsed until shortened. Hint: pipe long page titles and don't bother signing posts in your own section.
  8. If you believe someone has violated these rules, please speak to a moderator on their user talk page. If you believe the moderators are behaving inappropriately, please speak to an arbcom member on their user talk page or by email.
  9. This discussion will be open for 30 days and will be closed by a panel of three editors with experience closing discussions and who will be appointed by the Arbitration Committee prior to the start of the RfC. The closing panel will summarize and evaluate what consensus, if any, exists within the community.
  10. Per their order and this amendment, any appeals of a moderator decision may only be made to the Arbitration Committee at Wikipedia:Arbitration/Requests/Clarification and Amendment. The community retains the ability to amend the outcomes of the RfC through a subsequent community-wide request for comment

Moderators of this discussion

The Arbitration Committee has appointed two moderators for this RfC:

Additional clerking help: MJL (talk · contribs)

Closers

The Arbitration Committee has appointed a panel of three closers for this RfC:

Proposals

Question 1: Should we develop a noticeboard where mass creations and sources used for them can be discussed?

Proposed: A noticeboard will be created to allow for obtaining consensus for, making reports of, and having other discussions of mass creations and the sources used for such creations. (Details to be developed there.)

Support (Create noticeboard)

  1. Thryduulf (talk) 19:04, 3 October 2022 (UTC)[reply]
  2. Per what I wrote in the pre-RfC stage, especially the process described here. — Rhododendrites talk \\ 19:38, 3 October 2022 (UTC)[reply]
  3. --Enos733 (talk) 20:13, 3 October 2022 (UTC)[reply]
  4. This is very needed, article creation at scale has highly disruptive potential.Lurking shadow (talk) 20:56, 3 October 2022 (UTC)[reply]
  5. I suspect it's the only proposal that will achieve consensus here, effectively punting all this nonsense to a new location. Nonetheless, there are situations where it will be necessary, and it ought to be a net positive. Vanamonde (Talk) 21:01, 3 October 2022 (UTC)[reply]
  6. Worth a try, though I'm uncertain of how much good it will in practice do. Seraphimblade Talk to me 21:11, 3 October 2022 (UTC)[reply]
  7. Open to giving this a try. HouseBlastertalk 21:18, 3 October 2022 (UTC)[reply]

Oppose (Create noticeboard)

  1. oppose statement

Comments (Create noticeboard)

Comments from Thryduulf (Q1)
Comments from Rhododendrites (Q1)

If you meet the definition of "article creation at scale" (see my comments at Q3), then you must post a notice to this noticeboard with the following information:

  1. The approximate number of articles you will create.
  2. The approximate time frame for creation.
  3. A description of the overall topic/theme.
  4. Which notability criteria you will be using.
  5. What kind of sourcing you will use to demonstrate that each article meets the criteria (subject to the results of Q2).

Upon creation of the noticeboard, a subsequent RfC (or other discussion) will determine how long these discussions stay open, who approves them, if there's an appeals process, etc. — Rhododendrites talk \\ 19:48, 3 October 2022 (UTC)[reply]

Comments from Editor X

Please open your own section with username in the heading

Question 2: Should we require (a) source(s) that plausibly contribute(s) to WP:GNG?

Proposed: Modify the General notability guideline (GNG)/Subject-specific notability guidelines (SNG) at WP:Notability (as appropriate) to add: (Please rank your choices by listing, in order of preference from most preferred to least preferred, all seven options.)

A: All articles created under SNGs (other than those which confer notability) must be cited to at least one source which would plausibly contribute to GNG: that is, which constitutes significant coverage in an independent reliable secondary source.

A-2: At least two sources.

B: All articles (except those not required to meet GNG) must be cited to at least one source which would plausibly contribute to GNG: that is, which constitutes significant coverage in an independent reliable secondary source.

B-2: At least two sources.

C: All WP:MASSCREATEd articles (except those not required to meet GNG) must be cited to at least one source which would plausibly contribute to GNG: that is, which constitutes significant coverage in an independent reliable secondary source.

C-2: At least 2 sources.

D. No change.

Statements (Require GNG-quality source(s))

Please in your statement rank your choices by listing, in order of preference from most preferred to least preferred, all seven options. Sign as usual with 4 tildes.

  1. D, C, C2, B, B2, A, A2 (I don't think change will improve things, especially given the very significant variety in SNGs)
  2. D, C, C2, B, B2, A, A2. --Enos733 (talk) 20:16, 3 October 2022 (UTC)[reply]
  1. C2 > C > A2 > A > D > B2 > B. Much of the conflict that led to this RfC is driven fundamentally by a mismatch between criteria used for creation and for deletion. This is primarily the result of SNGs that do not independently confer notability being used to justify mass-creation using databases and lists. Neither such SNGs nor such sources are, at present, admissible as evidence for keeping at AfD, where such articles inevitably end up. Requiring the articles to include sources supporting GNG addresses this mismatch. I would prefer two sources to one, but requiring it of every single article is a bit of an over-reach. Vanamonde (Talk) 20:53, 3 October 2022 (UTC)[reply]

Comments (Require GNG-quality source(s))

Comments from Thryduulf (Q2)
  • Combine the massively variety in style, format and purpose of SNGs with the very subjective nature of what constitutes a source that "passes" the GNG and changes would not improve matters. Better imo to discuss things individually at the board proposed in Q1. 19:26, 3 October 2022 (UTC)
Comments from Rhododendrites (Q2)

Whoa. 2B (and to a much lesser extent 2A) extends far beyond the scope of this RfC IMO, applying to all articles. This would be a radical change and should be separated if anyone wants to really propose it.

Weak support for C, but really I think the guidance should go something like this: "Mass created articles must include sufficient sourcing to show notability, and cannot be based only on simple statistical databases. While there are no firm requirements about the level of quality an article must reach when created, many in the community have a strong preference for mass created articles to be more than one- or two-sentence stubs." (This obvious extends to quality, but doesn't mandate anything about length). — Rhododendrites talk \\ 19:52, 3 October 2022 (UTC)[reply]

Comments from Vanamonde
  • I believe a cleaner way to do this would be to simply prohibit mass-creation that is based on criteria that do not independently grant notability. However, this idea has not made it into the RfC. Some of the proposals above do so indirectly, at least per my view of what mass-creation is, and so have my support. Vanamonde (Talk) 20:55, 3 October 2022 (UTC)[reply]
Comments from Enos733 (Q2)

I agree with the comments of Thryduff and Rhododendrites above (and can support Rhododendrites's proposal). We do want high-quality articles, but we must balance that with the idea that this project is "freely editable" and we should be hesitant to enact procedures that enact barriers towards the sharing of knowledge (we have procedures for dealing with vandalism and deleting articles that do not fit with this project). --Enos733 (talk) 21:11, 3 October 2022 (UTC)[reply]


Comments from Editor X

Please open your own section with username in the heading. Please limit comments within a section to 300 words.

Question 3: Should we create a definition of "article creation at scale"? By rate, source, similarity, other?

Proposed: "Article creation at scale" is the creation of over 25 similar/similarly-structured articles per day or 50 per week or 100 per month or 200 per year using the same source.

This definition, once finalized, would be usable for establishing limits for the need to request consensus to create at scale, for requesting permission to create at scale, or for other discussions surrounding article creation at scale. (This proposal is intended to be refined and may not be finalizable here in this RfC but can be used for input for later proposals.)

Support (Create definition)

  1. Thryduulf (talk) 19:06, 3 October 2022 (UTC)[reply]
  1. Supporting on principle, but it's odd to ask if we should have a definition while simultaneously making decisions about the undefined thing. — Rhododendrites talk \\ 19:45, 3 October 2022 (UTC)[reply]

Oppose (Create definition)

  1. I supported, but was confused by the difference between the heading, which asks "By rate, source, similarity, other?" and the actual proposal, which answers that question: "by rate, similarity, and source". I moved to oppose mainly because "similar/similarly-structured" needs more clarity and because "using the same source" should be one way in which they can be similar/similarly-structured. See my comments below for an alternative. — Rhododendrites talk \\ 19:58, 3 October 2022 (UTC)[reply]
  2. Support creating a definition, oppose this definition. As I see it, mass-creation is simply creating articles without individually checking them for notability. Period. It's sometimes justifiable; it sometimes draws from lists or databases whose entries are inevitably notable; and sometimes it doesn't, but the products may still be good. However, putting numbers on it misses the crux of the matter, and also allows for endless dispute about timing and rates (see how bad the wikilawyering can get just with respect to 1RR restrictions). Vanamonde (Talk) 21:00, 3 October 2022 (UTC)[reply]

Comments (Create definition)

Comments from Thryduulf (Q3)
  • The definition will probably have to be slightly fuzzy, but I can only see this being a good thing. 19:08, 3 October 2022 (UTC)
Comments from Rhododendrites (Q3)

[Turned comment into 3A per suggestion on the talk page]. See my comments at Q1 for what I think a request for permission should look like. — Rhododendrites talk \\ 20:56, 3 October 2022 (UTC)[reply]

  • @Vanamonde93: mass-creation is simply creating articles without individually checking them for notability. Period - I suspect if that were the working definition, there would be consensus to completely prohibit it, but I've not seen anybody put forth a definition like that before. On the flip side, it would also allow for creating thousands of articles per year as long as you know they're notable, regardless of rate/sourcing, and those are things people are clearly concerned about. — Rhododendrites talk \\ 21:14, 3 October 2022 (UTC)[reply]
Comments from Vanamonde
  • @Rhododendrites: I don't think there will. There's plenty of mass-creation that's quite justifiable; described scientific species, politicians that unambiguously meet NPOL, legally recognized towns, etc. These are areas in which we've had community support not only for mass-creation, but for bot creation. I think the community is upset about mass-creation of non-notable pages, which, I believe, is the consequence of the GNG-SNG mismatch I mentioned above. Vanamonde (Talk) 21:18, 3 October 2022 (UTC)[reply]
Comments from Editor X

Please open your own section with username in the heading. Please limit comments within a section to 300 words.

Question 3A: Alternative three-part definition

An editor is engaged in "article creation at scale" if these three criteria are all met:

  1. Rate - More than 50 new articles in the span of a month or 500 in the span of a year.
  2. Related articles - The articles are on a similar topic, similar theme, or based on the same set of sources.
  3. Manually created - Rather than the use of a bot/script/tool (which requires going through a different process, bot authorization).

Anyone who answers "yes" to all three of these is engaging in "article creation at scale" of the sort that would require abiding by the rules set forth elsewhere in this RfC (such as posting a request to a noticeboard, if Q1 gets support). Even if an editor does not think they meet the criteria, an uninvolved administrator may determine that someone's editing fits within the spirit of these requirements, and instruct them to seek permission.

Support 3A (alternative three-part definition)

  1. As proposer. — Rhododendrites talk \\ 20:56, 3 October 2022 (UTC)[reply]
  2. I think we can change the numbers later, but this is a good start at defining mass-creation. --Enos733 (talk) 21:16, 3 October 2022 (UTC)[reply]

Oppose 3A (alternative three-part definition)

  1. Support some guidelines in principle, but 500 a year before some community oversight kicks in is way too many. Seraphimblade Talk to me 21:19, 3 October 2022 (UTC)[reply]

Comments (alternative three-part definition)

Question 4: Should we prohibit the creation of articles at scale?

This proposal would prohibit the creation of articles at scale based upon a rate definition to be separately decided.

Support (prohibit)

  1. The creation of encyclopedia articles must be understood as a matter of quality, not quantity, and that the rapid creation of articles almost certainly threatens our extant processes for article triage and improvement. Chris Troutman (talk) 20:37, 3 October 2022 (UTC)[reply]

Oppose (prohibit)

  1. Much too blunt of an instrument. At the most extreme, we're saying we don't want someone to create 51 GAs in a month on various topics? Or we're assuming they're all stubs? If the latter, a more precise question might be to ask whether we want a minimum level of quality for articles created at scale. Update: I've added Q5 accordingly. — Rhododendrites talk \\ 20:52, 3 October 2022 (UTC)[reply]

Comments (prohibit)

Comments from Editor X

Please open your own section with username in the heading. Please limit comments within a section to 300 words.

Question 5: Minimum article quality when created at scale

Articles created at scale should be required to meet a certain level of quality in addition to minimum sourcing requirements (see Q2).

For example: minimum number of sentences, article size, assessment, ORES score, etc.).

If you support this, you may suggest qualitative or quantitative standards, but a separate question will be required to find consensus for specific requirements.

Support (minimum quality)

  1. Support. Articles that:

1)were disapproved by the community at the noticeboard for article creation at scale or the Bot Approvals Group

or

2)contain no sources

or

3)contain only: - deprecated sources - sources that are easily editable by unqualified people


should be speedily deletable. 1) is equivalent to a deletion discussion, 2 and 3 suggest that the author didn't have any standards whatsoever and their creations cannot be trusted.Lurking shadow (talk) 21:19, 3 October 2022 (UTC)[reply]

Oppose (minimum quality)

  1. oppose statement

Comments (minimum quality)

Comments from Rhododendrites (minimum quality)

Adding this question because it comes up frequently and would be useful to resolve one way or the other. Not supporting or opposing at this time. — Rhododendrites talk \\ 21:07, 3 October 2022 (UTC)[reply]

Comments from Editor X

Please open your own section with username in the heading. Please limit comments within a section to 300 words.


Question 6:

Neutral description of proposal.

Support (proposal name)

  1. support statement

Oppose (proposal name)

  1. oppose statement

Comments (proposal name)

Comments from Editor X

Please open your own section with username in the heading. Please limit comments within a section to 300 words.

Discussion

Comments from Editor X

Open additional comments sections below. Please limit comments within a section to 300 words.