User:BrownHairedGirl/No-reflinks websites

From Wikipedia, the free encyclopedia

This page is for my work on bare URL references to websites where WP:REFLINKS never finds an article title. Where testing here and on sample URLs confirms that WP:REFLINKS consistently fails, I have been using WP:AWB to add {{Bare URL inline}} to all WP:Bare URLs refs to that website.

As of December 2021, the lists included over 1,400 websites.

After that, I changed my workflow, and just added new websites as I found them instead of making batches. As of September 2022, the list probably includes over 2,000 websites, maybe over 3,000.

Purpose[edit]

This tagging applies only to the URL which is tagged. In most cases WP:REFLINKS will be able to fill other bare URLs on an article which has been tagged in this way.

In all cases, other tools are available.

Reference-filling tools[edit]

  • WP:REFLINKS: will handle only bare links, which means that it doesn't spend time checking refs which have already been filled. If there are only a few bare URLs in an article with many refs, that can be a big timesaver.
    Unfortunately, Reflinks has been unmaintained for several years, so there is no prospect of a fix to its flaws, which include:
    • Reflinks cannot connect to many thousands of live websites, so it cannot get any info about those webpages. That is why I have taken to tagging the pages where it fails.
    • It doesn't recognise the {{Bare URL inline}} tag, and skips any bare URLs which have that tag. That's why I apply the inline tag only to links to websites which Reflinks can never handle anyway.
    • Reflinks uses {{cite web}}'s |publisher= parameter when it should use |website=
    • Reflinks often puts junk in the |author= parameter
  • ReferenceExpander: excellent on bare links, but can go bonkers in some cases; see a disastrous result at [1].
    If you use ReferenceExpander, please note that it saves changes without preview. So check the diff of its edit very carefully, and be ready to revert its edit.
    Note that ReferenceExpander tries to rewrite every ref which doesn't use a cite template, including. In simple cases (e.g. <ref>[https://example.com/hello Someone says hello]</ref>) it usually works fine, but since it strips all existing info, it can lose a of detail on a more complex ref.
    For example, in this (reverted) edit[2] it mangled a lot of refs, such as changing <ref>Covington, Phil. [http://www.triplepundit.com/2012/09/rideshare-company-commuter-shuttle-bay-areas-ridepal/ No Company Commuter Shuttle? Bay Area’s RidePal Has The Answer]. ''Triple Pundit'' September 21st, 2012</ref> to <ref>{{Cite web|title=No Company Commuter Shuttle? Bay Area's RidePal Has The Answer|url=https://www.triplepundit.com/story/2012/no-company-commuter-shuttle-bay-areas-ridepal-has-answer/62201|access-date=2021-12-21|website=www.triplepundit.com}}</ref> ... losing the author and date.
    If a ref has been archived using {{webarchive}}, it will strip that too.
  • WP:REFILL: powerful but buggy. Use with care!
    It fixes lots of refs which Reflinks cannot fix, but it can also go spectacularly wrong on some refs. Preview its output very carefully, by checking the diff before saving. It has a particularity nasty habit of trimming the URL where the website doesn't issue a proper 404 error, but redirects to the homepage. For example if there is a bare ref to http://example2.com/somethingaboutnothing (or a formatted ref such as <ref>[http://example2.com/somethingaboutnothing Wordsoup]</ref> or <ref>{{cite web |url=http://example2.com/somethingaboutnothing |title=Wordsoup}}</ref>), and the page http://example2.com/somethingaboutnothing is redirected to http://example2.com, then Reflinks will change the URL to http://example2.com, and change the page title. Nasty; please beware.
  • Unlike the tools above, Citation bot works only on completely bare refs, or on refs which already have a cite template; it will not apply a cite template to a ref of the form <ref>[http://example.com/foo Foobar]</ref>. However it is very accurate, and actively maintained by a very conscientious bot owner, who rapidly fixes any bugs.
    The major weakness of Citation bot is that it uses an external tool (the "zotero") to get the title of a webpage. There are many sites for which the zotero never returns a title; and additionally the zotero is frequently overloaded and requests to it timeout, so a failure to get a title cannot be counted as definitive.
  • of course, references may always be filled manually.

Secondary tasks[edit]

If a page is edited to add one or more {{Bare URL inline}} tags, the following secondary tasks may be performed:

General fixes[edit]

General fixes (see WP:GENFIXES) are a set of semi-automated edits that are enabled by default in AutoWikiBrowser. They are intended to be uncontroversial and require minimal human oversight; many are cosmetic and improve wikitext readability but do not affect display to readers.

Fixing <br>[edit]

Per H:BR, a line break formatted as an unclosed tag <br> breaks some syntax highlighters. This AWB job will convert such uses to <br />.

This is a cosmetic change which makes no difference to how the page is displayed, but it assists those editors who use some syntax highlighters.

Unhiding bare URLs[edit]

Note that since 21 November 2011, these edits have also been making the text of bare URL and dead link refs visible. A ref text "[1]" or "[3]" is useless to the reader.

For example, a ref of the form <ref>[https://example.com/foo]</ref> will be displayed in the reference list simply as a number: 5

That bare number tells the reader nothing about what the link is, so this task strips the square brackets, making the ref render as https://example.com/foo

See for example this edit[3] to Ruben Katoatau.

phab:T291704[edit]

Function disabled as of 15 December 2021. phab:T291704 has been fully fixed in v2.0.8.5 of InternetArchiveBot

Adds dashes to undashed uses of the cite parameters |access-date=, |archive-date=, and |archive-url=. For example, |accessdate=|access-date=.

This is a work-around to the bug phab:T291704 in InternetArchiveBot. That bot currently doesn't recognise the undashed form, and may create a duplicate parameter, which is an error tracked in Category:CS1 errors: redundant parameter. The bot owners are working on a fix, but have not yet got a solution.[4]. So until a fix to the bot is implemented, adding the dashes helps reduce errors.

Failure types[edit]

In most of these cases, WP:REFLINKS consistently fails at the initial stage of setting up an HTTPS connection. For example, reflinks always fails to get a title for links to the Wall Street Journal website https://www.wsj.com/, giving errors such as Can't retrieve page https://www.wsj.com/ : <urlopen error [Errno 1] _ssl.c:510: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol>

In some cases, such as most major Australian newspapers, REFLINKS successfully connects, but consistently returns a useless generic title such as "No Cookies" or "Loading 3rd party ad content". One of the most common bare URL ref is to Twitter, where Reflinks returns the title "JavaScript is not available".

Lists of websites to be tagged[edit]

The list of websites has grown very large, so it has been split into sub-pages:

  1. User:BrownHairedGirl/No-reflinks websites/Set 1
  2. User:BrownHairedGirl/No-reflinks websites/Set 2
  3. User:BrownHairedGirl/No-reflinks websites/Set 3
  4. User:BrownHairedGirl/No-reflinks websites/Set 4
  5. User:BrownHairedGirl/No-reflinks websites/Set 5: timeouts
  6. User:BrownHairedGirl/No-reflinks websites/Set 6

Websites which are still being tested, and are not yet being processed by AWB, are listed at User:BrownHairedGirl/No-reflinks websites/sandbox.

Testing the list of websites[edit]

All of the refs tagged by this AWB job have been repeatedly tested with WP:REFLINKS.

Feel free to run the tests yourself, but ... please note that the test sets are very large. They take many minutes to process, and impose a heavy load on the server which hosts Reflinks.

  1. test Set 1
  2. test Set 2
  3. test Set 3
  4. test Set 4
  5. test Set 5: timeouts
  6. test Set 6
  7. test the sandbox

False positives[edit]

The following websites consistently failed to give titles when tested, but after the refs had been tagged, they started giving titles:

References[edit]