User talk:Dispenser/Checklinks
From Wikipedia, the free encyclopedia
[edit] Readability page
What do the highlighed red words signify? (e.g. here) Harryboyles 06:17, 6 January 2008 (UTC)
- They're for debugging the sentence counter which just looks for periods. —Dispenser (talk) 08:17, 6 January 2008 (UTC)
[edit] suggestion for the click on link checker
add a separate column, and/or a button that says details, which upon pressing, expands the link as is being done now by clicking anywhere. —Preceding unsigned comment added by Nergaal (talk • contribs) 04:10, 16 January 2008 (UTC)
- Thanks for the direction, while I've been trying to keep the interface as clutter free as possible. I think the icon seems to be the necessary hint. But we'll have to see in the logs (its been 13 hit of 100 utility hit w/ the message). —Dispenser (talk) 03:45, 18 January 2008 (UTC)
[edit] monobook script
Hi. The monobook script doesn't seem to be working for me. It doesn't appear in either Firefox or IE7. I am looking in the right place, right? At the toolbox on the left? Matthew | talk | Contribs 01:02, 12 February 2008 (UTC)
Done I had been loading the function dynamically before. It should show up bellow the search box and labeled as "Check external links". — Dispenser 03:09, 12 February 2008 (UTC)
[edit] Ignore list
Sorry if this is mentioned somewhere, but I don't see it. What URLs would be on the ignore list or more appropriately, why are CNN links on the URL Ignore list? Phydend (talk) 19:46, 18 February 2008 (UTC)
ignorelist = [ re.compile(r'.*[\./@]example.(com|net|org)(/.*)?'), # reserved for documentation re.compile(r'.*[\./@]tools.wikimedia.(org|de)/.*'), # So we don't end up calling ourself re.compile(r'.*[\./@]wikimedia.org/.*'), # Wikipedia media repository re.compile(r'.*[\./@]archive.org(/.*)?'), # Prevent downloading of media re.compile(r'.*[\./@]cnn.com(/.*)'), # CNN has firewalled us ]
- Basically CNN had put a rule in their firewall config to drop all packets from the Toolserver. This caused requests which queried CNN to timeout, which take about 5 minutes. — Dispenser 23:19, 18 February 2008 (UTC)
[edit] Readability tool
Sometimes the tool comes back with statistics fairly quickly (For example; Introduction to evolution, Bees and toxic chemicals and Dog), but othertimes seems to be slow, so slow that it might be broken (Evolution for example). What is going on? Are those articles just too complicated? Is something else wrong?--Filll (talk) 01:23, 20 February 2008 (UTC)
Fixed I made an optimization that I shouldn't have in template removal. I do not put much credence to the tool and have ceased any serious development. Problems begin with the syllable counter doesn't use a dictionary or known algorithms. The readability algorithms were based on their respective Wikipedia articles which have errors, are simplified, and/or were incorrect. Additionally, the readability algorithms have a standard deviation of roughly 1½ for 1 interval, i.e. accurate to within ±1.5 for 68% of people. — Dispenser 05:58, 20 February 2008 (UTC)
[edit] Featured Article Candidates
Dispenser,
First, congratulations on a great tool; very useful.
Now the bad bit ;) Where does the tool get the FAC list from? I ask because it doesn't seem to be up-to-date for the list of all current candidates. Is it a manual job to update it? Cheers. Carré (talk) 11:47, 6 March 2008 (UTC)
- It runs automaticlly starting at 5:00 UTC, using the category list created from /config template. It uses the HTML output from the page and the runs a regex on it to get the pages from the linked headers. The part has been working for some time now. However, it seems as though there is a caching issue somewhere as it continues to get 1½ month old version of the page. I've changed the address to the purge page in hopes that will resolve the issue. It'll solve it in the short term, we will see if it fixes the problem in 6 months from now. — Dispenser 04:01, 7 March 2008 (UTC)
[edit] Suggestions
I don't really think a sortable table would help that much. A big pink notice that a site is one of the host sites like members.aol.com/geocities/etc. might be nice, but its so easy now to get a domain name that it's easy to hide if you know what you're doing. Being generally clueless on the sort of programing that you can do with Wikipedia, would it be possible to highlight if the word "blog" was on the page? Or other words that should throw up a red flag? If this isn't possible or would be too hard, I totally understand. In the totally dreaming realm I'd also love something that would see the list of refs and see if they are using cite web and check that they have publisher and last access date used so that I can easily pull up a list of citations missing those two parameters. That's easily the one thing that gets dropped the most. Ealdgyth - Talk 15:45, 14 April 2008 (UTC)
- I've changed how templates are handled so they're more flexible. It will display the {{cite web}} information in single {} and italics. Is this alright? The blog thing seems hard to do since it isn't easy equating a single link with a word that appear outside that link (i.e. intro talk about somebody's blog, and its in reference to link number 10). — Dispenser 03:48, 20 April 2008 (UTC)
[edit] KeyError
When trying to do a check on a page with an unusual character (such as é), the python script gets confused and throws up a KeyError. — Wackymacs (talk) 16:52, 14 April 2008 (UTC)
Fixed Thanks! — Dispenser 03:48, 20 April 2008 (UTC)
[edit] Error checking Degrassi: The Next Generation
I've just noticed this error that was not occurring in the last few days.
It repeatedly brings back both GLAAD links as not working, however, when I clicked on them to check myself, they do work. Cheers! -- ṃ•α•Ł•ṭ•ʰ•Ə•Щ• @ 03:18, 29 April 2008 (UTC)
- I am unable to duplicate your results, I have found other bugs but both GLAAD links continue to popup with rank 0. Perhaps it was a server or during the weekend development of the tool. — Dispenser 02:59, 30 April 2008 (UTC)
- Yup. Just ran it 3 times, and it's all fine. Thanks for checking though. -- ṃ•α•Ł•ṭ•ʰ•Ə•Щ• @ 03:08, 30 April 2008 (UTC)
[edit] False deadlinks?
Someone recently ran the tool against an article and it reported a dead link [1]. I manually checked the link in question, and it is good. I also tried to run the tool, and got the same false reading of a deadlink. The link is on the New York Times website, I'm wondering if they might be filtering traffic of this nature? Yngvarr (c) 12:53, 18 May 2008 (UTC)
- I checked the page earlier today and the link in question did not show up with a red row. I suspect that you misinterpreted the addition of «dead link» to the title and have changed to the more traditional {{dead link}} format. I suspect that the user that made the edit in question had merely played around with the options. An alternate possibility is NYT site was down. I will need to add a history mechanism in the future. — Dispenser 05:29, 19 May 2008 (UTC)
[edit] How to run it?
There's no simple, one-stop explanation of how to run it. Philcha (talk) 16:09, 17 June 2008 (UTC)
- I've added a {{nutshell}} to the top of the page, but basically your suppose to figure it out from the examples box which fills in various pages. — Dispenser 04:05, 30 July 2008 (UTC)
[edit] Unwatching
Articles in my watchlist are unwatched when I run this, which is a bit annoying. Other than that, very good tool. --Closedmouth (talk) 05:07, 26 June 2008 (UTC)
- I've added an option in the new Preferences page, but it hasn't been implemented into the backend yet. Option are only Remove all or Add all, since I don't have access to user information. — Dispenser 04:05, 30 July 2008 (UTC)
Done It should work on all the tools now. — Dispenser 21:11, 21 September 2008 (UTC)
[edit] great service! correction bottom graph caption, please
"Distribution", not "distrAbution". Thanks. TONY (talk) 05:26, 18 July 2008 (UTC)
Fixed Thanks, I fixed a few other, then removed the section as I realized it was a mostly irrelevant. Need to place things contextually next time. — Dispenser 21:11, 21 September 2008 (UTC)
[edit] Readability - Simple English
I was in the process of checking an article for Simple English and noticed the option of checking that wiki was removed. I was wondering if this was a permenant change or if I was just imagining that it even happened.. The tool is extremely handy for checking the level of our VGA (simple's version of FAs) candidates and I hope we can continue to use it. Creol (talk) 09:41, 6 September 2008 (UTC)
- I had forgotten to update it when I was migrating all my tools the interwiki link syntax. Since your posting I've ensured that tools is coherent (links on simple don't goto English. You can check pages by using the syntax simple:music or by pasting the URL in and it will convert for you. — Dispenser 18:48, 11 September 2008 (UTC)
[edit] FYI Observation - templates and external links
First, I will add my voice to the chorus of kudos. Nicely done.
I notice that the Checklinks resource does not recognize links generated as part of (at least some) transcluded templates. For instance, {{AMQ}} as used in WDEL; there are several such templates in broad use among radio station articles. This might be something to include in the documentation as a limitation. --User:Ceyockey (talk to me) 02:10, 14 September 2008 (UTC)
Done I've added a section on the Internal workings. — Dispenser 21:11, 21 September 2008 (UTC)
[edit] Possible bug with duplicate ref content
[2] Two references named "GQ" appear in one paragraph. Someone using your tool converted the second name to autogenerated1. Does this look right? (As a separate issue, I've noticed editors putting quotes around ref names, so should autogenerated1 be "autogenerated1" when a name is created?) Gimmetrow 19:13, 18 October 2008 (UTC)
- On the same note, it properly combined the ref named "bbc1". Wildhartlivie (talk) 19:34, 18 October 2008 (UTC)
- Should be addressed to User:NicDumZ since he's the authors of that particular code. The two refs only differ by a space before "December". — Dispenser 01:06, 20 October 2008 (UTC)
[edit] Blacklisted URLs?
Why are certain URLs blacklisted? I was attempting to fix Schiller Institute when reflinks [sic] tripped over one of the URLs. --Adoniscik(t, c) 17:15, 23 October 2008 (UTC)
Fixed Only www.jstor.org is blacklisted. The link in question resulted from an error that I made in the globalbadtitles when I made a mistake in converting to the new format. Read more about title blacklisting at DumZiBoT approval request. — Dispenser 18:17, 27 October 2008 (UTC)
[edit] Cite Web
Regarding edits today to the Tennis page. It torn our a number of "Cite Web" references, in favor of a "old style" flat reference...? Is this standard? -- IrishDragon 05:35, 23 November 2008 (UTC) —Preceding unsigned comment added by IrishDragon (talk • contribs)
- URLs are not proper references, so it converted them to "flat URLs" and ran the reflinks bot script to add titles. If you wish to convert them into using cite web you may wish to the the webreflinks script. — Dispenser 06:57, 27 November 2008 (UTC)
[edit] geosearch.py
I've been looking at error detection in coordinates lately, and rediscovered your pretty tool. I had been thinking of some new regexps, but the tool can't handle them yet:
- Possible degree, minute and second characters [°′'`´‘’″"“”] seem to match with article names: regexes should be applied to the coordinates only.
- Negation of all the allowed characters [^0-9NSEW._-] doesn't seem to show only coordinates with other characters.
- Negation queries NOT REGEXP might be nice so that a regex of the correct format could be given and the tool would then show all erroneous ones, as kind of a catchall, but that might require a way to give multiple patterns. Or maybe the tool could have that query built in?
--Para (talk) 23:51, 2 December 2008 (UTC)
- A few things seem to be going on here. MediaWiki stores the URLs in UTF-8 if provided in UTF-8, but percent encode them when rendering. Firefox barfs on Unicode quote, sad really. MySQL seems to have only byte-wise support for regexing UTF.
params=[^&:]*[°′'`´‘’″"“”]? [°′'`´‘’″"“”] seems to be interpreted as[\xC2\xB0\xE2\x80\xB2'`\xC2\xB4\xE2\x80\x98\xE2\x80\x99\xE2\x80\xB3]params=[^&:a-z]*[^0-9NSEW._&:a-z-]- It's a good idea, but I still don't have the database tools design fully fleshed out yet.
- Originally I wrote program that parsed the external link table, with a good amount of error correction. Docu sent me a patch to add more verbose messages. It currently runs daily and dumps it's logs into tools:~dispenser/resources/logs/coord-enwiki.log. — Dispenser 05:11, 3 December 2008 (UTC)
-
- Ok, the results from the logging tool look good, much better than a simple catchall query that doesn't classify anything. Would it be possible to have a table interface to prettify the log and sort by error type? --Para (talk) 14:23, 4 December 2008 (UTC)
- It's formatted in Tab-Separated Format which can be imported into excel and then "filter"ed into a table interface. — Dispenser 16:42, 4 December 2008 (UTC)
- Right, I have no trouble reading it, but I was more thinking of casual users who stumble upon a link of things to do. The various errors are listed on Wikipedia:WikiProject Geographical coordinates#Coordinates search tool, with a link to a tool that's easy to read and allows easy access to the problem article. I wouldn't say that the raw log achieves that... and so prettifying without people needing to copy and paste things would be nice, especially if someone happens to have a framework set out already. ;) --Para (talk) 00:33, 5 December 2008 (UTC)
- We need a better way to index tools on the Toolserver, as there's been probably someone who wrote a viewer already. File viewer is my rendition of a simple universal log viewer. Its doesn't have drop down lists or sorting, but that's what the table tools extension is for. — Dispenser 21:25, 20 December 2008 (UTC)
- Thanks, that looks just fine. Can you make the Javascript read a url parameter, so that a certain log file in the viewer would open when linked from here? --Para (talk) 17:35, 5 January 2009 (UTC)
- Back/forward doesn't work, but it can be wikilinked - tools:~dispenser/view/File_viewer#log:coord-enwiki.log. — Dispenser 06:21, 26 January 2009 (UTC)
- Thanks, that looks just fine. Can you make the Javascript read a url parameter, so that a certain log file in the viewer would open when linked from here? --Para (talk) 17:35, 5 January 2009 (UTC)
- We need a better way to index tools on the Toolserver, as there's been probably someone who wrote a viewer already. File viewer is my rendition of a simple universal log viewer. Its doesn't have drop down lists or sorting, but that's what the table tools extension is for. — Dispenser 21:25, 20 December 2008 (UTC)
- Right, I have no trouble reading it, but I was more thinking of casual users who stumble upon a link of things to do. The various errors are listed on Wikipedia:WikiProject Geographical coordinates#Coordinates search tool, with a link to a tool that's easy to read and allows easy access to the problem article. I wouldn't say that the raw log achieves that... and so prettifying without people needing to copy and paste things would be nice, especially if someone happens to have a framework set out already. ;) --Para (talk) 00:33, 5 December 2008 (UTC)
- It's formatted in Tab-Separated Format which can be imported into excel and then "filter"ed into a table interface. — Dispenser 16:42, 4 December 2008 (UTC)
- Ok, the results from the logging tool look good, much better than a simple catchall query that doesn't classify anything. Would it be possible to have a table interface to prettify the log and sort by error type? --Para (talk) 14:23, 4 December 2008 (UTC)
[edit] Tool link is not working
The tool link is curently not working. Dr.K. (logos) 05:31, 3 December 2008 (UTC)
- It's likely the result of the move to the new server. It's probably a disk caching issue since it work again when I reloaded it. — Dispenser 05:35, 3 December 2008 (UTC)
-
- I'm still getting "404 not found" after both F5 and CTRL-F5. --Philcha (talk) 17:38, 8 December 2008 (UTC)
- It seems to a be a server configuration problem, the new server, passes escaped URL directly to the rewrite script. I suspect it is related to a configuration setting as it was working at the beginning, but River's filed a bug and has since enabled a workaround solution. — Dispenser 02:21, 21 December 2008 (UTC)
- I'm still getting "404 not found" after both F5 and CTRL-F5. --Philcha (talk) 17:38, 8 December 2008 (UTC)
[edit] Table formatting
Hi. I've noticed a problem with some of the changes checklinks has been making in articles that use a table formatting different than a basic wikitable. It's converting them back to the basic wikitable. WP:ACTOR has endorsed and incorporated a more stylized table than this basic one and there is no option allowed to avoid this when running the tool. You ran the tool on Mark Wahlberg which left the filmography table reverted to this, although it had been updated to the new format [3]. When I originally began using checklinks, it didn't do this, and I used it routinely. Could this be removed or converted to allow the exclusion of this table changing? Thanks. Wildhartlivie (talk) 18:32, 26 December 2008 (UTC)
- They shouldn't be doing that, it screws with custom skins, increases article size, doesn't automatically update, and is inconstiant with the rest of Wikipedia. The code they used was from 21 June 2005 revision of {{prettytable95}}, the prettytable templates where deprecated in late 2005-early 2006. Some pages are still subst with the old code which is why the convertion code exists. Since the idea was neither well thought out or implement, I've posted wikiproject about changing this. — Dispenser 22:12, 26 December 2008 (UTC)
- I didn't report what was happening as a bug, but just as something the tool has started changing more recently. The change that was implemented by WP:ACTOR only changed the font size and the color of the table top - the filmography table itself has been in use since the beginning of the project, so it isn't like a drastic reworking has occurred from the original table used. A lot of projects use variations on tables in the course of their projects and in the past, when I ran the tool on those pages, wikitable changes weren't made. If projects aren't free to adapt changes in tables used by the projects, there is a problem, because something outside of those projects is dictating style. Meanwhile, as one of the handful of people who are consistently active in the project, I have to say, your recommendation for templates is a bit over my head. I don't really know what you are suggesting regarding them. I don't see a lot of difference between the markup that is being used and what is included in Help:Table. What I do know is that mandatory table changes by the tool will cause me to not use it like I have in the past. Wildhartlivie (talk) 23:57, 26 December 2008 (UTC)
- Before you changed it [4], it was a standard prettytable, it was revert [5] 2 months later, reverted back, and now changed to standard wikitable markup. This is the only project I've seen on this wiki that uses non-standard table just for styling. The reason given above are sufficient for any of the regulars at common.css to being running a bot to clean up the mess. Additionally, the documentation in Help:Table is old dated and includes hacks that we really don't want to pollute the data set with. — Dispenser 04:42, 26 January 2009 (UTC)
- I didn't report what was happening as a bug, but just as something the tool has started changing more recently. The change that was implemented by WP:ACTOR only changed the font size and the color of the table top - the filmography table itself has been in use since the beginning of the project, so it isn't like a drastic reworking has occurred from the original table used. A lot of projects use variations on tables in the course of their projects and in the past, when I ran the tool on those pages, wikitable changes weren't made. If projects aren't free to adapt changes in tables used by the projects, there is a problem, because something outside of those projects is dictating style. Meanwhile, as one of the handful of people who are consistently active in the project, I have to say, your recommendation for templates is a bit over my head. I don't really know what you are suggesting regarding them. I don't see a lot of difference between the markup that is being used and what is included in Help:Table. What I do know is that mandatory table changes by the tool will cause me to not use it like I have in the past. Wildhartlivie (talk) 23:57, 26 December 2008 (UTC)
[edit] <references />
The tool arbitrarily replaces <references /> with {{reflist|colwidth=30em}}. That is really bad, since there are valid reasons for using both, and thus it should not be automatically changed. (Generally <references /> is actually preferable if there is no particular reason to use {{reflist}}). So if someone could please stop it from making this change as soon as possible that would be great.
—Apis (talk) 18:30, 12 January 2009 (UTC)
- I totally agree. Please cease this practice. That kind of thing requires consensus to change. --Adoniscik(t, c) 20:17, 12 January 2009 (UTC)
- when is <references/> better? i've always thought {{reflist|colwidth=30em}} was better, to be honest. shirulashem (talk) 02:09, 21 January 2009 (UTC)
- Text size gets unnecessarily small and multiple columns usually makes no sense unless the page use harvard references or similar. The rest of the page is in one column. Bad typography that makes the reference section less accessible which is kind of counter productive on a site that wants to make information more accessible.
—Apis (talk) 13:58, 21 January 2009 (UTC)
- Text size gets unnecessarily small and multiple columns usually makes no sense unless the page use harvard references or similar. The rest of the page is in one column. Bad typography that makes the reference section less accessible which is kind of counter productive on a site that wants to make information more accessible.
- when is <references/> better? i've always thought {{reflist|colwidth=30em}} was better, to be honest. shirulashem (talk) 02:09, 21 January 2009 (UTC)
The controversy with this template will never be settled. The font size was made consistent (smaller) in IE, and users fired back. Many users keep upping the number of columns and others want hacks implemented in MediaWiki to add columns to every reflist. While I agree with the points raised by Apis, I believe consistency across pages is far more important.
Reflinks will only convert when told to use templates. Commonfixes (used in Reflinks, Checklinks, and PDFbot) will convert if the surrounding divs make the references smaller (i.e. no visual change). Commonfixes also applies a simple algorithm if more than 30 references it changes {{reflist}} and {{reflist|3}} into {{reflist|colwidth=30em}}. If less than 8 it will remove any columns. This is based on edits I have seen, so if somebody could bring me edge cases (even theoretical) I will try to improve this behavior. — Dispenser 05:39, 26 January 2009 (UTC)
- The tool changes <references /> even if the result is smaller text?
- There is no reason to reduce the fontsize for reflists when using a single column. I agree that colwidth is a much better option than a fixed number of columns and as far as I am concerned it is great if the tool changes {{reflist|3}} into {{reflist|colwidth=30em}}. However, it shouldn't change {{reflist}} and <references /> into {{reflist|colwidth=30em}}. Also 30em is kind of arbitrary and might not be a good number for most of the cases where multiple columns are actually desirable. If you could somehow have the tool check if the page is using harvard style or shortened footnotes that would be an indication that the page would benefit from multiple columns. If not <references /> would probably be better. Anyway, I doubt you would get community consensus for automatically changing reference type like this.
—Apis (talk) 08:38, 3 February 2009 (UTC)
-
- The behavior is almost compulsory for Firefox Wikipedians (only firefox users can see it) to change every reference section to multicolumn reflist. The 30em number was chosen from how it worked on different screen sizes; closely matching what people were hard coding. 20em might be good for the short footnote style, but coding something will be hard and some people improperly mix styles (long+short footnotes). Judging from the edits and discussions of hard coded multicolumn for IE users, that if we took a poll that most would support standardizing on reflist, but I digress.
- I’ve reviewed the original request for Reflink, realized it was only asking to use reflist when adding the references section, and change have the tool accordingly. By the way Checklinks never changed a regular <reference />. — Dispenser 05:58, 4 March 2009 (UTC)
[edit] talk pages
great tool. after you check a page, on the tools drop-down, if you click on "talk page" it appends "-talk" at the end of the articlename instead of appending "Talk:" to the beginning. shirulashem (talk) 02:02, 21 January 2009 (UTC)
Fixed, but since I drop pseudo-namespace support it mean things like Talk:Wikipedia:article will happend, sigh. — Dispenser 05:43, 26 January 2009 (UTC)
[edit] Not working?
Checklinks Says "No changes will be maded [sic]" and then does nothing when I click ok. --Closedmouth (talk) 07:53, 9 February 2009 (UTC)
Fixed, and now that area of the code's been cleaned up. — Dispenser 20:25, 13 February 2009 (UTC)
- Thanks! --Closedmouth (talk) 05:54, 14 February 2009 (UTC)
[edit] Deadlink error
HI, in this edit to D. B. Cooper, you erroneously flagged http://www.msnbc.msn.com/id/23801264/ as a dead link. I've fixed it. TJRC (talk) 02:45, 15 February 2009 (UTC)
- Sorry about that MSNBC seems to have flacky server software. — Dispenser 05:06, 14 March 2009 (UTC)
[edit] Can't "fix" redirect links anymore?
URLs that redirect to another page cannot be "fixed" anymore by the tool, even if we want to? Gary King (talk) 21:31, 17 February 2009 (UTC)
- As I’ve explained on the reflinks discussion there is no benefit to replacing redirects. Despite the warning, the button was a temptation (like advisor's fix button), so there was misuse of replacing with 404 pages. The copy and paste method, however, still works. — Dispenser 16:09, 18 February 2009 (UTC)
[edit] Unable to expand links
After checking a page's links, I can't click on the links to expand and fix them. Has this functionality been removed to only provide reporting or is there an error? —Ost (talk) 16:11, 26 February 2009 (UTC)
Fixed, syntax error and nobody noticed for a week? Should I even be improving it if nobody needs to uses it? — Dispenser 05:04, 4 March 2009 (UTC)
-
- Thanks for the fix. I don't know about anyone else, but I appreciate it. I hadn't noticed it sooner as I was working off a page created weekly where the code was still working.
- I was also wondering, what is the expected behavior of checklinks on Category pages? The weekly pages and associated log do not appear to visit all of the pages in the category (e.g., Category:Top-importance Louisville articles). —Ost (talk) 14:22, 9 March 2009 (UTC)
Fixed, thank you. It was actually two bugs: category detection was not working and the fall back "list link" generator was skipping adjacent links. — Dispenser 15:26, 9 March 2009 (UTC)
- Thanks again for the fix. When you reran the tool for the pages last week it worked, but now the reports for the category pages are empty. Looking at the log, the tool seems to have removed the first letter after the category namespace: Getting [[Category:Igh-importance Louisville articles]].... Thanks, Ost (talk) 18:35, 16 March 2009 (UTC)
- I decided to release the source and wanted to clean it up before I did. Bugs happen. — Dispenser 19:33, 28 March 2009 (UTC)
- Thanks again for the fix. When you reran the tool for the pages last week it worked, but now the reports for the category pages are empty. Looking at the log, the tool seems to have removed the first letter after the category namespace: Getting [[Category:Igh-importance Louisville articles]].... Thanks, Ost (talk) 18:35, 16 March 2009 (UTC)
[edit] Localization in pt
Is it possible to use this tool in other languages? I would like to give it a try in pt:wiki to speed up review of Featured article candidates. GoEThe (talk) 16:07, 16 March 2009 (UTC)
- Interfaces messages are not stable for translation. But to get pages from pt:wiki you just need to prefix
pt:before the pagename, just like with interwiki links. — Dispenser 19:07, 16 March 2009 (UTC)
[edit] Problem with Link Checker error
I have no idea where this kind of thing should be posted, so please excuse me if this is the wrong place, and feel free to remove my comment.
The toolserver linkchecker tool (a wonderful tool btw) shows the following for a certain webpage:
Media type text/html; charset=UTF-8 is wrong for .xml files
It doesn't seem to make any sense to throw an error for this - firstly as far as I'm aware there's no rule or standard precluding the text/html mime type for any extension whatsoever, xml or other - extensions can/should be assessed completely separately from mime-types. Secondly, even if there were such a rule or standard, it would be the responsibility of webmasters to adhere to it, not Wikipedia editors referencing such webmasters pages. Many sites use various file-extensions for text/html pages completely legitimately, including their own custom file extensions or no extension whatsoever. Also, many apps use xslt, xsl to legitimately serve xml as html - mod-xslt is a good example. ɹəəpıɔnı 06:11, 23 March 2009 (UTC)
- This is the scenario the error is thrown: the tool opens a PDF file expecting a application/pdf media-type, instead it receives a “200 OK” with a media-type of text/html and it is actually a soft 404 error. The tool is designed to give warnings for the level it cannot determine to be either dead or good. So flags many things for human review. — Dispenser 16:43, 7 April 2009 (UTC)
- What about this scenario: the tool opens a .html file expecting a text/html media-type, it receives a “200 OK” with a media-type of text/html but it is actually a soft 404 error. The tool throws no error, because mime-type is really no indication of the likelyhood of such, in fact soft 404's are almost certainly more likely for .html files than any other. Aren't soft 404's out of scope of this tool really? I should imagine the amount of false positives thrown are far greater than the amount of genuine errors of this kind. At the least XML could possibly be bundled in with html as a valid extension for text/html... Anyway, sorry if I came across as over-critical, honestly not intended. Thanks for responding all the same. ɹəəpıɔnı 17:36, 7 April 2009 (UTC)
-
-
- If the tool GET the page instead of obtaining the headers (HEAD) it does perform some basic content analysis. The ranking system has been in the need of an overhaul, but I have not found a good grouping system for the various errors. — Dispenser 15:26, 13 April 2009 (UTC)
-
[edit] Re:Checklinks
You reverted me adding your tool to WP:PW saying it was a duplicate to the pages you posted on WP:WikiProject Professional wrestling/Broken external links, however, you did forget that the WP:PW page includes FACs and GACs which are very important to right broken links. How can we make a page directly for them? Raaggio 03:33, 13 April 2009 (UTC)
- If you look at the broken external linkspage, you will see that they link to automatic scan of GA and FA article categories. It appears that the list at WP:PW is not kept up to date as there are 66 GA missing. — Dispenser 15:26, 13 April 2009 (UTC)
- FACs and GACs = Featured Article Candidates and Good Article Candidates. I want for an automatic update for the articles that are nominated before they get promoted. Raaggio 02:02, 16 April 2009 (UTC)
- That's a pretty bad miss read on my part. Using the htmlregex option you can selectively choose only the links preceding particular icons. So you have a few option for there inclusion. Select only GAN/FAC pages, select GA/GAN/FA/FAC pages, or select all main space links. If you opt for the latter two you should delete or redirect (or soft-redirect to the Checklinks sub page) the broken links sub page.
- FACs and GACs = Featured Article Candidates and Good Article Candidates. I want for an automatic update for the articles that are nominated before they get promoted. Raaggio 02:02, 16 April 2009 (UTC)
-
-
- htmlregex for selecting pages with the FAC or GAN image preceding the link:
-
<a [^<>]+? title="(Featured article nominee|Good article candidate)"><img [^<>]+?/></a> *<a href="/wiki/(?P<page>[^"]*)" title="[^"]*">
-
-
- htmlregex for selecting pages with an image preceding the link:
-
<img [^<>]+?/></a> *<a href="/wiki/(?P<page>[^"]*)" title="[^"]*">
-
-
- — Dispenser 03:12, 16 April 2009 (UTC)
-
[edit] Checklinks incorrectly claims journal subscription required
I used webchecklinks to verify Water fluoridation, and it complained about this citation:
-
- Sheiham A (2001). "Dietary effects on dental diseases" (PDF). Public Health Nutr 4 (2B): 569–91. doi:. PMID 11683551. http://journals.cambridge.org/action/displayFulltext?type=1&fid=1357436&aid=1357428.
saying "302 Journal subscription required". It's true that the URL in question causes the web server to respond with a "302 Moved Temporarily" HTTP result, but the download does then succeed, without requiring a subscription or registration. Just thought you'd like to know. Eubulides (talk) 06:22, 28 May 2009 (UTC)
- The Journal subscription thing is domain based so I'll have to see if its possible to improve it whenever I get around to refactoring that script. — Dispenser 14:38, 16 June 2009 (UTC)
[edit] Option to tag sites requiring registration
It would be convenient to have an option to call up {{registration required}} in addition to the dead- or spam-link options. LarryGilbert (talk) 08:00, 26 April 2009 (UTC)
- That's funny, I was just about to say the same thing. That would be great if you made it automatically put the {{registration required}} template next to any links that require registration. Logan | Talk 19:30, 21 May 2009 (UTC)
Declined The last time I checked the policy regarding registration sites was only concerned itself with the stuff in the external links section. The policy for that is to simply remove them and is part of the reason why detection is include in the tool. It looks to me that template is rather superfuical and companies many change their registration polices (New York Times) or is dependent on the IP address the user is connecting with (as is the case with many universities). I may in the future add support for custom templates. — Dispenser 10:23, 21 June 2009 (UTC)
[edit] Deadlink is alive
Hi, The reported deadlink at timesonline.co.uk is alive when you click on its link and on the link on the page. Drop me a line on my talk in case you can't reproduce this and it seems to be a domain/browser/OS related issue. Cheers (Cool tool btw.) Enki H. (talk) 04:27, 18 June 2009 (UTC)
- Not a bug The server is flaky; the first time I opened that link I was greeted with a 404 Not Found message. I waited for a minute before trying again and finally got the article. This sort of behavior is not uncommon (see #False deadlinks? above) and is likely related to some timeout out issue on the hosting server. I get around to rewriting the core I will be adding something to indicate this flaky behavior. — Dispenser 10:23, 21 June 2009 (UTC)
[edit] URLs with user+password
I recently ran checklinks on Oxygen toxicity and it flagged this citation:
- <code>{{cite web |url=ftp://downloadfiles:decompression1@ftp.decompression.org/Baker/Oxygen%20Toxicity%20Calculations.pdf |format=PDF|title=Oxygen toxicity calculations |author=Baker, Erik C. |year=2000 |accessdate=2009-06-29 }}</code>
- Baker, Erik C. (2000). "Oxygen toxicity calculations" (PDF). ftp://downloadfiles:decompression1@ftp.decompression.org/Baker/Oxygen%20Toxicity%20Calculations.pdf. Retrieved 2009-06-29.
I guess checklinks can't handle URLs of the form "ftp://username:password@domain/..."? Thought I'd mention it in case you have time to fix this. Eubulides (talk) 16:37, 1 July 2009 (UTC)
[edit] File-->Image
Checklinks is 'correcting' links to the file namespace by changing them to the 'image' namespace. While totally innocuous, it is unnecessary and may cause confusion down the road. Thanks for the tool, love it otherwise. Protonk (talk) 22:11, 13 August 2009 (UTC)
[edit] Minor spelling changes
Here are some suggested spelling changes for the English language messages.
1. Would you like to run reflinks.py bot script to attempt add missing title on external links and combine idenitical refernces
I suggest changing it to:
Would you like to run reflinks.py bot script to attempt to add missing titles for external links and to combine identical references?
2. Change Excessed to Exceeded in "Excessed redirect limit".
It appears when the tool tries to follow the link:
http://darwin-online.org.uk/content/frameset?itemID=F373&viewtype=text&pageseq=506
from Thomas Henry Huxley.
The links all work, but darwin-online.org.uk does take several seconds to display the section of a large page, in case that is the reason. -84user (talk) 09:16, 16 September 2009 (UTC)
[edit] cite.php update
The cite software has been updated to allow definition of references within the reference list. See Wikipedia talk:Footnotes#cite.php update. Both Checklinks and refTools fail when this style is used. See Arthur Rudolph for a sample and Help:Cite messages for the new error messages. ---— Gadget850 (Ed) talk 18:59, 17 September 2009 (UTC)
[edit] Manual changes do no longer work
For example: on the check for Maurice Garin, there is currently a 302 error on "Trans-Alpine du Livre, Vallée d'Aoste, Article on Maurice Garin [transalplivre.eu]". This is a redirect to a wrong page, so I tried to report the link as dead. I did that the usual way (clicking the plus, and changing the operation to "{{dead link}}"). Then I click "Save changes". I get a message saying that no changes will be made. This happens with every page and every option I try. It seems like the script only uses the default actions, and not the ones that are changed by users. I tried it on three different browsers (Firefox3.0, IE7.0 and Chrome) and all have the same effect. --EdgeNavidad (talk) 09:48, 2 October 2009 (UTC)
- I can confirm that the "Save changes" no longer has the effect of making the manual changes. I just tried it on Lulu (company) and none of my manual changes were recognised. -84user (talk) 14:16, 4 October 2009 (UTC)
Now it seems to work again. --EdgeNavidad (talk) 17:14, 5 October 2009 (UTC)Fals hope, it still does not work.--EdgeNavidad (talk) 17:16, 5 October 2009 (UTC)
I have picked up development again on Checklinks, this unfortunately means lots of stuff will break as I attempt to redesign and rewrite the JavaScript interface. And likely wont work in IE for a while after. The goal is to increase automation and usability.
Automation will need some backend changes such as knowing how long a link's been around. It will also know if WebCite or the Wayback Machine has archive copies and automatically replace those instead of just tagging them with {{dead link}} if there a copy close enough to the access date. Possibly direct saving without previewing.
Since most users encounter this tool through article review processes, it is often overlooked that it can be used to modify articles. The basic are to change icons to text, reduces clicks needed to get things done by enlarging/removing container and adding quicklinks. Another addition will be adding more contextual help and make it clearer why some tools are provided.
So while I'm not done, feedback about the design and any other ideas is welcomed. — Dispenser 04:33, 6 October 2009 (UTC)
- This is a great tool, and if you are working to improve it, great! --EdgeNavidad (talk) 06:38, 6 October 2009 (UTC)
- I am glad that improvements are to be made, however at present using Firefox 3.0.14 it doesn't work, i.e. tagging dead links, links to Wayback Machine, etc. Is it possible to reinstate the previous version, with a link to the under development version? Jezhotwells (talk) 20:17, 21 November 2009 (UTC)
[edit] Bug: Removes parantheses in article name
When clicking on the link on a WP:FLC page of an article which has parantheses in the name (e.g. Wikipedia:Featured list candidates/List of National Treasures of Japan (paintings)/archive1), the parantheses get removed. Checklinks and similar tools search for the name without parantheses to which generally no article exists. bamse (talk) 17:22, 11 November 2009 (UTC)
Works for me on three different browsers using both direct links and copy & pasting. — Dispenser 06:00, 15 November 2009 (UTC)
[edit] Make the + (expand) button switch to a - (reduce) button when it opens
First of all, YAY, what a great tool! I'd been wanting something like this for a while...
Now, for the enhancement request: It'd be nice to be able to close the info windows from the same place as is used to open them (i.e. the + button and/or the (info) link). I (now) figured out that I can close the info window with the x in the right corner, but it'd be better to not have to reach all the way over there. I may look into the code and see if I can hack up a patch. JesseW, the juggling janitor 04:35, 10 December 2009 (UTC)