User talk:Dudemanfellabra

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A couple of things[edit]

  1. Despite your edit, lines with multiple refnums are still encountering gibberish. See National Register of Historic Places listings in Butler County, Kentucky for an example; I just now edited the page, so it's not a caching thing.
  2. Is there any chance that we could have a tracking category for lines (other than Image and Summary) that have parameters with no information whatsoever? While adding the date_delisted parameter to all Pennsylvania counties, I noticed that many entries didn't have refnums. I'd find it quite a headache to look through tons of entries on tons of lists to catch the occasional entry with no refnum, so a tracking category would surely be simpler. You've already set it up to display as <Ref. # missing>, so I imagine that getting it to transclude a tracking category in the same situation would work the same way.

Please don't take this as a complaint, either "you're slow" or "you're making a mess" or anything else — I'm just bringing this up in case you have the chance to work on it at some point. Nyttend (talk) 21:07, 2 March 2014 (UTC)

No offense taken.
  1. I just edited the Butler County list to fix the problem. Multiple refnums should be separated by commas or comments per the documentation.
  2. The bot is supposed to output a list of all articles with missing/misformatted refnums to User:NationalRegisterBot/NRISOnly like it does everything else, but it's not finding any. Can you show me an example the next time you find one of a site that doesn't have a refnum and isn't caught by the bot? As you expected, adding a category was easy (in fact, there was already a Category:NRHP list missing refnum from old WLM stuff that I just hijacked), but if functionality is not working in the bot, I'd like to fix it as well.--Dudemanfellabra (talk) 21:34, 2 March 2014 (UTC)
Thanks! I had no idea that the separator was significant. Lancaster County, Pennsylvania has a missing refnum (Jackson's Mill Covered Bridge, the first former listing), but it doesn't appear at User:NationalRegisterBot/NRISOnly, and its only appearance in the bot's userspace is on User:NationalRegisterBot/AllNRHPPages/Duplications regarding three border-straddling sites. Nyttend (talk) 21:46, 2 March 2014 (UTC)
Ah, yes my bot doesn't look at former listings at all, so any of those that are missing refnums won't be picked up by it. Looking through the category, it seems most lists are there because of a former listing. If you find one that is a current listing that is missing its refnum, let me know about that. I just ran the bot and updated everything and got no missing refnums, so to my knowledge you shouldn't find any. As for the Jackson's Mill bridge showing up in the bot's userspace, where do you see it exactly? I can't find it anywhere on the linked page, and really it shouldn't be there at all since as I said my bot ignores former listings all together.--Dudemanfellabra (talk) 05:38, 3 March 2014 (UTC)
Unclear antecedent — I meant that the Lancaster County list doesn't appear in the bot's userspace except in the context of border-straddling sites. Nyttend (talk) 06:05, 8 March 2014 (UTC)

Wise County and Norton, Virginia[edit]

Another Virginia anomaly for you. The (two) listings for the independent city of Norton, Virginia were combined with those of surrounding Wise County, Virginia. I have separated the two in National Register of Historic Places listings in Wise County, Virginia (and redirected National Register of Historic Places listings in Norton, Virginia there, away from National Register of Historic Places listings in Virginia), but your files or the script may need updating. (N.B. the other zero listing in Virginia, Poquoson, Virginia, is surrounded by York County, but seems to actually not have any listings at this time.) Magic♪piano 17:50, 3 March 2014 (UTC)

Thanks for letting me know again. As the list was set up, it would have counted the norton properties correctly, but it would have incorrectly recounted the norton properties as the wise county properties. The reason this happens is National Register of Historic Places listings in Norton, Virginia is a redirect and thus my code looks for a section on the page titled "Norton" and pulls the data from there. National Register of Historic Places listings in Wise County, Virginia is not a redirect, so the script just takes the first table on the page, which in this case happens not to be the correct one. A similar situation is that of National Register of Historic Places listings in Pierce County, Washington and National Register of Historic Places listings in Tacoma, Washington, the latter of which is a redirect to a section on the former page. Tacoma is listed below the rest of the county, so everything works with the script. I've just edited the Wise County page to drop Norton below the county listings, and if it stays like that, the script will work on its next run. If that is not satisfactory, the other two options would be to 1) move the page to something like National Register of Historic Places listings in Wise County and Norton, Virginia and have the Wise County link redirect to that as well or 2) move the Norton listings to a separate page, avoiding redirects all together. In the first case, since Wise County would be a redirect, the script would look for a section titled "Wise County" and successfully find it. In the second case, there would only be one table on each page and no redirects, so everything would work as well. I think my solution is a bit less of a hassle, but either will work. Thanks again!--Dudemanfellabra (talk) 18:47, 3 March 2014 (UTC)
Thanks for dealing with this, both of you; I noticed this a while ago but forgot to do anything about it. By the way, I don't think the Country Cabin is actually in Norton; it appears to be in rural Wise County just outside of Norton, so it should be in the other list. As to whether they should be listed on the same page or not, I remember there was a long discussion over the name of the Prince William County list, but that was back when a certain editor turned every discussion into a major dispute, so it may be time for another discussion about it at WT:NRHP. TheCatalyst31 ReactionCreation 22:13, 3 March 2014 (UTC)

‎NRHP in Washington[edit]

It's simply that these images skew the statistics, making it seem as if we've gotten photos at sites for which we have nothing. Vaguely comparable to putting together a stub on an MPS and then using it in place of links to nonexistent articles, e.g. how List of the 1733 Spanish Plate Fleet Shipwrecks is linked at National Register of Historic Places listings in Monroe County, Florida. Someone could even pad the stats by writing a slightly nonstub MPS article just to save effort on sites with documentation and link it, e.g. hitting "undo" on this edit; for practical purposes, it's no different from adding these image links. Nyttend (talk) 03:45, 8 March 2014 (UTC)

@Nyttend: I realize that; I was just pointing out that big decisions like that tend to make people unreasonably mad haha. I also notice you just split out Norton, VA from Wise County.. regarding that, you should look at the section above this one.--Dudemanfellabra (talk) 05:31, 8 March 2014 (UTC)
Actually, I split it because I'd looked at the section above this one. Basically, I can't remember any other pages like this; in my memory, every list is either dedicated to a specific county/countyequivalent (or a piece of one), or it's on the statewide list. Combining multiple county/countyequivalent lists in a single list that's not the entire state list is something I can't ever remember seeing, aside from a few states that were once letter-split, e.g. National Register of Historic Places listings in Missouri, Counties L-N. I'm not fond of tiny list pages, so I might be more in favor of putting tiny lists like Norton back into the statewide list, but I disagree with putting it with the county because it's no more a part of Wise County than a part of Accomack County or the city of Lexington. Nyttend (talk) 06:01, 8 March 2014 (UTC)
But geographically it is entirely surrounded by the county, so in my eyes it's kind of a special case. Having them on one page makes something like the map of all coordinates look better, i.e. without any holes. Personally I don't care either way because my script can handle both, but leaving cases like this one and the aforementioned Prince William County together can at least be somewhat justified from this viewpoint.--Dudemanfellabra (talk) 06:59, 8 March 2014 (UTC)

Jesse Whitesell House and Farm[edit]

I just saw your WT:NRHP discussion with Orlady regarding the Jesse Whitesell House and Farm. As the photographer for the images currently in the article, I can tell you that it's rather confusing on the ground, too; I wasn't quite clear what I should photograph in order to get elements of both the original and the increase. If I correctly understand your words, I agree with what you've said: although it was originally located just in Kentucky, it needs to be listed as a duplicate because the increase causes the listing to include resources on both sides of the border. Nyttend (talk) 04:00, 18 March 2014 (UTC)

If that is the case, then we need to make the county lists have the same reference numbers so that my code will pick it up. If one has the original refnum and the other has the increase refnum, the code won't pick it up as a duplicate.--Dudemanfellabra (talk) 08:39, 18 March 2014 (UTC)

Update to the Progress Script[edit]

I'm posting this here because I didn't want to write it twice on each of your talk pages haha, and I didn't want to bother the entire project just to talk to you two, User:Nyttend and User:TheCatalyst31 (I'm hoping that ping will alert you to this?). The reason I only want to talk to you two is because you're the only other people I've seen occasionally use the progress script to update the Progress page. I've been working for the past day or two on an update to the script which uses a different method to scrape the data from the county lists than before and in turn dramatically speeds up the process. Instead of taking (on my slow connection) roughly 2-2.5 hours, I now consistently get runs of about 45-50 minutes, and I expect them to be even faster when I go back next week to my faster connection.

I'm still not convinced that I have worked all the bugs out, though, so I haven't actually edited the progress script with these changes. The current test code is at User:Dudemanfellabra/Sandbox.js and the output is at User:Dudemanfellabra/Sandbox. Comparing that output to what's currently on the Progress page (made convenient by this dif), they roughly match, although there are some small differences. Some of those differences are due to the fact that the updates are 3 days apart, and one would expect there to be differences due to new article creation, etc., but some I believe due to the magnitude of them (i.e. the total number of listed sites in the entire country dropped by ~100) are due to the different approaches to the code. I am beginning to look through to compare the data to what the NRHPstats script outputs on the individual county lists as well as what I can manually tabulate, but I figured three sets of eyes is better than one. Would either/both of you care to help me look over this?

To be honest, I'm actually more inclined to trust this newer data because of the new way I handle in-county duplications, but the new code uses some complicated regular expressions to extract the data from the wikitext whereas before I was just using the processed HTML (the processing of which was what led to the long wait time), so maybe those regex's miss some listings that the old code doesn't? One possible reason would be hard-coded table rows, which my new code wouldn't catch (it only looks for transclusions of {{NRHP row}}). Those shouldn't exist, though, because if they did, my bot would (in theory) catch them and report them as having no refnum on the county list since that was only introduced recently via the row template. The thing I don't like is that the number of total sites reported by the new code is lower than what's given on United States National Register of Historic Places listings, which I trust to be the most accurate of the three numbers. Then again, this may be due to the newly generated duplicates differing in many states to what was on the Progress page before automation, which was usually just a copy and paste extension of what was on the relevant state list.

If I can get this working in an acceptable manner, I'll hopefully apply the same technique to the bot code itself. Currently that code takes anywhere from 5-7 hours depending on my connection to run, so I would expect to at least shave an hour or so off of that. Most of that time, though, is spent querying individual pages to see if they need to be tagged with NRIS-only, so that won't be sped up at all. Either way, some improvement is better than none, so I'll take it! Thanks for you guys' continuous help!--Dudemanfellabra (talk) 21:44, 26 March 2014 (UTC)

I think I see what's causing Washington to lose 100 listings. For whatever reason, the Tacoma and Spokane sublists are on the same page as the rest of the county listings, and the new script is counting the same list twice instead of counting both lists. I'm not sure why it's doing this when it's not doing that for any other page with multiple tables, but it is. (As an aside, why are those two lists set up like that in the first place? The whole point of splitting out sublists is to cut down on load time and page size, and leaving the list on the same page does neither.)
I also noticed that the number of untagged pages jumped by 30, which strikes me as odd. In Illinois, one of the untagged listings is Chicago, Burlington & Quincy Railroad Depot (Wyoming, Illinois), which is conveniently the only listing in its county, and it definitely has a project tag. I think this is more likely a coding error than a bunch of new untagged articles. TheCatalyst31 ReactionCreation 22:32, 26 March 2014 (UTC)
I can always count on you to find these things haha.. and quickly. Thanks for that. The problem with Spokane and Tacoma is rooted in how I find the section for sublists. I use a regex that looks for any section that ends with the county/city name rather than consists entirely of it. The reason I do this is because on many state lists, the title of each county's section links to the county article itself, i.e you there is something like "==[[Pierce County, Washington|Pierce County]]==". If I just used a regex that looked for the county name like "==Pierce County==", it wouldn't match correctly. To make it work, I ignore the first half of the section title and only check if it ends with the correct name. Because the section titles for Pierce County outside of Tacoma and Spokane County outside of Spokane both ended with "Tacoma" and "Spokane" respectively, my code matched them incorrectly and didn't make it down the page to the correct section. To remedy this, I've added parentheses to the incorrect section titles so that my regex won't match them.
That aside, I agree that Tacoma and Spokane are a little weird in that they are not on separate pages. I think I brought this up once before, but I don't want to go digging. I wouldn't complain if someone moved them out haha, but I feel like whoever the editor was that did that did it for a reason.
As for the Illinois untagged article, the wikitext had as an article name "Chicago, Burlington %26 Quincy Railroad Depot (Wyoming, Illinois)". This is the URL encoded version (i.e. "&"="%26") of the article name, and is unnecessary and uncommon to have in wikitext. I changed the text there, but just to be safe for any other articles for which this might be the case, I added some code to automatically decode any article titles. The good news is that this is a problem with that specific article and not with the code as a whole. I'll rerun it now to see if that is corrected. Thanks again!--Dudemanfellabra (talk) 00:09, 27 March 2014 (UTC)
And sure enough on the next run, the Washington and Illinois issues were taken care of. The numbers in that diff are a little more believable than before. I'm even willing to explain the differences between the stub/start/untagged numbers from old to new by the application of my new method of counting duplicates, which I believe to be more accurate (though I have no experimental proof.. only theoretical justification). I'd love to find some actual justification of that statement haha.--Dudemanfellabra (talk) 04:21, 27 March 2014 (UTC)
Illinois still has an untagged false positive in Boone County, though I couldn't tell you which one. Though I suspect that part of the increase in untagged articles is due to your method of counting (Montezuma County, Colorado alone is responsible for nearly half of it, and those all look legit). TheCatalyst31 ReactionCreation 04:31, 27 March 2014 (UTC)
Update: I just fixed some redirect weirdness with the talk page for United States Post Office (Belvidere, Illinois), so that might have been the problem. TheCatalyst31 ReactionCreation 04:34, 27 March 2014 (UTC)
Ah yea that was the problem. The code would have looked at Talk:United States Post Office (Belvidere, Illinois) (the talk page of where the link in the list resolves to), which up until you just changed it was untagged. Now that you fixed that, it should fall into line. I also found another county in Washington that had the weird title thing going on (Thurston/Olympia), so I added parentheses to that one as well and will rerun the code. Not sure why I didn't catch that earlier.--Dudemanfellabra (talk) 05:00, 27 March 2014 (UTC)
Ok, while I was sleeping I updated the actual progress page to get better data to compare to. The data in this diff (Progress page on the left, new update on the right) is separated by only about 5 hours, the Progress page being the newer of the two, so a large majority of the differences between the two will be due to the different methods of counting used in each (though there are still probably a small number of differences due to editing in those 5 hours). The first difference I see in any list is that of Birmingham, Alabama, where everything matches except the number of Stub/Start+ articles. The old code (i.e. the data on the Progress page, which remember is 5 hours younger than what's in my sandbox) shows 29 stubs/25 Start+, and the new code shows 30/24. My visit to the page just now shows an NRHPstats output of 30/24, which matches with the new code (i.e. the older data, so it's not likely that something was downgraded from Start+ to stub in those 5 hours then upgraded back before my check just now). This is strange to me since the method for NRHPstats is the same as the old code, but maybe there's some weirdness going on here due to my old method, which again I believe to be inferior to my new method. My manual tabulation of the bluelinks on that page is below:
As you can see, these numbers match NRHPstats as well as my new code, but not the Progress page output. Not sure why the Progress page and NRHPstats don't line up, but this is at least some evidence in support of my theory that the new code is more accurate than the Progress page. I'll see if I can find any more case-by-case examples, preferably from smaller lists haha.--Dudemanfellabra (talk) 18:53, 27 March 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @TheCatalyst31: Sorry for the long silence, but last week at school was pretty gruelling (getting down to the end of the semester now), and I haven't had time to do anything until now. I'm still trying to figure out any differences between my new code and the old code, and the next major difference I find is Coconino County, Arizona, which shows on the Progress page as 25 stubs, 39 Start+ (which matches the NRHPstats script for me) and in my sandbox as 35 stubs, 29 Start+. I made a manual table like the one above for this county below:

As you can see, this matches the new code instead of the old one, leading me to believe the new code is better. I think what happened is the old code took the 11-time duplicated Lookout trees article and mistakenly counted it as Start+. The new code does a better job at the article titles attached to their ratings by looping through an array each time a rating is queried instead of assuming elements are going to line up. With this bit of evidence, I'm now willing to say the new code is superior to the old code, both in speed and in accuracy. I'm going to copy it in and run it now.--Dudemanfellabra (talk) 15:19, 5 April 2014 (UTC)

Infobox NRHP[edit]

I'm not absolutely sure that your edit is responsible, but either this edit or something else is causing embedded NRHP infoboxes to be misaligned—they're being treated as entries in the wrapping infoboxes, left-aligning with the text of the other entries rather than with the labels (or headings, if you will) of the entries. See, for instance, the example under "Embedding" at the template page. Could you look into this? Deor (talk) 11:57, 10 April 2014 (UTC)

Thanks for pointing that out. I had made the change to fix one problem another user had brought up, but it seems I broke more than I fixed. I reverted the edit.--Dudemanfellabra (talk) 13:14, 10 April 2014 (UTC)

The Late Show[edit]

Is it really a good thing that we're going to get Tripling Elephants on the Late Show? Be ready for a repeat of Wikipedia:Articles for deletion/Elephant (wikipedia article) and related Colberrorism :-) Nyttend backup (talk) 21:33, 11 April 2014 (UTC)

Haha I forgot you span my "Wikipedia life" and my Facebook one. For a second there, I was like OMG STALKER AAAAAAAHH!! But then I figured it out. Haha to be honest I had never heard of the Colbert-related stuff on Wikipedia. I generally stick to the NRHP and Meridian-related stuff.--Dudemanfellabra (talk) 13:58, 12 April 2014 (UTC)
Ah, sorry to give you the fright; I responded here because it would be easier to give you the Wikipedia links :-) Tripling elephants were a big deal right around when I registered, two years before you did, youngin'. Plus, it's been a minor administrative thing — see the protection log for Elephant. Just last year, I tried removing protection, but vandalism was instantly back, and I had to restore protection just two days later. Meanwhile, see my note to Orlady; I'm hoping to get a few photos for a few Alcorn County and Tishomingo County sites next week. Nyttend (talk) 04:14, 13 April 2014 (UTC)
Ah, I see. You'll be pretty far away from me, as I spend most of my time in Forrest County, Mississippi, and Tuscaloosa County, Alabama. If you ever take a trip down the I-59 corridor, be sure to notify me!--Dudemanfellabra (talk) 19:27, 13 April 2014 (UTC)
Will do, but it's hard enough justifying a "detour" through Mississippi on my way to Ohio; going halfway to the Gulf would be even harder to justify :-) I'm not going to create the |commonscat= tracking category without input, so it would help if you'd offer comments (both on the name and on the idea itself) at WT:NRHP. I brought it up just below your last comment on the subject. Nyttend (talk) 19:34, 13 April 2014 (UTC)

A possible oversight with NationalRegisterBot[edit]

While tagging some untagged articles, I came across this article, which should have been tagged as NRIS-only but wasn't. Do you have any idea how the bot skipped that one? TheCatalyst31 ReactionCreation 09:55, 13 April 2014 (UTC)

Thank you for finding that and pointing it out. There was a flaw in my code that didn't pick up named references which had a space between "name" and "=".. I guess I didn't see that as a valid input for whatever reason. On the Harper-Chesser House article, one reference was included as <ref name="nris">...</ref> and the other as <ref name = nris/>. Since one has a space and the other one doesn't (I had already corrected for the quotes and the different formats), they were treated as two separate references, the first named and the second unnamed. I've just modified my code to pick up that second reference as being named, so I'll re-run it and probably get quite a few articles of this type.
On a side note, I've updated the bot's code to use the new method I employed on the Progress page, thereby shrinking the run time from 6-8 hours to only about 3.5-4! Still working on ways to get it down even faster, but that's progress! Thanks again!--Dudemanfellabra (talk) 14:39, 13 April 2014 (UTC)
And just as I predicted, this patch found 11 previously untagged articles, not including the one you manually tagged. They've all been tagged now! Thanks!--Dudemanfellabra (talk) 19:24, 13 April 2014 (UTC)

List numbering bot notice[edit]

Thanks. For a number of reasons, most recently my recent trip to China, I haven't been able to work on NRHP articles in a few weeks, but I eventually do plan to get back to them. Daniel Case (talk) 16:59, 19 April 2014 (UTC)