Wikipedia:Reference desk/Archives/Computing/2014 October 18

From Wikipedia, the free encyclopedia
Computing desk
< October 17 << Sep | October | Nov >> October 19 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


October 18[edit]

UEFI reprogramming[edit]

I recently managed to install TianoCore's DUET on a USB, and I've come to wonder about the new Windows computers that feature secure boot. How hard would it be to disable the secure boot function by simply reprogramming the chip that holds the UEFI binaries? Or is that somehow digitally signed as well? — Melab±1 00:50, 18 October 2014 (UTC)[reply]

You do not have to modify the UEFI; you may simply disable Secure Boot, which is a choice that is available to you as a user. The procedure to disable Secure Boot depends on your hardware vendor.
If you were to actually modify the secure firmware (i.e. to "reprogram the chip"), you would effectively be executing a man in the middle attack on the security system, which is generally believed to be "prohibitively difficult" for a well-architected security system. To date, there are no widely-known published security flaws in the Secure Boot / UEFI trust chain; so you're on your own to implement such an attack. Nimur (talk) 01:05, 18 October 2014 (UTC)[reply]
Presuming the OP means desktop or laptops when they say computers then you're correct. However if the OP is including tablets, bear in mind Windows RT tablets can't have Secure Boot disabled AFAIK. Nil Einne (talk) 18:52, 18 October 2014 (UTC)[reply]

archive.org and nytimes.com[edit]

Using Firefox 33 on Windows 7, I have had an ongoing problem with the use of archive.org, aka Wayback Machine, with pages from nytimes.com, aka The New York Times. The Times has a paywall which, IIRC, allows 20 free articles per month. Tech support at archive.org is nonexistent.

When I try to access an archive of, for example, http://www.nytimes.com/2014/10/18/us/ferguson-case-officer-is-said-to-cite-struggle.html, I get the following:

Loading...
http://www.nytimes.com/2014/10/18/us/ferguson-case-officer-is-said-to-cite-struggle.html | 1:22:42 Oct 18, 2014
Got an HTTP 302 response at crawl time
Redirecting to...
http://www.nytimes.com/glogin?URI=http%3A%2F%2Fwww.nytimes.com%2F2014%2F10%2F18%2Fus%2Fferguson-case-officer-is-said-to-cite-struggle.html%3F_r%3D0

After a few seconds, the message swaps the two URLs. This goes back and forth ten times or so, and then it redirects to the archive of a nytimes.com login page. That URL is: https://web.archive.org/web/20141018033248/https://myaccount.nytimes.com/auth/login?URI=http%3A%2F%2Fwww.nytimes.com%2F2014%2F10%2F18%2Fus%2Fferguson-case-officer-is-said-to-cite-struggle.html%3F_r%3D5&REFUSE_COOKIE_ERROR=SHOW_ERROR

This might be somehow related to the fact that I have a registered account at nytimes.com, which gives me unlimited access to articles, but it happens whether I'm logged in or not.

I have seen hints that it might have something to do with Firefox rejection of cookies (including the stuff at the end of the URL above), but I think I have that set as liberal as possible:

Accept cookies from sites: [checked]
Accept third-party cookies: Always
Keep until: they expire

It's possible that the design of the Times paywall makes it incompatible with archive.org. That would be easy enough to determine, if a few other people could test the above scenario. But I would love to get this working, since I do a lot of work with nytimes.com references, and I like to archive whatever I can. ‑‑Mandruss (talk) 04:30, 18 October 2014 (UTC)[reply]

If you are a paying customer of nytimes.com, have you considered asking their technical support people for help? — Preceding unsigned comment added by 174.88.135.88 (talk) 05:32, 18 October 2014 (UTC)[reply]
Well I've been watching this page for some time and it seems that people often come here first, even when some kind of tech support is available elsewhere. And the reason appears to be that (1) it's easier, and (2) the answers are often better. My guess is that I would sit on hold for 15 minutes, only to hear, Wayback who?? If they can't find your symptom on their flip chart, go fish. ‑‑Mandruss (talk) 05:52, 18 October 2014 (UTC)[reply]
The same thing happens to me. It does look like the NYT paywall is incompatible with archive.org's web crawling.
Your nytimes.com login and cookie policy are irrelevant because you're only fetching pages from archive.org, not nytimes.com. Archive.org's spider's cookie policy is relevant, but I doubt there's anything they could do to fix this except maybe buying an institutional subscription from the NYT. The NYT could fix it, if they were so inclined. They don't seem opposed to archive.org crawling their site, or they would have just forbidden it outright, so maybe they will fix it if you report it. -- BenRG (talk) 07:47, 18 October 2014 (UTC)[reply]
Ok, but if I'm only fetching pages from archive.org, why are both URLs in the message nytimes.com URLs? ‑‑Mandruss (talk) 08:08, 18 October 2014 (UTC)[reply]
Because those are the URLs archive.org is trying to access. If archive.org wants to archive NYTimes, they need to access nytimes.com URLs, not archive.org URLs.... The informational message you refer to is a standard archive.org message shown when they received a HTTP 302 response, and shows up even when it's a simple 302 which works at the redirected URL. You can tell this by (amongst other things) the footer of the message which says

The Wayback Machine is an initiative of the Internet Archive, a 501(c)(3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form."

Other projects include Open Library & archive-it.org.

Your use of the Wayback Machine is subject to the Internet Archive's Terms of Use.

Anyway I can't speak for other examples, but in this case a 3 of the 11 attempts did actually work, in particular 02:39:35 [1] 04:04:09 [2] and 05:09:06 [3]. Well I can't actually say for sure none of the others didn't, it's possible that some of them did after part of the 302 redirect chain.
Nil Einne (talk) 14:07, 18 October 2014 (UTC)[reply]
  • @Nil Einne:Because those are the URLs archive.org is trying to access. - Would appear to contradict what BenRG said, unless "access" and "fetch" have different meanings.
  • Thanks for finding the 3 that worked. At least now I have archive parameters for that ref, which is an improvement.
  • How did you find those 3? By simply trying them one at a time?
  • So we see that it can work some of the time with nytimes.com. The question becomes, then, what makes the difference between success and failure? ‑‑Mandruss (talk) 19:45, 18 October 2014 (UTC)[reply]
What I said appears to be in concurrence or reenforcing what BenRG said, not contradicting it. You are only accessing/fetching pages from archive.org, that's what both of us said. But archive.org obviously needs to have (tried to) fetch/access the page at nytimes.com, otherwise they would have nothing to show you (they'd just say the page wasn't in their archive). Archive.org can't fetch/access a nytimes.com page from archive.org, they fetch it from nytimes.com to archive it. (Well technically they could archive an archive.org archive, but there's no reason to do that.)

As BenRG said below, they tried 11 times in the day (well 14 now). Archive.org got a 302 response most of those times, and they inform you of this with the page you see. After a few seconds, archive.org shows you their archive of the page the 302 redirected them to (I don't think it's always the same time but it appears to be here) which in this case is a 302 redirect itself.

3 of the 11 (not sure the 14) times they apparently didn't get a 302 response at all, and archived the page successful. The page you see in those cases is archive.org's copy of the page which they fetched/access from the nytimes.com URL. To be clear, the page you are seeing, as with the 302 redirects is very likely coming entirely from archive.org, but it was fetched from the nytimes.com URL by archive.org.

(In the case of the 302 redirects, they got a 302 response when trying to access the page, and the page they're showing you is an information page they're constructed to tell you what happened. Also to avoid confusion, the nature of HTML means it's possible some stuff came from outside nytimes.com as the HTML file and CSS and other files it references may have told them to fetch stuff from there. Further, archive.org aren't intended to be an anonymising or security proxy and may have bugs, so it's possible in certain scenarios your browser may fetch content from somewhere other than archive.org. However this isn't significant here.)

As for the 3 working copies, well I got ?lucky with one of them which was what made me realise some worked. In particular, after the first time I got in the redirect chain, I wanted to try again but I couldn't get back to the original archive.org page with the 11 copies. So instead of starting from archive.org's main page, I tried accessing the page using the URL bar visible on an archive.org archive. This redirected me to a working copy. I believe this was probably just an accident, although it's possible the URL bar at the top sends you to a version that wasn't a 302 redirect if one exists. Either way, after finding the first working copy, I opened all 11 pages in seperate tabs and then looked for any working copies. It's easy to find the time for these versions from the archive.org URL.

Nil Einne (talk) 12:46, 19 October 2014 (UTC)[reply]

The nytimes.com URLs were fetched earlier in the day by archive.org (before you requested them). It got a 302 response at that time, and archived that response. When you asked for the URL, it used the archived response. The same goes for the next 10 URLs in the redirect chain.
If you want to archive web pages on demand, try WebCite. See Wikipedia:Using WebCite. -- BenRG (talk) 03:19, 19 October 2014 (UTC)[reply]