Wikipedia:Village pump (policy)/RfC: Wikimedia referrer policy
RfC: Wikimedia referrer policy
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
What referrer information should Wikipedia send to an external website when a reader clicks on a link? (Originally initiated by Guy Macon) --Relisting. George Ho (talk) 01:37, 30 June 2017 (UTC) --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
Limitations: The Wikimedia foundation is not bound by Wikipedia RfCs, so this RfC is purely advisory. If there is a strong consensus for a particular referrer policy, a request will be made to the Wikimedia foundation to implement that policy. The request will contains the words "if possible" and "as far as is practical" to make it clear that we are saying what we want, not how to accomplish it.
Background:
Extended content
|
---|
Previous discussions:
Overview: When someone who is reading Wikipedia clicks on an external link their web browser may be given referrer information to be passed on to the external website. Depending on various factors, this information can range from telling the external site exactly what Wikipedia page the reader was on when they clicked on the link, telling them that the link was from Wikipedia but not telling them where on Wikipedia, to telling them nothing at all about the site that the link was on. For example, consider the case of someone reading the Wikipedia page at Bomb-making instructions on the internet#References who clicks on the link to The Low Cost Cruise Missile: A looming threat?. (example ignores our recent switch to HTTPS for clarity). Depending on how we configure Wikipedia, the aardvark.co.nz website (and anyone monitoring the connection between the user's browser and aardvark.co.nz) could be told that site/page the link was on (the referrer) was:
Furthermore, Wikipedia might send additional information that some sites may (or may not) choose to honor in various ways. These are discussed in the technical comments section below. Many web sites have a legitimate desire to know who links to them. Alas, spammers also desire to know who links to them so that they can refine their spamming. Most users, including those who fear surveillance by governments, corporations or criminals, desire to send the minimum amount of information that is consistent with them being able to use the website. History: (needs to be double checked; please post corrections in the comments section) Before 2011, Wikipedia was an HTTP site with full URLs sent in the referrer. In 2011, Wikipedia added support for Hyper Text Transfer Protocol Secure (HTTPS). Users who accessed Wikipedia with HTTP still sent full URLs in the referrer. Users who accessed Wikipedia through HTTPS and clicked on an HTTPS external link also sent full URLs in the referrer. Users who accessed Wikipedia through HTTPS and clicked on an HTTP external link sent no referrer information. In 2015, Wikipedia stopped offering HTTP and only offered access to the site with HTTPS. In February of 2016 Wikipedia added the following to the HTML code on every Wikipedia page: <meta name="referrer" content="origin-when-cross-origin"/> Various websites talk about this leaking information to an eavesdropper when a user clicks on an HTTP link, and some recommend... <meta name="referrer" content="strict-origin-when-cross-origin"/> For example, [1] says "Warning: Navigating from HTTPS to HTTP will disclose the secure URL or origin in the HTTP request." and "Likewise if you're thinking of using origin or origin-when-cross-origin then I'd recommend looking at strict-origin and strict-origin-when-cross-origin instead. This will at least plug the little hole of leaking referrer data over an insecure connection." |
What this RfC is not
- This RfC is not binding on anyone. It is purely advisory.
- This RfC is not a discussion of the technical details regarding what is and is not possible using current technology, which may change. It is an RfC about policy, not implementation. There is a section on the bottom for technical discussions such as what HTTPS does and does not allow, etc. All comments that focus on "how we should do it" rather than "what should we do" may be moved to the technical discussion section by any user.
- This RfC is not about links to other Wikipedia pages or to other projects that are under the control of the Wikimedia foundation. It is assumed that, as far as possible/practical, other WMF sites will receive as much information as they want to receive.
Questions
[edit]Question 1: As far as possible/practical, should referrer information contain full URLs? (en.wikipedia.org/page#section)
[edit](This was the status quo before we switched to HTTPS in 2015)
- Support And nobody was hurt, right? It is useful in cases such as an outreach museum/library/GLAM project finding out that its contributions are generating links back. However, for the security conscious, we could consider adding some opt-out, or we could add a mechanism for making some pages have more silent referrers. Both seem like a waste of our developers time, IMHO. --Piotr Konieczny aka Prokonsul Piotrus| reply here 05:41, 18 June 2017 (UTC)
- Support. Wikipedia is one of the most-visited websites on the internet. We shouldn't set a bad example in breaking the referer convention, which is that when linking from a HTTPS page to another HTTPS page the full URI (excluding userinfo and fragment, but Wikipedia doesn't use any in the URI anyway) is passed on as referer by default. Deryck C. 11:50, 19 June 2017 (UTC)
- If we included the domain and page, do not include in-page anchors (the
#section
part), absent an overwhelmingly compelling reason to do so. Compare this to subpoena'able library records; what book you check out may be discoverable, but no one is reading over your shoulder to examine what chapters and pages and paragraphs you read. If we continued including domain (and page?), I support the idea of user-configurable opt-out, and do not believe it to be a waste of developer time (especially since it's something a volunteer can do as a Gadget). With that, we don't need page-specific suppression; if you're in a jurisdiction that spies on you and would look askance if you were reading an article on transsexuals, or Jesus, or cocaine, or whatever, just turn on the opt-out. 06:35, 20 June 2017 (UTC) — Preceding unsigned comment added by SMcCandlish (talk • contribs) - Support This is a long-standing convention of the WWW and we shouldn't break it without a good reason to do so. I'm not persuaded by the reasons given below. If someone wants to create a single-use trackable URL, doing so is easy, especially given the proliferation of 'www.site.com/site.php?id=123456' type URL schemes on the web today. Conversely, if someone wants to not send referrer information, as part of a wider scheme of maintaining their privacy, the major browsers provide good ways to do that. In short: hiding this information doesn't help much unless it's part of a much broader effort to stay private, most of which we can't control. I also think this option is presented misleadingly; selecting this type of referrer information will only send the anchor part if the user has previously clicked on a link to that anchor (and so the anchor is part of the URL displayed in the browser bar). Nine times out of ten, when a user opens a page and just scrolls down, the anchor part will not be included. GoldenRing (talk) 12:11, 23 June 2017 (UTC)
- Comment This would seem to depend on page construction and the type of citations used. Presumably if short citations are used and the links appear in References section and Reference will be shown as the section name, but if long citations are used within reftag pairs then a different section will be shown. It seems unnecessary to me to treat the two differently, based on an editorial decision on which style of citations to use. -- PBS (talk) 17:42, 23 June 2017 (UTC)
- Support basically per GoldenRing; we want to be a good citizen of the web, and users who worry about their privacy have well known ways to do so. --GRuban (talk) 16:19, 26 June 2017 (UTC)
- Support — The information is often accessible by performing a search with the following query anyway:
insource:www.randomurl.com/random
- Very seldom does will this gives rise to more than a few hits — so we're not doing anyone any favors by pretending this isn't already public. Secrecy through obfuscation is neither real secrecy nor privacy in any meaningful sense. Carl Fredrik talk 14:46, 27 June 2017 (UTC)
- The above claim appears to be factually incorrect. Your insource search on randomurl.com simply tells the owner of randomurl.com that we link to it. Sending referrer information tells the owner of randomurl.com (and any police officer with a court order, and anyone who hacks his server logs) how many times it was clicked on and in many cases the exact Wikipedia page the visitor was on when he clicked on that link. As explained in detail elsewhere on this page, this really helps spammers to fine-tune their spam campaigns, and it allows law enforcement to identify, question, arrest, or even (in some countries) torture and kill Wikipedia users based upon what Wikipedia pages they read. Without referrer information, the owner of randomurl.com, the cop, and the hacker have nothing that tells them what Wikipedia page the visitor to randomurl.com was on, or even if they accessed Wikipedia at all. --Guy Macon (talk) 20:29, 27 June 2017 (UTC)
- You're skipping a couple of important points when you call this out as false:
- Since very few pages send no referrer it will often be safe to assume Wikipedia links make up a large portion of those with no referrer.
- If law enforcement is able to see that you visited a specific site, it will also be able to see that you visited Wikipedia. From there the step to looking up which article you visited isn't a big one – despite lack of referrer.
- Carl Fredrik talk 20:28, 19 July 2017 (UTC)
- You're skipping a couple of important points when you call this out as false:
- Please show evidence backing up your claim that "Since very few pages send no referrer it will often be safe to assume Wikipedia links make up a large portion of those with no referrer." The last time I checked, the number of users who access a typical site though a bookmark, typing in a URL, or by clicking on an HTTP site from a HTTPS (none of which contain referrer information) site are far larger than the number who access the site from someplace that sets meta name="referrer. Your claim appears to be factually incorrect.
- Your claim that "If law enforcement is able to see that you visited a specific site, it will also be able to see that you visited Wikipedia" is flat-out wrong. It only holds true if law enforcement is specifically targeting you at the ISP level. In the vast majority of cases, law enforcement is monitoring the piracy/bombmaking/terrorist/porn site and logging who visits it. In that case law enforcement has no way of telling that you visited Wikipedia -- unless Wikipedia sends referrer information. --Guy Macon (talk) 07:24, 20 July 2017 (UTC)
- No, once again you're making false assumptions. When it comes to links for specific pages, such as a newspaper story, a book, or a document – it is without doubt that most visitors will come from some other page. For these links, of which the most numerous case being references, the majority will not be bookmarked. Hence, it is your claim that is factually incorrect. Most of these links will normally send a referrer, and only a small subset will be navigated to directly. Just as an example, no one is going to type out
https://www.nytimes.com/2017/07/25/business/media/lyft-taco-bell.html
- Your claim is once again incorrect as without ISP-level surveillance law enforcement cannot at all be able to detect what pages you've viewed – making the argument moot. That is unless you assume they control specific sites, but that is farfetched, and it is extremely unlikely that they will react upon a referrer from Wikipedia from such a site. And not extremely unlikely as in:"there could be a couple of cases", but as in "if I jump from this building there is an infintessimally small chance I will fly" or "experimental physics accepts 5 Sigma, and could all be wrong". Carl Fredrik talk 17:22, 25 July 2017 (UTC)
- No, once again you're making false assumptions. When it comes to links for specific pages, such as a newspaper story, a book, or a document – it is without doubt that most visitors will come from some other page. For these links, of which the most numerous case being references, the majority will not be bookmarked. Hence, it is your claim that is factually incorrect. Most of these links will normally send a referrer, and only a small subset will be navigated to directly. Just as an example, no one is going to type out
- Your claim that "If law enforcement is able to see that you visited a specific site, it will also be able to see that you visited Wikipedia" is flat-out wrong. It only holds true if law enforcement is specifically targeting you at the ISP level. In the vast majority of cases, law enforcement is monitoring the piracy/bombmaking/terrorist/porn site and logging who visits it. In that case law enforcement has no way of telling that you visited Wikipedia -- unless Wikipedia sends referrer information. --Guy Macon (talk) 07:24, 20 July 2017 (UTC)
- Your opinion/speculation is noted, and I freely acknowledge that some of what I have written is also speculation. Please let me know if you have any actual evidence regarding what percentage of incoming links to your average website contain referrer information. Also, please let me know if you have any actual evidence regarding your claim that law enforcement getting a warrant for a website's logs "is farfetched". --Guy Macon (talk) 16:10, 26 July 2017 (UTC)
Question 2: As far as possible/practical, should referrer information contain full domains and pages? (en.wikipedia.org/page)
[edit]- If we included the domain, do not include page; it's intrusive. — SMcCandlish ☺ ☏ ¢ ≽ʌⱷ҅ᴥⱷʌ≼ 06:35, 20 June 2017 (UTC)
- Support as a subset of my support for Q1. Deryck C. 15:49, 28 June 2017 (UTC)
- Deryck, may you please convert
#
to*
at the beginning of your statement? The whole question hasn't gotten larger yet. Thanks. --George Ho (talk) 16:24, 28 June 2017 (UTC)- Thanks. You would've been welcome to do that for me as well. Deryck C. 09:48, 29 June 2017 (UTC)
- Deryck, may you please convert
Question 3: As far as possible/practical, should referrer information contain full domains? (en.wikipedia.org)
[edit](This is the status quo as of February of 2016, default HTTPS behavior was the Status quo before that)
Support since this is already the same level of information disclosed by usage HTTPS/SNI (slightly reworded after initial post). I think referrer information is a useful information about site-2-site relations, and i think it doesn't disclose so much reader behavior that we should be too concerned about it. Considering how many people were railing against the introduction of HTTPS as a default a couple of years ago, some of the comments in this RFC leave me absolutely stunned. It seems like everybody has jumped towards the other extreme now.. —TheDJ (talk • contribs) 12:20, 11 June 2017 (UTC)- I have stricken my vote (for personal reasons). —TheDJ (talk • contribs) 09:40, 19 June 2017 (UTC)
- I am pretty sure that this is not the default amount of information disclosed by HTTPS (but please double check and post any corrections; this is exactly the type of thing that it is easy to get wrong). If there is no meta referrer tag, the default is "no-referrer-when-downgrade". This means that if a link from an HTPPS website such as Wikipedia goes to an HTTP site, the site gets no referrer information at all, but if the link goes to another HTTPS site, the link gets the full URL including the page (not just the full domain) in the header.
- From [2]:
- "3.2. 'no-referrer-when-downgrade'
- The 'no-referrer-when-downgrade' policy sends a full URL along with requests from a TLS-protected environment settings object to a potentially trustworthy URL, and requests from clients which are not TLS-protected to any origin.
- Requests from TLS-protected clients to non- potentially trustworthy URLs, on the other hand, will contain no referrer information. A Referer HTTP header will not be sent.
- If a document at https://example.com/page.html sets a policy of 'no-referrer-when-downgrade', then navigations to https://not.example.com/ would send a Referer HTTP header with a value of https://example.com/page.html, as neither resource’s origin is a non-potentially trustworthy URL.
- Navigations from that same page to http://not.example.com/ would send no Referer header.
- This is a user agent’s default behavior, if no policy is otherwise specified.
- (Emphasis added). --Guy Macon (talk) 16:43, 11 June 2017 (UTC)
- From [2]:
- That's the referrer policy, which has nothing to do with HTTPS (other than that it is carried in side of it). I've already discussed below how SNI discloses similarly information below. —TheDJ (talk • contribs) 22:32, 12 June 2017 (UTC)
- Simple answer. You are wrong. Read the passage from w3.org above. Besides explicitly mentioning HTTPS and comparing it to HTTP in multiple places, "TLS-protected" means HTTPS. Referrer policy is an inherent part of HTTP and HTTPS, and is carried in the header. The meta referrer tag in the body simply allows finer control instead of accepting the default HTTP/HTTPS referrer behavior. --Guy Macon (talk) 02:53, 13 June 2017 (UTC)
- That's the referrer policy, which has nothing to do with HTTPS (other than that it is carried in side of it). I've already discussed below how SNI discloses similarly information below. —TheDJ (talk • contribs) 22:32, 12 June 2017 (UTC)
- Support per my comment below about the benefits to the Wikimedia community of sharing this information, which I believe to be a net positive. Sam Walton (talk) 12:08, 2 June 2017 (UTC)
- Support—This is valuable information that enhances the visibility of Wikipedia to other sites, while providing little additional information about the involved user.--Carwil (talk) 18:59, 15 June 2017 (UTC)
- Oppose. It's not about providing direct additional information about the user, but enabling user-tracking tricks. Guy Macon covers this in detail below. — SMcCandlish ☺ ☏ ¢ ≽ʌⱷ҅ᴥⱷʌ≼ 06:35, 20 June 2017 (UTC)
- Support, first choice; this hides any information that could be of reasonable use to shady government agencies or spammers (outside of the realm of abject paranoia), yet still provides useful information for good-faith other third parties. Lankiveil (speak to me) 02:00, 23 June 2017 (UTC).
- Except that it DOESN'T hide the information! Consider the case where a Wikipedia user reads our page on Bomb-making instructions on the internet and then clicks on the link to Feinstein Amendment SP419 at Cornell university. A government agency then gets a court order giving them access to Cornell university's server logs or simply hacks the server to get the logs. If we are a silent referrer, the logs at cornell.edu will simply show that a particular person read [3], a perfectly innocent act. If we send domain-only referrer information, the logs at cornell.edu will say that that person clicked on a link to the Feinstein Amendment SP419 page while reading Wikipedia, and a link search using the tool we provide will show that the only link to www.law.cornell.edu/uscode/text/18/842 on Wikipedia is from our Bomb-making instructions on the internet page. So, by sending referrer information, we just turned the government knowing that a particular user accessed the text of the Feinstein Amendment SP419 -- a perfectly innocent act in itself – to the government knowing that a particular user accessed the text of the Feinstein Amendment SP419 while reading the Wikipedia Bomb-making instructions on the internet page. Don't forget the couple who were questioned by the police after he did a google search on "backpacks" while she did a Google search on "pressure cookers" in another room... --Guy Macon (talk) 23:10, 23 June 2017 (UTC)
- Or that the user clicked through on the link from the page on the act, or perhaps even from a link from this page. Sure, an incompetent law enforcement agency might jump to conclusions unsupported by actual evidence, but that's not our problem. Lankiveil (speak to me) 02:45, 24 June 2017 (UTC).
- Factually incorrect. First of all, "the page on the act" does not exist. There is no Wikipedia page on the Feinstein Amendment SP419. The only article with a link to the Feinstein Amendment SP419 page at Cornell is our Bomb-making instructions on the internet page. Second, from 10 February 2010[4] to when it was first mentioned in the RfC, the only Wikipedia page of any kind with a link to the Feinstein Amendment SP419 at Cornell is our Bomb-making instructions on the internet page. Any accusation that someone read our bomb-making page backed up with a server log from before this month showing that they clicked on the Feinstein Amendment SP419 page at Cornell from Wikipedia would be a slam-dunk for the prosecution. There simply is no other plausible explanation. BTW, I found three other examples of innocent pages that are only linked to from Wikipedia pages that you could be arrested for reading, but I can't link to them because that would make another place on Wikipedia where they are linked. --Guy Macon (talk) 07:44, 24 June 2017 (UTC)
- Explain please how a server log saying that the visitor came from "en.wikipedia.org" would demonstrate with "slam-dunk" certainty that the reader came from Bomb-making instructions on the internet, and not from the link to the same page that I've already posted on this page? Lankiveil (speak to me) 02:37, 25 June 2017 (UTC).
- This has already been explained to you, twice. You just aren't willing to listen. Again, up until we started talking about this (which only happened in the last 30 days), Bomb-making instructions on the internet was the only page on Wikipedia that had a link to the Feinstein Amendment SP419 page at Cornell. Someone accessing the Feinstein Amendment SP419 page at Cornell from any other link on the Internet -- including the one you posted -- would not place referrer information in the logs saying that they were reading Wikipedia when they clicked on the link. That's three times now that I have answered the same question from you. If you are not willing to accept the answer, I cannot help you any further, and will ignore you if you ask a fourth time. --Guy Macon (talk) 05:46, 25 June 2017 (UTC)
- Au contraire, I have listened and given you every opportunity to explain that this isn't a scenario that doesn't need a series of extremely improbable events to arise, and isn't easily foiled by the simple act of posting a link on a discussion page. You have not been able to do so, resorting instead to increasingly vitriolic responses as the holes in this poorly thought out idea are exposed. As you are clearly not interested in further discussions, I will not engage with you further on this matter. Lankiveil (speak to me) 10:42, 26 June 2017 (UTC).
- It still doesn't hide the information, as you have repeatedly claimed. --Guy Macon (talk) 20:34, 27 June 2017 (UTC)
- Lankiveil, unless you're willing to pepper Wikipedia with potentially hundreds of links from such sensitive articles and somehow also induce a lot of uninterested people to regularly click on them just to muddy the waters, you haven't mitigated this problem. Daß Wölf 01:44, 1 July 2017 (UTC)
- @Daß Wölf: The link doesn't have to be clicked on, it only needs to exist in order to introduce reasonable doubt. Lankiveil (speak to me) 02:57, 4 July 2017 (UTC).
- @Lankiveil: I wouldn't call the doubt it introduces reasonable, as all the mentions of government surveillance on this page have induced me (and likely everyone else) to do anything else but click that link. Moreover, even in countries with presumption of innocence, the idea of reasonable doubt doesn't enter the picture until far too late in the process, if at all. The operating principle of many surveillance agencies throughout the history has been to spy now and sort out the innocents later, and I see no reason to believe that this has changed for the better with the advent of big data and anti-terrorism. Daß Wölf 00:28, 6 July 2017 (UTC)
- @Daß Wölf: The link doesn't have to be clicked on, it only needs to exist in order to introduce reasonable doubt. Lankiveil (speak to me) 02:57, 4 July 2017 (UTC).
- Au contraire, I have listened and given you every opportunity to explain that this isn't a scenario that doesn't need a series of extremely improbable events to arise, and isn't easily foiled by the simple act of posting a link on a discussion page. You have not been able to do so, resorting instead to increasingly vitriolic responses as the holes in this poorly thought out idea are exposed. As you are clearly not interested in further discussions, I will not engage with you further on this matter. Lankiveil (speak to me) 10:42, 26 June 2017 (UTC).
- This has already been explained to you, twice. You just aren't willing to listen. Again, up until we started talking about this (which only happened in the last 30 days), Bomb-making instructions on the internet was the only page on Wikipedia that had a link to the Feinstein Amendment SP419 page at Cornell. Someone accessing the Feinstein Amendment SP419 page at Cornell from any other link on the Internet -- including the one you posted -- would not place referrer information in the logs saying that they were reading Wikipedia when they clicked on the link. That's three times now that I have answered the same question from you. If you are not willing to accept the answer, I cannot help you any further, and will ignore you if you ask a fourth time. --Guy Macon (talk) 05:46, 25 June 2017 (UTC)
- Explain please how a server log saying that the visitor came from "en.wikipedia.org" would demonstrate with "slam-dunk" certainty that the reader came from Bomb-making instructions on the internet, and not from the link to the same page that I've already posted on this page? Lankiveil (speak to me) 02:37, 25 June 2017 (UTC).
- Factually incorrect. First of all, "the page on the act" does not exist. There is no Wikipedia page on the Feinstein Amendment SP419. The only article with a link to the Feinstein Amendment SP419 page at Cornell is our Bomb-making instructions on the internet page. Second, from 10 February 2010[4] to when it was first mentioned in the RfC, the only Wikipedia page of any kind with a link to the Feinstein Amendment SP419 at Cornell is our Bomb-making instructions on the internet page. Any accusation that someone read our bomb-making page backed up with a server log from before this month showing that they clicked on the Feinstein Amendment SP419 page at Cornell from Wikipedia would be a slam-dunk for the prosecution. There simply is no other plausible explanation. BTW, I found three other examples of innocent pages that are only linked to from Wikipedia pages that you could be arrested for reading, but I can't link to them because that would make another place on Wikipedia where they are linked. --Guy Macon (talk) 07:44, 24 June 2017 (UTC)
- Or that the user clicked through on the link from the page on the act, or perhaps even from a link from this page. Sure, an incompetent law enforcement agency might jump to conclusions unsupported by actual evidence, but that's not our problem. Lankiveil (speak to me) 02:45, 24 June 2017 (UTC).
- Except that it DOESN'T hide the information! Consider the case where a Wikipedia user reads our page on Bomb-making instructions on the internet and then clicks on the link to Feinstein Amendment SP419 at Cornell university. A government agency then gets a court order giving them access to Cornell university's server logs or simply hacks the server to get the logs. If we are a silent referrer, the logs at cornell.edu will simply show that a particular person read [3], a perfectly innocent act. If we send domain-only referrer information, the logs at cornell.edu will say that that person clicked on a link to the Feinstein Amendment SP419 page while reading Wikipedia, and a link search using the tool we provide will show that the only link to www.law.cornell.edu/uscode/text/18/842 on Wikipedia is from our Bomb-making instructions on the internet page. So, by sending referrer information, we just turned the government knowing that a particular user accessed the text of the Feinstein Amendment SP419 -- a perfectly innocent act in itself – to the government knowing that a particular user accessed the text of the Feinstein Amendment SP419 while reading the Wikipedia Bomb-making instructions on the internet page. Don't forget the couple who were questioned by the police after he did a google search on "backpacks" while she did a Google search on "pressure cookers" in another room... --Guy Macon (talk) 23:10, 23 June 2017 (UTC)
- Support, this is a perfectly reasonable balance between privacy and useful functionality. Murph9000 (talk) 15:12, 23 June 2017 (UTC)
- Support. This is a reasonable compromise that protects the privacy of individual users as much as possible. Wikipedia should not expose its readers and editors to risk, but those demanding a restrictive solution based on a worst case scenario are demanding a level of privacy and protection for those users that does not exist anywhere else in the world. This is very much like insisting children cannot go on a particular field trip because they might be involved in a car accident, when they are vulnerable to the exact same risk of a car accident any time they travel in a motor vehicle. You are doing nothing to help those users because they are still vulnerable to this worst case scenario on every other webpage they visit. If your concerns about these users are genuine, instead work with the WMF or the EFF or other organizations to raise awareness of these risks and provide these users with the tools to avoid these risks on all pages, instead of quietly eliminating this tiny risk with Wikipedia while leaving them ignorant and vulnerable in regards to every other webpage they visit. Additionally, eliminating this tiny risk will do significant damage to this project by removing a significant metric which is used in Wikipedia outreach, partnerships with cultural institutions, and research. The health and continued viability of this project depends on efforts like these and damaging these efforts will put this project at much more of a risk than it and its users face from a hypothetical worst case scenario. Gamaliel (talk) 00:04, 24 June 2017 (UTC)
- "a level of privacy and protection for those users that does not exist anywhere else in the world"? It exists for every HTTPS page that links to an HTTP page, unless they override it the way the WMF did, and it existed for Wikipedia prior to February of 2016 when the WMF decided to override it without asking our permission. --Guy Macon (talk) 07:54, 24 June 2017 (UTC)
- Thereby returning us to the previous status quo. --Izno (talk) 12:33, 4 July 2017 (UTC)
- "a level of privacy and protection for those users that does not exist anywhere else in the world"? It exists for every HTTPS page that links to an HTTP page, unless they override it the way the WMF did, and it existed for Wikipedia prior to February of 2016 when the WMF decided to override it without asking our permission. --Guy Macon (talk) 07:54, 24 June 2017 (UTC)
- Support as a subset of my support for Q1. Deryck C. 15:49, 28 June 2017 (UTC)
- Support – I remain unconvinced by those who seek change. --Izno (talk) 12:33, 4 July 2017 (UTC)
- Support: Giving more information than this raises substantial user privacy issues. Okay, there are privacy issues with this option, but in a more contrived situation: we're not actively sharing potentially embarassing data about individuals that could be harvested by corporate or government entities. Giving less information is unhelpful to the sites that we link to, who are potentially our greatest allies. The change in attitude towards Wikipedia from academic publishers, and consequently from academics, that we've seen over the last several years, has come from those publishers looking at server logs and seeing where their traffic comes from. Don't underestimate the importance of this for our mission. MartinPoulter (talk) 17:16, 8 July 2017 (UTC)
- Support – Seems like a good compromise. Let's other sites know that they are actually getting traffic from us (which can be important for things like GLAM partnerships), but still protects user privacy. Kaldari (talk) 18:18, 18 July 2017 (UTC)
Question 4: As far as possible/practical, should referrer information contain partial domains? (wikipedia.org)
[edit]- Second choice. I'd prefer the symbolism of option 5—"Wikipedia doesn't give away any of your information" is an important tool in building trust—but I could live with this if the WMF really feel it's important that other sites see how much of their traffic is coming from Wikipedia. ‑ Iridescent 22:21, 10 June 2017 (UTC)
- Support it provides a service to webmasters to inform them that Wikipedia is the source of traffic to their website. Providing additional information on a per-user level is unnecessary and has risks to privacy. If it is technically easier to implement, I might support option 3. Power~enwiki (talk) 22:26, 12 June 2017 (UTC)
- comment: It should be noted, that this is not technically feasible. Not including the subdomain is not one of the options at: https://www.w3.org/TR/referrer-policy/#referrer-policies . BWolff (WMF) (talk) 23:35, 13 June 2017 (UTC)
- Given what BWolff said, this would appear to give us a choice between silent referrer and full domain name (with or without page info). — SMcCandlish ☺ ☏ ¢ ≽ʌⱷ҅ᴥⱷʌ≼ 06:35, 20 June 2017 (UTC)
- Please remember the "as far as possible/practical" part of the question. I purposely did not limit our possible !votes to what the specs allow. If it turns out that there is a strong consensus for this option (not likely at this point), we will have to come as close as we can using current technology and fully implement it as soon as the technology allows. --Guy Macon (talk) 14:33, 20 June 2017 (UTC)
Support this is one or 5.I support number 5 combined with number 6. This is one of the ways we show that Wikipedia matters to the wider world. This is how we get organizations like the NIH and WHO interested in working with us so losing this benefit would be unfortunate. Providing the number of people we send without associated IPs will do the trick though. Doc James (talk · contribs · email) 01:38, 23 June 2017 (UTC)
Question 5: As far as possible/practical, should referrer information contain no information? (silent referrer)
[edit]Support Question 5 ("Silent referrer")
[edit]- Support Here is the reason why, whenever possible, we should reveal as little as possible about what pages our users read or what links they follow:
- Locals Questioned by Suffolk County Police Department after Googling "Backpack and "Pressure Cooker"
- Google Pressure Cookers and Backpacks, Get a Visit from the Feds
- Update: Now We Know Why Googling 'Pressure Cookers' Gets a Visit from Cops --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
- @Guy Macon: Were those cases a result of referrer information or just a result of his employer having access to his work computer? 142.160.131.202 (talk) 06:58, 26 June 2017 (UTC)
- It looks like it was his employer, but the basic principle stands: if the police find -- using whatever method -- what pages you have been reading, they may question you. We can't stop employers from turning over information about pages visited to the cops, but we can make it so that a court order to show them the sever logs of the external site no longer reveals what Wikipedia page they were reading when they clicked on the link. --Guy Macon (talk) 07:49, 26 June 2017 (UTC)
- @Guy Macon: Were those cases a result of referrer information or just a result of his employer having access to his work computer? 142.160.131.202 (talk) 06:58, 26 June 2017 (UTC)
- Support The HTTP-Referer header is one of the basic security bugs of the web. While there are ways to suggest to browsers how to act, that is also a type of bogus opt-out feature software may or may not properly observe. It may be useful for users to use specialized software to forge those (pretending to originate from the destination site itself). However, I support doing what is possible to limit this issue for users with common software and default configurations. Another reason to support this is that link spammers are interested to know where clicks originate from. We already attempt to limit unnecessary tracking information from URLs themselves, and we use nofollow for search engine indexing (and noindex where appropriate), this is another step forward. Thank you for posting this RfC. —PaleoNeonate – 06:36, 3 June 2017 (UTC)
- Support Protecting our users' privacy is of paramount importance. We should share as little as possible with external entities, both as a matter of principle, as well as because of the meatspace consequences described by Guy above. James (talk/contribs) 16:45, 9 June 2017 (UTC)
- Support revealing as little as practical in most cases. I lack technical expertise, but I see this as being within the concept of Wikipedia as a free resource, and just as we protect the private information of editors, we owe it to our readers worldwide to feel that they can use us as a resource without undue fear of repercussions. I'd make some exceptions for complying with law enforcement in the countries where servers are located. --Tryptofish (talk) 23:55, 9 June 2017 (UTC)
- Support. We don't need referers. In fact, none of the Wikimedia sites need referers. They may be fancy HTML5 Web 2.0 stuff, but not all fancy HTML5 Web 2.0 stuff is useful. KMF (talk) 00:40, 10 June 2017 (UTC)
- @KATMAKROFAN: it does not have much to do with HTML5 or Web 2.0, it is part of the HTTP protocol since version 1.0 (1996) [5]. To disable it, users needed to use special proxies, modify their web clients or configure them to not send the HTTP-Referer header along with queries. More recently, it begins to be possible for sites to suggest referrer behavior to browsers which support this. This RfC is about making those suggestions to browsers to limit the security and privacy implications of referrers for users using common browser software with common (generally too permissive) default configurations. —PaleoNeonate – 03:58, 11 June 2017 (UTC)
- KATMAKROFAN, It sounds as if you are supporting Question 5: ("As far as possible/practical, should referrer information contain no information? (silent referrer)") In other words, no referer. Should your comment be moved to that section? --Guy Macon (talk) 20:34, 10 June 2017 (UTC)
- I went ahead and WP:BOLDLY moved KATMAKROFAN's comment and all responses to the proper section. --Guy Macon (talk) 14:17, 12 June 2017 (UTC)
- @PaleoNeonate: Is there not a Firefox/Chrome extension available for disabling it? 142.160.131.202 (talk) 07:00, 26 June 2017 (UTC)
- @142.160.131.202: that is possible, although it shouldn't be necessary since the configuration in about:config should also allow this (some strings vary/change with version, but some to look for: network.http.sendSecureXSiteReferrer, network.http.sendRefererHeader, network.http.referer.*). Proxies like Privoxy can also be used to forge them for a whole network. —PaleoNeonate – 07:08, 26 June 2017 (UTC)
- KATMAKROFAN, It sounds as if you are supporting Question 5: ("As far as possible/practical, should referrer information contain no information? (silent referrer)") In other words, no referer. Should your comment be moved to that section? --Guy Macon (talk) 20:34, 10 June 2017 (UTC)
- @KATMAKROFAN: it does not have much to do with HTML5 or Web 2.0, it is part of the HTTP protocol since version 1.0 (1996) [5]. To disable it, users needed to use special proxies, modify their web clients or configure them to not send the HTTP-Referer header along with queries. More recently, it begins to be possible for sites to suggest referrer behavior to browsers which support this. This RfC is about making those suggestions to browsers to limit the security and privacy implications of referrers for users using common browser software with common (generally too permissive) default configurations. —PaleoNeonate – 03:58, 11 June 2017 (UTC)
- Support – Protecting the privacy of those visiting our encyclopedia is much more important than reaping any benefit that comes from sharing referrer information. — Godsy (TALKCONT) 08:05, 10 June 2017 (UTC)
- Support – anyone can add a reference – which could be a front for surveillance, by not providing referrer information we help protect the privacy of our readers. — xaosflux Talk 21:12, 10 June 2017 (UTC)
- Support While I sympathise with the benefits of partners seeing the incoming traffic, and this encouraging them to contribute content, there are other ways of achieving this. And even if we made exceptions, the principle of being a silent referrer is important as outlined above. All the best: Rich Farmbrough, 21:37, 10 June 2017 (UTC).
- Support: especially considering latest developments towards surveillance and censorship in countries like, as non-exhaustive examples, Turkey or the UK, it would seem outright irresponsible for Wikipedia to provide information that could help identify users and what they have connected to, and in a very non-obvious way at that, since users typically expect privacy-conscious HTTPS sites not to leak details like the page or subdomain being visited. LjL (talk) 22:00, 10 June 2017 (UTC)
- Support From what I can make out with the technical details, the Turkish government could potentially see anyone coming in from *.wikipedia.org to a Turkish website and then block that IP's access to Wikipedia (if they somehow see that it is, in fact, a Turkish user). The WMF's goal was to have Turkey unblock access to Wikipedia, not reinforce their ability to do so. (Whether we as a community will support the WMF's statement remains to be seen, but I personally do support it regardless of whatever political problems we may face because of it.) — Gestrid (talk) 22:08, 10 June 2017 (UTC)
- I thought of another reason, which actually has to do with our current English Wikipedia policies. Webmaster X sees, hey, I'm getting a lot of traffic from Article A (or some variant of the web domain, as illustrated in the other options). That webmaster then tries to spam Wikipedia articles with links to his article because more clicks means more money/ customers/ etc.. Now imagine a bunch of webmasters doing that. Maybe even some companies. We already have some problems with undeclared WP:PAID violations by companies. (Specifically, I recall one SPI maybe the middle of or late last year where I believe it was brought to ANI, and I actually ended up emailing Wikipedia Legal about it to report the problem because it was a TOS violation. I got permission from them to post the email from them, if anyone wants to look it up, but I don't believe it's pertinent.). We don't need more violations of it because companies now know how much traffic is coming from us. — Gestrid (talk) 01:29, 11 June 2017 (UTC)
- Went ahead and found the ANI archive myself. Thanks to the WMF for upgrading Special:Contributions so we can search specific date ranges, by the way. The link to the ANI archive is Wikipedia:Administrators' noticeboard/IncidentArchive929#Undisclosed Paid Editing Farm. I was still pretty new to Wikipedia, so I apologize in advance for my older contribs. — Gestrid (talk) 01:53, 11 June 2017 (UTC)
- It's ingenuous to think that paid editing would be affected. If somebody is editing Wikipedia specifically to get traffic, they can just measure general traffic fluctuations on their websites without any need for referrer. As for the Turkish government, I would think that government surveillance has more powerful tools nowadays than something as "dumb" as referrers, but who knows. Nemo 16:22, 25 June 2017 (UTC)
- Support. I've never had much sympathy for the "privacy is our top priority" view—I opposed the switch to https, for instance—but I find the arguments in favour of referrer information to be utterly unconvincing. If the best argument that can be made is "some commercial organisations might not want to donate in kind if we're not giving them this information", that really doesn't add up to much. No disrespect to The Wikipedia Library, which has the best of intentions, but the number of people who would even notice if it disappeared could probably be counted on two hands at most; besides, I really don't buy the argument that other organisations won't support Wikipedia unless they're allowed to track incoming traffic, let alone that their potential loss outweighs the potential problems this could cause (and the potential negative press if the WMF lines up alongside Facebook, Google et al in the "our readers are the product" camp). ‑ Iridescent 22:15, 10 June 2017 (UTC)
- Support Jclemens (talk) 23:39, 10 June 2017 (UTC)
- Support. Wikipedia should protect its users -- and the consequences of a negative news story in which this feature is used against readers could be severe. That said, I think it makes sense to have computer experts look over the possibility of other ways of notifying sites that they are linked to. We have overall statistics for how many times articles are accessed. It would be possible to take all the articles that are accessed >1000 times monthly, look at all the references in them, and notify all the webmasters of all those sites that they are linked by Wikipedia but referrer (or "referer"...) information is not being provided, giving a report of all the pages that link to them and how much traffic each gets. That is, if there's a polite way of doing it that is not considered spamming, which is why I want computer experts to think of something. If need be, I
could put up(umm, wait, no -- see below) with a general citation to Wikipedia in the referrer (i.e. the previous option), which then could link directly to the report. Wnt (talk) 00:17, 11 June 2017 (UTC)- I believe that this information is already available, but of course we could make it more convenient to access it. For example, imagine that you own the American Institute of Physics website. Our linksearch tool tells you that exactly one Wikipedia page (Age of the universe) links to your The Expanding Universe: Theories of a Static Universe page, and our Pageviews Analysis tool tells you that Age of the universe got 145,317 pageviews in the last six months. Putting a partial domain (wikipedia.org) in the referrer would also allow you, the imaginary owner of the American Institute of Physics website, to figure out the IP address of every Wikipedia user who clicked on the link to your page and to know exactly what Wikipedia page they were reading when they clicked on that link. If the AIP website happened to be one of those pages where users log on and cookies are stored about that logon, this would allow the AIP to link the real identity of certain Wikipedia users with which Wikipedia page they were reading when they clicked on that link. Being a silent referrer would prevent this. --Guy Macon (talk) 11:41, 11 June 2017 (UTC)
- OK, you've convinced me -- let's not even have the domain there. You're right that it would only take a moment for a unique site to search itself on Wikipedia -- and a "sting" site, like the Russian government (or Dianne Feinstein!) looking for people reading about drugs, would obviously arrange to get itself added in one unique place for this specific purpose. Wnt (talk) 20:49, 11 June 2017 (UTC)
- Support Because referrer information only makes it easier for third party sites to correlate IP numbers with Wikipedia readers and editors; and because it potentially creates temptation to spammers by linking Wikipedia with page traffic. Neither benefits the project. Geogene (talk) 00:58, 11 June 2017 (UTC)
- Support. MER-C 02:47, 11 June 2017 (UTC)
- Support TonyBallioni (talk) 03:17, 11 June 2017 (UTC)
- Support This is an obvious security vulnerability. I can't believe there needs to be an RfC for something like this. A Quest For Knowledge (talk) 05:45, 11 June 2017 (UTC)
- Support any benefit, real or imagined, enjoyed by the WMF or the projects themselves, is outweighed by the chilling effect of knowing that encyclopedia browsing habits are being exposed to unknown actors. –xenotalk 10:41, 11 June 2017 (UTC)
- Support The entire referrer model is, at bottom, a technical option that for most of its existence was not actually optional, thus leading most users to accept it as a fait accompli. Now that it is optional, we should recognize that its two main uses are commercial and monitoring-related – neither of which Wikipedia should want any truck with. Potential negatives, outlined by various editors above, are very likely to outweigh any slight reciprocal/funding benefits. --Elmidae (talk · contribs) 11:44, 11 June 2017 (UTC)
- Support – we stop spamming of enforced referral links as much as possible (i.e., links with '&referrer=' type of parameters). There are enough of those examples where spammers add those parameters just to know the efficacy of their spamming. This is basically a hidden version of that same information. To protect the privacy of Wikipedia editors, to protect readers of Wikipedia, and to hide the efficacy of spamming attempts, it does seem best to hide where people are coming from. --Dirk Beetstra T C 12:39, 11 June 2017 (UTC)
- Support. In the current day and age of global espionage this is a glaring security hole. I'm astounded that this hasn't been fixed already. Daß Wölf 03:03, 12 June 2017 (UTC)
- Support The whole concept of privacy (which WMF claims to support, though they keep trying to weasel against it) is to withhold info from third parties unless users themselves decide otherwise on a case by case, individual basis. That Wikipedia should avoid revealing any referer info on privacy grounds to the extent possible should be a no-brainer. Getting referers also is very useful for SEO's, another source of disruption and bias in Wikipedia that we should avoid trying to help. (I think we shouldn't even disclose article view counts, partly for that same reason).
That said, on a technical level, I didn't even know there was a way to suppress referers from the server side, so it surprised me considerably to hear that we had previously been doing it. I remember trying pretty hard to figure out how to let people click links on some of my personal pages, without revealing the referring page.
I see the whole concept of referers as invasive, a leftover from the more communal and mutually trusting days of the academic WWW built at CERN. HTTP headers in those days also contained the user's email address: as you can guess, when the web went commercial, they had to get rid of that pretty fast. Referers should have gone at the same time. If I read an online article about how to fix my shoes using duct tape, and then go to the store to buy some duct tape for that purpose, the store is not entitled to know what article I read or what I'm going to do with the tape. That's precisely what referers do.
FYI note: in Firefox, you can turn off referers on the browser side by viewing about:config and setting network.http.sendRefererHeader to 0. It stops a very few pages from working properly but usually causes no trouble, so I do this and recommend it. 173.228.123.121 (talk) 06:58, 12 June 2017 (UTC)
- Support. Well, it is not really a Wikipedia thing: the referer field was an abomination from the start and should be killed with fire. Maybe a rogue browser will send the referer against our wishes, or tech-savy users have already deactivated it, but I see zero positive side to encouraging it to be sent. The only argument to send referers is the "Wikipedia Library" argument (see below), but sorry, that is just a hidden form of monetizing users' history, which I oppose on principle (but again, not a Wikipedia thing). TigraanClick here to contact me 16:22, 12 June 2017 (UTC)
- Support Yup. Only in death does duty end (talk) 16:27, 12 June 2017 (UTC)
- Support I initially leaned towards Q4 above, as I don't believe there are privacy issues there. But after considering the potential for increased link-spamming (as already discussed above), it would be best not to pass any information at all. -- Tom N talk/contrib 01:37, 13 June 2017 (UTC)
- Support. In the absence of any persuasive argument to the contrary, this seems smarter and safer. RivertorchFIREWATER 05:23, 13 June 2017 (UTC)
- Support Tell them nothing! The privacy of our users and editors is not negotiable – only a legally enforcible warrant or court order should allow the release of any user information. It is common knowledge that people in some countries are persecuted (and prosecuted) for accessing "prohibited" information, facilitating such persecution is morally reprehensible and arguably a human rights violation. We must also WP:DENY linkspammers and SEOers the "benefit" of their bad faith acts. Roger (Dodger67) (talk) 21:47, 13 June 2017 (UTC)
- Support Anonymity on all fronts. →Σσς. (Sigma) 06:54, 16 June 2017 (UTC)
- Support – Doing otherwise would just make it so that the corporations can refine their algorithms and make more money. That would also open Wikipedia to link spam attacks. So, I oppose anything else (except for option 8). RileyBugz会話投稿記録 18:29, 16 June 2017 (UTC)
- Support Privacy is paramount. Doug Weller talk 19:37, 17 June 2017 (UTC)
- Support Cas Liber (talk · contribs) 01:37, 19 June 2017 (UTC)
- Support: I was initially inclined to support including the domain name, but the case Guy Macon makes is compelling. I'm not swayed at all by the "visibility of WP" and "benefit to the community" claims. Any such good would be outweighed for the potential for abuse of several kinds (privacy invasion, but also link spamming, and COI editing). I'm also not swayed by Astinson_(WMF)'s note below that the domain info could be got by other means like man-in-the-middle attacks; the point is to not make it unnecessarily easy. I should still lock my door when I leave the house, even though someone could smash my windows to get inside, or bring a battering ram to knock the door down, by way of analogy. — SMcCandlish ☺ ☏ ¢ ≽ʌⱷ҅ᴥⱷʌ≼ 06:35, 20 June 2017 (UTC)
- Support. Reader privacy trumps link-target-server expedience. (We should also take care to avoid linking to nonce urls that give away where the link is coming from without need of a referrer, but that's beyond the scope of this rfc.) —David Eppstein (talk) 07:30, 22 June 2017 (UTC)
- Support per almost every support argument in this section.- MrX 12:15, 22 June 2017 (UTC)
- Support Privacy first. – Train2104 (t • c) 13:59, 22 June 2017 (UTC)
- Support As a side note: The arguments against this proposal seem to be "it doesn't give away important information that is given away anyway" and "it blocks many organizations from being able to use the irreplacable information they'd otherwise get." Those positions seem, at first blush, to contradict one another. --joe deckertalk 17:54, 22 June 2017 (UTC)
- Support Privacy first. The advantages of sending any referrer data skew overwhelmingly in favour of predatory government regimes and shady SEO types. No reason to help them out. Snuge purveyor (talk) 21:16, 22 June 2017 (UTC)
- Support Agree privacy for our readers is critically important. Could we provide to major sources the total number of page views from WP each month in aggregate? Doc James (talk · contribs · email) 01:44, 23 June 2017 (UTC)
- Whether we should provide aggregate totals is covered in the newly-added question #6. Whether we can provide aggregate totals should be discussed in the technical section. --Guy Macon (talk) 02:48, 23 June 2017 (UTC)
- Support. Privacy is paramount. Sławomir Biały (talk) 12:36, 23 June 2017 (UTC)
- Support. I can see no compelling reason for sending referrer info, but can think of a number of reasons why it might be unwise, esp. for people viewing Wikipedia living under illiberal regimes. Alexbrn (talk) 16:17, 23 June 2017 (UTC)
- Support I edit Wikipedia articles on behalf an organization as its Wikipedian in Residence. Having referral information would be valuable for me and other people doing work like mine, so when I support "no referral information" I do so with awareness of how institutions expect to get this data based on their other experience with commercial websites. Lots of organizations hire people in new media roles, like for Facebook and Twitter, and in return those websites provide all sorts of information back to the organizations which engage with them. In the same way, institutions expect information back from anyone whom they appoint to share their expertise in Wikimedia projects. However, as a Wikimedia community member, I am highly conscious of our community value to encourage people to feel safe to read and learn whatever they want in our website without anyone spying on them. As a society it was only recently that corporations started demanding data about what everyone is doing every moment of their lives. If someone is reading a work then gets an idea to check a reference and start reading another work, then until just recently it would have been taken for granted that the individual doing the reading should have an expectation of privacy and that no one needs to know what they were reading. I would like for the Wikipedia community to be able to proudly and unambiguously say, "Wikipedia is a free project. No one is either trying to sell you anything or commodify your time here. Read what you like. We are doing whatever we can to preserve your privacy. This is how a nonprofit community project is supposed to be." If Wikipedia preserved user privacy, that does not mean the end of data collection or providing information to referrers. Just because Wikipedia does not emulate Facebook does not mean that it needs to be devoid of user research. We can still have opt-in data sharing, better content reporting tools which analyze the available public information, and we can have public community conversations about our values to decide what other things we can do. Blue Rasberry (talk) 17:13, 23 June 2017 (UTC)
- Absolutely Support, per Guy Macon, SMcCandlish, &c. Happy days, LindsayHello 20:42, 23 June 2017 (UTC)
- Support… there's no need for us to tell external websites why or how I got there. That I was reading Wikipedia and what I was reading is my private business. Wikipedia needs to stay as private as possible. In fact, rather than just protecting privacy at the HTTP level, we should go even further and try to scrub externals links that contain unnecessary url parameters with information about who created the link and when. Jason Quinn (talk) 05:12, 26 June 2017 (UTC)
- I plan on doing exactly that, using a combination of organizing volunteers to work on this and having bot do the work wherever they can. But first we need to stop sending referrer information. --Guy Macon (talk) 19:31, 26 June 2017 (UTC)
- Support. Adding links to commercial sites to Wikipedia has become a standard part of the SEO toolbox, with obviously negative consequences to the project. There is a whole industry built around it. Removing the referrer information from the header would hamper their efforts. I do, however, support a blanket exception for government, accredited educational organizations and a straightforward opt-in process for other legitimate entities. On a side note, I caution against the attitude hinted at by a WMF representative below, which questions the ability of editors participating in this RfC to make an informed decision. It is a spurious argument that goes against the idea of Wikipedia's self-governance. Rentier (talk) 15:46, 27 June 2017 (UTC)
- Support. Wikipedia shouldn't encourage browsers' handing spammers these gifts of shiny, delicious, and most likely profitable analytics data on a silver platter. Doing so encourages and facilitates the abuse of Wikipedia for driving external traffic. —{{u|Goldenshimmer}}|✝️|ze/zer|😹|T/C|☮️|John15:12|🍂 07:27, 30 June 2017 (UTC)
- Support Whose business is it what I was looking at or even that I was on Wikipedia? Referrer information is great for analytics, horrible for actual readers/editors. We should support the readers and editors. ~ Rob13Talk 17:48, 5 July 2017 (UTC)
- Support. As one of the most visited websites on the planet, Wikipedia should do everything it can do to support/protect its users' privacy. Yintan 14:03, 18 July 2017 (UTC)
- Support the option that offers readers and editors most privacy. If an editor is writing an article about a contentious organization, and if that organization can see the times that a certain IP address kept entering its site from Wikipedia (e.g. while checking references), then combining that data with the article's history allows that organization to link the editor's user name and IP address, and if the editor is using a workplace IP, then the workplace too. SarahSV (talk) 19:41, 18 July 2017 (UTC)
- Support. Per what Guy Macon said. -- MarcoAurelio (talk) 13:51, 19 July 2017 (UTC)
- Support as someone with 2 decades of experience in online marketing. (((The Quixotic Potato))) (talk) 20:49, 19 July 2017 (UTC)
Oppose Question 5 ("Silent referrer")
[edit]- Oppose, this is excessive and not necessary. Users have the easy ability to fully restrict referrer info if they feel the need to go to that extreme. This is something which should be strictly user choice, and not forced upon everyone by default. The more moderate options are perfectly reasonable and balanced when it comes to privacy. It also feels generally against the open and cooperative founding principles of the web, in addition to defying long established convention for highly questionable benefits (for the vast majority of users). Murph9000 (talk) 15:18, 23 June 2017 (UTC)
- Moved this vote from original position. --George Ho (talk) 01:37, 28 June 2017 (UTC)
- Oppose — We're likely to drop in search rankings if we do this. We're already losing pageviews and readers, if we don't even have this up our sleeve we risk falling into obscurity. We're not winning any favors by panning to the lowest common denominator and claiming that all policing is per definition harmful in full techno-libertarian spirit. If anything we can add a "Warning you are leaving Wikipedia" page in-between us and the external link. That is less of a burden than this proposal, which is frankly insane and goes against the fundamental way the internet is constructed. Carl Fredrik talk 14:53, 27 June 2017 (UTC)
- @CFCF: "We're likely to drop in search rankings if we do this." How so? Rentier (talk) 16:13, 27 June 2017 (UTC)
- Google tracks referrers through Google Analytics. It will seem like we have fewer referrers. Carl Fredrik talk 17:19, 27 June 2017 (UTC)
- It is highly unlikely that Google uses any Google Analytics data in their rankings. They have repeatedly denied doing so (e.g. [6]). Even if they did, it is unclear if and how the outgoing traffic would affect the ranking. On the other hand, there is some evidence that the quality of external links affects the referring site's ranking. In this case, the change could actually help Wikipedia's rankings by deterring spammers who mostly link to low-quality sites. Rentier (talk) 17:52, 27 June 2017 (UTC)
- Google tracks referrers through Google Analytics. It will seem like we have fewer referrers. Carl Fredrik talk 17:19, 27 June 2017 (UTC)
- Carl and Murph9000, I want to move the "oppose" arguments and responses into separate subsections. Therefore, I can create subheadings for both supports and oppose. Would that be fine? George Ho (talk) 18:43, 27 June 2017 (UTC)
- I think that is a good idea. I just had to go through the list for the third time, fixing places where the oppose comments reset to count to 1. (We have a "show preview" button for a reason, folks...) --Guy Macon (talk) 20:42, 27 June 2017 (UTC)
- @George Ho: That's fine by me. The later addition of numbering actually violates WP:TPO in the case of my comment, as it changed the context of my comment without my consent. My comment was not originally in reply to the comment above it, as the current indentation implies. I'm happy with your proposed change, and my comment's context is directly replying to this section's heading and not any individual comment. Within a new sub-section (or not) is fine by me, as long as it's clear that it's a direct reply to the section's question. You also have my consent to remove/hat/move this new comment, if that helps maintain the section. Murph9000 (talk) 01:16, 28 June 2017 (UTC)
- Moved this vote and all responses from original positions. --George Ho (talk) 01:37, 28 June 2017 (UTC)
- @CFCF: "We're likely to drop in search rankings if we do this." How so? Rentier (talk) 16:13, 27 June 2017 (UTC)
- I oppose the use of silent referer by default per my comment in Q1 above. The convention is to provide referer info when linking from one HTTPS page to another HTTPS page. Wikipedia shouldn't set a bad example by breaking conventions. If a user wants privacy we can add silent referer to MediaWiki as an opt-in functionality. Or the user can copy a link and paste it in an incognito tab if they want privacy. Deryck C. 15:46, 28 June 2017 (UTC)
- Per my support for Option 3 above. User:Murph9000 also makes good points. Lankiveil (speak to me) 02:50, 4 July 2017 (UTC).
- Oppose – As discussed in question 3, sending only the domain name is adequate for protecting users' privacy. The argument that someone will then do a link search to find the specific page they came from is unconvincing. If someone needs to completely hide all of their tracks there are plenty of browser extensions to easily do that. Also, I think it would actually hurt Wikipedia (and the sites we link to) to use a silent referrer (especially regarding things like GLAM partnerships). If we have no way to show that Wikipedia is sending traffic to a site, it will be harder for our partners to see the benefits of collaborating with Wikipedia. Kaldari (talk) 18:27, 18 July 2017 (UTC)
Question 6: As far as possible/practical, should we provide total monthly referral numbers in aggregate to major sources?
[edit]This would send the information in some sort of a report, not in the referrer.
Procedural note 1: This question was added on day 23 of an RfC that is scheduled to close after 30 days. (later extended to 60 days) --Guy Macon (talk) 02:42, 23 June 2017 (UTC)
Procedural note 2: Providing total monthly referral numbers can be combined with any of the previous questions, all of which ask about what Referrer information we should send. --Guy Macon (talk) 02:42, 23 June 2017 (UTC)
Procedural note 3: Reworded question and formatted it to be the same as the other questions. As written is was missing the all-important "As far as possible/practical". --Guy Macon (talk) 02:42, 23 June 2017 (UTC)
- Support As first choice. This gives privacy to individuals IPs but provides data on how significant we are to potential partners. Doc James (talk · contribs · email) 01:46, 23 June 2017 (UTC)
- This cannot be a first choice, because support for this question says nothing about the question this RfC was created to address; which is "What, if anything, should we send in the referrer?". In other words, supporting this does not exclude any of the referrer questions. Might I suggest simply supporting this and marking one of the referrer questions as your first choice? --Guy Macon (talk) 02:42, 23 June 2017 (UTC)
- Thank User:Guy Macon good point. I am happy for us to provide no details but than provide this to other websites that share our goals. Doc James (talk · contribs · email) 04:11, 23 June 2017 (UTC)
- Support because I am sympathetic towards the needs of the Foundation to promote us as a vital part of contemporary society and the easiest way to do this is to show people how we connect them to other sources. TonyBallioni (talk) 01:51, 23 June 2017 (UTC)
- Support this pending answer from User:Astinson (WMF) -- Alex, as far as you know, is this technically feasible, and would this meet partnering needs? Jytdog (talk) 04:57, 23 June 2017 (UTC)
- (Discussion moved from policy section to technical section) --Guy Macon (talk) 22:45, 23 June 2017 (UTC)
- Just to add to my note above, this is offtopic for this RfC on the face of it, but this is relevant as a way to address the key argument of those who want to include referral information -- as a way to meet the need while protecting readers' privacy. However folks want to think about it, it would be useful to get !votes of support or opposition so that WMF knows if this would be acceptable to the community. We always want to find ways to meet everybody's core desires. Jytdog (talk) 04:08, 7 July 2017 (UTC)
- Sort of oppose – As noted above, this would be challenging to implement, and thus the work required would outweigh the benefit. I would, although, be supportive of a policy that allows the sending of referral information to trusted sources, like having a list of websites that we are ok with sending information to. RileyBugz会話投稿記録 16:28, 23 June 2017 (UTC)
- Comment – This is outside of the scope of Wikipedia's concern, so it should really be up to WMF. If they want to report periodic referral counts to their info-eco-partners, they can, and I'm sure that the WMF braintrust can figure out how to do it. The resource cost needed to implement it can be weighed against the benefit of informing partners of these stats.- MrX 22:49, 23 June 2017 (UTC)
Support — This is not really challenging to implement, it only needs some focus from a tech team at the foundation. Neither do I see it as antithetical to proposals 1–4 either, simply as a good idea overall. Even if we choose to do one thing with our referrers, it would be a good idea to have it for all languages. Carl Fredrik talk 15:01, 27 June 2017 (UTC)
- Rethink: Only if we actually go ahead with any other horrible proposals to remove referrers. Carl Fredrik talk 15:04, 27 June 2017 (UTC)
- This is off topic. We're talking about HTTP referer headers in this discussion. Deryck C. 15:47, 28 June 2017 (UTC)
- Allowable. I agree this is off topic, and I don't know if this is a useful application of WMF resources or will be welcomed by the spammees. However, I have no objection to it as an alternative to referrer if that sweetens the pill. That is, assuming the numbers are collected in a way that doesn't create a big database of which editor read what or clicked what! Even if not intended to be divulged, such a thing would be eventually. Wnt (talk) 00:58, 2 July 2017 (UTC)
Question 7: As far as possible/practical, should Wikipedia generally send referrer information, but act as a silent referrer only on specific things below?
[edit]Wikipedia would send no referrer information to external links: (a) on Wikipedia pages whose presence in browsing history could present a risk of persecution to readers, and (b) to domains (identified by nation or organization) known to conduct active surveillance leading to persecution of readers. [finalized June 29--Carwil (talk) 13:11, 29 June 2017 (UTC)]
Procedural note: This question was added on day 28 of the RfC based on the understanding that it will be relisted for 30 more days soon.--Carwil (talk) 17:26, 28 June 2017 (UTC)
- Support as proposer — This proposal is meant to address the privacy concerns raised about sending referrer information: that a court-empowered state or web-hosting state actor could use the referrer information as part of a chain of evidence demonstrating that someone had visited a particular Wikipedia page (e.g., Falun Gong, Clandestine chemistry) or had visited Wikipedia in a country actively attempting to block access to it (e.g., Turkey). Having Wikipedia go dark as a source of referrals is costly. The proposal for the current Wikimedia referral policy argues that because "Wikimedia sites and Wikipedia in particular are also one of the Web's top sources of authority[1] and arguably one of the largest referrals of internet traffic," and awareness of this fact is critical for those receiving that traffic to improve their interactions with Wikimedia sites. Various additional benefits of sharing referrer information have been raised in this RfC.
- Rather than cut off all our noses to prevent warts, this proposal suggests a moderate effort task: designating certain pages as persecution risks (this could be editor-initiated or a task for WMF) and responding to domain-level censorship of Wikipedia by silencing referrer information to the state in question.--Carwil (talk) 17:26, 28 June 2017 (UTC)
- Comment: This proposal does not address link spam. I don't believe that the problem is severe enough to warrant messing with referrer data. Agreeing with this proposal doesn't mean you have to take the same view.--Carwil (talk) 17:26, 28 June 2017 (UTC)
- Carwil and Guy Macon, I will rearrange Questions 7 and 8: I'll change Question 8 to Question 7 (rewritten), and I'll change original Question 7 to Question # and add the parent header "Copy, paste, and modify the below to the above". Sounds fine? --George Ho (talk) 01:20, 29 June 2017 (UTC)
- Sounds good to me. Perhaps also trim down the section heading? It is to big to fit in an edit summary, (see the edit summary of this comment) and most of it could be med to a paragraph right below the heading. --Guy Macon (talk) 05:55, 29 June 2017 (UTC)
- I'm comfortable with "…act as a silent referrer selectively to defend user privacy," or "…act as a silent referrer selectively to protect users from persecution" if that seems neutral. Depending on space, this text could follow the "but" in the current text or follow "Wikipedia."--Carwil (talk) 06:33, 29 June 2017 (UTC)
- How about, Carwil, "(…) generally send referrer information, but act as a silent referrer only on specific things below?", "(…) generally send referrer information, but act as a silent referrer only on specific circumstances (see below)?" or "(…) generally send referrer information, but act as a silent referrer only to (a) pages whose history could indicate a persecutory risk to readers and (b) domains notably conducting active surveillance to persecute readers?" Meanwhile, allow me to rearrange the Question sections please? --George Ho (talk) 06:48, 29 June 2017 (UTC)
- I trust George Ho's judgement, and he has my permission to change anything as he sees fit without asking my permission. --Guy Macon (talk) 08:31, 29 June 2017 (UTC)
- Done: Converted to "Question 7" and moved one into
"Copy, paste, and modify the below template (not to be confused with wiki-template) to the above".--George Ho (talk) 08:41, 29 June 2017 (UTC) - I changed the subheader to "Feel free to create more questions above this heading" after realizing the one below the header is actually asking a question. I changed "#" to "??" also. --George Ho (talk) 08:51, 29 June 2017 (UTC)
- Done: Converted to "Question 7" and moved one into
- I trust George Ho's judgement, and he has my permission to change anything as he sees fit without asking my permission. --Guy Macon (talk) 08:31, 29 June 2017 (UTC)
- How about, Carwil, "(…) generally send referrer information, but act as a silent referrer only on specific things below?", "(…) generally send referrer information, but act as a silent referrer only on specific circumstances (see below)?" or "(…) generally send referrer information, but act as a silent referrer only to (a) pages whose history could indicate a persecutory risk to readers and (b) domains notably conducting active surveillance to persecute readers?" Meanwhile, allow me to rearrange the Question sections please? --George Ho (talk) 06:48, 29 June 2017 (UTC)
- I'm comfortable with "…act as a silent referrer selectively to defend user privacy," or "…act as a silent referrer selectively to protect users from persecution" if that seems neutral. Depending on space, this text could follow the "but" in the current text or follow "Wikipedia."--Carwil (talk) 06:33, 29 June 2017 (UTC)
- Sounds good to me. Perhaps also trim down the section heading? It is to big to fit in an edit summary, (see the edit summary of this comment) and most of it could be med to a paragraph right below the heading. --Guy Macon (talk) 05:55, 29 June 2017 (UTC)
- Carwil and Guy Macon, I will rearrange Questions 7 and 8: I'll change Question 8 to Question 7 (rewritten), and I'll change original Question 7 to Question # and add the parent header "Copy, paste, and modify the below to the above". Sounds fine? --George Ho (talk) 01:20, 29 June 2017 (UTC)
- Oppose – It shouldn't be Wikipedia's job to decide what external sites are harmful surveillance. Deryck C. 09:53, 29 June 2017 (UTC)
- Comment: I have deep reservations about this proposed solution. First of all, who is going to spend the hundreds of hours trying to keep the blacklist up to date in the face of paid government agents actively creating new sites that are not on our current list? Second, how do we determining which sites to blacklist? Do we blacklist Cornell university? Once again, consider the case where a Wikipedia user reads our page on Bomb-making instructions on the internet and then clicks on the link to the Feinstein Amendment SP419 at Cornell university. Any policeman with a a court order giving them access to Cornell university's server logs will see the referrer information we send when our user clicks on that link -- and the only link on Wikipedia that links to the Feinstein Amendment SP419 at Cornell university is on our Bomb-making instructions on the internet page. Don't forget the couple who were questioned by the police after he did a google search on "backpacks" while she did a Google search on "pressure cookers" in another room... So, does Cornell university go on our blacklist? --Guy Macon (talk) 22:32, 29 June 2017 (UTC)
- The answer in this case (subject to the community effort that decides on what "a risk of persecution to readers"), is that we would silence referrals on Bomb-making instructions on the Internet, and leave Cornell alone. I view this proposal as akin to "protecting" a page, but for reader privacy rather than for stability from editwars. While it would of course be up to the community, I think the notion of privacy to be defended here is that reading Wikipedia itself should not be cause for arrest/jail/persecution. Curiosity shouldn't kill the cat, as it were.
- I would add that this bit of WMF's statement is relevant here: "We acknowledge that a subset of users (both editors and readers) have specific concerns about having control of what referrer information is sent to external websites. … We think the most effective solution to these concerns is the use of widely available browser extensions. For privacy-concerned users, these extensions provide more robust control of their privacy, not just on Wikimedia projects, but beyond. … With input from the Foundation’s Technology teams, we will provide on-wiki documentation and best practices about this solution, and link them to the Privacy Policy FAQ."
- In my view, each referral-protected page would have a prominent "how to preserve your privacy" notice included in the page, just below "references" and "external links" and (if practical) made inside of pop-up references containing an external link.
- To be blacklisted as an external site , on the other hand, would mean that the receiving site itself (not a potential court order upon them) is tracking and persecuting readers for using Wikipedia per se.--Carwil (talk) 00:39, 30 June 2017 (UTC)
- Comment: Blacklisting Wikipedia pages whose presence in browsing history could present a risk of persecution to readers is also problematical. Consider the case of Robert Petrick, who was convicted of killing his wife, Janine Sutphen in 2003. He was caught partly because he did extensive web research on the depth, water currents, boat ramps, etc. of North Carolina's Falls Lake, which led the police to find his wife's body in the lake, wrapped up in sleeping bags with chains added for weight. Or the case of Vincent Tabak, convicted of killing Joanna Yeates in 2010; part of the evidence against him was his looking at online maps and images of Longwood Lane, the road three miles from her Bristol flat where her body was discovered. Of course if the police investigate someone based upon what pages they read on the Internet and don't find evidence of murder, it doesn't hit the news... So, do we add random lakes, bays and roads to our blacklist? -Guy Macon (talk) 22:32, 29 June 2017 (UTC)
- In my view, persecution ≠ prosecution. In theory, every work of art could be stolen or counterfeited and visiting the relevant page could be construed as part of a larger case demonstrating criminal intent. (Of course such intent is not especially convincing in the absence of non-Wikipedia and non-Internet evidence.) Blacklisting them would imply cutting off all referrer data to GLAM institutions. The more speculative the possibility of prosecution (which is less of a concern than persecution), the less weight we should give relative to the benefits of sharing referrer data for an open Internet and the increase in available free content generated by people who receive said referrals (e.g., the Met Museum).--Carwil (talk) 00:39, 30 June 2017 (UTC)
- Oppose. A blacklist would be unmaintainable. If we go down this route, a whitelist would make more sense. Daß Wölf 02:41, 30 June 2017 (UTC)
- Oppose. Trying to keep a blacklist up to date will be extremely difficult. And who is to decide which sites should be blacklisted in the first place? What are the criteria? No, this would be virtually, if not totally, unworkable. Yintan 13:59, 18 July 2017 (UTC)
Question 8: As far as possible/practical, should Wikipedia act as a silent referrer except to websites associated with GLAM and those others that we decide to be safe?
[edit]As the title says. More could be added, but I think that have GLAM things be whitelisted is a good start. RileyBugz会話投稿記録 19:47, 6 July 2017 (UTC)
- Support as proposer. RileyBugz会話投稿記録 19:47, 6 July 2017 (UTC)
- Hi, RileyBugz. What will you do with your argument from Question 5: "
So, I oppose anything else
"? --George Ho (talk) 19:51, 6 July 2017 (UTC)- Added note. Thanks! RileyBugz会話投稿記録 19:52, 6 July 2017 (UTC)
- Hi, RileyBugz. What will you do with your argument from Question 5: "
- I'm in support of this if this is implemented in such a way that organisations of ill intentions cannot game the system through donations. This is a good idea and I've thought of suggesting it myself earlier, but we have to be careful not to end up beiing financed through refback sales. Also, we would need a procedure to exclude links from sensitive pages, such as the bomb making one that everyone's talking about. Daß Wölf 21:39, 6 July 2017 (UTC)
- Absolutely oppose due to the principles behind net neutrality. We should not handle referrals differently based on whether we support the organizations we're referring to. Wikipedia should not be political, and we should not prioritize handing non-public information to certain entities we support. ~ Rob13Talk 21:44, 6 July 2017 (UTC)
- Comment: This idea has been suggested by several editors, and so far has been rejected by every identifiable member of GLAM or The Wikipedia Library, most of whom are WMF employees. Given this opposition and the strong possibility that nobody associated with GLAM or TWL will be willing to identify websites to whitelist, who exactly, will spend that time to keep the "websites associated with GLAM and those others that we decide to be safe" updated, especially given the fact that spammers will be highly motivated to get on the whitelist by becoming members of whatever group ends up updating the whitelist? I might be inclined to support this as a second choice if someone could provide a good answer to this question.
- Comment: As written, it is hard to see how whoever evaluates and closes this RfC can do other than add any !votes for this proposal to the !votes for Question 5 (silent referrer). --Guy Macon (talk) 02:24, 7 July 2017 (UTC)
Feel free to create more questions above this heading
[edit]Question ??: As far as possible/practical, should referrer information contain something else not listed above (please be specific about what you think it should contain)
[edit]There are a couple of possible referrer policies that we have had in the past that were not mentioned in the above list.
From 2001 to 2011, we had the following referrer policy:
- Access Wikipedia through HTTP, click on an HTTP link: Full URL sent in referrer.
- Access Wikipedia through HTTPS, click on an HTTP link: Not possible. Wikipedia did not support HTTPS.
- Access Wikipedia through HTTP, click on an HTTPS link: Full URL sent in referrer.
- Access Wikipedia through HTTPS, click on an HTTP link: Not possible. Wikipedia did not support HTTPS.
This is the same as Question #1.
In 2011, Wikipedia added optional HTTPS support, so we then had the following referrer policy:
- Access Wikipedia through HTTP, click on an HTTP link: Full URL sent in referrer.
- Access Wikipedia through HTTPS, click on an HTTP link: No Referrer information sent.
- Access Wikipedia through HTTP, click on an HTTPS link: Full URL sent in referrer.
- Access Wikipedia through HTTPS, click on an HTTPS link: Full URL sent in referrer.
This possible referrer policy is not mentioned in the above list.
In 2015, Wikipedia stopped supporting HTTP, so we then had the following referrer policy:
- Access Wikipedia through HTTP, click on an HTTP link: Not possible. Wikipedia does not support HTTP.
- Access Wikipedia through HTTPS, click on an HTTP link: No Referrer information sent.
- Access Wikipedia through HTTP, click on an HTTPS link: Not possible. Wikipedia does not support HTTP.
- Access Wikipedia through HTTPS, click on an HTTPS link: Full URL sent in referrer.
This possible referrer policy is not mentioned in the above list.
In February of 2016 Wikipedia started using origin-when-cross-origin in the meta referrer, so we then now have the following referrer policy:
- Access Wikipedia through HTTP, click on an HTTP link: Not possible. Wikipedia does not support HTTP.
- Access Wikipedia through HTTPS, click on an HTTP link: Full domain sent in referrer.
- Access Wikipedia through HTTP, click on an HTTPS link: Not possible. Wikipedia does not support HTTP.
- Access Wikipedia through HTTPS, click on an HTTPS link: Full domain sent in referrer.
This is the same as Question #3.
--Guy Macon (talk) 01:38, 13 June 2017 (UTC)
Discussion
[edit]WMF Statement on this RFC
[edit]Thank you Guy for starting this conversation and others for weighing in so far: it’s always important to examine our processes closely, especially in how Wikimedia projects and their software affect end users.
This RFC is in response to a change in the referrer policy made in early 2016, which corrected an unintended consequence of the HTTPS change during 2015: sending entirely silent referrals to other websites. As we understand the comments so far in the RFC: the members of the community who have participated in the RFC, are principally concerned about the exposure of the domain of users, as that domain information could expose readers to malicious actors who can identify a) which particular pages on Wikimedia projects they were visiting or b) expose that they have been using Wikimedia projects, which in some regions may be risky. Another concern: that this information could be used to advance commercial or spamming interests.
In reexamining the concerns shared so far (some of which are similar to those shared by Guy and others in a 2016 series of threads on Village Pump and on Meta), we have several considerations that run counter to the assumption/premises behind the RFC:
- The current referrer policy, which allows transmission of the domain name only, exposes no more information than may be gained by a well-positioned and skilled actor via other means (DNS cache analysis, SSL handshake examination, IP traffic observation, etc.) This has previously been discussed: see in April 2016. The Security team at the Foundation reviewed that analysis and reiterated that, for the majority of wiki users, the privacy and security benefit of fully withholding referrer information is marginal.
- We acknowledge that a subset of users (both editors and readers) have specific concerns about having control of what referrer information is sent to external websites. Protecting the privacy and security of Wikimedia readers and contributors is a core commitment of the Foundation. We think the most effective solution to these concerns is the use of widely available browser extensions. For privacy-concerned users, these extensions provide more robust control of their privacy, not just on Wikimedia projects, but beyond. Browser-based solutions allow users to control referral behavior across their entire browsing experience, both when reaching and leaving Wikimedia projects.With input from the Foundation’s Technology teams, we will provide on-wiki documentation and best practices about this solution, and link them to the Privacy Policy FAQ.
- More generally, the Wikimedia movement benefits greatly from the ecosystem of knowledge actors--whether publishers, cultural organizations, non-profit or commercial publishers or news sources--being aware that we influence public access to the resources in this ecosystem, which in turn advances our end goals of free and open knowledge for all.
It’s this last point that many of the arguments currently in the RFC discussion do not fully account for: the importance of this referrer traffic. Here is our understanding:
- The vast majority of the URLs that are on Wikimedia projects are curated by our communities and point to good-faith sources of knowledge. Through this community process of review, Wikimedians contribute to an ecosystem of quality knowledge sources on the internet by signalling the provenance of information and which sources of information can be considered trustworthy. This is particularly important when trust in content is low due to fear of misinformation, especially through poorly verifiable and unattributable information. Providing reliable parts of the internet a signal that we facilitate public access to their content ensures the long-term sustainability of Wikimedia’s principal effort (collecting and disseminating knowledge) and our good reputation in the reliable and trustworthy web. This includes, but is not limited to:
- Libraries that actively contribute to Wikimedia projects because the traffic greatly improves the impact of their special collections. See for example, the 68 citations to a 2007 work by university special collections to add links and references to enwiki, inspired by the traffic. There are multiple other similar studies, for example, or example, or example.
- Scholarly platforms such as Crossref (one of the agencies handling unique identifiers (DOIs) for scientific papers) which are interested in our projects facilitating access to their research materials. We know from Crossref that Wikipedia is among the top sources of traffic to the scientific literature, thanks to referral information. A growing number of altmetrics services are producing insights into the societal impact of scholarship by measuring mentions of the literature in Wikipedia. An increasing amount of this published knowledge is Open Access, but those Open Access organizations are competing with commercial sources of scholarship-- without the demonstration of impact from our projects through referrals, we are shortchanging some of our biggest allies in our mission.
- Other sources of knowledge, such as GLAMs, non-profits and other knowledge organizations need some sign of impact, besides pageviews on Wikipedia (which doesn’t correspond well to actual impact of their resources): though the workers who make the choice to contribute to our project typically share our values, motivations and understanding of our projects, they need to justify their collaboration to supervisors with a signal of impact (referrers attributable to the activity). This kind of impact has been witnessed in partners we have worked with since as early as 2011: for instance, the Deutsches Bundesarchiv had an overwhelming increase in requests for using their images, after donating to Commons (see outreach:GLAM/Case studies/German Federal Archives#2011 statistics the documentation on outreach).
- Second, transparency and openness about our process and impact are important values of the Wikimedia movement. Because our projects draw heavily on external sources of information for verification and other activities, webmasters around the world are contributing to our projects and mission too. When our projects impact their services, they should have at least basic tools to understand what is going on and why (to answer the basic and in-this-case benign question “from which domain?”). Without that information provided in an open, web-standardized way, we are actively harming the ability for our allies (reliable knowledge providers) to respond to our impact (in much the same way that a sudden spike in traffic caused by Reddit, or Social Media might be something that webmasters need to investigate and respond to).
We also want to note that there are indications that this RFC does not involve many of the people that would be directly affected by this, including long-time editors to our projects (see for example the comment below by Mike Christie) and communities that engage in outreach.
All in all, we at WMF see the question of what is of best interest for the movement falling in favor of the current policy. Additionally, we want to support individuals concerned about their privacy: as I mention above we will be compiling a list of the browser tools that that should empower both privacy-concerned editors and readers to control their referrer information.
Thank you everyone for continuing to weigh in, and I hope that helps clarify our own internal analysis, Astinson (WMF) (talk) 16:24, 19 June 2017 (UTC)
- The "there are indications that this RFC does not involve many of the people that would be directly affected by this" argument has been tried many, many times and has always been rejected by the community. The English Wikipedia has 48,255,987 registered users and 120,786 active editors. No RfC has ever attracted comments from 1% of them. That does not make all RfCs invalid. They are how we settle questions of consensus, and all claims of some invisible consensus among those who did not participate are routinely ignored. --Guy Macon (talk) 14:42, 20 June 2017 (UTC)
- As my post re the RfC is mentioned above, I'll comment here too. I agree with Guy that lack of participation in an RfC is not an indication of an invisible consensus. I would add that a small number of participants for an issue that is regarded as a major change has historically been regarded as a problem, though the definitions of "small" and "major" vary, of course. More to the point, my own comment was not saying that those not participating did not agree; I was saying (and I still think) that the RfC could have been formulated in a way that articulated more clearly the point of view Guy disagrees with. (And as I said before, I think Guy made a good faith effort to be fair to both sides in his wording, but failed.) Without that I felt I could not fairly !vote, because I couldn't be sure I supported either position -- I have some sympathy with both sides. I doubt this will happen, but I'd like to see the RfC withdrawn and redrafted by proponents of both sides. As an aside, I think this does meet the definition of "major", given the effect on our partnerships. Mike Christie (talk – contribs – library) 15:29, 20 June 2017 (UTC)
- Mike Christie, I'm not sure whether to take the withdrawal request seriously. The whole majority tremendously picked one option: "silent referrer". Looking at your user page, you have access to three reference websites via The Wikipedia Library. I don't know whether advising "silent referrer" is... undesirable to you. However, most of the participants here are very concerned with privacy. Do you know ways to preserve and protect privacy of Wikipedians besides "silent referrer"? What about ways to combat spamming? George Ho (talk) 16:28, 20 June 2017 (UTC)
- It wasn't really a request; it was more a comment that the RfC could have been written better, not that I have any particular expectation that the outcome would be different if the RfC had been written differently. To answer your questions: My own preference, briefly stated, would be to preserve privacy as much as possible wherever possible, unless there are strong arguments to the contrary. The RfC appears to be about whether there are indeed strong arguments, and to what extent privacy would be compromised if we accede to those arguments. I haven't !voted because I'm not clear about the interaction between those two points; I didn't want to wade through a very long discussion to dig out something that I think should have been stated more clearly in the initial statement of the RfC. Mike Christie (talk – contribs – library) 16:36, 20 June 2017 (UTC)
- Mike Christie, I'm not sure whether to take the withdrawal request seriously. The whole majority tremendously picked one option: "silent referrer". Looking at your user page, you have access to three reference websites via The Wikipedia Library. I don't know whether advising "silent referrer" is... undesirable to you. However, most of the participants here are very concerned with privacy. Do you know ways to preserve and protect privacy of Wikipedians besides "silent referrer"? What about ways to combat spamming? George Ho (talk) 16:28, 20 June 2017 (UTC)
- As my post re the RfC is mentioned above, I'll comment here too. I agree with Guy that lack of participation in an RfC is not an indication of an invisible consensus. I would add that a small number of participants for an issue that is regarded as a major change has historically been regarded as a problem, though the definitions of "small" and "major" vary, of course. More to the point, my own comment was not saying that those not participating did not agree; I was saying (and I still think) that the RfC could have been formulated in a way that articulated more clearly the point of view Guy disagrees with. (And as I said before, I think Guy made a good faith effort to be fair to both sides in his wording, but failed.) Without that I felt I could not fairly !vote, because I couldn't be sure I supported either position -- I have some sympathy with both sides. I doubt this will happen, but I'd like to see the RfC withdrawn and redrafted by proponents of both sides. As an aside, I think this does meet the definition of "major", given the effect on our partnerships. Mike Christie (talk – contribs – library) 15:29, 20 June 2017 (UTC)
- @Guy Macon and Mike Christie: We included that statement, because I have gotten both on-wiki and off wiki concerns similar to Mike's about this RFC not really being one which accounts for, and is fully understandable by a general audience of editors. Astinson (WMF) (talk)
- The standard method of dealing with such a concern is through another RfC. You can post an RfC asking the same question in your own words, explaining that you are making an exception to the informal "don't ask again too soon after the RfC settled the issue" rule based upon your belief that the respondents did not understand the RfC I wrote. I will warn you, though, that the usual result of doing that is more support for the original RfC. But you are free to try.
- Let's face reality here. You are extremely unlikely to get what you want. You simply do not have the support, and all evidence suggests that those who responded to this RfC by choosing "Silent Referrer" knew exactly what they were voting for and (after some good explanations from both sides) what the technical issues are.
- Given these facts, I highly encourage you to start supporting the alternative that gives you most of what you want, and I have not yet seen a compelling reason why you are rejecting the compromise in favor of fighting a battle that you cannot win.
- For those reading along who don't want to re-read the entire RfC, here is the compromise that I am urging Astinson (WMF) to support:
- First, Wikipedia puts the following in the head of the HTML...
- <meta name="referrer" content="same-origin">
- ...thus sending no referrer information when a user clicks on a link to a non-Wikipedia page
- Second, those who are involved in the Wikipedia Library and GLAM select certain websites that they think would benefit from receiving referrer information. These might even include "every website that ends with .edu,
.gov, .mil,or .museum"; what to put on the list will be discussed, giving TWL and GLAM members wide leeway as to what they want to include). These websites will be listed in a central location.
- Second, those who are involved in the Wikipedia Library and GLAM select certain websites that they think would benefit from receiving referrer information. These might even include "every website that ends with .edu,
- Third, Wikipedia automatically adds
- <a href="http://example.com" referrerpolicy="always">
- ...to all the links on the list, thus overriding the meta tag in the head and sending full referrer information to those links.
- Astinson (WMF), given that what I describe above is the alternative that is receiving overwhelming consensus, and given that it appears to give TWL and GLAM members pretty much everything they say they want, I really don't understand why you are rejecting it in favor of a fight that you cannot possibly win. --Guy Macon (talk) 19:26, 20 June 2017 (UTC)
- @Guy Macon: I am not reading a clear consensus on that particular alternative solution you describe there. Moreover, you are describing a very high-overhead strategy (requires developers extensive community overhead (actually creates more work for the volunteer community), etc) and which only supports people already know how to work with our community, which is very different from the concern we express above: that our impact is broad across the whole reliable web because the community already does a good job curating urls to reliable sources of information. Astinson (WMF) (talk)
- I will defer to your conclusions regarding consensus for allowing GLAM and TWL to override the silent referrer policy on particular links, and unless someone involved with GLAM and/or TWL says that they support that capability, I will not put that in the recommendation I am preparing to submit to the WMF. (everyone will be able to see and comment on the draft before I send it, and of course anyone is free to create their own recommendation and send that to the WMF). There is no point in proposing such a thing when it is clear that nobody at GLAM or TWL will be willing to participate by identifying those sites which they believe will benefit from referrer information. I still do not understand why you are rejecting the offer of the ability to override the no-referrer on selected links, but what you support is your decision to make. Is it because you hope that the WMF will ignore the clear wishes of the Wikipedia community? That is extremely unlikely to happen, given their experience in trying to do exactly that with superprotect and failing spectacularly. The WMF does not dictate what content is and is not allowed on Wikipedia, with the narrow exception of office actions – an exception created to make sure that we at Wikipedia can never decide to host illegal content and thus put the WMF at risk of lawsuits or prosecution. --Guy Macon (talk) 18:38, 21 June 2017 (UTC)
- @Guy Macon: I am not reading a clear consensus on that particular alternative solution you describe there. Moreover, you are describing a very high-overhead strategy (requires developers extensive community overhead (actually creates more work for the volunteer community), etc) and which only supports people already know how to work with our community, which is very different from the concern we express above: that our impact is broad across the whole reliable web because the community already does a good job curating urls to reliable sources of information. Astinson (WMF) (talk)
- Astinson (WMF), given that what I describe above is the alternative that is receiving overwhelming consensus, and given that it appears to give TWL and GLAM members pretty much everything they say they want, I really don't understand why you are rejecting it in favor of a fight that you cannot possibly win. --Guy Macon (talk) 19:26, 20 June 2017 (UTC)
- @Guy Macon: Bad drives out good, and if we so much as suggest a single link could give more URL data than the bare minimum acknowledgement of Wikipedia's involvement, that will become the universal practice. I don't think that any GLAM group *really* needs any kind of URL data. And it would utterly disembowel any privacy-oriented policy if we were to let American .mil sites have special data about what people are clicking on the link! I think you should walk this idea back to some rustic spot behind a military base where no one will notice when you put it out of its misery. Wnt (talk) 20:18, 20 June 2017 (UTC)
- Point well taken. I have
strickenthe .mil and .gov suggestions. I was only thinking "unlikely to be owned by a spammer" without considering "likely to be gathering information without a warrant". Thanks for pointing out my error. - That being said, if GLAM says that sending referrer info to a particular library or museum is needed, I see little harm in doing that. If the above compromise motivates one or two of them to support a silent referrer policy, I will put it in the detailed proposal that will follow the end of this RfC. If they are just going to fight the consensus of the community, I won't bother. And if they try to get the WMF to not follow the consensus of the community, that's when we bring out the artillery and go to war. --Guy Macon (talk) 21:13, 20 June 2017 (UTC)
- Point well taken. I have
- @Guy Macon: Bad drives out good, and if we so much as suggest a single link could give more URL data than the bare minimum acknowledgement of Wikipedia's involvement, that will become the universal practice. I don't think that any GLAM group *really* needs any kind of URL data. And it would utterly disembowel any privacy-oriented policy if we were to let American .mil sites have special data about what people are clicking on the link! I think you should walk this idea back to some rustic spot behind a military base where no one will notice when you put it out of its misery. Wnt (talk) 20:18, 20 June 2017 (UTC)
- @Astinson (WMF): I found the 2007 paper that you point out was referenced 60+ times: [7] What it says is that a library made a strong effort to add links to Wikipedia to drive traffic to their site and even starting articles on their own. Note that they didn't give us a bunch of copyright licenses -- they gave us a bunch of links. The result of all this COI (but good COI, trust them) editing was that 1.03% of their traffic was referred by Wikipedia, or 11,206 sessions per month.
- Now I don't know about you, but I don't think that's really going to make a big difference with some folks, and when it does, I would guess that for every librarian clucking with joy over another thousand sessions a month he's earned with his new article, there will be a hundred and seventy spammers, give or take, who are looking for the same satisfaction. And the effect of your blanket policy is to give them this joy, not to dole it out to friends because it would be too much trouble to make a tag or template that controls when the referred-from-wikipedia data is sent, but absolutely anyone who adds links to Wikipedia gets to watch the ka-ching come in. Does anything about that strike you as not so useful? Wnt (talk) 20:11, 20 June 2017 (UTC)
- @Wnt: That was in 2007: since the practices and understanding of Wikimedia communities has changed significantly among institutions. If you read more recent studies (post-2011) they talk about other tactics that more actively contribute content to our communities, including editathons, article writing, and other strategies for contributing content. Astinson (WMF) (talk) 14:25, 21 June 2017 (UTC)
- @Astinson (WMF): This RfC isn't about the security of editors, it seems that the principal concern here is of spammers, which is a concern that you have not answered. And, your concerns could be solved by having you, the WMF, curate websites (after this RfC) that are good to send info to. Of course, that would be determined by a later RfC that would determine what level of info we are comfortable with sending to them. RileyBugz会話投稿記録 21:44, 20 June 2017 (UTC)
- @RileyBugz: To be clear, if you count the "vote" comments in the section above: they overwhelmingly mention concerns of privacy and security, whereas the SPAM concern is brought up as a secondary argument. Additionally most of the disagreement in the discussions below have to do with privacy and security concerns, only some of which are linked to commercial exploitation of those issues. 14:25, 21 June 2017 (UTC)
- Actually, 14 support !votes mentioned privacy and security, and 11 support !votes mentioned spam. (Some mentioned both, many mentioned neither, simply saying that it is a good idea.) Clearly the Wikipedia community is concerned with both. --Guy Macon (talk) 18:56, 21 June 2017 (UTC)
- @RileyBugz: To be clear, if you count the "vote" comments in the section above: they overwhelmingly mention concerns of privacy and security, whereas the SPAM concern is brought up as a secondary argument. Additionally most of the disagreement in the discussions below have to do with privacy and security concerns, only some of which are linked to commercial exploitation of those issues. 14:25, 21 June 2017 (UTC)
- @Astinson (WMF): That's good to hear; still, are these other strategies really something we have to care that much about that we install a blanket referrer policy everywhere based on what a few GLAMs might do?
- So my preference would be as follows...
- a)Become a silent referrer right away, as per the vote. If you face *real* needs for referral information, you can tell them you're *working* on something else. Temporary means permanent in security matters, so we can't have a temporary delay on the overall decision editors want here. I don't believe that not seeing that 1% figure for the next six months is going to cause any seismic changes in how the GLAM world treats Wikipedia.
- b)If truly needed, go ahead and have developers code up a special tag that allows a "wikipedia.org" referrer to be sent on specific links only. No specific page information, not even to your best buddies. So if you put, say <referrer-data> and </referrer-data> around a block of text, all the links have the attribute stuck onto them. But you need to be able to check by edit filter (see below) so maybe you have to use something like <referrer-link my_target my_text> to replace [my_target my_text] style links at a low level. (And something to prevent that from being transcluded into position, maybe a bundled noinclude effect or something)
- c)Configure an edit filter that notes any new referrer-data attributes. This has to work in such a way that a clever person can't just put a template into a page to introduce the attribute without being flagged, which may require checking parser output or actual HTML for the first syntax I suggest in (b).
- d)GLAM volunteers should be whitelisted in the edit filter, and can add referrer links pretty much at will. The complete whitelist must be public, and subject to review and amendment by a consensus of ordinary Wikipedians. The others get flagged and are made available for review, and if abuse becomes significant, might be disallowed for unregistered edits etc. Wnt (talk) 18:49, 21 June 2017 (UTC)
- @Wnt: I, personally, have major concerns about this suggestion: it requires a yet another high-volunteer-labor overhead process with very limited transparency, which has a lot of opportunity for security/privacy issues in the long term and the same kinds of human-error issues that we currently have with patrolling urls added to the projects. I would argue that our significant and effective, medium/low overhead processes prevent SPAM in a number of venues that already protects our contributors at a strong level (edit filters, semi-automated and automated tools identifying and removing bad urls,CopyVio Checks, and existing strategies for screening for other poor/Spammy content). These may not be 100% effective at catching everything, but with improvements like WP:ORES and the increased amount of work being done by Community Tech with tools like meta:CopyPatrol: the increase in efficiency and quality of existing community processes aligns our links with the kinds of organizations that folks describe as reasonable through her. There are millions of links that would have to be reasonably included in those tags. Astinson (WMF) (talk) 16:30, 23 June 2017 (UTC)
- Comment: "This includes, but is not limited to […] Libraries that actively contribute to Wikimedia projects because the traffic greatly improves the impact of their special collections. See for example, the 68 citations to a 2007 work by university special collections to add links and references to enwiki, inspired by the traffic." How is this not spam? Sure it's promoting a library rather than fake viagra or whatever, but it's still a COI, and I'd argue that anything like this should be required to go through edit requests, and that COI link addition by the users with the COI should be disallowed. —{{u|Goldenshimmer}}|✝️|ze/zer|😹|T/C|☮️|John15:12|🍂 16:32, 30 June 2017 (UTC)
General policy comments
[edit]Please comment about what we should do, not how we should do it. Misplaced comments may be may be moved to the proper section by any user.
- User:Guy Macon... can you please state what problem you are trying to solve in this RFC, before trying to ask a wide variety of very technical questions that most people will have trouble answering with full understanding ? Why not ask less technical questions like "Do you think it is important to indicate to other websites and partner organisation where there traffic is from" "Do you think this should be limited to partners that make use of of https" etc.. —TheDJ (talk • contribs) 15:40, 1 June 2017 (UTC)
- mess up my ping User:Guy Macon.. —TheDJ (talk • contribs) 15:40, 1 June 2017 (UTC)
- No need to ping me. When I make a comment I check for replies.
- I thought that the following was clear:
- "Overview: When someone who is reading Wikipedia clicks on an external link their web browser may be given referrer information to be passed on to the external website. Depending on various factors, this information can range from telling the external site exactly what Wikipedia page the reader was on when they clicked on the link, telling them that the link was from Wikipedia but not telling them where on Wikipedia, to telling them nothing at all about the site that the link was on."
- The problem I am trying to solve in this RFC is that the Wikipedia community was never consulted before the Wikimedia foundation made decisions about what information about what pages/websites our users visit is given to external sites that the WMF does not control.
- As for "partners". I believe that they are covered by the following:
- "Many web sites have a legitimate desire to know who links to them. Alas, spammers also desire to know who links to them so that they can refine their spamming. Most users, including those who fear surveillance by governments, corporations or criminals, desire to send the minimum amount of information that is consistent with them being able to use the website."
- A "partner" would seem to be an example of "a web site that has a legitimate desire to know who links to them".
- What would be helpful would be a list of these "partner" websites, and why they might want this information about what websites and/or pages a user visited before visiting their site. --Guy Macon (talk) 16:01, 1 June 2017 (UTC)
- It could have helped to state that you wanted to validate the current position of the foundation, this is unclear from the current consultation and now it feels like a random Q/A without effects. The simpler RFC seems to be (in pseudo): "This is the current status quo, and this is what it means. For the following scenarios please indicate which information (none, hostname, url etc) you would be comfortable with disclosing" maybe a sibling question of "Do you think the Foundation's current referral disclosure should change from what it is now ? "
- Also i didn't really ask about partner's, it was an example. But I see that you have a keen interest in that particular area. —TheDJ (talk • contribs) 16:33, 1 June 2017 (UTC)
- I don't want to validate the current position of the foundation. I want the users of Wikipedia to have a say about whether the sites that they visit when they click on an external link are sent information that allows the site to figure out what Wikipedia pages the users read. --Guy Macon (talk) 20:37, 1 June 2017 (UTC)
- For what is worth, as a lay Wikipedia editor, I found the question posed in this RfC clear enough to answer it. Still from the point of view of a lay Wikipedia editor, I don't really care if some of the options technically end up validating the Foundation's choices and others invalidating it; the important thing is that it be made clear the results are not binding, and it was. Knowing these facts, I will subsequently be able to assess for myself whether the Foundation heeds the community's wishes, or ignores them. LjL (talk) 23:42, 10 June 2017 (UTC)
The problem is assuming the Foundation still believe their decision is the correct one, they will probably prefer to ignore this RfC. But they'd probably also prefer to be able to do so without saying "sorry but we don't care what you think". It seems to me this RfC unfortunately is likely to give them that. For starters, they're likely to argue that it wasn't sufficiently neutral since there seem to be a major focus in the opening on the evils of referers with only a brief mention of how "Many web sites have a legitimate desire to know who links to them" which is then immediately followed with talk of spammers and and something which if it were in an article would be called up for both WP:Weasel and lack of citation about "Most users, including those who fear surveillance by governments, corporations or criminals" (not even especially). Yes there is this mention here of one advantage of referrers, but realistically most participants aren't going to read it.
In fact, it seems clearly a number of people either didn't read or didn't understand the opening part. At least one seemed to think a referer was something to do with HTML5 rather than, as me and someone else pointed out, something that had existed since 1996 or whenever. Others seem to think the info is being sent by wikimedia rather than the pages served by wikimedia telling the browsers, "this is what we suggest you do" and the browsers following it.
The fact it isn't clear how many people even understand the history of referrers, is something I'd imagine they could really use to their advantage. While nominally there's nothing wrong with people deciding to this is what we should do without understanding the history, it's also easy to say "people would have made a different decision if they did understand the history". Some comments indicate a clear understanding of the history of referrers but for the others, it's easy to ask "do you understand that this is what was happening when people were using HTTP, and the default even with HTTPS is the info will be sent provided the other page is secure it's only with non secure environments it isn't sent to avoid leaking of excessive info".
One more thing that concerns me is there seems to be a strong focus on how this will allow people to figure out what pages are being read even if only the domain is being sent. Yet there's only very limited (yes it was mentioned below where the issue of unique URLs came up) mention of the fact that it's unclear how much this is is genuinely a concern since anyone likely to have nefarious aims for this info is probably just going to find some way to ensure their views for that URL only come from wikipedia or mirrors. (It seems unlikely anyone with nefarious aims is going to be particularly concerned whether the info is being read on wikipedia itself, or on some other page which copied from wikipedia.)
In reality I'm not sure how many people would have !voted differently even if they knew all this, but the fact we can't rule it out give the WMF a good excuse for not following this. Although it's also worth noting that people may not !vote simply because it seems clear where this RFC is headed. When I first saw this I considered !vote probably for option 3 or 4, but it seemed clear at the time, and even more clear now, that there was no point. It would be lost in a sea of option 5s. While as said, I suspect that things wouldn't have been any different if the RfC was different it's also impossible to say so there's all those who were put off by !voting either by the RfC or by the current results too. Further even if we were to organise another RFC when they reject this one for these reasons and more, they could easily say the whole situation is tainted now because of any stink so they feel they will only consider an RFC in a year or two when it's died down.
In case it's not clear, I have a strong suspicion this is precisely what's going to happen, the WMF will reject this because of the way the RFC was done and say the whole thing is too messy now. A bunch of people will get angry and talk about how the WMF is once again ignoring the community. We may or may not have a second RFC, and if we do people will probably get angrier. The WMF will make some noises about reaching out to the community etc and nothing much will happen. Maybe if we really get angry enough they will relent, but even if that does happen it could probably have happened with a lot less fuss. If we want to properly engage with the WMF and get our voices heard, we need to IMO seriously consider how we approach stuff like this. That means for example that any RFC telling the WMF something needs to be very carefully drafted so that they can't pick holes 10 kilometres long in it.
Note that even if we put aside the WMF, I'll be blunt if this was solely a community decision, I'd likely strongly oppose its implementation based on this RFC.
Nil Einne (talk) 16:46, 12 June 2017 (UTC)
P.S. Of course some may say my comment has tainted the RFC although I think the chance many will read it is slim. But anyway as may be obvious, I've felt this way since I first saw the RFC when I posted below, but waited to post for various reasons including to give more time to confirm my suspicions which unfortunately have been and also to reduce any possible effect. I wish this RFC well since it's obvious some people feel strongly about this and I don't really care that much myself. But I just don't think it's actually going to achieve anything and if it does, it will only be with a lot of unnecessary angst. I couldn't hold off for longer since I've been spending way too much time on wikipedia recently and need to cut it out so didn't want to leave it hanging but also felt this had to be said. There just seems to be too much unnecessary drama between the WMF and the community and while yes, they've definitely been a major contributing factor I feel so have we and people don't seem to appreciate that part.
Nil Einne (talk) 16:58, 12 June 2017 (UTC)
- Interestingly, I had the same thought while writing the RfC. The problem is, I literally could not think of any benefit to Wikipedia from sending referrer information. I could think of a benefit to the WMF; they are likely to think (rightly or wrongly) that sending referrer information somehow increases large donations. But I didn't want to list that because I thought that a lot of editors would see it as a reason not to send referrer information.
- If Wikipedia-WMF relations weren't broken, and RfC like this would lead to a productive discussion with someone at the WMF where they explained in detail why they are rejecting this proposal. Alas, Wikipedia-WMF relations are broken, and the all-too-predictable response from the WMF will be stonewalling; Not saying yes, not saying no, and not saying why. --Guy Macon (talk) 21:12, 12 June 2017 (UTC)
- For what is worth, as a lay Wikipedia editor, I found the question posed in this RfC clear enough to answer it. Still from the point of view of a lay Wikipedia editor, I don't really care if some of the options technically end up validating the Foundation's choices and others invalidating it; the important thing is that it be made clear the results are not binding, and it was. Knowing these facts, I will subsequently be able to assess for myself whether the Foundation heeds the community's wishes, or ignores them. LjL (talk) 23:42, 10 June 2017 (UTC)
- I don't want to validate the current position of the foundation. I want the users of Wikipedia to have a say about whether the sites that they visit when they click on an external link are sent information that allows the site to figure out what Wikipedia pages the users read. --Guy Macon (talk) 20:37, 1 June 2017 (UTC)
- First, we at WMF are assuming that Wikimedia is central to a much larger knowledge ecosystem, rather than only serving Wikipedia community/reader needs, per an understanding of impact and mission closer to what is being discussed around themes #4 and #5: in the movement strategy process. To participate in that ecosystem, and benefit aligned knowledge organizations: we have to acknowledge that their motivations are different than ours. WMF and Wikimedia might be largely self-funding and open/free/privacy purists, but those institutions that we work with are not: and need some way to justify either funders or sponsors (rarely commercial) the impact of their work, and most orgs operate with two metrics: citations and referrals/pageviews on their own websites. As good citizens in that ecosystem, we need to provide at least some signal that helps folks evaluate their impact through us -- if we signal total silence (Dark traffic), then they assume the impact is from Google or some commercial or not-aligned-with-Wikimedia's-values source and invest money and time in SEO rather than free-knowledge infrastructure or they never get found by the public because they don't have financial resources/savvy-- either way, this is bad for the free and open internet's reliable content.
- Second, I personally, in my volunteer capacity, have experience working with a GLAM, that only wanted to support my Wikimedia contributions, because they increased the visibility of their resources (not my main goal by far, but it opened the door to working with them), see: WP:Blake. I hear that story from a lot of long-time Wikimedians who are working with GLAMs, and we also meet a lot of librarians who choose to join the community because of realizing referral stats, but end up staying as valuable allies and contributors in the community well beyond that initial metric (a few published case studies: here, or for example, or example, or example). Our theory of change around this in my team (Community Programs) and Dario's team (Wikimedia Research), is that without first realizing the value of Wikimedia projects through the internal metrics, they never have a motivation to join us, or at least recognize we are allies. Without that first layer of discovering our impact, the rest of our arguments for being allies die (including ones dependent on the values of free, open, secure and private).
When WMF Security and Legal describe full silence as a referrer as having marginal impact on end-user security or privacy compared to our current situation: to me it doesn't make sense to sacrifice the above, high-impact and scale opportunities (movement orgs allying with aligned digital-knowledge-sectors, and recruiting expert communities to participate and invest in Wikimedia projects). I will have a bit more tomorrow, hopefully, but if there are any questions or clarification, let me know. Astinson (WMF) (talk) 22:25, 12 June 2017 (UTC)
- I would encourage everyone to give serious thought about the view expressed above. Yes, it disagrees with my view in many ways, but this is a good thing, and I am glad to see the the RfC is becoming less one-sided. We all want what is best for Wikipedia, and to my way of thinking a vigorous debate about what, exactly is best for Wikipedia is much more helpful than a bunch of editors who agree with me. Too many times in my life I have had that happen and later realized that groupthink kept us from making the best decision. So let's avoid knee jerk reactions and give Astinson (WMF)'s arguments careful consideration.
- Minor housekeeping detail: Astinson (WMF), you wrote "...WMF Security and Legal describe full silence as a referrer as having marginal impact on end-user security or privacy compared to our current situation...". Do you have a link to the discussion where they said that? --Guy Macon (talk) 00:59, 13 June 2017 (UTC)
- Found it. See Wikipedia:Village pump (policy)/Archive 126#Review of the change in terms of privacy concerns. --Guy Macon (talk) 12:25, 13 June 2017 (UTC)
- Yep, that is the one I am referring to. We are re-reviewing this right now, but the initial indications are much the same conclusion. We may have some other additional thoughts as well, Astinson (WMF) (talk) 15:00, 13 June 2017 (UTC)
- Also, wanted to ping a few folks that asked for a better arguement or might have thoughts based on how they are commenting above:@WNT, Godsy, Iridescent, Elmidae, Tcncv, and Rivertorch:. I will add a bit more later too, and will ping other folks, Astinson (WMF) (talk) 15:00, 13 June 2017 (UTC)
- In a purely personal capacity, I didn't sign up to be and don't recognise myself as being
central to a much larger knowledge ecosystem
, nor do I want to be part of a project that amends its policies to make it easier for Facebook et al just because the WMF have arbitrarily decided they are "strategic partners". Of the organisations on the list you link above, at a rough guess I'd say I'm actively hostile to working with at least 25% of them, and at best neutral about the remainder. ‑ Iridescent 15:16, 13 June 2017 (UTC)- @Iridescent: As regards the graphic linked above. They aren't all partners or allies in that map, but rather folks who influence our movement in one way or another (or act on us whether or not we work with them-- for example Facebook and Google). There are some groups, that we are much more closely aligned with: and those groups are the ones that we are most interested in supporting.Astinson (WMF) (talk) 17:44, 13 June 2017 (UTC)
- In a purely personal capacity, I didn't sign up to be and don't recognise myself as being
- Also, wanted to ping a few folks that asked for a better arguement or might have thoughts based on how they are commenting above:@WNT, Godsy, Iridescent, Elmidae, Tcncv, and Rivertorch:. I will add a bit more later too, and will ping other folks, Astinson (WMF) (talk) 15:00, 13 June 2017 (UTC)
- Yep, that is the one I am referring to. We are re-reviewing this right now, but the initial indications are much the same conclusion. We may have some other additional thoughts as well, Astinson (WMF) (talk) 15:00, 13 June 2017 (UTC)
- Found it. See Wikipedia:Village pump (policy)/Archive 126#Review of the change in terms of privacy concerns. --Guy Macon (talk) 12:25, 13 June 2017 (UTC)
As individual editors it may be tempting to think that our service to readers ends with what we write and what they read, but that is putting blinders on our increasingly central role in the entire knowledge ecosystem. Our massive amounts of traffic to our site tell us that we are doing a good job, but it's the traffic that we send to other sites that tells the rest of the world Wikipedia matters--that it is important, authoritative, un-ignorable--and this leads to a myriad of individuals and organizations seeking to engage with us.
Everything from university libraries to museums to newspapers to human rights groups to scholars... they do care more about Wikipedia because it is visible to them in the traffic we send. I'm terrifically opposed to the privacy-shredding infrastructure of the modern web and take ample steps myself to avoid being tracked, targeted, and advertised to. If we were sending information about the specific pages that users were coming from it would be an egregious risk to them, a loss of privacy, a violation of our ethos as a research service, and a contributor to the beast of internet identity tracking. That, however, in all but a very small number of marginal vulnerabilities (that we still need to look at), is not what happens now.
Sending only the *domain* of a website like ours tells external sites essentially nothing about the individual readers of Wikipedia. All it tells them is that Wikipedia is big, it is critical, it is significant, and it matters. We don't need to erase that last piece of data and go silent, dark, because we're already dark at the individual level. At the site level, this minimal amount of information shines a spotlight only on our projects as a whole, and we as a movement benefit greatly form that in many ways, some of which are also invisible to individual users. Keep users safe in the dark, put Wikipedia in the spotlight--that's what benefits us most. Ocaasi (WMF) (talk) 18:19, 13 June 2017 (UTC)
@Astinson (WMF): Some of the technical specifics of this issue are way over my head, but since you asked, I'll say this much more. Wikipedia is the only top website that is totally noncommercial and doesn’t attempt to monetize its visitors. That’s nearly miraculous, considering what the World Wide Web has become, but it won’t stay that way if it chooses to forge relationships with other entities whose motives are less pure or whose financial position is more precarious. Wikipedia’s users and visitors have an expectation of…if not benevolence exactly, then at least detachment, when coming and going from the site.
In your reply to Iridescent above, you say that [t]here are some groups, that we are much more closely aligned with: and those groups are the ones that we are most interested in supporting
. But how can we be sure who those groups are aligned with now, or who they’ll be aligned with in the future, and how can we be sure what data might ultimately be passed along? I realize that slippery-slope arguments are sometimes alarmist and illogical, but when it comes to Internet user data, I think these are questions worth asking.
On a more practical level, referrals are a concern for a at least two reasons that I won’t go into here, except to say that the more important of the two conceivably could affect the physical safety of vulnerable people in certain places. I can even imagine instances in which the Wikipedia domain alone could pose a problem. It’s not likely but it could happen, and if we accept that, then we should accept that it is incumbent upon the Foundation to take that into account when deciding how much information its servers pass along to other sites. Eliminating risk is impossible, of course, but minimizing it should be the goal—and that means providing no information whatsoever, whenever that is possible. RivertorchFIREWATER 19:32, 13 June 2017 (UTC)
- In the comment above, Ocaasi (WMF) claims that "Sending only the *domain* of a website like ours tells external sites essentially nothing about the individual readers of Wikipedia." That isn't true. Consider the case of someone reading the Wikipedia page at Bomb-making instructions on the internet#References who clicks on the link to The Low Cost Cruise Missile: A looming threat?. A quick check with the link search tool we helpfully provide will tell the owner of aardvark.co.nz (and anyone who hacks that site or who has a court order) that the only two articles on Wikipedia with that link on them are are Bomb-making instructions on the internet#References and Bruce Simpson (blogger)#DIY Cruise Missile. Thus it is not true that the owner of aardvark.co.nz knows "essentially nothing". Instead he knows that the Wikipedia user was reading one of two Wikipedia pages. In cases where there is only one page with the link, he not only knows exactly which page the Wikipedia user was reading when they clicked the link, he knows exactly which sentence they were reading. And we already know of at least one person who got a visit from the police after Googling pressure cookers and backpacks. (See Locals Questioned by Suffolk County Police Department after Googling "Backpack and "Pressure Cooker", and Google Pressure Cookers and Backpacks, Get a Visit from the Feds.) There are places where you can be arrested and tortured for reading the Wikipedia page Bomb-making instructions on the internet. We should not take revealing who reads that page lightly. --Guy Macon (talk) 20:57, 13 June 2017 (UTC)
- Guy, I'm out of my depth here technically and so would rely on WMF privacy, security, and legal advice for risk specifics. In other words, take this with a grain of salt. I think that you're not wrong about risk, but not right enough to make a meaningful distinction in the risk-benefit analysis. The scenario you presented, of a clicked-on link to a site about low-cost missiles traced back to the Wikipedia article on bombmaking is a great example. It's a scary scenario, but misses the bigger point and actual privacy risk readers face. If the reader is visiting the site on low-cost homemade missiles and their ip address or browser information is exposed, that alone and in itself is problematic--and no moreso than adding "and I came from Wikipedia's page on the subject to get here". If a user needs protection from that level of ip surveillance off of Wikipedia, then they would need to be blocking/hiding/masking their ip to the websites regardless, and Wikipedia's referrer info wouldn't make a significant difference. It even gives a false sense of security that having a silent referrer protects you from tracking once you leave Wikipedia; indeed, far more cautious and rigorous (but available) methods are needed for that. User privacy once someone leaves Wikipedia is very important, important enough that we need to inform users to equip themselves with tools sufficient to manage their actual risks. The referrer policy being minimal vs. absolutely silent tips the scale only slightly, but the larger issue of exposed/unmasked ip addresses is the real concern: that is the weight which tips the whole scale over on its head--and it happens outside of our control. So would we choose to lose massive network benefit and visibility for a negligible sliver of anonymity that isn't sufficient to protect privacy in the first place? Ocaasi (WMF) (talk) 22:20, 13 June 2017 (UTC)
- I think there are two adversaries here that are kind of being conflated in this discussion:
- First adversary – A passive eavesdropper in the middle of the network path. (e.g. Someone in a coffee shop monitoring the wifi, some governments, etc)
- Changing our referrer policy has little affect on this adversary as the adversary can sniff the SNI TLS extension (which contains the domain name), the IP destination address (Its easy to find out who owns the destination IP address given the IP). Depending on where in the network the adversary is located, they may also be able to sniff DNS requests. Thus the referrer policy has basically no affect on this adversary.
- Second adversary (The target of the link in question)
- The entire point of the current policy is to allow the targets of links to be able to generate bulk statistics about where there traffic is coming from without revealing what specific page. To that end, it does leak enough information for such third parties to be able to say x% of our requests come from en.wikipedia.org. However, the referrer is limited to the origin (origin is fancy webspeak for the part of URL before the "path" portion. so https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy) for example, has the origin https://en.wikipedia.org). I believe most people consider it acceptable that third parties can make bulk statistics about how many of their visitors come from Wikipedia vs other places on the internet (Am I correct in this)?
- From what I understand – the argument here is that if there is a unique-ish url (perhaps with a unqiue identifier in a query parameter) that's only used a few places on wikipedia, the adversary can then use Special:linkSearch to narrow down which specific page on Wikipedia the user is coming from. My counter to this would be:
- Is the full url someone is coming from actually "sensitive"? Ideally that wouldn't be exposed as it kind of sounds creepy, but is it actually harmful to expose it? Prior to the introduction of our current referrer policy, this information was regularly exposed (The default referrer policy in browsers exposes this information in many circumstances) with no complaints afaik. Are there realistic scenarios where users would be hurt by exposing this information, and what are they?
- Even with the most restrictive no-referrer policy [8], an adversary could still figure out what page a user is coming from by putting the url only on Wikipedia. This type of attack relies on the url being relatively unique, thus it makes sense that someone doing this attack would only use the link in one place, meaning that they don't even need the referrer. It should be noted that google webmaster tools allows site owners to see all the people who link to them (Like Special:Linksearch but for the entire internet).
- Thus, I think the privacy concerns for the current referrer policy are overstated, and even if they are an issue, changing the referrer header will not "fix" the problem well it would seriously inconvenience people who want to run bulk traffic statistics on their website. BWolff (WMF) (talk) 23:28, 13 June 2017 (UTC)
- @BWolff (WMF): I said the same thing above. However, I don't think that is the last word. The thing is, if you simply put a "sting" link on the Internet, you don't really know how a user came to it. Unless you're told. It's still at least plausible that there is some other way your link ended up sitting clickable somewhere. And the other thing is, well … I'm not really feeling accommodating toward Internet commerce. I mean, there are so many companies and governments trying to figure out everything you do on the internet and have one God-AI Who Simply Knows Everything About You, and if Wikipedia were to not go along with them there would be a certain non serviam satisfaction to it all. Let them write articles and reports and estimates how much Wikipedia traffic there is. What does it hurt? Even if they misestimate how much there is, and therefore underestimate the value of advertising their company on Wikipedia … is that a bad thing? Really? Wnt (talk) 23:57, 14 June 2017 (UTC)
- @Wnt: [Switching to my personal account to emphasize this is just my personal/volunteer opinion]. The flip side of not really knowing the sting link came from Wikipedia even if it is very likely it did, is that even with referrers, there's no guarantee. People can hack their web browser to make the referrer be anything, including making it be Wikipedia even when it shouldn't be (Or use command line tools like curl or wget). However that's pretty unlikely.
- From what I understand, one of usecases for referrers is GLAMs who freely license photographs (or other Media) so that the media can then be used on commons and then Wikipedia and want to track the "impact" of their media "donation". GLAMs who open up their media often do get a traffic bump due to people clicking on the source link, or perhaps because upon seeing the image in question users become interested in the object in question, and want to do follow-up research around the internet. Having the referrer data helps these groups quantify the effect that free licensing their collection has on their web properties. This in turn helps their staff make the case to their bosses internally that investing effort to re-license media is worth it. GLAMs are after all like any other organization – staff have to prove that their pet projects further the organization's mission. The result of all this is we are able to get access to multimedia we wouldn't otherwise have access to. Now if I believed the referrer data actually compromised the privacy of our users, I would say that the trade off is not worth it. But I do not believe that it does, and as a result of GLAM collaboration we have a lot of cool photos that we would be unable to otherwise have (Can't exactly send a volunteer back in time to photograph an event that happened 50 years ago). Additionally, I don't consider this type of third party to fall into the evil-world-dominating-corproation category. Bawolff (talk) 07:25, 15 June 2017 (UTC)
- @BWolff (WMF): I said the same thing above. However, I don't think that is the last word. The thing is, if you simply put a "sting" link on the Internet, you don't really know how a user came to it. Unless you're told. It's still at least plausible that there is some other way your link ended up sitting clickable somewhere. And the other thing is, well … I'm not really feeling accommodating toward Internet commerce. I mean, there are so many companies and governments trying to figure out everything you do on the internet and have one God-AI Who Simply Knows Everything About You, and if Wikipedia were to not go along with them there would be a certain non serviam satisfaction to it all. Let them write articles and reports and estimates how much Wikipedia traffic there is. What does it hurt? Even if they misestimate how much there is, and therefore underestimate the value of advertising their company on Wikipedia … is that a bad thing? Really? Wnt (talk) 23:57, 14 June 2017 (UTC)
- I think there are two adversaries here that are kind of being conflated in this discussion:
- Guy, I'm out of my depth here technically and so would rely on WMF privacy, security, and legal advice for risk specifics. In other words, take this with a grain of salt. I think that you're not wrong about risk, but not right enough to make a meaningful distinction in the risk-benefit analysis. The scenario you presented, of a clicked-on link to a site about low-cost missiles traced back to the Wikipedia article on bombmaking is a great example. It's a scary scenario, but misses the bigger point and actual privacy risk readers face. If the reader is visiting the site on low-cost homemade missiles and their ip address or browser information is exposed, that alone and in itself is problematic--and no moreso than adding "and I came from Wikipedia's page on the subject to get here". If a user needs protection from that level of ip surveillance off of Wikipedia, then they would need to be blocking/hiding/masking their ip to the websites regardless, and Wikipedia's referrer info wouldn't make a significant difference. It even gives a false sense of security that having a silent referrer protects you from tracking once you leave Wikipedia; indeed, far more cautious and rigorous (but available) methods are needed for that. User privacy once someone leaves Wikipedia is very important, important enough that we need to inform users to equip themselves with tools sufficient to manage their actual risks. The referrer policy being minimal vs. absolutely silent tips the scale only slightly, but the larger issue of exposed/unmasked ip addresses is the real concern: that is the weight which tips the whole scale over on its head--and it happens outside of our control. So would we choose to lose massive network benefit and visibility for a negligible sliver of anonymity that isn't sufficient to protect privacy in the first place? Ocaasi (WMF) (talk) 22:20, 13 June 2017 (UTC)
A solution that satisfies privacy and GLAM requirements?
[edit]@Bawolff: I went back to the original specification and found something interesting: it is actually possible to set referrer policy on an individual link using <a referrerpolicy=... >
. That means that if GLAM volunteers feel a dire need to get this information out for some specific links, WMF could accommodate them. (There are also supposedly some funny Javascript methods to set all the referrers everywhere to a deliberate fake address e.g. www.wikimedia.org ([9]) though, as often the case with Javascript, it's a crapshoot what happens.) But in general, privacy should be the policy, and we should look for very compelling reasons before tolerating even very limited exceptions. When push comes to shove we don't have to get this stuff today; but we do have to resist a flawed culture. Treading water won't seem so bad once the ship sinks. Wnt (talk) 14:33, 15 June 2017 (UTC)
- Good catch! This really looks like a good solution. Right now we are heading down the road to a situation where the consensus of the Wikipedia community is to be a silent referrer, with the inevitable result that those who are more concerned about GLAM will lobby the WMF to ignore the consensus of the Wikipedia community. This is a recipe for conflict. If instead the GLAM people get on board with being a silent referrer and take the initiative to send referrer information to those GLAM partners that need it, then we will have reached a compromise that makes everyone happy. In addition, we can send full URLs to the GLAM partners not just the domain, thus giving the GLAM partners more information than they get now. --Guy Macon (talk) 15:12, 15 June 2017 (UTC)
- This has a lot less to do with specific partnerships, and more to do with all of the different, diverse, and often-not-yet-contacted organizations that ought to be partnering with our community. There are thousands of Libraries with published content cited on Wikimedia projects, similarly there are dozens of new academic journals being added to our citations each month, expecting Wikimedia communities in many languages to conform to a rather complex format for the citations: even if we automated this in one way or another, you still are actively damaging the impact of these knowledge organizations, who share the same the same fundamental end-goal and mission as us. Astinson (WMF) (talk) 16:28, 15 June 2017 (UTC)
- Astinson, I appreciate that everyone at the WMF pushing this is acting in good faith, but when you have every non-WMF-affiliated person to express an opinion disagreeing with you, this is the recipe for a repeat of Flow or Superprotect. While this is my personal opinion, I'm fairly confident that I'm expressing the general will of the community when I say that if a particular self-appointed "partner" is making demands that are totally at odds with Wikipedia's basic ethical and cultural standards, that's an argument for breaking ties with that organisation, not for redesigning Wikipedia to accommodate the whims of whichever publisher the WMF is currently trying to schmooze. If a particular institution refuses to cooperate with Wikipedia because they don't like our policies, that's their loss not ours. It's to the WMF's credit that they don't generally try to throw their weight around, but that doesn't mean we should lose sight that we are the cultural phenomenon leading the revolution in how information is disseminated, not the William Blake Institute, the Metropolitan Museum of Art or even Elsevier, Gale and Google, and we shouldn't be afraid to dictate to other organisations the terms under which we are willing to allow them to work with us, rather than go cap-in-hand asking them what compromises they'd like us to make in order for them to give us the crumbs from their table. Put bluntly, there is no partner whose loss would cause significant damage to Wikipedia; it seems important to the handful of editors who make use of it, but if the whole of WP:TWL shut down tomorrow the impact would not even be noticed by 99% of Wikipedia's active editors. (I've managed to get by thus far without once having interacted with either TWL or a GLAM partnership, and I'm writing primarily on the visual arts where one would expect that kind of thing to be more common. The whole TWL/GLAM/WIR alphabet soup is really not as important as those who are connected to it think it is.) ‑ Iridescent 17:00, 15 June 2017 (UTC)
- Astinson (WMF), the problem is that you are asking the WMF to ignore the wishes of the Wikipedia community, which quite frankly values user privacy over GLAM. Are you "sure" that you want to fight that fight? It is likely to end up with you losing the fight, but only after a huge shitstorm. --Guy Macon (talk) 17:30, 15 June 2017 (UTC)
- (edit conflict) @Iridescent: I would agree with you: if this were about satisfying a handful of organizations or momentary partnerships with the change, it wouldn't be worth it, but we are talking about a web-wide utilized standard for metrics among informational websites. Our hard-line on other concerns throughout the movement (including free and open copyright, community based editorial decisions, etc), actually are leading in a practice shift as you mention.
- Yet these external communities are a significant component of our impact. If you haven't worked with these other knowledge organizations, or haven't found them useful yet, that doesn't mean that your experience is one shared by all facets of the movement; there is ample indication that partnership and collaboration with organizations and communities outside of our projects is a top-level priority for many parts of the Wikimedia movement: see the various themes emerging in the strategy process. Moreover, in this conversation, there are several indications that folks who might have opinions can't meaningfully engage here (take User:Mike Christie's comment). I, and other WMF staff, am engaging in good faith, in part because we recognize that the privacy concerns expressed here effect a certain subsection of the community, but the project-wide way of addressing those concerns which the RFC proposes doesn't fully account for the various other stakeholders in the conversation. As I mentioned a couple days ago, we are consulting internally to see how we can support the privacy concern, while not discounting the impact of this whole other part of the movement. Astinson (WMF) (talk) 17:58, 15 June 2017 (UTC)
- @Astinson (WMF): I have to wonder how far you're planning to go with that. I mean, if Facebook tells you they'll pay all your bandwidth but in exchange you have to put their tracking scripts and like buttons and web bugs on every page, what do you say? It seems like it would be better to stick to a hard line now when the bribes are small and resisting them is easy, rather than going down the garden path and having to say no when the money is huge but the consequences are a loss of fundamental values. Wnt (talk) 20:12, 15 June 2017 (UTC)
- Again, FUD. what are you even talking about here. As alex says at the end of his comment, if anything the Wikimedia movement has been leading the privacy debate.. You ask the security staff for information, and you blame them for engaging ? You ask people who have a good use for the current referrer policy to explain this, and you are surprised they answer ?? —TheDJ (talk • contribs) 08:01, 16 June 2017 (UTC)
- I did not mind that he answered – but I do want to take that opportunity to point out to him what direction such an answer could eventually lead toward. Wnt (talk) 10:50, 16 June 2017 (UTC)
- I have been reading this thread quietly, and want to respond here, as Astinson has articulated a key thing. The editing community is what creates and maintains the key asset here – content. Without content generated and maintained by the community, WMF would have nothing to do. It is content that people come to find.
- It is absolutely true that WMF enables content creation and maintenance by providing the servers and software, and that is a great thing. And I appreciate WMF's efforts to grow the movement via things like alliances. But what folks in WMF seem to forget all the time, is that this peripheral stuff – like creating alliances – is not content creation and maintenance. It might enable content creation and maintenance, but it is not itself content creation.
- So – don't put the cart before the horse. The WMF does not lead -- it serves. When the WMF (and WMF employees) forget that what they are doing is supportive or peripheral to the core asset-generating miracle here (volunteer content creation and maintenance) and act as though WMF and its goals are central, it has lost its way.
- What is the the generator of the miracle? People have spent lots of time studying that, but at the bottom it is probably values. We do this for free, because it is made available freely. We do this anonymously, and we fiercely protect our privacy and want to protect the privacy of readers.
- What you are failing to hear (quite completely) is that what you are advocating runs hard against these two core values of no-money and privacy. You want to "monetize" the link-trail. And why? To further the WMF's goals. You should be very wary of violating core community values to pursue WMF goals. That is ass-backwards. Jytdog (talk) 04:43, 17 June 2017 (UTC)
- Astinson, I appreciate that everyone at the WMF pushing this is acting in good faith, but when you have every non-WMF-affiliated person to express an opinion disagreeing with you, this is the recipe for a repeat of Flow or Superprotect. While this is my personal opinion, I'm fairly confident that I'm expressing the general will of the community when I say that if a particular self-appointed "partner" is making demands that are totally at odds with Wikipedia's basic ethical and cultural standards, that's an argument for breaking ties with that organisation, not for redesigning Wikipedia to accommodate the whims of whichever publisher the WMF is currently trying to schmooze. If a particular institution refuses to cooperate with Wikipedia because they don't like our policies, that's their loss not ours. It's to the WMF's credit that they don't generally try to throw their weight around, but that doesn't mean we should lose sight that we are the cultural phenomenon leading the revolution in how information is disseminated, not the William Blake Institute, the Metropolitan Museum of Art or even Elsevier, Gale and Google, and we shouldn't be afraid to dictate to other organisations the terms under which we are willing to allow them to work with us, rather than go cap-in-hand asking them what compromises they'd like us to make in order for them to give us the crumbs from their table. Put bluntly, there is no partner whose loss would cause significant damage to Wikipedia; it seems important to the handful of editors who make use of it, but if the whole of WP:TWL shut down tomorrow the impact would not even be noticed by 99% of Wikipedia's active editors. (I've managed to get by thus far without once having interacted with either TWL or a GLAM partnership, and I'm writing primarily on the visual arts where one would expect that kind of thing to be more common. The whole TWL/GLAM/WIR alphabet soup is really not as important as those who are connected to it think it is.) ‑ Iridescent 17:00, 15 June 2017 (UTC)
Just to make sure that we are all talking about the same thing, if Wikipedia puts the following in the head of the HTML...
<meta name="referrer" content="same-origin">
...we will send no referrer information when a user clicks on a link to a non-Wikipedia page and full referrer information when a user clicks on a link to another Wikipedia page.
If we then add the following to selected links...
<a href="http://example.com" referrerpolicy="always">
...this will override the meta tag in the head for that particular link.
Thus we can use the meta tag to make Wikipedia a silent referrer for all outgoing links and then override that policy with referrerpolicy on the link for any sites that are of particular interest to GLAM or The Wikipedia Library. This can all be done with a bot; all we humans would need to do is to make a list of what sites we want to send referrer information to.
Several sources say that all major browsers support setting a referrer policy for the page and then overriding it with a referrer policy for the link (which is what the standard at [10] says they should do) and an extensive web search has turned up zero evidence of any major browser not supporting both. A few seldom-used browsers (Internet Explorer under Windows 95, Opera Mini (not to be confused with Opera) don't support either. --Guy Macon (talk) 07:05, 18 June 2017 (UTC)
Arbitrary break 02
[edit]Unsourced claim that has been proven to be false |
---|
The following discussion has been closed. Please do not modify it. |
Only Chrome supports this attribute so far, and none of the other vendors have indicated they are going to be adding this any time soon. —TheDJ (talk • contribs) 15:32, 15 June 2017 (UTC)
Where is the evidence that only chrome supports use of <a href="http://example.com" referrerpolicy="always"> (which is what we need to override a "same-origin" (silent referrer) referrer meta tag on selected links)? According to [11] Referrer Policy is supported by:
It is not supported by Internet Explorer, but because Internet Explorer also doesn't support the referrer meta tag or referrerpolicy on the link, IE will always have the default HTTP behavior no matter what we do with our meta tags and links. According to [12], There are many ways you can deliver the referrer policy:
--Guy Macon (talk) 03:44, 16 June 2017 (UTC)
|
Examples where sharing referrer information is a benefit to Wikipedia
[edit]It's easy to point to examples where sharing this kind of information is bad, but I've seen sharing some information be a real benefit to Wikipedia, and wanted to make sure we talk about that. I have an obvious conflict of interest for this particular example, but it's the one I'm most qualified to talk about; The Wikipedia Library. As much as we would love for all the publishers and databases who give free access to Wikipedians to be doing so purely out of the good of their heart, some (not all) are doing so at least partly from a business perspective. More Wikipedians citing their content on Wikipedia means more people clicking through to their content – that's unavoidable. As such, one of the metrics by which some organisations judge whether they want to continue giving Wikipedians free access to resources is traffic from Wikipedia, which they can only track if we're at least sending them the current amount of information (wikipedia.org). Removing that information would have a demonstrable impact on TWL's ability to expand, as well as to continue existing access donations. My main point being, the ability to monitor traffic levels from Wikipedia is a really important way to convince organisations to get iolved with the Wikimedia community; if they can see obvious benefits to doing so they're much more likely to work with us to make the Wikimedia projects a better place. I know there are other examples where sharing this data is ultimately beneficial to Wikipedians, and would encourage anyone who has a similar case study to share it. Sam Walton (talk) 12:06, 2 June 2017 (UTC)
- I am looking at Wikipedia:WikiProject Resource Exchange/Resource Request, and I am not seeing any examples where a publisher or databases gave free access to the world, which they would have to do in order to get traffic from Wikipedia or anyone else. Of course me not being able to find it does not equal it not existing. Do you have any specific examples that lead to links on Wikipedia? --Guy Macon (talk) 22:27, 3 June 2017 (UTC)
- @Guy Macon: My example wasn't about the resource exchange, but rather the access donations. See File:Wikipedia Incoming Traffic Graph.png for some example traffic data, showing the increase for one TWL partner, which led them to be enthusiastic about continuing to give free access to Wikipedians. That plot is also interesting to see the HTTPS dip, before the referrer tag was put in place :) Sam Walton (talk) 23:11, 5 June 2017 (UTC)
- The HTTP referer article gives an in depth explanation of referrers for anyone else who was largely unfamiliar with the concept before this like me. — Godsy (TALKCONT) 07:50, 10 June 2017 (UTC)
- ((EC) Yes perhaps one significant point in light of the comment above it it's something that's existed and been used since 1996 and has nothing to do with HTML5. As I understand it, the particular issue of concern here is that HTML5 has a "meta referer" tag that can be used to overide default browser referer policy. Default policy for most browsers is to not send any referer tag when HTTPS was used to connect to the originating website. The WMF did not used to implement that tag meaning once wikimedia sites were switched to HTTPS only, referers would not normally be sent for any user. (Prior to that, any user using HTTPS would generally not have sent a referer but anyone using HTTP would generally have done so.) They've now added the tag meaning that a referer will be sent again, similar to was done before with HTTP but now for everyone using HTTPS (unless they get their browser to overide default behaviour) except they specific origin meaning only the hostname will be sent unlike before where it would have been the full URL. Meta:Research:Wikimedia referrer policy So while this specific question has to do with HTML5, referrers are not some sort of fancy HTML5 web stuff. Nil Einne (talk) 04:11, 11 June 2017 (UTC)
- That chart raises a question in my mind, and perhaps the suspicion that there is something I don't understand about referrers in the HTML handshake vs referrers in the meta tag. The file description says "Trend line does not include this dip or the recovery period shortly afterwards". Why would there be a recovery period? The trend line instantly dropped to zero when we stopped sending referrer information. Why didn't it instantly recover when we resumed sending referrer information? --Guy Macon (talk) 11:57, 11 June 2017 (UTC)
- Yeah, good question, I wondered that too, and don't know enough about the technical details to say. Sam Walton (talk) 07:16, 12 June 2017 (UTC)
- @Samwalton9: To begin with, if there's a quid pro quo, it's not a donation. And if there's a quid pro quo, what is that? If all we're paying is eyeballs, then any method to estimate those eyeballs should be sufficient. Now that they know how many people were referred, we don't need to prove it again. But if what we're paying is user privacy, then this devil's deal needs to be stamped out posthaste. If the companies are that eager to see referer data that they would give real value and then take it away if it's not given, that's exactly the reason not to give it! But I think the real reason is just that the access donations don't really cost them anything at all; they know that they lose only a handful of sales, and get them straight back again in free advertising, no matter how you count the eyeballs. Wnt (talk) 23:19, 11 June 2017 (UTC)
- @Wnt: Problem is, there is no real way for the "sponsor" (not "donator", I agree on that point) to quantify the "eyeballs" without the referer info. In the example given above, Wikipedia-originating pageviews are a few hundreds per day. If the website receives 1000+ hits per day with a pattern of increase (as is usually the case when you are taking SEO measures), it would be extremely hard to determine the impact of Wikipedia exposure if you just have to guess. This particular sponsor may now know how many WP-originating pageviews there are, but for another new sponsor, or that sponsor after a change in the IT department, without the referer there is no easy way to quantify the page views.
- If the only way to count the eyeballs is to allow the sponsor to know the browser history of whoever's brain is at the other end of the optic nerve, I would support not counting eyeballs at all ("devil's deal", as you say). I would expect the WMF to take a similar position (unless/until finances get tight), but it is not a noncontroversial position, as much as I would like it to be. TigraanClick here to contact me 16:34, 12 June 2017 (UTC)
- @Tigraan: So? They guess. You make it sound as if otherwise it were an exact science -- like they have any real idea how much the added traffic really translates to sales, given that someone who finds out they exist via Wikipedia will probably come back and look at them on some other occasion before putting down any money. Yeah, they might not like not being in control and they might offer us fewer shiny baubles in exchange … but how much difference do those really make anyway? Wikipedia is swimming in cash, could buy all the subscriptions it wanted if it felt like it. The fact is, old fashioned advertising didn't come with links – you put your ad in the paper and maybe you got a chance to ask a few people what brought them to your shop. We should strive to be a precedent for turning the entire web back to old fashioned advertising – no more tracking, no more scripts, no more spies, and, sadly for many of us, but happily for the companies, no more ad blockers. Because you can't adblock a straight image or a link on a page. That's the leadership we should be showing. The corporates who invaded our educational internet a decade and a half ago have made a trash heap of it, only good for getting your computer ransomed, and we ought to stand up for a better way. Wnt (talk) 11:48, 13 June 2017 (UTC)
- How much views translate to a sale is not an exact science. But how much views originate from which referer is. Maybe the sponsors are idiots who don't realize this, and they are wrong in wanting referer data; maybe they are evil megacorporations or useful idiots of those and they are morally bankrupt for wanting the data. But the fact remains that they want the data, and will stop sponsoring if we don't give it to them. Should the WMF become 100% convinced of your point, which I doubt, it would still be unlikely to have the inclination to argue with every and each sponsor to convince it. (I agree with you 100% about the adverspying in today's internet, but SEO and referers were already a part of the previous model, so I don't see how it is relevant here.)
- As for buying all possible subscriptions, well, see Elsevier#Pricing. The WMF is probably not thrilled at the idea of coughing north of $1m per year and giving it to one of the worst offenders in terms of restricting access to knowledge. TigraanClick here to contact me 14:02, 13 June 2017 (UTC)
- Surely there is a middle way to be found, e.g. enabling origin-when-cross-origin only for .edu sites. I would expect an organisation with $80 million in revenue to do a better job negotiating instaed of giving away all of this data just to solicit more donations. This is technically speaking unidentifiable/anonymised data, but in practice some corporations out there can pretty much reconstruct large parts of a person's browsing history based on such data, and I'm quite annoyed that many people refuse to acknowledge this. Daß Wölf 00:26, 14 June 2017 (UTC)
- @Tigraan: If they don't want to cough up a million in subscriptions, they should be leery of coughing up a million in privacy. Do they really need all those subscriptions? Are they really worth a million? Elsevier knows full well that many of the Wikipedians do not work at a well-heeled private scientific company that can afford to pay per article or by subscription, and therefore, they lose no money or value by letting them subscribe. And they also know that many of us prefer to use Sci-Hub even above applying for free Wikipedia-related access. I mean, it costs them nothing to give this access, it's worth very little, we don't need it, but it ends up leading to them getting valuable exposure even if they can't quantitate it reliably. And if all that fails … we can go back to the old status quo, no subscriptions, we get the articles whichever way we can, whether it's Sci-Hub or Twitter tags or just plain emailing the author for an eprint like in antediluvian times, or even, God forbid, going down to the library for a formal interlibrary loan. I mean, the thing about bait is you don't actually have to eat it, not if you're not clever enough to figure out what it's sitting on. Wnt (talk) 23:44, 14 June 2017 (UTC)
- As for buying all possible subscriptions, well, see Elsevier#Pricing. The WMF is probably not thrilled at the idea of coughing north of $1m per year and giving it to one of the worst offenders in terms of restricting access to knowledge. TigraanClick here to contact me 14:02, 13 June 2017 (UTC)
- How much views translate to a sale is not an exact science. But how much views originate from which referer is. Maybe the sponsors are idiots who don't realize this, and they are wrong in wanting referer data; maybe they are evil megacorporations or useful idiots of those and they are morally bankrupt for wanting the data. But the fact remains that they want the data, and will stop sponsoring if we don't give it to them. Should the WMF become 100% convinced of your point, which I doubt, it would still be unlikely to have the inclination to argue with every and each sponsor to convince it. (I agree with you 100% about the adverspying in today's internet, but SEO and referers were already a part of the previous model, so I don't see how it is relevant here.)
- @Tigraan: So? They guess. You make it sound as if otherwise it were an exact science -- like they have any real idea how much the added traffic really translates to sales, given that someone who finds out they exist via Wikipedia will probably come back and look at them on some other occasion before putting down any money. Yeah, they might not like not being in control and they might offer us fewer shiny baubles in exchange … but how much difference do those really make anyway? Wikipedia is swimming in cash, could buy all the subscriptions it wanted if it felt like it. The fact is, old fashioned advertising didn't come with links – you put your ad in the paper and maybe you got a chance to ask a few people what brought them to your shop. We should strive to be a precedent for turning the entire web back to old fashioned advertising – no more tracking, no more scripts, no more spies, and, sadly for many of us, but happily for the companies, no more ad blockers. Because you can't adblock a straight image or a link on a page. That's the leadership we should be showing. The corporates who invaded our educational internet a decade and a half ago have made a trash heap of it, only good for getting your computer ransomed, and we ought to stand up for a better way. Wnt (talk) 11:48, 13 June 2017 (UTC)
- Actually seems I misunderstood something I read. I now believe the default with HTTPS is as outlined in the comment above namely referer will be sent only if the other connection is also secure. See e.g. [14] [15] BTW the comment above I referred to in my comment was KMF's comment no under option 5, which was here before it was moved. Nil Einne (talk) 15:45, 12 June 2017 (UTC)
- I posted a list of the various behaviors and when Wikipedia exhibited them under question #6, in case someone wants to !vote for a previous behavior that I didn't list in my list of questions. It may also help everyone to understand what the default HTTP and HTTPS behaviors are. --Guy Macon (talk) 01:45, 13 June 2017 (UTC)
- That chart raises a question in my mind, and perhaps the suspicion that there is something I don't understand about referrers in the HTML handshake vs referrers in the meta tag. The file description says "Trend line does not include this dip or the recovery period shortly afterwards". Why would there be a recovery period? The trend line instantly dropped to zero when we stopped sending referrer information. Why didn't it instantly recover when we resumed sending referrer information? --Guy Macon (talk) 11:57, 11 June 2017 (UTC)
- ((EC) Yes perhaps one significant point in light of the comment above it it's something that's existed and been used since 1996 and has nothing to do with HTML5. As I understand it, the particular issue of concern here is that HTML5 has a "meta referer" tag that can be used to overide default browser referer policy. Default policy for most browsers is to not send any referer tag when HTTPS was used to connect to the originating website. The WMF did not used to implement that tag meaning once wikimedia sites were switched to HTTPS only, referers would not normally be sent for any user. (Prior to that, any user using HTTPS would generally not have sent a referer but anyone using HTTP would generally have done so.) They've now added the tag meaning that a referer will be sent again, similar to was done before with HTTP but now for everyone using HTTPS (unless they get their browser to overide default behaviour) except they specific origin meaning only the hostname will be sent unlike before where it would have been the full URL. Meta:Research:Wikimedia referrer policy So while this specific question has to do with HTML5, referrers are not some sort of fancy HTML5 web stuff. Nil Einne (talk) 04:11, 11 June 2017 (UTC)
I think the points in favor of wikipedia.org or en.wikipedia.org referrer information are compelling (@Ocaasi (WMF):), and I would certainly act differently as a content provider analyzing bulk statistics if I saw many such referrals vs. zero, thereby perhaps contributing to the Wikimedia movement. Further, I would say that such benevolent users don't need referring page data, since they are perfectly capable of searching Wikipedia for their links and running pageviews tools on the pages that link to them.
Conversely, I think the hostile actor problem is a real one, if a site is seeking to either track individual users invasively or censor Wikipedia (Turkey). However (contrary to opinions expressed above), I think the solution is to blacklist hostile webhosts and act as a silent referrer to them, rather than whitelist sites we want to offer clear statistics to. In the hostile nation-state case, this could mean becoming a silent referrer to an entire top-level domain. I leave it to others with more technical skills to do that work, but I don't think silencing all our external referrals is a smart first step to address that problem.--Carwil (talk) 19:11, 15 June 2017 (UTC)
Comment – How would one of options receiving unanimous support affect external websites covering Wikipedia, like Wikipediocracy, especially if a user clicks a link to that website from the article about it? --George Ho (talk) 02:01, 13 June 2017 (UTC)
- It would have zero effect unless Wikipediocracy is secretly trying to gather information on which pages Wikipedia readers have read or on how effective some spam campaign is, in which case being a silent referrer would Foil Their Evil Plans. :)
- We know that being a silent referrer will not hurt Wikipediocracy, because we were a silent referrer to HTTP sites such as wikipediocracy.com for five years, from 2011 to February of 2016. Wikipediocracy is probably the #1 site most likely to complain if Wikipedia does something to hurt them. --Guy Macon (talk) 16:09, 16 June 2017 (UTC)
Thanks to @Guy Macon: for pointing out a use case where this referrer information could be materially harmful to someone. There are clearly other cases where having this referrer data (typically in aggregate form) would be beneficial, notably:
- Partner institutions and others who have chosen to make their content freely available discover that doing so is bringing traffic to them.
- External institutions discover that Wikipedia is a major driver of readership to their sites and choose to reorient their content in way that is more freely available.
- Paywalled sites discover that large numbers of readers come from a free-content site, shifting conversations about the expectations of their readers to access content.
- Data like that presented here shows academic content providers that the free-content Wikipedia community is a central audience for them, prompting shifts towards open access at various levels.
I come to this conversation as both a Wikipedia and an academic aware of numerous ongoing conversations about the future of academic research and whether or not it will be open access. Individual scholars, academic associations (like the American Anthropological Association), funding agencies (the National Institutes of Health and the National Science Foundation) are all discussing whether and how research might be made more publicly accessible. One compelling argument in that process is the fact that open access content is read more, and read more widely; and this referrer data makes that argument more compelling. We're really talking about whether other arenas of content, including academic knowledge production, orient themselves around free and open sharing sites like Wikipedia or around various fee-for-service models.
So, can we address the negative cases, while continuing to reap the aggregate benefits? Some mechanism might be
- Silence referrals on particular categories of Wikipedia pages.
- Create a silent referrals user preference.
- Blacklist surveillance-oriented destination sites to receive only silent referrals.
Fundamentally, there are real tradeoffs here, and I believe the Wikipedia community should recognize that.--Carwil (talk) 17:57, 17 June 2017 (UTC)
- I agree that there are real tradeoffs here. I simply disagree that any of the cases where sending referrer data could be beneficial are even 1/1000 as important as protecting our users from governments that will imprison, torture, or kill based upon what Wikipedia pages they read. I don't think they are 1/100 as important as protecting our readers from marketers who would love to add "favorite Wikipedia pages" to the information they gather and sell about us. I don't think they are 1/10 as important as making the job of Wikipedia spammers more difficult. And, it appears, the vast majority of those who have responded to this RfC agree.
- That being said, even though the cases where sending referrer data could be beneficial are not as important as user privacy or inconveniencing spammers, they are important, and we should provide that information to them if at all possible without violating user privacy.
- So, does your suggested solution above meet both needs? No. It does not.
- We know that only a tiny percentage of Wikipedia users will opt out of sending referrer information, and besides, every major browser already has that capability.
- We know that we lack the manpower to maintain a blacklist of all surveillance-oriented destination sites, and we know that the vast majority of high-traffic sites are surveillance-oriented. Facebook doesn't spend millions of dollars giving you a website for free. No, they sell information about you. You aren't the customer, you are the product being sold.
- A blacklist does nothing to address the one scenario that gets people imprisoned, tortured, or killed, which is a government accessing the logs of a website through a court order.
- The first time some drug dealer ends up in a US federal court and it is revealed that part of the prosecution's case against them includes the fact that they accessed our pages on Clandestine chemistry and Rolling meth lab -- information that the police obtained only because we send referrer information -- the shit will hit the fan, and this RfC will be extensively quoted in the press. All the more so if the Government is China and the Wikipedia page is Falun Gong.
- No, blacklisting is not the answer. Whitelisting is the answer. Unlike spammers and governments looking to find dissidents, the use cases you list above are few in number, easy to identify, and mostly eager to cooperate with us. And, if it turns out that we cannot give them the information they want without compromising our user's privacy, then too bad. They will simply have to do without. --Guy Macon (talk) 19:16, 17 June 2017 (UTC)
- If our goal is to protect Wikipedia users from state monitoring and surveillance at the external links they click on, then silencing referrals for all outbound links is a very blunt tool to achieve that goal. We're talking about two presently hypothetical cases where either (1) "thoughtcrime"-level content on Wikipedia (e.g., Falun Gong page for an aggressive PRC prosecutor) links to innocuous external content, but someone gets traced to reading it; or (2) Wikipedia searches are used to provide evidence of criminal intent (Bomb-making, clandestine chemistry, rolling meth lab). Now, it's far more likely that an interested user in any of these cases clicks through to also objectionable content, and far more damning for them criminally. If we're worried about thought-crime and criminal-intent-through-reading for the user who is on these pages, we should be worried about the same thing when they click on non-innocuous content. They would be even more endangered for those clicks.
- Also, even when we silently refer people to pages, those servers get to have the IP address, location, and system fingerprint of the user that connects to them. (Unless browser or Tor-like countermeasures are taken). So the protection silent referral offers is limited at best. If there's a state actor capable of demanding server logs, they could easily request both the destination website's server logs and the corresponding IP address's ISP logs. Again, silent referral isn't so protective.
- This suggests two possible response. One, which I mentioned before: silence referrals on pages associated with repressible activities, including the pages we've been discussing. Create a by-page parameter to silence those referrals. Two, and this would take rather more work, use Wikipedia to advertise the privacy risks associated with external links. Every "external links" subhead could include a [your privacy on external links] link in the banner, offering users the chance to understand how Wikipedia treats browser and user data differently than other websites, with click-through links to countermeasures that users can take (including a user preference to silence all referrals, but more importantly on how to anonymize their traffic). That might actually keep people from being jailed or tortured in a way that simply setting a default silent referral never could.--Carwil (talk) 21:42, 18 June 2017 (UTC)
- So, since we can't fully protect our users, we should give up? I agree that a by-page parameter, especially a cookie-based (and sufficiently anonymised!) one for unregistered users would raise awareness of ad surveillance methods, but it's a lot of extra work, and it requires users to take a poorly informed stance on a fairly open-and-shut question (how could forwarding referrer to non-sponsor sites be of any use to the reader?). I can't imagine this could ever be as effective as even an unadvertised low-key switch to silent referrer. As for disclaimers, any smoker will tell you how much they're worth. Daß Wölf 23:39, 18 June 2017 (UTC)
Comment – I have two more questions. How will this RfC affect search engines, like Bing and Google? Also, how will it affect traffic statistics of Wikipedia articles, like Pageviews Analysis tools? --George Ho (talk) 03:50, 18 June 2017 (UTC)
- Zero effect. The WMF (without asking us what we think) only started sending referrer information to all sites we link to in February of 2016, and Pageviews Analysis, Bing and Google worked just the way they work now before the WMF made that change. They will still work just fine after we become a silent referrer again. --Guy Macon (talk) 04:11, 19 June 2017 (UTC)
Response to "Review of the change in terms of privacy concerns"
[edit]This is a response to Wikipedia:Village pump (policy)/Archive 126#Review of the change in terms of privacy concerns.
Expanding on the responses to User:Astinson (WMF) by User:Rich Farmbrough and myself on that page, I would like to add the following;
(I will sign each point to make it easy to respond to it inline.) --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
- The claim was made "We did a review of the concerns you raise with both the WMF Security and Legal teams in light of the implemented change". Why have we not heard from WMF Security or WMF Legal directly? Instead we have a second-hand paraphrase by The Wikipedia Library Projects Manager. Astinson has a bias in the direction of putting the needs of the Wikipedia Library above the privacy of Wikipedia users. In particular, it is extremely implausible that WMF legal said anything other that "sending referrer information or being a silent referrer are both perfectly legal". If one of the options we are discussing has legal issues, we would have heard from WMF legal directly by now. --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
- "Why have we not heard from WMF Security or WMF Legal directly" - I work on the security team. I have directly commented on this RFC. How much more direct do you want? BWolff (WMF) (talk) 10:53, 19 June 2017 (UTC)
- My apologies. You gave no indication in your comment that you work on the security team and I haven't looked at the user pages of everyone who has commented. --Guy Macon (talk) 14:05, 19 June 2017 (UTC)
- No worries. I work for the foundation and I can't keep track of whose in what role half the time. BWolff (WMF) (talk) 15:56, 19 June 2017 (UTC)
- The claim was made "This change does not change HTTPS in any way for users while within our sites". This is true, but not relevant. At the top of this RfC is the clear statement "What this RfC is not: This RfC is not about links to other Wikipedia pages or to other projects that are under the control of the Wikimedia foundation. It is assumed that, as far as possible/practical, other WMF sites will receive as much information as they want to receive". --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
- The claim was made "We're not in any way changing the nofollow policy, so there is no net benefit for search engine optimization or other traffic generation strategies" the first part is mostly true, but the second part is demonstrably false. Nofollow only helps with traffic generation strategies that depend on search engines. Inserting a spam link on a Wikipedia page is itself a traffic generation strategy. The only issue is whether we want to give the spammer feedback on how his Wikipedia spam strategy is working. This is extremely valuable information to the spammer. Is it better to spam a high-traffic Wikipedia page and risk your spam link being quickly removed, or is it better to spam multiple a low-traffic Wikipedia pages in the hope that nobody will notice? Being a silent referrer denies the spammer this information. --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
- The claim was made "From a privacy evaluation perspective, organizations or individuals with enough sophistication to maliciously track referral data from URLs on Wikipedia, could also track the SSL interactions for the same exchange".
- This confuses several distinctly different eavesdropping scenarios.
- Consider the case where I, a Wikipedia user, read our page on Bomb-making instructions on the internet and then click on the link to Feinstein Amendment SP419 at Cornell university.
- Scenario 1: An eavesdropper who I will call "Eve" is monitoring my internet usage (this would require a court order to my ISP, but is certainly doable). Eve sees that I accessed Wikipedia and then accessed www.law.cornell.edu, but does not know that I accessed t SP419 page on cornell.edu and does not know that I accessed our bombmaking page. Wikipedia has 8,665 links to www.law.cornell.edu. All of this remains true no matter what we set our referrer policy to. --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
- Scenario 2: "Eve" is monitoring Cornell university's internet usage. Eve only sees that I accessed www.law.cornell.edu, but does not know what page I accessed and knows nothing about me accessing Wikipedia. All of this remains true no matter what we set our referrer policy to. --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
- Scenario 3: "Eve" gets a court order giving them access to Cornell university's server logs or simply hacks the server to get the logs. Note that this has the advantage for Eve of revealing my previous visits to cornell.edu, not just the ones that happened after Eve started listening in. Eve know knows what page on cornell.edu I accessed (www.law.cornell.edu/uscode/text/18/842)
- Now it gets interesting. If we are a silent referrer, the logs at cornell.edu will not contain any information about me accessing Wikipedia. If we send domain-only referrer information, the logs at cornell.edu will say that I came to the www.law.cornell.edu/uscode/text/18/842 from wikipedia.org, and it turns out that the only link to www.law.cornell.edu/uscode/text/18/842 on Wikipedia is from our Bomb-making instructions on the internet page. It is, however, linked to by many other websites. So, by sending referrer information, we just turned Eve knowing that I accessed the text of the Feinstein Amendment SP419 -- a perfectly innocent act in itself – to eve knowing that I accessed it while reading the Wikipedia bomb-making instructions on the internet page. --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
- Reading through the example articles you've highlighted, I find
- Falun Gong: 126 external links about the topic, 9 external links to "innocuous" other material
- Bomb-making instructions on the internet: Roughly 15 external links about violence or bombmaking; 16 external links to other material
- Clandestine chemistry: 4 external links about clandestine chemistry; 19 external links to other material.
- It's useful to silence referrals on all pages like these (connected to thought crimes or useful for establishing criminal intent). But silencing referrals doesn't help the people who clicked on the relevant external links, just the irrelevant ones. They may still be jailed for clicks that, but for Wikipedia, they never would have made.--Carwil (talk) 14:24, 20 June 2017 (UTC)
- Reading through the example articles you've highlighted, I find
- The claim was made "Malicious link spammers could discover which pages editors are coming from anyway by creating unique URLs that signal their source." This is a classic case of the Nirvana fallacy. Burglars could get into my house and steal my possessions by picking a lock or smashing a window, but that doesn't imply that I should store my valuables on my front lawn with a big "please steal me" sign on them. After we stop giving the spammers the referrer information they need to get better at spamming us, I fully intend to spearhead a multi-faceted effort to get rid of any such unique URLs on Wikipedia. Plus, it is a demonstrable fact that most spammers are stupid, and if we only interfere with the 99% of spammers who lack the sophistication to create a unique URL, we will have done a good thing. --Guy Macon (talk) 07:58, 17 June 2017 (UTC)
Another way that spammers benefit from referrer information
[edit]Key quote: " It appears that the compromised sites are examining the referrer and redirecting visitors..."
While the exact method that worked with Google would not work here, a trivial modification of it would:
- Create a legitimate-looking site with legitimate content. Make it appear to be exactly the sort of reliable source that Wikipedia is looking for.
- Wait until some Wikipedia editor uses it as a citation, using the handy referrer information we provide to know when that happens.
- Stay legit-looking until the initial traffic (as seen by looking at the referrer information) dies down. This is so that the original editor and any new page patrollers will see a legit-looking page when they check the link.
- Switch it over to the spam site.
- Profit!
--Guy Macon (talk) 05:21, 19 June 2017 (UTC)
- Why would you need referrer for that attack? It makes more sense for the attacker to switch it for all users. BWolff (WMF) (talk) 11:11, 19 June 2017 (UTC)
- The referrer tells the attacker when someone used his real-looking fake source in a citation on Wikipedia. He will have already attempted to get as many other sites to link to it as possible -- that would be an essential part of making it easy for the Wikipedia editor to find -- but the referrer tells him when he starts getting hits from Wikipedia instead of those other sites.
- Also, the referrer makes it a lot easier for him to determine when Wikipedia editors and new page patrollers have finished with the page in question and that new hits are from Wikipedia readers.
- Nothing is perfect, and a clever spammer can find other ways to properly time his legit-to-spammy switchover to avoid detection, but sending the referrer makes it a lot easier on him. --Guy Macon (talk) 14:15, 19 June 2017 (UTC)
General technical comments
[edit]Please comment on the technical aspects of this proposal, not on what our policy should be. Misplaced comments may be may be moved to the proper section by any user.
Comment: Allowing the domain in the referrer (en.wikipedia.org) does not protect the Wikipedia reader from the target site figuring out what Wikipedia page they were on when they clicked the link. Most sites only have a few links, and we provide a linksearch tool that tells anyone who asks exactly where those links are. Only being a silent referrer protects against this. --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
- You convinced me with that. Indeed, even just the wikipedia.org is too much -- because if a malicious bureaucratic entity wants to punish people for reading a Wikipedia page, they'll put their honeypot on that one page only. Indeed, they may go a step further and have no other link to it on the web, in which case omitting referer is no protection -- except, they don't absolutely know that nobody took the text and copied it out of context to a forum, or copied just the link, and I would hope some quick post facto muddying of the waters by supporters of the political prisoners (i.e. by putting those links where they would appeal to government supporters, say, in a context where it is hard to get a firm date for when they first appeared) could leave the prosecutors very uncertain whether they have nabbed the people they want. Wnt (talk) 23:42, 11 June 2017 (UTC)
- Excellent analysis. If a silent referrer policy is implemented, we should work together to create a Wikipedia help page explaining this and giving everyone detailed instructions on how to obfuscate any such honeypot efforts as efficiently as possible. And of course we at Wikipedia should do whatever we can to protect our readers, even if we know that our countermeasures are not perfect. Being a silent referrer is part of that, but we should be thinking hard about what more we can do. I just put that on my calendar to revisit after this RfC closes. The comment by PaleoNeonate about removing UTMs that they posted at he bottom of the "query strings and similar tactics" section is a good example of something else we can do. --Guy Macon (talk) 00:08, 12 June 2017 (UTC)
Comment: Twitter has figured out a way to hack the default HTTPS behavior, See Hacking HTTPS -> HTTP referrers --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
Comment Given that it's the browser that decides to send the referrer information, I think it would be more fruitful to lobby your favourite browser vendor to include a mechanism to disable sending this info. For example, Firefox has the network.http.sendRefererHeader configuration setting and various other settings starting with network.http.referer that allow you to control if and how the referrer information is sent (see [19] for more information). Firefox and Chrome both seem to have addins that allow you to control when the referrer data is sent. Perhaps the browser you use has equivalent functionality available. isaacl (talk) 03:48, 2 June 2017 (UTC)
- Already done. As far as I can tell, every modern browser obeys the Meta Referrer tag, so Wikipedia can decide not to send referrer information for all users without them having to make a special configuration change. The key point here is that the Wikimedia foundation decided to send information to every website that we link to containing information that allows the site to figure out what Wikipedia page the user was reading, and they did it without asking us if that is what we want done. Shouldn't that be our decision as a community to make?
- My post was regarding the need for a policy, so not sure why it was moved to this section. I think the user should control the decision on what the browser should send, as this can be applied generally to all web sites. Thus I'm not convinced of a need in a change in policy. Of course, any editor or group of editors are free to put forth a proposal for changing the meta tags sent, as is being done here. isaacl (talk) 14:02, 2 June 2017 (UTC)
- I think it is obvious that not every user (alas!) is going to download Firefox, nor change these settings. Not all users are ever going to know this. Our question is what are our responsibilities. I mean, it's like making nitroglycerin. Every kid in an intro college chem class should know not to make nitroglycerin during the lab exercise. But if one of the eager students happens to do it anyway? Then people go to the college and say why didn't you stop this? It's our responsibility to try to prevent dangers to our readers -- though this is subordinate to our defining purpose to collect and disseminate knowledge, even when it is knowledge proscribed by some regime. Wnt (talk) 23:49, 11 June 2017 (UTC)
- Let me put it another way: I don't see a need for the level of consensus required for a policy to be agreed upon, since users can ultimately decide for themselves what gets sent, so a change to the server configuration does not inhibit their ability to send the information they prefer. But given that the primary issue for the WMF appears to be the desire to provide referrer data for partners, then certainly the more support collected to support a change to the meta headers, the better. isaacl (talk) 01:18, 12 June 2017 (UTC)
- Maybe there is no need for a policy-level RfC, but "users can change it" is certainly not an argument to that effect. If the WMF decided that all logged-in editors who did not tick a box in the preferences get some javascript that makes their computer mine Bitcoin for Jimbo's wallet, I would certainly expect a lot of opposition even if it is opt-out. TigraanClick here to contact me 16:15, 12 June 2017 (UTC)
- To put it a third way: it's unusual for a website to enact a policy to enforce what the user's browser sends, since it can't control what happens on the user's side. But the distinction doesn't really matter for the purposes of this discussion. isaacl (talk) 00:36, 13 June 2017 (UTC)
- It's not unusual (see Twitter example above) and control is possible in this case via <meta> tags and other tricks. Of course, the user's browser can choose to ignore all that stuff, but in most cases it doesn't, and most users are too ignorant to change it or even care about it. We can't expect everyone to be an expert on everything, and especially so in an area dominated by advertising empires (who are responsible for some of the top browsers of today: Chrome, Edge, IE...) Daß Wölf 01:26, 13 June 2017 (UTC)
- What Wölf said. We don't want to "control" the browser behavior (and we have no way to do that), but we can encourage it to behave in a certain way. If the meta tag is going to be ignored by most browsers, or if it should in general not be used (for a reason yet to be specified), one wonders why it was introduced, really. TigraanClick here to contact me 09:35, 13 June 2017 (UTC)
- I'm not saying it is unusual for a website to send meta headers or to forward links through a redirection mechanism; I'm saying it is unusual to establish a policy (in the Wikipedia sense of the policies that can be enacted by community consensus) for its users which cannot be enforced. We aren't going to sanction users, for example, for what their browsers send. However I am in agreement with all of the discussion above that the community can certainly develop a view on what the web servers should be sending, regardless of what this view is labelled. isaacl (talk) 12:47, 13 June 2017 (UTC)
- It's not unusual (see Twitter example above) and control is possible in this case via <meta> tags and other tricks. Of course, the user's browser can choose to ignore all that stuff, but in most cases it doesn't, and most users are too ignorant to change it or even care about it. We can't expect everyone to be an expert on everything, and especially so in an area dominated by advertising empires (who are responsible for some of the top browsers of today: Chrome, Edge, IE...) Daß Wölf 01:26, 13 June 2017 (UTC)
- To put it a third way: it's unusual for a website to enact a policy to enforce what the user's browser sends, since it can't control what happens on the user's side. But the distinction doesn't really matter for the purposes of this discussion. isaacl (talk) 00:36, 13 June 2017 (UTC)
- Maybe there is no need for a policy-level RfC, but "users can change it" is certainly not an argument to that effect. If the WMF decided that all logged-in editors who did not tick a box in the preferences get some javascript that makes their computer mine Bitcoin for Jimbo's wallet, I would certainly expect a lot of opposition even if it is opt-out. TigraanClick here to contact me 16:15, 12 June 2017 (UTC)
- Let me put it another way: I don't see a need for the level of consensus required for a policy to be agreed upon, since users can ultimately decide for themselves what gets sent, so a change to the server configuration does not inhibit their ability to send the information they prefer. But given that the primary issue for the WMF appears to be the desire to provide referrer data for partners, then certainly the more support collected to support a change to the meta headers, the better. isaacl (talk) 01:18, 12 June 2017 (UTC)
- I think it is obvious that not every user (alas!) is going to download Firefox, nor change these settings. Not all users are ever going to know this. Our question is what are our responsibilities. I mean, it's like making nitroglycerin. Every kid in an intro college chem class should know not to make nitroglycerin during the lab exercise. But if one of the eager students happens to do it anyway? Then people go to the college and say why didn't you stop this? It's our responsibility to try to prevent dangers to our readers -- though this is subordinate to our defining purpose to collect and disseminate knowledge, even when it is knowledge proscribed by some regime. Wnt (talk) 23:49, 11 June 2017 (UTC)
- My post was regarding the need for a policy, so not sure why it was moved to this section. I think the user should control the decision on what the browser should send, as this can be applied generally to all web sites. Thus I'm not convinced of a need in a change in policy. Of course, any editor or group of editors are free to put forth a proposal for changing the meta tags sent, as is being done here. isaacl (talk) 14:02, 2 June 2017 (UTC)
Comment: Is a phabricator task needed to suppress an attempt to send information to an external website? Seems that option 5 is having an unanimous support right now. --George Ho (talk) 03:27, 12 June 2017 (UTC)
- A phabricator task would be a much later step. One can't just post a phabricator task about a major policy change and expect it to be implemented. If this RfC ends up the way it is going so far, we are going to have to get the WMF to agree to a no-referrer policy. After that the phabricator task is just paperwork.
- Please don't jump the gun and attempt to make this happen. That will just muddy the water. After the RfC closes I will create a page where we can coordinate our efforts, choose who to make the request and where to make it, carefully edit exactly what we will be asking for, etc. --Guy Macon (talk) 06:07, 12 June 2017 (UTC)
- Okay. I'll await the upcoming project page then. :) --George Ho (talk) 06:17, 12 June 2017 (UTC)
Comment about Question 6: If I understand this question, this suggests that click statistics would be aggregated and processed by the foundation servers, then public reports issued, as an alternative to allowing HTTP-Referer. If so, this implies that at the implementation-level, every reference or external link would have to go through a redirector URL. The foundation servers currently have no way to know this information (as no redirector is currently being used). They only know which foundation resources and pages/articles we visit. —PaleoNeonate – 05:18, 23 June 2017 (UTC)
HTTPS
[edit]Comment: The fact that Wikipedia uses HTTPS and whenever possible uses HTTPS links makes some of the listed options impossible using current technology (the technology may change and this restriction may change in the future). In particular, when you click on a link from Wikipedia (Which is HTTPS) to an HTTP site, the default "no referrer when downgrading from HTTPS to HTTP" policy protects you against someone monitoring your connection (someone intercepting your WiFi, your ISP, the police...).
The default HTTPS behavior means that the eavesdropper only knows that you visited en.wikipedia.org, not what pages you went to. Without the default HTTPS behavior, when you click on a HTTP link and leave Wikipedia, the Wikipedia page you were on would be sent in an unencrypted HTTP packet to the new site, and the eavesdropper would see it. (This appear to be incorrect per [20]. See section 3.2 for details)
The default HTTPS behavior means that an eavesdropper monitoring you when you click on an HTTP link sees nothing in the header that shows what page you were on when you clicked the link. If you click on an HTTPS link the eavesdropper sees nothing – it is all encrypted -- but the website owner sees the full URL of the Wikipedia page you were on when you clicked the link. --Guy Macon (talk) 15:00, 1 June 2017 (UTC), Modified 18:22, 11 June 2017 (UTC)
- But also remember.. It's pretty easy to draw out which IP addresses you connect to begin with. So even when you have https with all the referrer protections you could think of, computer models can be built to follow your path of connections you make (and how often). Unless many websites share IP addresses between eachother or you are using TOR, you can be tracked / data mined by ISPs or the government. (actually even with TOR, this tracking was applied in the past by governments using a bug in TOR I seem to remember.. It was speculated that this type of profile building was even used to identify TOR users by matching two similar profiles). I'm mentioning this, because while yes, some of the referrer information can be used to identify you, but at the same time very similar OTHER information will remain even if you hide some more of this referral information. And that is an important element to this consideration. —TheDJ (talk • contribs) 16:01, 1 June 2017 (UTC)
- We are talking about different threats. Consider the example I gave in the overview: someone reading the Wikipedia page at Bomb-making instructions on the internet#References clicks on the link to The Low Cost Cruise Missile: A looming threat?, but this time assuming that both sites are HTTPS and wikipedia does its best to be a silent referrer. What does an eavesdropper know? For convenience, I will call the user "Alice", the person who controls the external site "Bob", and the eavesdropper "Eve".
- If Eve is monitoring Alice specifically, you are correct in that Eve knows which IP addresses Alice connects with, and thus knows that she accessed Wikipedia.
- If Eve is monitoring Bob, She knows that Alice visited Bob's site, but she knows nothing about Alice visiting Wikipedia, and neither does Bob.
- Now assume the same scenario, still using HTTPS, but with Wikipedia sending domain-only referrer information. Now Bob (but not Eve) knows that Alice came to his site from the English Wikipedia, and a quick check with the link search tool we provide will show that the only two articles on Wikipedia with that link on them are are Bomb-making instructions on the internet#References and Bruce Simpson (blogger)#DIY Cruise Missile If if Bob gets a court order or his servers get hacked, other people know what Wikipedia page Alice visited.
- It is my contention that the users of Wikipedia should have a say about whether some site in New Zealand is sent information about what Wikipedia pages they read. --Guy Macon (talk) 19:14, 1 June 2017 (UTC)
- So your concern is the leaking of referral information from, for instance the hacked server's (Bob) access logs and then reverse engineering that information back towards the behavior of Alice... I guess that's a possibility... Although again, i would argue that domain information is already leaked and that as you point out, page level information can be reverse engineered with the link search tool from that. I would like to point people to these extensions btw: Referrer Control for Chrome or Firefox. — Preceding unsigned comment added by TheDJ (talk • contribs) 20:56, 1 June 2017 (UTC)
- It is my contention that the users of Wikipedia should have a say about whether some site in New Zealand is sent information about what Wikipedia pages they read. --Guy Macon (talk) 19:14, 1 June 2017 (UTC)
Meta Referrer Tag
[edit]Comment: Including a Meta Referrer Tag in the <head> section of a Wikipedia page allows us to specify what information is sent when people follow links to any other site, whether it uses HTTPS or HTTP. This means that we can specify referrer behavior that HTTPS normally would not allow. --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
Rel=nofollow on links
[edit]Comment: Adding rel="nofollow" to external links instructs search engines to ignore the link when ranking of the link's target in the search engine's index. It makes adding spam links to Wikipedia less effective, but otherwise does not have any effect on the referrer information. --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
- No: using "nofollow" also conflicts with other standards, for example
rel="me"
. And it punishes non-spam sites as much as spammy ones. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:40, 2 June 2017 (UTC) - Rel=nofollow doesn't "punish" anyone, it just refrains from assisting them. We don't owe such assistance to anybody and we should not provide it. There were two RFC's back in the day that both concluded in favor of nofollow, reportedly because the discussions were manipulated by SEO's. Eventually nofollow got activated when there was some kind of SEO contest that worsened our linkspam problem to ridiculous levels (even before that contest, it was much worse than it is now). Activating nofollow was of tremendous benefit and while I acknowledge being avant-garde about this topic, I think we should activate noindex as well (that says: block the entire encyclopedia from external search engines forever) and enjoy the screams of SEO's as they tear their own hair out all over the world. 173.228.123.121 (talk) 07:16, 12 June 2017 (UTC)
"we should not provide it"
that's a bold statement – or should I say "opinion". If we find a source useful enough to cite it, or add it to an 'External links' section, we should allow it the crumb of benefit that accrues from us doing so; that's simple courtesy. As for past RfCs, we're not beyond seeing the error of our ways, and WP:CONSENSUSCANCHANGE applies. The "tremendous benefit" you claim sadly threw that baby out with the spam bathwater. To suggest that we"block the entire encyclopedia from external search engines forever"
beggars belief. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:12, 12 June 2017 (UTC)
Query strings and similar tactics
[edit]Comment: It is possible for a spammer who inserts a link to create a unique URL. this would let them identify the exact page that the incoming link was on no matter what we do with the referrer. There are multiple ways that this can be done, including
- www.example.org?ID=myevilpointer
- myevilpointer.example.org/
- www.example.org:3717/
- www.example.org/reasonable_page/my_evil_pointer/
Some of these could be fixed by a robot looking for such links. A simple robot could mass-remove all query strings in external links, but some links do not function without them. A more sophisticated robot could remove them only in cases where the page is identical with and without the query string, but so far Wikipedia has not shown itself to be technically advanced enough to create such a robot. --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
- This does not seem simple at all to me really... The false positives would likely be gigantic. —TheDJ (talk • contribs) 16:09, 1 June 2017 (UTC)
- Sorry for being unclear. What I was trying to convey is that the problem is complex and that a simple robot most likely would not be able to solve it. It would be a fair amount of work, but I believe that I could create a program that solves the problem with a minimum of false positives (the key to doing that is doing a byte-by-byte comparison and only removing the suspected evil pointer in cases where the page is byte-for-byte identical after the change to the link), but I wouldn't want to try using anything other than C/C++. Some of the languages listed at #Programming languages and libraries seem to be poorly suited for such a task, but to be fair some of the others have a good reputation -- I am simply not familiar enough with them to know whether they could handle such a task. --Guy Macon (talk) 20:30, 1 June 2017 (UTC)
- Even if a robot could solve it, do you really trust that the pages in question wouldn't be able to detect it, and serve the robot different text? —Cryptic 21:40, 1 June 2017 (UTC)
- Anything having to do with spammers is an arms race. Every measure generates a countermeasure, witch in turn generates a counter-countermeasure. Along the way you lose the dumber and less skilled spammers, but nobody has found the magic bullet that stops all spam forever. --Guy Macon (talk) 01:23, 2 June 2017 (UTC)
- I'm not sure if there's already some robots checking for this, but by routine among citation cleanup tasks if I see unnecessary tracking information in urls I remove it (other than what is needed to reference to a page or relevant query of course). I remember reading that it was discouraged, but I now fail to find it quickly via Wikipedia:Citing sources#Citation style. It may be a good idea to make this more obvious if that is considered good practice to avoid. —PaleoNeonate – 06:50, 3 June 2017 (UTC)
- I made a note on my calendar to get back to this after this RfC closes. We should definitely have an easy-to-understand help page explaining how spammers create unique URLs and how to tell them from legitimate uses of things like query strings. Thanks for suggesting it! --Guy Macon (talk) 23:33, 10 June 2017 (UTC)
- Also a side note: we do remove UTMs which are related to the above, Primefac has also successfully obtained permission for a PrimeBot task to remove those, for instance (and thanks for working on this, Primefac). —PaleoNeonate – 02:04, 11 June 2017 (UTC)
- I made a note on my calendar to get back to this after this RfC closes. We should definitely have an easy-to-understand help page explaining how spammers create unique URLs and how to tell them from legitimate uses of things like query strings. Thanks for suggesting it! --Guy Macon (talk) 23:33, 10 June 2017 (UTC)
- I'm not sure if there's already some robots checking for this, but by routine among citation cleanup tasks if I see unnecessary tracking information in urls I remove it (other than what is needed to reference to a page or relevant query of course). I remember reading that it was discouraged, but I now fail to find it quickly via Wikipedia:Citing sources#Citation style. It may be a good idea to make this more obvious if that is considered good practice to avoid. —PaleoNeonate – 06:50, 3 June 2017 (UTC)
- Anything having to do with spammers is an arms race. Every measure generates a countermeasure, witch in turn generates a counter-countermeasure. Along the way you lose the dumber and less skilled spammers, but nobody has found the magic bullet that stops all spam forever. --Guy Macon (talk) 01:23, 2 June 2017 (UTC)
- Even if a robot could solve it, do you really trust that the pages in question wouldn't be able to detect it, and serve the robot different text? —Cryptic 21:40, 1 June 2017 (UTC)
- Sorry for being unclear. What I was trying to convey is that the problem is complex and that a simple robot most likely would not be able to solve it. It would be a fair amount of work, but I believe that I could create a program that solves the problem with a minimum of false positives (the key to doing that is doing a byte-by-byte comparison and only removing the suspected evil pointer in cases where the page is byte-for-byte identical after the change to the link), but I wouldn't want to try using anything other than C/C++. Some of the languages listed at #Programming languages and libraries seem to be poorly suited for such a task, but to be fair some of the others have a good reputation -- I am simply not familiar enough with them to know whether they could handle such a task. --Guy Macon (talk) 20:30, 1 June 2017 (UTC)
- I clean such links up by hand when I come across them. I'd support some narrowly targeted and vetted bot operations to do it on a wider basis. 173.228.123.121 (talk) 07:18, 12 June 2017 (UTC)
- Generally this is a productive area to look at. I don't expect a silver bullet but I do expect a few shots to connect. I encountered the UTM parameters thing in action, which I'd never heard of previously, and was pleased to see that someone had taken a potshot back at the dismaler science. Though this is definitely not the most difficult instance.
- Where unique domain names are concerned, I think it might be worth doing some screening for "hapax legomenon" links. Most Wikipedia references come in multiples, so if we have something go through and find links that occur from only one page, odds are they are typos and they don't work anyway and a bot should check to see if they're 404 and tell the person who posted them. And if they do work, they may be spam or unreliable sources, and maybe they could go in a queue for volunteers to look through, thought obviously that last part is easier said than done. And among all those links, any "sting" or spam-testing links might be spotted by editors looking through. (Possibly, having editors look through on a volunteer basis also provides an alternate theory to why someone looked at the link other than that they were reading a proscribed page, but I doubt that would help them, since censorship and presumption of innocence rarely turn up in the same proceeding) Wnt (talk) 00:48, 19 June 2017 (UTC)
Server Name Indication
[edit]Comment: Server Name Indication may be able to leak some limited information about a Wikipedia session to an eavesdropper who is monitoring the SSL handshakes. I believe that an effective countermeasure to this would be becoming our own certifying authority. Because wildcarding is allowed *.wikipedia.org will work with all Wikipedias, *.wiktionary.org" will work with Wiktionaries, etc. --Guy Macon (talk) 15:00, 1 June 2017 (UTC)
- This seems to have little to do with the referrer discussion, but also, I think you are mistaken. As far as I remember, SNI is a client-side driven part of the negotiation and unless you modify your browser and it's TLS libraries, you cannot plug this leak of information. If this has you concerned, you should probably be using Tor. —TheDJ (talk • contribs) 15:48, 1 June 2017 (UTC)
- "SSL handshakes (part of the HTTPS encryption) *already* include our domain and the IP address of our servers (through Server Name Indication), so any passive observer of the HTTPS traffic could uncover this data with minimal effort. Source: Astinson (WMF),[21] Posted 23:09, 4 April 2016 (UTC) at Wikipedia:Village pump (policy), archived at Wikipedia:Village pump (policy)/Archive 126
- "SNI could be simply disabled by becoming our own certifying authority, and re-issuing certs as a standard part of rolling out new projects. This would be an improvement in security, but is not part of the issue I raised. (Note: Since wild-carding is allowed "*.wikipedia.org" covers all present and future wikipedias, "*.wiktionary.org" all Wiktionaries and so forth, so we would only need a new cert once in a blue moon anyway.)" Source: Rich Farmbrough[22] Posted 00:33, 5 April 2016 (UTC) at Wikipedia:Village pump (policy), archived at Wikipedia:Village pump (policy)/Archive 126
- BTW, I already use TOR. See my user rights list[23] and you will see that I am IP block exempt, allowing me to edit through TOR. --Guy Macon (talk) 16:21, 1 June 2017 (UTC)
- I think Rich misinterpreted that. While yes, that strategy avoids using SNI on the server side, a browser supporting SNI, will send you the SNI extension with the hostname information regardless. At least that is my understanding from the protocol. —TheDJ (talk • contribs) 20:35, 1 June 2017 (UTC)
- BTW, I already use TOR. See my user rights list[23] and you will see that I am IP block exempt, allowing me to edit through TOR. --Guy Macon (talk) 16:21, 1 June 2017 (UTC)
- Yes SSL/TLS is designed to be proxy-able (with proxy-generated alternate certificate signed by an authority trusted by the browser, often added when installing antivirus or other software). Because of this some information is visible in the handshake, this still is a different issue than passing information to the destination site when following a link from another (about the origin of that link). —PaleoNeonate – 06:58, 3 June 2017 (UTC)
So, bottom line, does becoming our own certifying authority help user privacy or not? Is it desirable/undesirable for other reasons? --Guy Macon (talk) 00:15, 12 June 2017 (UTC)
- I'd imagine it's not desirable for a pretty big reason: our root certificate wouldn't be in any of the browsers, meaning that nobody could access our sites without the big glaring "THIS SITE IS INSECURE" warnings. Yes yes, people could import our certificates but that's too much of a burden for our readers (nor can they all do this--think people at work, embedded browsers like smart TVs, etc...). FACE WITH TEARS OF JOY [u+1F602] 04:10, 12 June 2017 (UTC)
- Basically you don't just "become a CA" - you can just "self-sign" - but then you have all the problems 1F602 mentioned. — xaosflux Talk 04:34, 12 June 2017 (UTC)
- No, we are not talking about self-signing. We are talking about becoming a certificate authority.
- Becoming a X.509 Certificate Authority
- Normally when you configure a server to use TLS or SSL you have two choices; Either you pay someone like Verisign or Thawte to sign a certificate or you generate a self-signed certificate. However there is an alternative, which is to generate your own certificate authority or CA. … Which route you choose depends on your circumstances and why you need a certificate. For a large public service like an e-commerce website, you’ll want a certificate signed by an established trusted root CA, who, like Verisign, have their root keys bundled with web browsers and operating systems. This allows anyone to trust your server is the server it claims to be and traffic is encrypted, without having to install any additional certificate. The downside to this is the cost of getting a certificate. At the time of writing, Verisign were charging $2,480.00 USD for a 3 year 128bit certificate. Source:[24]
- We can cover *.wikipedia.org with one CA, and *.wikimedia.org with another. We may choose to not do this for wiktionary.org, wikisource.org, wikidata.org, etc. --Guy Macon (talk) 05:55, 12 June 2017 (UTC)
- Yeah real handy. Or if you get it signed, then you are just another intermediate CA, just with less expenses on issuing new certificates. And again, it doesn't change anything regarding the leak of the hostname information, as certificates are exchanged and checked AFTER the client has initiated the SSL/TLS handshake which contains the SNI host information.[25] —TheDJ (talk • contribs) 22:49, 12 June 2017 (UTC)
- We can cover *.wikipedia.org with one CA, and *.wikimedia.org with another. We may choose to not do this for wiktionary.org, wikisource.org, wikidata.org, etc. --Guy Macon (talk) 05:55, 12 June 2017 (UTC)
- Motion to close this subsection Is there any reason to discuss this here? AFAICT this has zero relevance to the referer policy. Beware of WP:TRAINWRECK. TigraanClick here to contact me 09:42, 13 June 2017 (UTC)
- Are you sure that Rich Farmbrough is wrong? See Wikipedia:Village pump (policy)/Archive 126#Review of the change in terms of privacy concerns. --Guy Macon (talk) 12:24, 13 June 2017 (UTC)
- I am inclined to agree that this is a little off the track of silent referer. As far as the technology is concerned, it is true that, if we provide a solution that renders SNI un-needed, that does not mean the client software won't use it. The solutions we could provide include subjectAltName if we were self-certifying (or prepared to spend enough) – either by chain-of-trust, which I haven't kept up on, or by becoming a certifying authority, or by other means – or by having a mapping of domains to IP numbers. In any event this would allow readers to choose a stack that deprecates SNI, and regain some privacy (depending on the solution) while still accessing our projects: doing nothing denies them this possibility. All the best: Rich Farmbrough, 13:01, 13 June 2017 (UTC).
- Just found this description which seems to be rather accessible explanation. —TheDJ (talk • contribs) 14:35, 13 June 2017 (UTC)
This section has several misunderstandings about how TLS/web-pki works. Keeping in mind that this has really very little to do with referrer policy, I would like to point out we already use wildcard certs. If you click the little lock icon in your browser, and go to advanced information about the certificate, you will see under "Certificate Subject Alt Name"
Not Critical DNS Name: *.wikipedia.org DNS Name: *.m.mediawiki.org DNS Name: *.m.wikibooks.org DNS Name: *.m.wikidata.org DNS Name: *.m.wikimedia.org DNS Name: *.m.wikimediafoundation.org DNS Name: *.m.wikinews.org DNS Name: *.m.wikipedia.org DNS Name: *.m.wikiquote.org DNS Name: *.m.wikisource.org DNS Name: *.m.wikiversity.org DNS Name: *.m.wikivoyage.org DNS Name: *.m.wiktionary.org DNS Name: *.mediawiki.org DNS Name: *.planet.wikimedia.org DNS Name: *.wikibooks.org DNS Name: *.wikidata.org DNS Name: *.wikimedia.org DNS Name: *.wikimediafoundation.org DNS Name: *.wikinews.org DNS Name: *.wikiquote.org DNS Name: *.wikisource.org DNS Name: *.wikiversity.org DNS Name: *.wikivoyage.org DNS Name: *.wiktionary.org DNS Name: *.wmfusercontent.org DNS Name: *.zero.wikipedia.org DNS Name: mediawiki.org DNS Name: w.wiki DNS Name: wikibooks.org DNS Name: wikidata.org DNS Name: wikimedia.org DNS Name: wikimediafoundation.org DNS Name: wikinews.org DNS Name: wikiquote.org DNS Name: wikisource.org DNS Name: wikiversity.org DNS Name: wikivoyage.org DNS Name: wiktionary.org DNS Name: wmfusercontent.org DNS Name: wikipedia.org
Thus we already use wildcard certs, and SNI is not technically needed for most Wikimedia sites (atm. This could of course change in the future). It should be noted that some subdomains do not use that main cert (e.g. lists.wikimedia.org is on a different server and uses a different cert). However, this is a mostly moot point since you cannot disable SNI in your browser. Also disabling SNI really does not gain you any privacy because DNS queries go in the clear, and the destination IP address go in the clear. If you're sending a request to one of 208.80.154.224, 91.198.174.192, 198.35.26.96, 208.80.153.224, 2620:0:861:ed1a::1, 2620:0:862:ed1a::1, 2620:0:863:ed1a::1, 2620:0:860:ed1a::1 then its obvious your are visiting a Wikimedia website. BWolff (WMF) (talk) 22:16, 13 June 2017 (UTC)
- That is good news. Does *.wiktionary.org not cover *.m.wiktionary.org though? All the best: Rich Farmbrough, 11:07, 17 June 2017 (UTC).
- @Rich Farmbrough: You might be interested in this summary to a similar question on stackexchange. —TheDJ (talk • contribs) 09:20, 19 June 2017 (UTC)
The Alert Reader will have noticed that at the very top of this RfC I wrote "This RfC is not a discussion of the technical details regarding what is and is not possible using current technology, which may change. It is an RfC about policy, not implementation". and that every RfC question contains the words " As far as possible/practical".
Server Name Indication is an good example of why I wrote those qualifiers. The above discussion is all about how SNI behaves now, but how SNI behaves is likely to change in the near future. See [26].
As I write this, the overwhelming consensus of the Wikipedia community is to be a silent referrer with exceptions made for certain selected sites. The most common counter-argument has been "but they can still get information about our users using another method". This is a classic case of the Nirvana fallacy; burglars could get into my house and steal my possessions by picking a lock or smashing a window, but that doesn't imply that I should store my valuables on my front lawn with a big "please steal me" sign on them.
As we can see from the above IETF document, there are some very smart people working very hard to make it so that these other methods of finding out what pages our readers access no longer work. We just need to do our part, controlling what we can. In other words, once we make the policy decision " As far as possible/practical, referrer information should contain no information (silent referrer)", we should expect the Wikipedia developers to keep up with things like changes in how SNI behaves and respond appropriately to implement the policy we have chosen. --Guy Macon (talk) 07:37, 18 June 2017 (UTC)
- Really? You're citing a random ietf presentation from three years ago that contains a vague hand-wavy proposal to encrypt SNI as evidence ietf is planning to encrypt SNI? That's not how the world works. I don't follow the TLS working group but rumours I've heard is that there's no agreement on how best to make SNI secret, its certainly not going to happen in TLS 1.3, and its up in the air if it will ever happen. Now I say this as unsubstantiated rumours, so you shouldn't believe me – but at the same time, you should also not believe that a random hand-wavey presentation from three years ago has any bearing on what the future may hold. BWolff (WMF) (talk) 10:46, 19 June 2017 (UTC)
Provide total monthly referral numbers in aggregate to major sources
[edit](Comments moved from policy discussion section to technical discussion section) --Guy Macon (talk) 22:40, 23 June 2017 (UTC)
@JYtdog, Doc James, and TonyBallioni: I need to check, but I actually don't think we collect this data on our end at all, and would require significant engineering overhead to do so -- and, might create even more privacy concerns than the existing policy (we would have to collect and monitor even more specific data about these interactions). As our security team has explained to me, and I describe in our comment below: the referrer policy regulates an action taken at the individual's browser which an individual can control if they have privacy concerns. Moreover I am concerned because: this kind of metric is a non-standard way of providing this information, and would require significant work on the part of both volunteer communities and institutions to maintain -- and because its non-standard, would require institutions to understand our already opaque community processes. Astinson (WMF) (talk) 15:19, 23 June 2017 (UTC)
- User:Astinson (WMF) why would it be hard to, every time someone clicks on a who.int link, to add +1 to a who.int counter? No one would be collecting IP information so can you explain what privacy concern this would create? Doc James (talk · contribs · email) 22:28, 23 June 2017 (UTC)
- @Doc James: I confirmed with one of our engineers: we don't currently collect any information right now about whether or not a user in a session clicks on an external link. This means that our servers have no evidence of the person going offsite at all (apparently researchers at the Foundation only infer leaving the site, when their session goes silent). We would have to develop a strategy for collecting that information which includes: a) implementing a script on a click to external url that logs the information on one of our servers, b) anonymize that data, c) develop a new-data retention strategy that deletes those logs at a reasonable amount of time. At all of those points, you retain more information than we currently have on our servers that could be vulnerable to attacks or external information requests-- and does so with a relatively large- and non-standard way of tracking. Moreover, this additional risk on our end, doesn't significantly prevent the other kinds of vulnerabilities that I describe in our statement above (nor would a change in policy to complete referrer silence). Again I am not an engineer, so am interpreting how it was explained to me,Astinson (WMF) (talk) 22:55, 23 June 2017 (UTC)
- Okay thanks User:Astinson (WMF). This discussion is about sending nothing to other websites. This proposal may take a bit of work but IMO would provide better protection for our readers. The excess data could be deleted seconds latter. Doc James (talk · contribs · email) 23:00, 23 June 2017 (UTC)
- @Doc James: I confirmed with one of our engineers: we don't currently collect any information right now about whether or not a user in a session clicks on an external link. This means that our servers have no evidence of the person going offsite at all (apparently researchers at the Foundation only infer leaving the site, when their session goes silent). We would have to develop a strategy for collecting that information which includes: a) implementing a script on a click to external url that logs the information on one of our servers, b) anonymize that data, c) develop a new-data retention strategy that deletes those logs at a reasonable amount of time. At all of those points, you retain more information than we currently have on our servers that could be vulnerable to attacks or external information requests-- and does so with a relatively large- and non-standard way of tracking. Moreover, this additional risk on our end, doesn't significantly prevent the other kinds of vulnerabilities that I describe in our statement above (nor would a change in policy to complete referrer silence). Again I am not an engineer, so am interpreting how it was explained to me,Astinson (WMF) (talk) 22:55, 23 June 2017 (UTC)
- So, such a proposal would require collecting the data in some form or another. The most obvious way of doing so would be to either have an interstitial page that redirects to the link target, or use some sort of javascript implementation that pings the server on click. I believe google does something along these lines. Either way, that means actively collecting information on what pages our readers are viewing. In many ways this seems more of a privacy violation as such an implementation is unlikely to respect user choices in the same way as referrer does, since users can disable the referrer in their browser. This is in contrast with this proposal which requires us to actively obtaining the data, instead of passively relying on what browsers do by default. It also requires more trust in us, as the user now relies on us properly anonymizing things (We are hopefully a trustworthy bunch, but at the end of the day, the best privacy system is any system which minimizes the trust required in other parties). This also requires that our users trust us a lot more than in the referral case, since such data, if not properly anonymized, contains much more information about user habits than the plain referrer data does if it was matched up with other request data to our site. [This is just a rough comment about the potential downsides. If we become serious about this idea, it would need a more careful analysis than this one off comment. Furthermore, this comment is just my opinion and should not be taken as anything official] BWolff (WMF) (talk) 10:23, 27 June 2017 (UTC)
Not our culture
[edit]Comment: I think that Wikimedia has to be on guard against the syndrome of falling into line with other computer-intensive sites. (For example, comparing itself to Google and Facebook) Remember: they all exist to spy on their customers and make money by underhanded means. The WMF projects are something noble, something far removed from the use cases that characterize the rest of the web. For example, the WMF often cannot use their software, of course, because of licensing/copyright issues. It should also follow that the WMF sometimes has to transgress their spoken or unspoken cultural norms -- such as that everyone is joined together in a project to spy on and exploit the user. Because they are not always up front about their purpose, this may require some vigilance. Wnt (talk) 13:38, 11 June 2017 (UTC)
- Comment This whole discussion just makes me utterly sad. So much FUD, such a isolationist view.. When did we get this self obsessed ? —TheDJ (talk • contribs) 22:35, 12 June 2017 (UTC)
- It makes me sad to see that you are sad about our good-faith effort to balance user privacy against giving external websites information that they want to see. Might I humbly suggest not reading things that make you sad? If, by some chance, you are strapped to a chair with your eyelids tied open in front of a monitor showing a Wikipedia:Village pump (policy) feed with The Wikipedia Song blasting in the background, then let me address this message to your captors: First of all, keep up the good work. Secondly, please take away his keyboard. :) --Guy Macon (talk) 01:56, 13 June 2017 (UTC)
- The cure to Fear, Uncertainty and Doubt is to not have things to worry about, have a simple certainty to your life, and know that there is agreement on this. In other words, don't give worrisome pseudo-private information, don't wonder how it is (not) being used, and have an RFC to say so. Wnt (talk) 00:00, 15 June 2017 (UTC)
Comment. I've made a good faith effort to read the discussion, but it's too long to go through in detail, so I won't be !voting either way as I can't assert in good faith that I am doing so in full understanding of the issues. I am concerned by the comments about the value to Wikipedia Library relationships of the referrals, and suspect I could be persuaded to oppose if it those concerns are real. As it stands I'm afraid it feels like this RfC was drafted by someone with an hoped-for outcome in mind -- sorry, Guy, I'm sure you wrote it in good faith and didn't intend to have your thumb on the scales, but it does feel that way. I would prefer to see an RfC in which the benefits (if any) of the current scheme are laid out by someone who is a proponent, and the RfC puts more time into making the case for both sides. Mike Christie (talk – contribs – library) 22:32, 13 June 2017 (UTC)
Comment: If some people have concerns about the outcome of the RfC, why not "Question 7: Do nothing"? --George Ho (talk) 00:03, 14 June 2017 (UTC); partially struck (see below). 11:10, 16 June 2017 (UTC)
Define "do nothing". If you mean "do what we are doing now", that is the same as Question #3. Or do you mean "go back to the default HTTPS behavior? --Guy Macon (talk) 03:56, 16 June 2017 (UTC)
- Oh.... I overlooked the "status quo" notes mentioned in Questions #1 and #3. Oh well, I guess I should strike out the Question #7 suggestion then. --George Ho (talk) 11:10, 16 June 2017 (UTC)
Comment: This has previously been discussed on Meta at [27]. English Wikipedia users @Nemo bis: @Pundit: @Piotrus: @Denny: and @Halfak (WMF): may want to comment here.--Carwil (talk) 19:29, 16 June 2017 (UTC)
Comment. I am of several minds about this.
- The first, is that the goal of getting other organizations (including for-profit publishers) to partner with us, is a good one. I understand the argument that people in those organizations need evidence to convince their bosses that working with the movement is of value to the organizations.
- The second, refspam is a problem. I remove several instances of refspam every day. (by refspam I mean addition of a valid citation to a scientific paper, that appears to have been added for the sole purpose of adding the ref – to promote it) See for example every contribution by this person who is apparently here only to SELFCITE: Special:Contributions/Orthomd. We also get "academic spam" this on the level of academic labs spamming their websites all over, departments and colleges, and also journal publishers, book publishers, and of course companies trying to sell products. We get the whole range of spam, from subtle refspam to crass.
- The third, is that like all good goals, the execution matters a lot. GLAM projects and partnership with publishers for access are right at the edge of COI; a GLAM project can contribute valuable references and content, or it can fill pages with promotional content and spam links all over WP. When that happens, more harm is done than good. So much depends on how well the individuals who are doing the work (and their bosses) understand the mission of WP and how GLAM fits into the mission. (Small examples – one of the papers cited about is the Pitt GLAM experience (paper). The GLAM person there got access to a bunch of papers via Pitt's library, and was including "Access provided by the University of Pittsburgh" in the citation, every time they used one -- which only applied to their access to the document, of course; it wasn't freely available. This is not OK; just a misguided individual and only for a while, but this is the kind of COI thing that can happen. )
- With all that in mind, the "currency" of eyeballs is very loaded, right? On the one hand, incoming hits from Wikipedia is a useful metric that partners can use to sell collaboration internally. On the other, it is a direct incentive for GLAM participants to just spam ELs and add refspam to drive that metric higher. I don't know if the WMF has made any effort to measure the latter or is aware of any such efforts. Alex you cited a bunch of papers on the positives in your OP; are you aware of any paper that looked at negative consequences of GLAM and publisher partnerships, like spamming etc?
- I guess i would favor a clear no-referral policy, but with exceptions for organizations that are actively partnering with us and only while they are partnering with us, if this is technically feasible, and only at the domain level. I think. Jytdog (talk) 18:23, 22 June 2017 (UTC)
Does sending referrer information *really* help WMF?
[edit]Throughout this discussion, we've assumed that letting everyone know that they get traffic from Wikipedia is going to be to Wikipedia's or WMF's benefit. But is it? For example, articles like Google News and neighboring rights make it clear that publishers are not overjoyed to find out that people are reading a news aggregator and clicking through to their articles. To the contrary, they've tried and sometimes succeeded in making oppressive laws against it, as in Spain. What Wikipedia does in its front page WP:ITN column is essentially news aggregation, though because everything is rewritten it might avoid some of the anti news aggregation laws. And there are a lot of stupid AFDs for blindingly-obvious-Keep news items, and obstructionism at WP:ITN, which at least could be accounted for if publishers had someone in their corner trying to do the good work of sabotaging our site in order to keep us from being a competitor. Under these circumstances I would tend to wonder if parallel situations with museums and such are really so clear-cut as people here seem to assume. Maybe not all of them will respond to wikipedia.org referral links with pleasure – maybe some will respond by trying to obfuscate and lock up their content.
Therefore, I'd ask the experts who talk about the "knowledge ecosystem" just how sure they are that they're really benefitting at all, even if privacy issues were completely ignored. Wnt (talk) 14:43, 8 July 2017 (UTC)
Non-Wikipedia wikis
[edit]Other people use the same software that we do to create their own Wikis. We should give them the option of sending referrer information or of being a silent referrer and let them decide for themselves. --Guy Macon (talk) 01:38, 21 July 2017 (UTC)
Other
[edit]Alternative information sources
[edit]- This RfC is about keeping referrer information available to encourage partner institutions to invest time, money, and other resources into Wikimedia projects. There are many other options for providing comparable information that is not controversial in the way that referrer information is. It was only in 2016 as a result of the Community Wishlist survey that the WMF funded the development of tools to provide pageview traffic reports of Wikipedia articles. This is a basic count comparable to any other audience metric tool which any other site would provide, and it has only recently become available to the Wikipedia community. This is a conversation just started and it has not reached its maturity either in the WMF or the Wikimedia community. Non-invasive reports like this could be developed in dozens of directions which would impress partners much more than referral information. An example of something which we do not currently have is any tool for reporting traffic in a field, like for example, "Show me the audience traffic to all Wikipedia articles about pharmaceutical drugs." We also are unable to generate reports of editor engagement or article development, like "Count how many times all the Wikipedia articles on drugs cite academic journals" or "How often are articles in this category updated with content changes of adding or removing at least 20 words." Editors, readers, and researchers are starved for information as it is. If we are seeking to provide more data to partners then let's go in directions which are noncontroversial instead of beginning the discussion with a compromise of values. After we do what is easy and noncontroversial then let's have the more complicated discussions and decide which of our values to bend or abandon. Blue Rasberry (talk) 19:16, 23 June 2017 (UTC)
When and how to best close the whole discussion?
[edit]Joint closure was requested at "Wikipedia:Administrators' noticeboard#Closers needed for a very sensitive RfC." I wonder whether this discussion should be relisted. I saw Question 6 (previously Question 7) added two days ago. Newer votes went for either "silent referrer" option or other options. The whole majority at this time is favoring "silent referrer". Relisting is suggested, yet one said it is unnecessary. I want to relist this discussion, but I would like opinions first about when to close it. We have two editors volunteering the joint closure and might await one more closer. --George Ho (talk) 01:38, 25 June 2017 (UTC)
- I am fine with closing it at the usual 30 days but have a slight preference for extending it to 60 days to make sure that nobody will be able to say "I opposed being a silent referrer, but you didn't give me enough time". I would note that over half of the signatures at meta: Letter to Wikimedia Foundation: Superprotect and Media Viewer came in after 30 days had passed -- and if took nearly three years to get a reply from the WMF[28]. There is no deadline, and if we get the traditional Wikimedia Foundation response to suggestions from the Wikipedia community ("don't say yes, don't say no, don't say why, hope they will go away if you ignore them") there will be no action on this for many months or even years. I could be wrong, of course. There have been some personnel changes at the WMF, and perhaps they are now more open to suggestions from the Wikipedia community, but I am skeptical. --Guy Macon (talk) 12:50, 25 June 2017 (UTC)
- Godric, I plan to relist this discussion on either the 29th or the 30th of June, so I will give the discussion a 30-day extension. Just to let you know as you told me to. --George Ho (talk) 18:45, 27 June 2017 (UTC)
Relisting comment: As promised, I have relisted the discussion for 30 more days. Recently, I have seen a few more questions created, including one that was created a few days before the end of the initial 30-day span. Also, some raised concerns about what the majority supports, the "silent referrer", and recently, more votes on other options were added. As of now, the majority still supports "silent referrer", but the discussion is given another 30 days for more participants and more discussion. I would add another 30 days if it's desirable, i.e. totaling to 90 days, but it is not yet considered or desired. Therefore, let's see what happens then. --George Ho (talk) 01:37, 30 June 2017 (UTC); modified, 01:38, 30 June 2017 (UTC)
- Good call. Although I have been a strong advocate for the silent referrer option, I am a lot more interested in having every position be represented by the best arguments possible than I am interested in "winning". I am especially pleased that the late additions will now get more that a couple of days discussion. --Guy Macon (talk) 04:44, 30 June 2017 (UTC)
To rephrase or be clearer, if the WMF decides to disregard the consensus of this discussion, then the consensus of another discussion involving the same topic should decide whether the WMF's decision would violate the WMF policies, not rules of English Wikipedia, which are somewhat separate from other domains (point #4 of CONEXCEPT). I predict having another discussion to have one option implemented may take a while for WMF to abide to the consensus of this RfC discussion, but I could be wrong. There are possible outcomes:
- If the decision results in a "consensus" but not a "strong consensus" for one option, then what will happen to requesting the WMF to implement the policy? If that doesn't happen, then another discussion about referrer info can be made. I don't know how awhile the issue can be raised, but I see strong desire to have that option done soon. However, a less organized discussion would make the issue be discussed for a while. I would recommend discussing (a) sending referring info while using IP address and (b) sending referrer info while using a registered account. Can both subtopic be discussed in one or two separate discussions?
- If the decision results in a "strong consensus" for one option, but the WMF decides to disregard the consensus...
The issue of this discussion can be ignored. Unfortunately, due to the majority of the discussion, ignoring the issue would be impossible.(intentionally struck out something already ruled out)- A discussion regarding whether the WMF would be violating its own policies can be made.
- If the consensus does not agree that WMF's disregard would violate WMF policies, then the issue about referrer info would be raised again and again until success comes. Again, this would take a while.
- If the consensus does agree in writing that such disregard would violate WMF policies, how would this affect the relationship between the community and the WMF? How would this affect WMF staff?
- If a discussion regarding WMF staff violating WMF policies is ruled out, another discussion about referrer info can be made. As said in outcome #1.0, I would recommend discussing how (a) an anonymous user using IP address and (b) a user with a registered account, can send referrer info.
- I still feel that requesting a "joint closure" by 4 admins is completely unnecessary, and that the claim that the WMF will ignore the consensus is an accusation of bad faith. Whatever the consensus is, is a property of the statements of the many editors here. It's generally sufficient to have one admin assess the consensus unless there is a specific disagreement regarding the consensus (not simply a disagreement regarding the topic as a whole). Adding more admins shouldn't change the consensus in any way, it merely sounds impressive. Power~enwiki (talk) 16:43, 26 July 2017 (UTC)
Sending referrer info while using IP address
[edit]Guy Macon, can a referrer info be sent from an IP address without using an account? I think this is one of important issues about referrer info. --George Ho (talk) 15:47, 19 July 2017 (UTC)
- @George Ho: This is unrelated to accounts, it's an HTTP-level bug/feature. Other sites don't get information on your account, but they get your IP address (they know this because you connect to them, not because of the HTTP header), what they get via the HTTP header is information like the site and page you are coming from, if you clicked on a link there (as opposed to directly typing/pasting the destination URL in your address bar). —PaleoNeonate – 16:09, 19 July 2017 (UTC)
- Adding: If you click on a link from your own user page (or someone else's), then they could know which user page this was, however. —PaleoNeonate – 16:11, 19 July 2017 (UTC)
- Recently, PaleoNeonate, I studied the basics of HTTP referrer and the list of HTTP header fields. Thanks for informing me about revealing an IP address to an external website. Still, I'm uncertain whether it's okay for a registered user to have referrer info undisclosed to an external website, while it's not for an anonymous user using IP address (exposed to external website already) to have referrer info undisclosed. --George Ho (talk) 16:44, 19 July 2017 (UTC)
- I am not quite sure what "is it ok" means in this context (desirable? possible?), but referrer information (silent or revealing) is exactly the same for registered users who click on an external link and IP users who click on an external link. There is no way for us to treat those two situations differently without somehow detecting that a user has clicked on an external link, which would require a difficult and complex effort (basically making it so that when you click on an external link you go to another Wikipedia page that logs who you are and then clicks on the desired link for you). Right now we don't send your username or IP address to the external site (the World Wide Web was never designed to do that) but of course the external site knows your IP address (your browser, not Wikipedia, sends the IP address -- otherwise the external site wouldn't know where to send the page). The question is, does the user also get a special code instructing their browser to tell the external site that they were on Wikipedia when they clicked the link? If we are a silent referrer, they get no such information. The fact that the WMF unilaterally decided to send out that information about us without consulting the community is the real problem. --Guy Macon (talk) 06:55, 20 July 2017 (UTC)
- Recently, PaleoNeonate, I studied the basics of HTTP referrer and the list of HTTP header fields. Thanks for informing me about revealing an IP address to an external website. Still, I'm uncertain whether it's okay for a registered user to have referrer info undisclosed to an external website, while it's not for an anonymous user using IP address (exposed to external website already) to have referrer info undisclosed. --George Ho (talk) 16:44, 19 July 2017 (UTC)
Meta RfC
[edit]As a procedural comment, given that this RfC is of general interest of all Wikimedians and not just for the English Wikipedia, as what it is proposed here is to modify how WMF manages its traffic for us all independently of the language version, I feel that this should have been posted at m:RfC and announcements made at least on the larger projects for general awarness. Moving this entire discussion over there or ask people to re-comment in a new RfC would be overkill, but I still feel that other editors that do not happen to be active on this Wikipedia should have had the opportunity to know about this and share their thoughts. When/If this RfC closes, maybe we should hold a similar one at Meta. Thanks, -- MarcoAurelio (talk) 14:00, 19 July 2017 (UTC)
- You make a good point. I considered doing that, but this RfC is for referrer policy on the English Wikipedia only, and that is what my request (see below) will contain. It is my intention, once the RfC closes, to seek out dual-language editors to ask the same question on the other language Wikipedia's. The English Wikipedia deciding this for them without asking them would be just as bad as the WMF deciding for us without asking our permission was. I will certainly consider posting an RfC on meta as part of that effort; please repeat your suggestion at User talk:Guy Macon/Reforms/Referrer so we can discuss it further and decide what to do. --Guy Macon (talk) 07:00, 20 July 2017 (UTC)
The Next Step
[edit]As per the statement at the top of this RfC ("If there is a strong consensus for a particular referrer policy, a request will be made to the Wikimedia foundation to implement that policy.") I am preparing that request at User:Guy Macon/Wikimedia referrer policy.
Anyone else is free to make any request to the WMF that they choose, but the above link is the place to discuss my request. --Guy Macon (talk) 13:08, 1 August 2017 (UTC)