Talk:Usage share of operating systems/Archive 3

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3 Archive 4 Archive 5 Archive 7

Marketshare Servers based on websites is totally misleading

The current server share language is complete nonsense. It somehow asserts that webs servers are a good indicator of server share. That is utter nonsense. The vast majority of servers are not web servers.

There has been edits made to temper the uncited claims in this section, but they have been reverted.

It's clear that this section is merely a +POV apology for certain products. It's clear that wikipedia is being used as a promotional tool for certain products. —Preceding unsigned comment added by 173.206.8.177 (talk) 18:59, 11 February 2011 (UTC)

There are two issues here:
1. The explanations in the Server section are indeed unreferenced, and this invites people to add even more unreferenced material. I am OK with completely removing the text, but the past experience shows that someone will add it again anyway. The better approach is to find reliable and verifiable references for that portion (something I wasn't able to easily find myself).
2. Methodology of measuring market share. We as encyclopedia should not pass judgement on whether some methodologies are "complete and utter nonsense" or not. Some people claim that measuring revenue is nonsense, some claim that measuring web servers is nonsense. Everybody is free to make their opinions - but we as encyclopedia just report on what our sources say - Gartner, IDC, Netcraft etc.
Wikiolap (talk) 20:12, 11 February 2011 (UTC)
Re: #2; "We as encyclopedia should not pass judgement on whether some methodologies are "complete and utter nonsense" or not." I agree. But in this context, presenting only publicly available webservers -- in a conversation about Server OS Marketshare **is**.
The discussion re: web server marketshare is irrelevant here, in this context. All the uncited material from the above paragraph and the data in the "method units (web)" table should be removed. This is intentionally misleading and utter nonsense to compare oranges to bananas in way.
173.206.8.177 (talk) 20:29, 11 February 2011 (UTC)
Could you tell why exactly you want to remove web units data?1exec1 (talk) 22:56, 11 February 2011 (UTC)
We have had, at various times, three different methods of counting servers in this section: unit sales/revenue/web servers. All three have strengths and weaknesses - there is no clear-cut right and wrong here. We should just report all three, perhaps in three separate tables, pointing out the strengths and weaknesses of the methods too.--Harumphy (talk) 00:00, 12 February 2011 (UTC)
Reporting all three in separate tables sounds like a good first step. Based on the source, however, the Gartner figures are units, not revenue, so that's wrong to start with (I corrected the mistake, but someone reverted it with no explanation). Reporting server market share in both units and revenue would be the best idea.
Website share should be split out into a separate category, since it's a completely different issue from server OS market share. The text is also horrible, and should probably be deleted. I made some minor improvements to make it less POV, but those were reverted too.
If you oppose splitting the website figures into a separate category, can you point to an authoritative source that claims Netcraft's website survey has anything at all to do with server market share?
Overall, it's obvious someone is abusing the article to promote a particular POV. I'm not really interested enough to bother with it, but maybe someone with more time on their hands can correct this. If not, I suppose it'll be another case where the reputation of WikiPedia is damaged by a zealot pushing a particular POV and reverting any corrections or attempts to make the text NPOV. Shalineth (talk) 11:32, 12 February 2011 (UTC)
Web servers are a subset of servers-in-general, so I think the best thing to do would be to do units and revenue in two tables, then add a sub-heading "Web servers" with the third table in that new sub-section.--Harumphy (talk) 13:45, 12 February 2011 (UTC)
Yes, certainly, but web sites are not the same as web servers. I imagine it takes an enormous number of servers to run www.facebook.com, for example, whereas even a small server could run hundreds of very simple sites. The fact that web servers are a subset of total servers is a minor problem. The bigger problem is that there is no one to one correspondence between web sites and web servers, much less between web sites and either servers generally or server OS installations. This means that, barring authoritative evidence to the contrary, web site numbers cannot be considered valid estimators of even web server OS market share, much less overall server OS market share. Shalineth (talk) 14:50, 12 February 2011 (UTC)
3. The definition of "server" is broad. Web server market share might be estimated via the web while giving numbers for File server market share (down to NAS) via web is a challenge. --95.117.233.197 (talk) 13:59, 12 February 2011 (UTC)
4. The conjecture that IDC or Gartner figures substantially underestimate Linux or open source servers is logically unsound. As documented here, IDC unit figures for server shipments include Windows, Linux, Unix and other. Servers sold with no operating system would thus fall into the other category. However such servers make up only about 0.3% of the total (for Q1 2010). This implies two things:
1. The Windows and Unix market shares, 75.3% and 3.6% respectively in Q1 2010, are minimum market share levels, and do not overstate market shares for shipped servers.
2. The Linux market share is not substantially understated. Even if Linux is installed on every single server that didn't ship with either Windows or Unix, its Q1 2010 market share would only increase from 20.8% to 21.1%.
In light of the above, I suggest that the unsupported conjecture that IDC numbers understate open source server operating systems be deleted from the article, unless authoritative evidence to the contrary is provided. Shalineth (talk) 14:50, 12 February 2011 (UTC)

Suggestions for correcting server market share section

I propose the following corrections to the server market share section:

  1. Remove unsupported text claiming that IDC/Gartner figures understate open source OS share.
  2. Remove irrelevant web site share figures for possible inclusion in a separate section on website OS shares.
  3. Correct labelling of Gartner unit figures, which are currently mislabelled as revenue figures.
  4. Replace methodologically incorrect IDC server hardware revenue figures with methodologically correct server unit figures.

I probably shan't have time to check the page before next weekend. Comments appreciated. Shalineth (talk) 16:35, 12 February 2011 (UTC)

1 - I support removing all unreferenced claims.
2 - measuring market share of web sites is a valid method that at least 3 different sources use (Netcraft, securityspace, w3tech) - we should not remove legitimate reliable and verifiable sources. We already have some text which tries to clarify difference between methodologies. Maybe this text could be improved, but it should not have unreferenced claims either (see #1)
3 - Gartner reports revenue. The source is reliable and verifiable, but not public - the report itself costs money. I had access to it couple of years ago, I will try to get access again and verify that it is indeed revenue.
4 - IDC reported market share by revenue, and it is perfectly valid methodology (IDC is reliable and verifiable source). I used to have additional line in the table for IDC numbers by unit, but it was removed by other editors. I will be happy to add it back.
Wikiolap (talk) 00:46, 13 February 2011 (UTC)
Remove unreferenced claims. Report web site share in a new section, separate from the server section. Report both units and revenue in separate tables with correct labelling, even if there's only one cited source.--Harumphy (talk) 11:08, 13 February 2011 (UTC)
If we separate website and server-share reports we better do not include the website share at all. I suggest reordering the current table in the way that sources reporting website share are grouped together. We can also introduce one more column that says which method was used to acquire the statistics. Also see my answer below. 1exec1 (talk) 15:20, 13 February 2011 (UTC)
I disagree. The article already has a separate section for 'Web clients', which is distinct from the sections for 'Desktop and laptop computers', 'Netbooks' and 'Mobile devices'. The consistent approach for servers would be to have a section for 'Web sites', which is distinct from 'Servers'. Shalineth (talk) 21:09, 21 February 2011 (UTC)
2. Measuring market share of web sites is a valid way of measuring web site share. This is an article about server OS usage. Is there an authoritative source claiming that measuring web site share is a valid way of measuring either web server share or web server OS share? If not, I suggest it belongs in its own section (or perhaps own article) -- an article about web site market share, as opposed to (web) server OS market share. Again, I must stress, these are not synonymous. It is a severe methodological error to assume they are. Shalineth (talk) 12:19, 13 February 2011 (UTC)
3,4. Gartner and IDC report both revenue and units, although not all reports contain both measures. Revenue is a valid measure for market share, which can be defined in terms of either revenue or units. This article is about usage share, which implies units. Second, the revenue figure is for servers, not server OSes. That would be fine in an article about server hardware market share, but this is an article about server OS usage share. Again, the figures are absolutely valid, but they're being used incorrectly in this article. Shalineth (talk) 12:19, 13 February 2011 (UTC)
@Shalineth: Website share is a proxy to the actual server OS usage share in the same way as inspecting user agent strings is a proxy to desktop OS market share. If you consider them not appropriate, then sources reporting server market/unit share are not appropriate reference points either, as they report the current sales, not the share of already deployed servers.
In conclusion all sources used in the article are biased in one or another way. Since we are only presenting and commenting the data, not interpreting it, all sources must have the same credibility, unless there is a strong reason not to do so.1exec1 (talk) 15:20, 13 February 2011 (UTC)
This is true. Sales by hardware units and sales by hardware revenue are also proxies for OS usage share. None of the three methods correlates directly with OS usage share, but all three are of interest nevertheless. We should just report what the sources say, accompanied a concise summary of the strengths and weaknesses of each method. It is for the reader to decide how much credence to give to each method, not us. --Harumphy (talk) 09:57, 14 February 2011 (UTC)
@ 1exec1
It isn't quite the same thing, since there's usually a 1:1 mapping of web clients to client OSes. For web servers, a single server can run a huge number of websites, and at the other extreme, some websites require large server farms. All this means that the approximation is much closer on the client side. In any case, I think it's perfectly reasonable to include web site OS share, as long as it's properly labelled as 'Web site OS usage' and not conflated by original research with 'server OS usage'.
The same applies to 'server OS unit shipments' and 'server hardware revenue'. It's fine to include them both, as long as it's made very clear what they are, and 'server hardware revenue' isn't mislabelled as 'server OS revenue' or 'server OS usage'. What actually brought this article to my attention in the first place was confused comments by Linux advocates who thought 'revenue' in this article referred to software vendor revenue, not to server hardware revenue, and were going on about how most users don't pay for Linux so revenue figures are invalid, etc. The section on server OSes is very unclear about these things, and looks like a clear case of misrepresentation of data (not necessarily intentional -- though the unreferenced comments suggest it is). The data are valid, but are being misused. Shalineth (talk) 21:09, 21 February 2011 (UTC)
This sounds like a consensus to me - we keep the valid data in the article, but relabel it to disambiguate what it actually means. I would support this effort.Wikiolap (talk) 23:51, 21 February 2011 (UTC)
It sounds like consensus to me too.--Harumphy (talk) 13:34, 27 February 2011 (UTC)

Time limit for out-of-date sources

There was some discussion earlier in Talk:Usage_share_of_operating_systems#Should_we_remove_AT_Internet_Institute_from_web_client_stats.3F. I think it's fair to say there's a consensus that we should apply the same time limit, whatever that limit is, to all the sources. At the moment it's 12 months. Someone suggested we should reduce it to 6 months. (If we did that then ATII would get removed on 1st April if they haven't updated by then, because they last updated on 31/9/2010.) So, should be cut the time limit to 6 months?--Harumphy (talk) 13:34, 27 February 2011 (UTC)

I think yes. The previous discussion was stopped by the fact, that Wikipedia doesn't update either. As the problem has since been solved, I know no reason to keep a single old source, that skews the data.1exec1 (talk) 17:11, 27 February 2011 (UTC)
6 months seems reasonable to me. Jdm64 (talk) 02:21, 28 February 2011 (UTC)
FYI ATII has just updated. They must have heard us!--Harumphy (talk) 16:02, 1 March 2011 (UTC)
Yes, and I'm extremely disappointed with them. As you can see with the more detailed PDF, they consider Android as the "Google Operating System" and as if not being Linux, providing unaccurate data for this table... 89.181.106.123 (talk) 00:29, 2 March 2011 (UTC)

Mobile Devices Citation

Caption on image currently reads "Share of 2010 Q4 smartphone sales to end users by operating system, according to Gartner", followed by a citation.
The numbers in the pie chart are not contained within the cited article. The cited article was written on 19 May 2010, and reports on 2010 Q1 numbers.
Caption should be revised to cite an article containing the numbers used on the pie chart, or the pie chart should be changed to reflect the numbers in the cited article. Mismatches are bad, mmmkay?
64.113.8.130 (talk) 22:55, 4 April 2011 (UTC)

Linux table headings

For clarity and consistency between sections, I suggest we change the top-level heading in both the web client and mobile device tables from "Linux" and "Linux based" respectively to "Linux kernel based", and change the second-level heading in the web client table from "mainstream" to "Linux". --Harumphy (talk) 19:34, 11 May 2011 (UTC)

I don't think that's the best solution. For one "Linux kernel based" is a long title. Second, I think it would be confusing. What's the difference between "Linux" and "Linux kernel base"? I understand what you're trying to say, but would others? I think it's fine how it is, or possibly, "Linux" as the top heading (or "Linux based") and then sub-headings of "GNU/Linux" and "Android/Linux". Jdm64 (talk) 22:14, 11 May 2011 (UTC)
Linux has two meanings: (1) the Linux kernel, and (2) the family of operating systems based around it, which are largely binary compatible with each other and traditionally known as Linux distributions. Then there is Android, which uses a forked Linux kernel, is binary incompatible with Linux distributions and has a stack sitting on the kernel which is very different from anything else. The only thing that Android has in common with Linux distributions is the kernel, and that is a heavily modified, incompatible derivative. I am aiming to better reflect the two meanings, and to deal with the fact that within a couple of months or so it looks as though Android will be more mainstream than the stuff we currently call "mainstream". As far as length goes, "Linux kernel based" will fit without expanding column width. (I've tried it.) I don't thing we should use GNU/Linux or Android/Linux as they really are too long, don't reflect what the sources say and do not aid understanding at all. --Harumphy (talk) 08:05, 12 May 2011 (UTC)
I am more confused by "Linux kernel based" vs "Linux based" as they may be understood as synonyms and anything Linux based is certainly Linux kernel based. We must of course use terminology that reflects what the sources are talking about, but isn't most Linux except Android indeed GNU/Linux (which is not longer than "Linux based")? If there is significant use of other Linuces (affecting the decimal points we are writing out) simply "Android" and "Other Linux" should do. --LPfi (talk) 11:51, 12 May 2011 (UTC)

[section break]
Just to be clear, I'm suggesting this:

Linux kernel based
Linux Android

The top line is an umbrella heading that accurately reflects the only thing that Linux distributions and Android have in common: some sort of Linux kernel. In the second line, Linux means what it is most commonly understood to mean - a Linux distribution. In this I'm taking the view that Android is *not* a Linux distribution in the conventional sense because it has so little in common with Debian, Ubuntu, Fedora, RHEL, SuSE etc. All of the stats sources except Wikimedia separate Linux and Android in this way. --Harumphy (talk) 12:40, 12 May 2011 (UTC)

Like LPfi said, anything Linux based is surly Linux kernel based; This is like how Linux is a Unix-Like OS. Your headings look redundant, especially to somebody that doesn't know about Linux; and it doesn't make somebody want to learn what the distinction is. I think the layout below clearly shows the distinction between normal Linux and android. "Linux based" is a link to "Linux kernel". "GNU/Linux" could be 2 separate links to GNU and Linux or one link to Linux Distribution. How is that not simple and clear? Jdm64 (talk) 20:22, 12 May 2011 (UTC)
Linux Based
GNU/Linux Android
The phrase "Linux based" is no more informative than just "Linux", because it doesn't make clear which of the two things called Linux forms the base. Is could mean either just the kernel or the kernel plus the stuff that makes a Linux distribution. So, to answer your question, it's not simple and clear because it's ambiguous. Sure, the kernel's always there, even in Android, but the other stuff isn't. By excluding the word kernel, it doesn't make it clear that Android is based on only the kernel and not the other stuff. The 'umbrella' heading should reflect what the things under it have in common. They have only one thing in common: the kernel. That is why the k-word is the key to comprehension here. --Harumphy (talk) 23:38, 12 May 2011 (UTC)
Ok, fine, include kernel. But that still doesn't remove the confusion about "Linux kernel based" and "Linux". It should be "GNU/Linux" to show how Linux kernel based is different than Linux. Jdm64 (talk) 01:24, 13 May 2011 (UTC)
Fair enough. Thanks. I'll settle for that. --Harumphy (talk) 07:13, 13 May 2011 (UTC)

I believe Android should be reclassified as a mobile device. See my comments there. hhhobbit (talk) 14:24, 5 June 2011 (UTC)

Count Amazon Kindle?

Amazon Kindle was reported to likely break 8 million units sold last year. http://www.slashgear.com/amazon-likely-to-break-8-million-kindle-units-sold-this-year-21120580/

With quite a few media being sold: http://news.cnet.com/amazon-kindle-books-outselling-all-print-books/8301-17938_105-20064302-1.html Better data is likely available. Seems these are significant numbers. --89.12.7.116 (talk) 20:57, 26 May 2011 (UTC)

This page is about usage share of operating systems, not devices. The OS that the Kindle uses is Linux, so if we were to add it, it would only be a small side note that the Kindle runs Linux. I think it's more appropriate that the information be added to Linux-based devices. Jdm64 (talk) 00:16, 27 May 2011 (UTC)

I have written this about thirty times and each time started over. I would like to do that again right now Saying Kindle is Linux is like saying Mac iOS is OS-X, or OS-X is FreeBSD. Mac OS-X uses launchd to start everything. Except for a few things that init starts, init is basically something that all other processes have as their parent if they lose their immediate parent. launchd does not work the same way. Is OS-x's launchd the same thing as init in Unix / Linux? No. The same thing is occurring with these mobile OS. One mobile OS has the distinction of being derived from nothing but being its own little entity from the start - Blackberry. All the other mobile OS are diverging so far away from what they were derived from that the code base is becoming meaningless. iOS really is that different from OS-X. But each OS is really not just the kernel. It is all of the things that go together including the hardware that make up that system. Unless you want to have a separate category for each of these mobile OS I suggest you lump them all together with the category mobile OS. They have more similarities with each other than they do with what they were derived from. Apple has joined Windows in having malware that self installs now with no password required on Macintosh OS-X as long as the user account you are using has administrator privileges. It has the promise of continuning that way unless Apple finally wises up and begins requiring a password for software installs for all OS-X users. May I humbly suggest these malware problems are making a lot of people mobile OS only users? But you have been caught napping. Apple sold more iPhone and iPad systems in the last two quarters than they did OS-X. The malware problems with the predominant desktop systems combined with Twitter and other things are making many current desktop OS systems dinosaurs. So I suggest you have a separate mobile OS category with maybe a break down showing what each was derived from. But the malware problems of the predominant desktop systems are rapidly making mobile OS as the tour de force of the future. Would I have predicted that two short years ago? No. I was also caught napping. It is rapidly progressing toward a future where many people will be mobile OS only users, storing their data in the cloud (data storage repositories) and printing to new printers that use BlueTooth. Any general mobile OS that doesn't make provisions to share the data that was created on it with a different general mobile OS from another vendor will rapidly become a relic of the past. IMHO, your current classification scheme was what was there in the past and what we have now is becoming increasingly incongruent with what you have. You are missing what has been happening with these mobile devices. Mobile OS are rapidly becoming the OS of the future. The fact that 8 million Kindle units have been sold indicates that things are changing. Did we have eight million new installs of desktop Linux systems last year? No. Your percentages are woefully out of data, but mostly because your categorization is wrong. Kindle is not Linux. iOS is not Macintosh / OS-X. They are now separate entities with very little similarity to what they were derived from. hhhobbit (talk) 02:57, 6 June 2011 (UTC)

The problem is I still don't know where the data would fit on this page given the current sections. I'm not saying the information is unimportant, just not suited for this page. This page is still about OSs, and the OS of the Kindle is Linux kernel based. It's just not a traditional desktop distribution. Similarly iOS is based on the Darwin OS, just like MacOSX. Jdm64 (talk) 20:41, 6 June 2011 (UTC)
To me, the more important issue (and the answer is not clear to me re Kindle), is under what circumstances should a device's OS fit into this article. In some sense, every automobile with a computer chip has an OS in it (a real-time kernel of some kind), but I doubt that fits the intent of this article. Kindle has a Linux kernel. How much else of what we think of as "an OS" does Kindle have? If it weren't for Kindle's ability to browse the web, it would be a single-purpose dedicated device, not really different (IMHO) than the smarts in an automobile - or a microwave oven for that matter. My point is, where to draw the line? Again, I don't know the answer to that. Perhaps a section for devices that can't download apps. If Kindle is included, then so should the 300 M NON-smart phones sold last quarter be included. (a variety of proprietary "OS"s) ToolmakerSteve (talk) 04:37, 21 August 2011 (UTC)

Long Term Suggestions

I was looking at the discussion, and at the article. What struck me is that the current active discussion topics seem to be discussing the different facets of the same issue, and I think that we should look at combining them. Problem is that since you sometimes link to me, I can't work on the page :) I can however make suggestions. My apologies if the formatting is a bit rough. Formatting on discussion pages drives me to distraction sometimes.

  • Technology Types

I think we can limit things to three technology types:

Personal Computers - Desktops, Notebooks, Netbooks, Laptops, Nettops, in other words any stand alone computing device which is designed to be used by a single user and which has a fill sized keyboard.
Mobile Devices - Tablets, EReaders, MP3 Players, Phones, in other words any stand alone computing device which is designed to be used by a single user, which while may have a keyboard it will not be full sized, or it will be an on-screen keyboard. Optional Bluetooth or USB keyboards do not count as they are not part of the basic device and it is designed to function without a keyboard.
Servers - in other words any computing device which is designed for multiple user use, either over a network, or through direct connection as was once common. Servers include all computing devices which are not stand alone such as Desktop Client units. Mainframes and Supercomputers are effectively specialized Servers.
  • Numbers

This gets fun. No matter what is done no one will be happy. I'd rather be too expansive here though. While it's difficult to be certain about reliability of any numbers below 5%, the fact that something shows up is of interest. Part of the problem is that everyone wants the numbers to favor them. This puts us in opposition to them, because we want the numbers to be accurate and favor no one.

Unless there is solid evidence that the numbers from a supplier are inaccurate we need to show them. If an analyst or investigator is able to come up with evidence that there is a problem we need to provide a link to it with a note that this supplier's numbers are questionable.

When we are displaying numbers, we need to make sure that the numbers are from the same time period. In the Server Usage Share we have dates of 2007, Jan. 2009, July 2009, September 2010, and Q1 2011. Dates this far apart are impossible to make a valid comparison with. We need to set a rule on age range allowed. My personal suggestion is that the widest range should be eighteen months. It might make the charts a lot smaller, but it will make them a lot more sensible.

Longest term we should probably consider splitting this into three articles, i.e. usage share for each technology type so that each type can be handled in far more detail. UrbanTerrorist (talk) 19:59, 12 August 2011 (UTC)

Web clients - remove sources

Both AT Internet and StatOwl ignore mobile clients in their reports (well, in fact AT Internet notices the existence of iOS but doesn't consider Android worth counting, nor as a Linux "variant", StatOwl just ignores them). That makes the rest of the values inflated, so comparing the numbers from these two sources with the rest isn't a fair comparison. Thus, I propose for us to just stop taking into account both these sources, until they start reporting (or taking into account in their reports) the existence of mobile web clients. 195.23.131.230 (talk) 15:58, 12 April 2011 (UTC)

AFAICS AT Internet includes Android and a number of things under 'other'. This is perfectly OK for our purposes. StatOwl is more of a problem because they just take desktop OSes with above 0.1% share and expand the numbers so they add up to 100%. This is inconsistent with the rest of our table and there's no easy way of fixing it. So I think we should keep AT but I've no objection to removing StatOwl if that's where the consensus is.--Harumphy (talk) 07:17, 13 April 2011 (UTC)
AT Internet: The fact that AT Internet puts under "other" things that we don't makes our data on "other" and what fits in there for AT Internet and not for us erroneous. The only ways we're being correct about the data we're dealing with is either by removing AT Internet as a source, or putting things like Android also under other, like they do. So we actually have three different choices: 1) being wrong (as we are now), 2) removing one source (and thus removing the accuracy of the data we're presenting), or 3) putting Android under other, which I honestly don't like, since Android is technically Linux, so the numbers of "Linux" would be "some Linux", which would cause confusion... 195.2 width="100%"3.92.1 (talk) 16:07, 8 August 2011 (UTC)
StatOwl - I vote on removing StatOwl, since the fact that they don't have an "other" makes their data meaningful only in comparison between those OSs they have stats on. It might be interesting data, but it simply doesn't fit on what we're trying to represent in this table. 195.23.92.1 (talk) 16:07, 8 August 2011 (UTC)
I oppose removing StatOwl - they are valid reliable and verifiable source. We could add note explaining their methodology if more explanations is needed, but not to remove this source.Wikiolap (talk) 17:34, 13 April 2011 (UTC)
They are reliable and verifiable, yes, but they're not measuring the same thing we're representiong on that table. They represent the share between a list of OSes, while we're representing the share between all OSes (thus the "other" column). They don't give us enough data (an other column, for instance) to even find out what's the real percentage of those OSes they're representing, so their numbers, while interesting, simply don't have enough info to fit in our table. Putting them there, as they are nowadays, just adds known-yet-unmeasurable error into the table... 195.23.92.1 (talk) 16:07, 8 August 2011 (UTC)
I agree with your comment higher up this section about the ambiguity of our 'other' column. It isn't immediately obvious that what we count under 'other' varies from source to source. Rather than eliminate a source because it doesn't fit our idea of 'other', it would be better to eliminate the 'other' column from the table. There's nothing wrong with AT as a source. StatOwl, on the other hand, is more problematic because it only covers desktop OSes with >0.1% share and then expands them to fill 100%. So I think we should keep AT, dump StatOwl and dump the 'other' column. --Harumphy (talk) 22:10, 9 August 2011 (UTC)
I concur with the the idea of removing StatOwl. However, I don't think that dropping the 'other' column is a good idea unless all the rows add up to 100%. Since that column is defined as 'whatever doesn't fit to the current columns', or simply '100% - sum of the columns', it will be implied even if we dump it. So I don't see point in doing that. The abovementioned issue of AT not using the same 'other' definition as ours can be solved by merging the problematic cell into one for now.1exec1 (talk) 00:18, 11 August 2011 (UTC)
Sorry, I don't get that last bit. What do you mean by the "problematic cell" and what are you suggesting we merge it into? --Harumphy (talk) 10:28, 12 August 2011 (UTC)
I meant doing something like this:
Source Date Microsoft Windows Apple Linux kernel based Symbian Black-
Berry
OS
Other
7 Vista XP All
versions
Mac
OS X
iOS GNU/
Linux
Android
AT Internet [1] Apr. 2011 28.8% 16.4% 42.1% 88.4% 6.9% 2.8% 0.9% 0.5% 0.5%

1exec1 (talk) 09:30, 14 August 2011 (UTC)

Doing so is fine by me, as long as we do the same for the median... 195.23.92.1 (talk) 19:28, 17 August 2011 (UTC)
It seems a bit messy to me, especially if we do the same for the median. Overall I don't think it's an improvement.--Harumphy (talk) 10:01, 18 August 2011 (UTC)
Since dumping the "Other" column without adjusting the percentages would be odd (lines wouldn't add up to 100%) and this solution is messy, would you accept a solution where the "Other" column would be simplified (and we would add the Symbian and Blackberry values to the "other" column)? Feel free to add your comment and also your vote in the "Vote Count" section for this alternative 195.23.92.1 (talk) 14:21, 18 August 2011 (UTC)
If we're keeping the 'Other' column, I'm in favour of leaving it as it is. The fact that for one source it includes Symbian and Blackberry is a very minor irritation which I can live with more easily than any of the proposed remedies.--Harumphy (talk) 15:23, 18 August 2011 (UTC)
Shouldn't we at least put some kind of notice in the footnotes, then? Keeping it "as it is" results on both wrong data on "Other" (more than it should) and "Symbian" and "Blackberry" (less than it should). Another (while messy more agreeable to me) thing we could do would be doing the same it's done to Clicky and proposed to StatOwl, and "calculate" which percentage of that "Other" is "Symbian" and "Blackberry", taking the other sources as a reference. 195.23.92.1 (talk) 14:43, 19 August 2011 (UTC)
How about this: we replace '---' with 'n/a', change heading 'Other' to 'Other inc. n/a' and add a footnote saying n/a = data not available from source --Harumphy (talk) 21:37, 19 August 2011 (UTC)

StatOwl - further thoughts

Looking at the above discussion, I can see that I've blown hot and cold on StatOwl over the months. The only real problem I have with StatOwl is that it ignores mobiles and inflates desktop share to 100%. We have a related problem with Clicky Web Analytics, which produces separate stats for desktops and mobiles, which we multiply by roughly 0.94 and 0.06 respectively to get the figures for our combined table. (I work out the exact figure each month from the mean of two sources as explained in the footnote.) If we used the same correction for StatOwl - i.e. multiply its figures by the desktop factor of 0.94-ish, that would eliminate the inconsistency. Naturally this would have to be explained in the footnote. I've suggested this in the past, but not got support for it, so for many months the status quo has been that applying this kind of correction based on figures from two other sources is somehow OK for Clicky but not OK for StatOwl. It seems to me that we should be consistent here, so please comment here and see the further vote option below.--Harumphy (talk) 08:10, 19 August 2011 (UTC)

If you do so, you won't have data to fill in the blanks: if you put the rest of the percentage to reach 100% in "Other", then that data is wrong (because you have Android, iOS, Blackberry and Symbian's percentages there), if you also don't fill "Other", then the table will be... strange looking, even if more accurate. From these two choices I still prefer the third (ditch StatOwl), but if that's not what will happen... which of the two sollutions up there do you propose? 195.23.92.1 (talk) 14:35, 19 August 2011 (UTC)
I think the other 6% or so should go in the 'other' column. The footnote explains how the 'other' column is calculated so I don't have a problem with it.--Harumphy (talk) 21:24, 19 August 2011 (UTC)
Three in favour, none against, so I've implemented this. --Harumphy (talk) 14:40, 24 August 2011 (UTC)

Vote Count

We're discussing two sources in particular, and they have different suggestions... Here's a summary of the votes we can see by reading the discussion (please update this if you add something to the discussion):

StatOwl - Let's remove it!

AT Internet - let's merge the "Others" cells!

AT Internet - let's count "Symbian" and "Blackberry" as "other"!

Apply desktop/mobile split (mean of Net Applications and StatCounter Global Stats figures) to both Clicky and StatOwl

  • Yes, apply to both - 2 vote - Harumphy, Jdm64, 195.23.92.1
  • No, apply to neither - 0 votes
  • Apply to Clicky but not StatOwl (status quo) - 0 votes

LOL that graphic

that graphic on the right is not a good member of this page. it is created from impossible to identify data, and its citation is the page it is on. come on people. someone can do better than this. Forcep caliper (talk) 03:05, 30 September 2011 (UTC)

I agree. Referencing itself seems awkward. And giving usage shares for "web client operating systems" without saying what those are is even more confusing. — Preceding unsigned comment added by NotDifficult (talkcontribs) 07:45, 28 October 2011 (UTC)
In what way is the data impossible to identify? It's the median of eight sources, all of which are cited. So the figures can be verified precisely. What's the problem?--Harumphy (talk) 13:15, 28 October 2011 (UTC)

Median constitutes improper synthesis and original research

I first marked it as original research - a marking which was promptly deleted. I deleted the section and it was promptly reverted. The argument still stands: median is not an acceptable calculation:

  • While "well defined" it constitutes improper synthesis of the numbers it calculates over. It reaches a conclusion not supported by any of the sources. read WP:OR. It does not correctly reflect the sources.
  • Wikipedia policy requires consensus even for routine calculations like totals and counts. I marked it as WP:OR - a marking which should not be summarily deleted as was done by Harumphy.
  • Median is by no means a routine calculation; it is a statistical method which is not applicable in this setting: The result will be highly dependent on which sources are selected, ie the numbers are a result of article editing (specifically source selection) and not attributable to a source.

User Harumphy has threatened to treat it as edit warring if I remove the line again. However, my position stands: This is original research and it does not belong here. There is not consensus, so Harumphy, please remove that line yourself. Useerup (talk) 11:16, 30 October 2011 (UTC)

If the median was being used to infer something then it might constitute improper synthesis. But it isn't being used for that. It's just being stated as a median without any conclusion being drawn from it. The most relevant part of WP:OR is surely WP:OR#Routine_calculations, and there has to date been a consensus among editors here that it's OK as far as that policy is concerned.--Harumphy (talk) 11:29, 30 October 2011 (UTC)
Sure, if you delete the WP:OR markings you can claim consensus. Median is certainly not a routine calculation. Median, mean etc are original research and improper synthesis because the result is not supported by any of the sources. You are creating a synthesis over a number of sources. This is wrong on many levels, not least that the result will depend heavily of the sources selected. To use the median you need a source which calculated that median and which supports why a median is proper. There is no such source referenced, hence OR. Useerup (talk) 11:46, 30 October 2011 (UTC)
Reading archived discussions I don't see a discussion with a consensus at all. I see someone touched upon the subject by discussing the mean value - but no discussion and "consensus" on the applicability of median at all. But that really doesn't matter, as there is no consensus at this point. Useerup (talk) 12:23, 30 October 2011 (UTC)
To make matters worse, the median is supposed to "remove outliers" (per archived discussion). But the numbers do not at all express the same distributions. Some are demographically biased, others are openly geographically biased. Calculating a mean or a median (or any other statistical function) is not just OR - it is totally improper as it lumps together apples and oranges. Useerup (talk) 12:23, 30 October 2011 (UTC)
A few points in reply:
  • I agree that any past consensus becomes moot if there isn't consensus now.
  • I ask that given that the table's format has been stable for some time, it shouldn't be altered until a new consensus has been reached here first.
  • I disagree with your assertion that median is OR. In what way is a median less of a routine calculation than, say, the simple addition that is specifically endorsed by WP:OR#Routine_calculations? After all, they are both just forms of y=f(x1 ... xn). (If you know the values of x then there's only one possible value of y whether it's a median, mean or simple addition). You keep asserting that it's OR but ISTM that (a) you have not yet justified that assertion, and (b) even if it is, it's an allowable form of it. AFAICS what we're doing is entirely consistent with WP:OR#Routine_calculations. If you disagree, please explain why, don't just baldly assert your opinion as fact. --Harumphy (talk) 12:53, 30 October 2011 (UTC)
Median is not an routine calculation like a simple conversion between units of measure (inches to meters, birth date to age etc). It is a statistical function which is applicable in certain situations and not in others. For simple/routine calculations this is uncontroversial, you cannot argue that the conversion feet to meters introduces new knowledge or is open for interpretation. For statistical functions your are making assumptions and creating synthesis. I have explained why above: You are using median across a data set with very, very different numbers: Some numbers have expressed geographically bias, others has openly demographically bias. Median is as wrong as mean in those situations. If I introduce yet another stat counter (or remove one) it will immediately change the median number. Thus, the selection of sources becomes a basis for the calculated median. That selection is performed by wikipedia editors and has no basis in any of the sources. The policy is pretty clear, you cannot combine multiple sources to reach a conclusion not expressly supported by any one of the sources. Useerup (talk) 13:19, 30 October 2011 (UTC)
The median isn't a 'conclusion'. It's just a summary. A summary inevitably compromises precision in the pursuit of brevity. That doesn't render it invalid, or OR. What does anyone else think?--Harumphy (talk) 14:45, 30 October 2011 (UTC)
I think that saying that median is not routine calculation is itself OR, thus that assertion itself needs proper discussion before we can discuss its applicability here. Seriously, it has been discussed here already, and since the current table doesn't clearly violate any of the Wikipedia's policies, consensus has higher authority than anything else. 1exec1 (talk) 15:23, 30 October 2011 (UTC)
Going directly by WP:NOR:
  • Do not combine material from multiple sources to reach or imply a conclusion not explicitly stated by any of the sources. If one reliable source says A, and another reliable source says B, do not join A and B together to imply a conclusion C that is not mentioned by either of the sources. This would be a synthesis of published material to advance a new position, which is original research.
    • Here we have a table of 8 sources which say A B C D E F G H. This article joins all of those to imply conclusion M which is not mentioned by any of the sources. Please explain how that is not OR?
    • The use of median (even if OR was allowed) in this case seems highly doubtful. A median is computed over a homogeneous set of numbers expressing the same property for a number of observations, i.e. ages of students in a class. The problems here: :::::***the median here is not computed over the same property: One number is the web usage for mostly German sites, another "mostly" in U.S., a third "mostly" web designers and other self-selected communities. So what does the median represent? U.S.? Global? Germans? Coffee-drinkers? It makes no sense. It is the median of seeds in boxes of fruits. Some boxes with apples, some with oranges some with rotten bananas.
      • a median is only valid when computed over a complete set of observations. The number of students in a class is well-defined, countable, verifiable and finite. The number of web client usage share counters is uncountable and the selection here has been selected by editors.
  • This policy allows routine mathematical calculations, such as adding numbers, converting units, or calculating a person's age
    • These are simple mathematical arithmetic calculations and conversions; a far cry from statistical calculations. The closest you can get to mean or median (and that would be a stretch) is "adding numbers". However, this article not just adds numbers, it calculates a median over numbers from multiple sources, thus the calculation is sensitive to the the sources chosen by wikipedia editors, and thus is not supported by the sources. The sources may individually be reliable (with the caveats for each one) but the list has been comprised by WP editors and median calculated over that list. This is clearly new conclusions entered by WP editors.
    • It is even worse: At least 2 of the sources have been "corrected" by WP editors (in good faith, but still) further creating OR.
Claiming that "saying that median is not routine calculation is itself OR" is... strange. This is the talk page and not the Bizarro universe where everything is opposite. As everywhere on wikipedia the burden falls on the editor who wants to enter (or keep) a claim to demonstrate that it is not original research, see WP:VERIFY. To demand that anyone challenging a claim must first demonstrate the such a challenge itself is not OR is a novel take Useerup (talk) 16:54, 30 October 2011 (UTC)
You keep asserting that a conclusion is reached / implied by calculating median. It is not. This is explicitly stated in the article, along with the caveats regarding the accuracy and data skewing. Median is just that - a median of that data, no conclusion is implied anywhere that refers to the calculated median for support. Thus the SYNTH point is weak. 1exec1 (talk) 18:33, 30 October 2011 (UTC)
The bolded line with median is a conclusion. It states "this is the usage share". In the archived discussion the median is even pushed as a way to do away with "outliers". Useerup (talk) 19:27, 30 October 2011 (UTC)
The bolded line doesn't state it. It's just you inferring it. Thus the only error is in your own perception.--Harumphy (talk) 23:03, 30 October 2011 (UTC)
BTW your 'improper synthesis' tag on two of the table's footnotes is wrong so I'm removing it. This is a routine calculation that was unanimously approved by the three editors who discussed and voted on this very issue, and thus compliant with WP:OR (See the last vote in Talk:Usage_share_of_operating_systems#Vote_Count above.)--Harumphy (talk) 23:29, 30 October 2011 (UTC)
WP:NOTYOURS and WP:NOTDEMOCRACY. Issue stands - "correcting" numbers is improper synthesis. I apologize for being so blunt, but please don't remove tags before issue has been resolved. Useerup (talk) 01:09, 31 October 2011 (UTC)
So what exactly does the bolded median line state? Does it compute over the other rows (multiple sources)? Is it supported by any one of the sources? Are the numbers in that row attributable to reliable secondary sources or are they the result of wikipedia editing, i.e. selection of sources? Do any of the sources coalesce different demographics or different geographical regions and explain how median is a safe method? Useerup (talk) 01:20, 31 October 2011 (UTC)
It states what it says it states: the median. It states nothing else. That median is a routine calculation which has been approved by editors and is thus fully compliant with WP:OR#Routine_calculations. This policy is a specific exemption to the requirement for an external source: as long as the input figures are cited (and they are, right there in the column) the output of the calculation requires no external source. As far as selection of sources goes, we've included every remotely-credible source we know of that tracks multiple web sites. The sources do undoubtedly have various demographic, regional, linguistic and other biases, but that doesn't matter in the context of this dispute. (IMHO, the median is interesting because it probably helps to mitigate against such biases, but any such mitigation is not claimed either in the article or as a justification here.) Median is a 'safe method' for calculating a median. For that purpose, the only one claimed, it's 100% safe. I know of none safer.--Harumphy (talk) 08:36, 31 October 2011 (UTC)
Why is the median relevant here at all then? Why not an average? The WP:OR#Routine_calculations policy is for routine calculations based on a number from a single source, like calculating a persons age from a birthday or adding windows versions numbers to create a total for windows. The WP:OR policy specifically prohibits creating a synthesis from multiple sources. This median is exactly that, a number calculated from multiple sources. And you even have an opaque selection criteria for those sources. And it is not clear at all how you calculate that median; the numbers in the median row end up not even being comparable to each other (the row does not come to 100%). And the numbers used for the median calculation has even been "corrected" by editors as well. This is wrong on so many levels, but I will try to summarize them belowUseerup (talk) 10:10, 31 October 2011 (UTC)
In reference to Median is a 'safe method' for calculating a median. For that purpose, the only one claimed, it's 100% safe. I know of none safer: I know a 100% way to calculate an average. Let's go add that to the table, shall we?. It doesn't claim to be anything but an average, it's not a conclusion or anything; just an average. I also know a 100% safe way to calculate the product of all usage shares. Since it doesn't claim to be anything but a product of usage shares, we can add (multiply, rather) that as well. Let's throw in the sum as well; it doesn't claim to be anything but a sum. Useerup (talk) 10:29, 31 October 2011 (UTC)
For what it worth - I originally objected to the median in summary table on the same grounds as User:Useerup raises now. At the time, there indeed were majority of editors who thought it was not OR. So User:Harumphy is right - there was consensus. And while I still think it is OR, I also agree with User:Harumphy that any change to this tables or calculation would require new vote.Wikiolap (talk) 02:42, 31 October 2011 (UTC)
WP:NOTDEMOCRACY Useerup (talk) 10:10, 31 October 2011 (UTC)

Whether the median is OR or not is besides the point. It's just unimportant.Jasper Deng (talk) 21:50, 31 October 2011 (UTC)

sales are not equivalent to use

Re "Moreover sales are not equivalent to use, as Windows comes pre-installed on many computers that will be used with other operating systems."

AFAIK, the actual PERCENTAGE of Windows computers that are wiped and replaced with Linux is small. I have NEVER heard any evidence, anecdotal or substantive, to counter that. Unless you have a SOURCE for that statement it should be re-worded. ToolmakerSteve (talk) 20:03, 22 August 2011 (UTC)

To balance the (IMHO overstated) emphasis on Windows sales not equating with usage, I've added a SOURCED reference mentioning PIRACY, which is a factor that increases usage above sales. (My interest is in making the best possible estimates comparing various desktop OS to smartphone OS unit usage. E.g. I want to know when Android passes Windows to become the #1 OS in units.) ToolmakerSteve (talk) 20:33, 22 August 2011 (UTC)

Apologizes for pushing this point further, but I just noticed that the 1% Median web browsing statistics for Gnu/Linux also is consistent with hypothesis "the % of PCs on which Windows are replaced by Linux is statistically small." IMHO, a fraction of a percent - less than the statistical error of the available sources - having no significant impact on the total Linux percentage. Thus, the vague adjective "MANY" in ".. pre-installed on many computers that will be used with other .." is inappropriate. However, since I have not found a source, I will leave it to the author who added that sentence, to reword it to be less misleading. I DO like the basic concept of pointing out to readers that there is a difference between sales and usage, so I DO favor keeping that sentence in some fashion; on the other hand, it is important to not overload/confuse/mislead the average reader with information that may be statistically minor. ToolmakerSteve (talk) 22:40, 22 August 2011 (UTC)

One possible proxy for Linux usage is downloads. It would be best to combine that with a survey that samples downloaders, to find what they are doing with their copy - if there is such a survey. E.g., I have a hard drive with an older version of Red Hat Linux on it. Not currently installed in a machine. Some fraction of Linux downloads are in dual boot setups with Windows - would be interesting to have users estimate how much time they spend in each OS. To distinguish between hobbyists experimenting with it occasionally versus substantive use. ToolmakerSteve (talk) 00:51, 23 August 2011 (UTC)

I added a source having an alternate analysis of ~ 6% for Linux in 2009, and showed that such analysis would yield an alternate Q2 2011 Linux figure of ~ 5 million. However, making that extrapolation might qualify as "original research", hence dubious. I've e-mailed the author requesting any updated figures/links. ToolmakerSteve (talk) 03:00, 23 August 2011 (UTC)

Please see the discussion at the top of this page about the C. Martin blog piece. In the light of this I've reverted this one edit.--Harumphy (talk) 10:52, 23 August 2011 (UTC)
Thanks, I had missed that discussion. I also have since learned that even if 6% had been credible momentarily in 2009, due to Netbook sales, the extrapolation to today would not be valid. In 2009, there may have been a period where Linux was more strongly selling on Netbooks, e.g. by Dell. Microsoft responded with low price for Windows 7 Starter on limited hardware (e.g. 1 GB RAM), and has successfully turned vendors such as Dell back into near-100% sellers of Windows. I find no significant support for the notion that anyone other than the rare highly technical user would choose Linux (for a PC), given Windows available at negligible cost. (Quite the contrary, there is anecdotal evidence that a more common action, when a Windows license adds significantly to a computer's cost, but is optional, is to purchase the computer with a free OS, and then replace that with a pirated copy of Windows.) ToolmakerSteve (talk) 11:00, 23 August 2011 (UTC)
After searching to see what reports are available for different OS segments, it occurs to me there might be a simple explanation as to why there aren't more available numbers for Linux sales on PCs, from the various research companies: maybe the numbers aren't worth reporting on. Why do I say this? Because it is notable that numbers for Linux server sales are readily available. (Granted, server sales are lower volume and higher dollar, so easier to track.) If Linux were making significant inroads in general PC use, more companies would deem that worth researching. I consider this additional indirect evidence that Linux sales volumes on general PCs continue to be < 5%. ToolmakerSteve (talk) 12:19, 23 August 2011 (UTC)

After further thought about reasons that might cause significant numbers of people to go to the effort of replacing an OS, and the POV tone of the sentence under discussion, I have replaced it with the following attempt at a neutral statement: "Also, sales may overstate usage. Most computers are sold with a pre-installed OS; some users replace that OS with a different one, perhaps for security reasons, or to install an OS for which more applications are available.[citation needed]" ToolmakerSteve (talk) 20:45, 23 August 2011 (UTC)

This is anecdotal, but I have personally replaced Windows with Linux on about forty or fifty computers. I know a lot of people who have done that on more computers than I have. So the number of computers using Linux in use could be far different than what the analysts are estimating. The issue is getting reliable numbers, and to the best of my knowledge, no one has proposed a method that seems likely to be reliable. UrbanTerrorist (talk) 20:07, 16 October 2011 (UTC)

That is interesting information. Would love to see some survey that indicates how widespread that is. Both in business use, and in home use. Is this being done by people with IT background? Primary reasons for doing so? ToolmakerSteve (talk) 02:36, 11 November 2011 (UTC)

Windows 7 is now the Widely Used Operating System

Many Websites and other media reports that Windows 7 is now the most widely used operating system and has overtaken Windows XP. I wanted to request other editors to kindly check and refresh the usgae share to keep the page updated. Even the Windows XP page says that it is the 'second most popular version of windows' which logically shows that Windows 7 is now on the top. Meanwhile Windows Vista share has also changed indicating more usage for Windows 7. Changing both text and graph will be better. Thanks TheGeneralUser (talk) 16:42, 30 October 2011 (UTC)

Six of the eight sources that we track in this article still have XP as greater than 7, but on recent trends that's likely to change in about a couple of months or so. The Windows XP page cites w3schools as its source, a source which monitors web usage by web developers only. They are a small and highly atypical subset of the total user population so for our purpose it's not a credible source.--Harumphy (talk) 23:13, 30 October 2011 (UTC)

Looks like Win 7 passed XP in US, but has not yet done so globally. ToolmakerSteve (talk) 03:45, 11 November 2011 (UTC)

Ubuntu

"Clicky Web Analytics, StatOwl and Wikimedia indicate that Ubuntu has an order of magnitude more usage than any other identified desktop Linux distribution."

So does this mean that Ubuntu should get its own column in the chart?--Harizotoh9 (talk) 07:36, 25 October 2011 (UTC)

Somewhere in this talk page or its archives there is some discussion of what the threshold should be for including an OS in the web client table. For some time now the consensus has been that an OS only gets its own column if its identified by more than half the sources. Ubuntu is identified by three of the eight, but the consensus requires five out of eight. About four other Linux distros (IIRR Debian, RedHat, Fedora, SuSE) are also identified by three sources FWIW. --Harumphy (talk) 13:52, 25 October 2011 (UTC)
We should consider practical side of inclusion also. The horizontal space of the page is not infinite. 1exec1 (talk) 18:20, 29 October 2011 (UTC)
I've deleted this claim some time ago and now I found this discussion. Wikimedia indicate that Ubuntu is less then 4:1 popular then Fedora, Clicky was found on some Ubuntu-specific sites and thus is biased. StatOwl requires flash that I happily avoid using, so I can't check it. But the overall statement (given that many browsers don't actually disclose distribution) seems too unreliable. — Dmitrij D. Czarkoff (talk) 02:01, 14 November 2011 (UTC)

Monthly update

Usually I do a major update to the web client stats on the first of the month, as that's when four of the eight sources update. In view of what's happened here recently I can no longer be bothered.--Harumphy (talk) 08:44, 1 November 2011 (UTC)

While I understand your point, and of course you're free to contribute or not, I am sorry that you took that decision. Yes, there's a debate going on about the "Median" line in the article, but nothing to stop the update of the page (not with new stuff but with update of what's already there) until that debate is over... 89.180.146.171 (talk) 19:59, 1 November 2011 (UTC)
Harumphy, I personally hope that you will continue to update those stats. IMHO if something is so controversial that it is leading to the level of upset and tit-for-tat that we are seeing .. we should err on the side of "quiet", and omit the median information. Median is not nearly as important as having up-to-date sources! And having multiple contributors! ToolmakerSteve (talk) 03:50, 11 November 2011 (UTC)
If you want it updated, you can always update it yourself. As far as I'm concerned, WP:Too many cooks, at least for the time being. And I think the table is pointless without the median, so it would be a waste of effort. --Harumphy (talk) 16:10, 14 November 2011 (UTC)
Now that a professor of mathematical statistics has endorsed the use of the medians, it looks as though reason and sanity are going to prevail over aggressive wikilawyering after all, so I'll resume work on this table.--Harumphy (talk) 15:02, 16 November 2011 (UTC)

Undue weight

In its introductory sentence the article "Usage share of operating systems" states "Different categories of computers use a wide variety of operating systems, and the usage share varies enormously from one category to another." but the picture prominently presented in the top right corner only refers to "Usage share of web client operating systems" which is just a marginal amount of computer systems and thus giving undue weight to the Windows operating system. Even when looking at 32Bit CPUs only, desktop computers account for only 2% CPUs sold (see microprocessor section "Market statistics"). As the total number of CPUs sold is estimated at 1 billion this accounts for about 20 Million CPUs but the Top 500 list of supercomputers of June 2011-06 already counts 7 million (mostly 32Bit-desktop-)CPUs of which almost 6.5 millions (91%) are driven by Linux while Windows counts only 63140 CPUs or 1% and in the preformance statistics Windows even falls below the margin. In 2004 there were an estimated 548,380 PCs in use worldwide "Number of PCs by country", ITU. 2004 thus with respect to CPU count Windows market share can be estimated below 50% when excluding embedded systems and around 15% when including embedded systems Embedded systems survey --BerlinSight (talk) 14:50, 13 November 2011 (UTC)

  1. Your statistical analysis is dubious: while the desktop CPUs are 2% of total microprocessor sales, the rest of microprocessors sold include DSP's (You can typically find several of those in a single desktop computers), microprocessors for dumb cell phones, vehicle electronic equipment, home appliances (wash machines, watches, etc.), audio/video players, TV sets and other sorts of equipment that don't typically run operating systems.
  2. The embedded systems typically run custom operating systems using Linux kernel, which don't actually belong here, as they are mere firmwares, just like the firmwares in wireless adaptors, RAID controllers, and other computer parts or PC BIOSes;
  3. The desktop systems constitute the point of interest for main part of readers and editors (as You can note examining this talk page and its archives);
  4. The image in question is related to a Median line of the table, which is currently disputed in Dispute resolution noticeboard, so right now it may be the wrong time for this discussion.
Dmitrij D. Czarkoff (talk) 15:18, 13 November 2011 (UTC)
at 1) your argument is not valid: "Let's review. Processors are only 2% of all semiconductors, and PC processors are only 2% of all processors." (The Two Percent Solution -- eetimes) that means desktop CPUs are 2% of 2% of total microprocessor sales. Thus your stated DSPs and such are already discounted in my statistics.
at 2) this article is not only named "Usage share of operating systems" and not desktop operating systems (though that article redirects here, but that's a different point) its first sentence goes "Different categories of computers use a wide variety of operating systems, and the usage share varies enormously from one category to another.". Thus I can expect this article to refer to operating systems usage on all computers, not only the 2% desktop PC market.
at 3) The desktop PC is a dwindling share of computer usage and you even discount the 7 million CPUs in only the top 500 supercomputers. If you think this article should be about desktop computer usage, than name it that way.
at 4) I think a picture with that many issues should not be presented at the top of a major article, I suggest moving it down to the web client section where it belongs. --BerlinSight (talk) 10:14, 14 November 2011 (UTC)
  1. DSP stands for digital signal processor, they are processors and are included in that 98%.
  2. To be clear, can You please explicitly state, what are those 98% that are not desktops? What should be covered here most due to WP:DUE?
  3. The top 500 supercomputers count for 500 computers, so You really think that 500 > 7 000 000?
  4. As You've already been told, this picture may be removed soon due to the dispute occupying 90% of this page and some other pages as well. This discussion should happen after there will be an outcome from that one.
Dmitrij D. Czarkoff (talk) 10:31, 14 November 2011 (UTC)
I believe the article reflects the state of research by external sources. There is abundance of sources analyzing usage share of operating systems used as web clients, and also market share of smartphones operating systems - both are well represented in the article. The graph showing web clients is appropriate, as most sources focus on this statistics, and the article reflects that.
It would be appreciated if you will add new section about embedded operating system, as it is currently missing from the article.Wikiolap (talk) 17:59, 13 November 2011 (UTC)
The article opens with a picture showing Windows with a 90% market share and a sentence giving the impression, it is about computers in general. Thus a reader opening the article must get the impression Windows has a 90% market share over all computer usage and this definitely is not true. In most cases this first impression is what is left with the reader, regardless to what conclusion the rest of the article comes. I will definitely not waste my time writing a section for such a biased article, thank you for your invitation. --BerlinSight (talk) 10:14, 14 November 2011 (UTC)
The whole Wikipedia project is about collaboration. If don't want to collaborate, please consider using Your blog instead. — Dmitrij D. Czarkoff (talk) 12:33, 14 November 2011 (UTC)
I agree that having the graphic at the top of the article gives undue attention to web client share. At one time we had that graphic in the web client section but the page layout looked awful so we moved it to the top. I think we should remove the graphic altogether.--Harumphy (talk) 16:05, 14 November 2011 (UTC)
I've removed the graphic for now because the "source" (the median line in the table) is currently missing and that issue needs to be resolved first. The Web Client section could benefit from a graphic, but it belongs there and not at the top of a page which is a broader topic than just client operating systems being used for web browsing. strcat (talk) 04:08, 25 November 2011 (UTC)

Web client OS table: desktop/mobile splits: improper synthesis?

While the question of whether or not include the median row in the table awaits resolution by WP:MEDCAB at Wikipedia_talk:Mediation_Cabal/Cases/13_November_2011/Usage_share_of_operating_systems, we can continue to attempt to achieve consensus here on some of the other issues that have come up at the same time.

One question is whether or not it is wise to use data from other, cited sources to weight the desktop and mobile figures from StatOwl and Clicky Web Analytics. Perhaps a little background would be helpful.

  • Six of the nine cited sources do not discriminate between desktop and mobile OSes. They report the lot in a single set of figures.
  • StatCounter publishes separate desktop and mobile data, but also publishes a third table, presumably sourced from the same data at the same time by the same methods, showing the "desktop v. mobile OS" shares. Our usage of this data weights the figures in their first two tables using the figures in the third. To date nobody has questioned this approach.
  • Clicky web analytics and StatOwl publish data in a form that cannot be directly included in our table without creating a real and obvious 'apples and oranges' comparison. We attempted to convert the oranges into apples by weighting the figures using the mean of mobile percentage figures from two other sources. We always knew this was a borderline practice in terms of WP policy, but on balance we put up with it in order to be able to include these two sources in the table, and always explained what we'd done in the footnotes.

I think this is the nub of the issue. There are precious few credible sources of openly published stats, and the article can only report what has been published. If we consider 'improper synthesis' to be the clinching argument, then ISTM we would have to remove these two sources. Maybe this is a case where WP:IGNORE could be applied, in order to keep these two sources in the table.

Somebody is already gunning for Webmasterpro for an entirely different reason. If we're not careful we could end up discarding a large number of the sources we currently include. Would that be doing the readers a service or a disservice?--Harumphy (talk) 14:46, 18 November 2011 (UTC)

Do we need to discard the sources? Why not 2 separate tables, to underline the distinction. If the sources are considered reliable I find it a bit odd that they should be discarded just because someone cannot create a graph without synthesizing information. In fact, that would further question the editors' criteria for selection of sources Useerup (talk) 16:08, 18 November 2011 (UTC)
Six of the sources report all OSes and don't split it into desktops and mobiles. If we had two separate tables then we'd have to decide which OSes are desktops and which are mobiles, which isn't always clear, and then expand some of the figures, which would be equally improper synthesis as we don't have the data to do that. It least with a single table we're only tweaking figures for two sources, not six.--Harumphy (talk) 22:29, 18 November 2011 (UTC)
That's me gunning Webmaster Pro. It's heavily biased (as are StatOwl and AT Internet, AFAIR). It might be a good idea to keep only sources known for worldwide results, as others are really improper. Knowing the total amount of sites and hits monitored by biased sources one could combine them in a statistical model and get a more or less complete and statistically correct results, but as Useerup would suggest, this would be a way too much to fit WP:CALC. At the same time, we have several sources claiming global stats, so I believe only those should be kept.
In fact, I've started from the wrong point: the main problem with biased sources is that the scope of this page is not limited geographically, so biased sources are not sources at all here.
Dmitrij D. Czarkoff (talk) 23:34, 18 November 2011 (UTC)
All the sources have biases of one sort or another, usually unintentional. There isn't a single unbiased source AFAICS.--Harumphy (talk) 14:55, 19 November 2011 (UTC)
All possible statistical sources have some bias, though the three I've specified do have statistical biases they don't address. BTW, there are sources that specifically claim to give worldwide figures (StatCounter, Clicky) and there is Wikimedia, which can be seen as the best ever source, as it's #5 top popular site worldwide, available in all languages with notable speaker base (including several artificial and dead languages). Leaving only such sources will largely improve the table overall bias-independence. — Dmitrij D. Czarkoff (talk) 16:38, 19 November 2011 (UTC)
If we know that certain sources are altering their results, we should remove them. The mobile information is useful, and some of the references do provide usable data - omission of certain operating systems is fine as long as they don't alter the rest of their numbers. I don't think we should lower the quality of the section in order to maximize the number of references we can use. strcat (talk) 04:10, 25 November 2011 (UTC)

It's easier to remove mobile results:

  1. Clicky splits overall results (100% in other sources) in two parts: mobile and desktop, and gives no instructions for combining them back.
  2. StatOwl seems to drop mobile results altogether.

Effectively, there is a separate section for that. — Dmitrij D. Czarkoff (talk) 23:13, 18 November 2011 (UTC)