Jump to content

Talk:Usage share of operating systems: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Useerup (talk | contribs)
Wikiolap (talk | contribs)
Line 429: Line 429:
:::::::::[[WP:NOTYOURS]] and [[WP:NOTDEMOCRACY]]. Issue stands - "correcting" numbers is improper synthesis. I apologize for being so blunt, but please don't remove tags before issue has been resolved. [[User:Useerup|Useerup]] ([[User talk:Useerup|talk]]) 01:09, 31 October 2011 (UTC)
:::::::::[[WP:NOTYOURS]] and [[WP:NOTDEMOCRACY]]. Issue stands - "correcting" numbers is improper synthesis. I apologize for being so blunt, but please don't remove tags before issue has been resolved. [[User:Useerup|Useerup]] ([[User talk:Useerup|talk]]) 01:09, 31 October 2011 (UTC)
:::::::::So what exactly does the bolded '''median''' line state? Does it compute over the other rows (multiple sources)? Is it supported by any one of the sources? Are the numbers in that row attributable to reliable secondary sources ''or are they the result of wikipedia editing'', i.e. selection of sources? Do any of the sources coalesce different demographics or different geographical regions and explain how ''median'' is a safe method? [[User:Useerup|Useerup]] ([[User talk:Useerup|talk]]) 01:20, 31 October 2011 (UTC)
:::::::::So what exactly does the bolded '''median''' line state? Does it compute over the other rows (multiple sources)? Is it supported by any one of the sources? Are the numbers in that row attributable to reliable secondary sources ''or are they the result of wikipedia editing'', i.e. selection of sources? Do any of the sources coalesce different demographics or different geographical regions and explain how ''median'' is a safe method? [[User:Useerup|Useerup]] ([[User talk:Useerup|talk]]) 01:20, 31 October 2011 (UTC)
:For what it worth - I originally objected to the median in summary table on the same grounds as [[User:Useerup]] raises now. At the time, there indeed were majority of editors who thought it was not OR. So [[User:Harumphy]] is right - there was consensus. And while I still think it is OR, I also agree with [[User:Harumphy]] that any change to this tables or calculation would require new vote.[[User:Wikiolap|Wikiolap]] ([[User talk:Wikiolap|talk]]) 02:42, 31 October 2011 (UTC)


== Windows 7 is now the Widely Used Operating System ==
== Windows 7 is now the Widely Used Operating System ==

Revision as of 02:42, 31 October 2011

WikiProject iconTechnology Unassessed
WikiProject iconThis article is within the scope of WikiProject Technology, a collaborative effort to improve the coverage of technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
WikiProject iconComputing Unassessed
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.

Linux share: Caitlyn Martin's blog piece

Is this [1] a credible secondary source? It seems to me to be an exercise in wishful thinking. It seems to be clutching at straws. As a Linux enthusiast myself I've tried to follow her argument but it doesn't stack up IMHO. She says "The best estimate for present sales is around 8%", but she doesn't cite a source for this estimate, and in any case, present sales is a very different thing from the total installed base, bought over several years, that makes up usage share.

I can quite accept that the web client stats under-measure Linux a bit, mainly because Linux users are relatively security and privacy conscious and thus more likely to disable javascript, install adblock etc., all things which reduce counting on the third-party stats sites. It's interesting that Wikimedia's figures, based on server log files and thus immune to this hazard, show a somewhat higher figure (1.57%) than most of the others.--Harumphy (talk) 14:00, 4 December 2010 (UTC)[reply]

Agreed, but still not 8%. I tried to work that 8% figure in somewhere too, but the jump from 'installed user-base' to 'current sales' seemed too sharp for a short addition to existing text. The only way would be to devote a whole couple of sentences to it somewhere, and I'm not sure if she is notable enough for that. O'Reilly is a good source, but I'm not sure of her status to be speaking for them. --Nigelj (talk) 15:10, 4 December 2010 (UTC)[reply]
The only cited source there is quote from Steve Ballmer where he says that internal Microsoft research showed Linux and MacOS shares comparable. We already have this source covered. The blog doesn't seem notable enough to include in the article.Wikiolap (talk) 20:03, 4 December 2010 (UTC)[reply]

Should we remove the Wikimedia web client statistics?

The article currently states: "All of these sources monitor a substantial number of web sites. Statistics that relate to a single web site are excluded." To a large extent, this is not true for Wikimedia, of which Wikipedia alone is by far their most trafficked web site (although one that most English-language Web users have visited).

Also note that the Wikimedia report is based on the total number of HTTP requests rather than the number of unique clients (as determined using cookies). We need to consider the merits of the two approaches and which is more accurate. The Wikimedia report could easily be biased toward those operating systems used by those who access Wikipedia more often (although the others could be influenced by how much of each browser's user base regularly clears cookies). On these two principles, should we exclude the Wikimedia statistics? PleaseStand (talk) 00:48, 14 December 2010 (UTC)[reply]

Wikimedia's stats cover 60-odd sites within the Wikimedia family.[2] While this is much less 'substantial' than many of the other sources, it's much greater than 'one', the avoidance of which (specifically w3schools) was the original purpose of that sentence. (From time to time we get people trying to add w3schools' stats to the table, or suggesting that we should on this discussion page. Often they seem to be unaware that that site's stats are for its own site only, and that that site is aimed at web developers - a highly atypical readership with a much more diverse set of web clients than the general web-using population.) Also, the Wikipedias are very high-traffic sites. The English one disproportionately so, granted, but there are similar regional/linguistic skews in many of the other stats sources too. So I wouldn't exclude Wikimedia stats on the grounds that they monitor an insubstantial number of sites.
AFAIK there's no evidence to suggest that certain operating systems are used by those who access WP more often. Just as there's no evidence that certain OS's are used more by those who clear cookies, block scripts, use adblock etc. I imagine many of us have our suspicions in this regard, but no actual evidence. And if we had such evidence, the magnitude of the biases they introduce may be no larger than many of the other biases we already know about and to which all the sources are prone. So I don't think there's a case for excluding Wikimedia stats here, either. Harumphy (talk) 11:05, 14 December 2010 (UTC)[reply]

 Northern Ontario Jacob12190 (talk) 11:06, 14 December 2010 (UTC)[reply]

Web client table tweaks - January 2011

I propose to make a couple of minor tweaks to the table when the December figures come out in the new year, unless there are objections here first:

  1. For the Clicky desktop/mobile 'in lieu' split, take the mean of the Net Market Share and Statcounter figures instead of just using Statcounter. (This will probably have the effect of reducing Clicky's mobile share from around 4.1% to around 3.6%.) The footnote will explain what has been done.
  2. Android is rising rapidly and within a few months may overtake what we currently call 'mainstream' Linux. I propose to change the "mainstream" sub-heading to "desktop distros.". Harumphy (talk) 14:56, 29 December 2010 (UTC)[reply]
2) Oppose. There are several other mobile Linux distributions such as Maemo currently included within Mainstream Linux.1exec1 (talk) 00:24, 31 December 2010 (UTC)[reply]
Fair enough. I've done #1 but not #2 in today's update.Harumphy (talk) 11:09, 1 January 2011 (UTC)[reply]

Should we remove AT Internet Institute from web client stats?

We're seeing constant changes in the data, month by month (summary of each month here). With lack of frequent updates by ATII, we're ending up with less accurate results... 195.23.92.1 (talk) 18:18, 7 January 2011 (UTC)[reply]

Support 1exec1 (talk) 00:55, 8 January 2011 (UTC)[reply]
Be consistent - if we are to remove sources which don't update every month - then remove all of them, i.e. including Wikipedia one. If Wikipedia stays then ATII should also stay. Wikiolap (talk) 17:12, 8 January 2011 (UTC)[reply]
Remove - ATII is consistently slow at updating their stats. Remove for January's stats, unless they update. As for Wikimedia, we have a little more control over it. I've emailed the person updating the stats in the past, I think we just need to have him set up something more automated, because I think he has to manually run his scripts. Or talk to someone else that has access to the logs and can give them to us. Jdm64 (talk) 18:57, 8 January 2011 (UTC)[reply]
Keep. Last time we discussed this (see archive) we decided to keep stats for up to 12 months. It used to say as much in the web client section. (Somebody, unaware of the discussion and seeing that all the stats at the time were more recent, changed the "12 months" to "few months". I've just changed that back.) If 12 months is too long a period, then we should reduce that period. Whatever we do, we should apply the same time limit to all the sources.Harumphy (talk) 19:21, 8 January 2011 (UTC)[reply]
I, the guy who raised the issue in the first place, agree with this item, them. I didn't know nor found out anywhere that the discussion was held in the past and that "12 months" was decided. Since it was, let's just abide to the decision. 195.23.92.1 (talk) 14:09, 10 January 2011 (UTC)[reply]
Hmm, I haven't found any discussion in the archives. The problem is that the software market evolves very rapidly. Most of the time initial adoption of some product increases exponentially, by allowing 12 month delay we face data errors of more than 10% [3]. For example now AT Institute data is different from the median/mean data in Windows columns by huge margins (W7 - ~7%, Vista - ~4%, XP - ~11%). If we reduce the allowed delay to, say 6 months or less, we can lower the possible data errors more than two times.1exec1 (talk) 12:10, 21 January 2011 (UTC)[reply]
The 12 month precedent came up in Talk:Usage_share_of_operating_systems/Archive_1#web_clients_summary_table, in a brief discussion about what to do with OneStat data from Dec 8, 2008 that was getting on for one year old at the time. Jdm64 suggested "less than one year old" and I agreed - nobody else took part in the discussion. We removed OneStat on Dec 9, 2009. For a long time afterwards the article mentioned 12 months, which I recently reinstated.
As far as the time limit goes, I don't think it matters about AT being 'out of date' because (a) it's an encyclopedia, not a news site, so up-to-the-minute topicality is not essential, and (b) our choice of median rather than mean does a good job of excluding outlier figures. Harumphy (talk) 15:04, 21 January 2011 (UTC)[reply]

Median Windows Numbers

I've been playing with the Median numbers, which I had intended to quote in an article I was writing. I'm not quoting them. The numbers do NOT add up. No matter what I've tried, I cannot get those numbers of make any sense, and since there is no explanation of the calculations used to determine the Median, the only conclusion I can draw is that the numbers were either invented, or are in error. So instead of reporting your numbers, I'm reporting them, and my conclusion that this article is in error. If you want to look at the article, which is my prediction for where OS usage shares will be in 2012, it will be at: http://madhatter.ca

One other point - Netbooks should be included as Notebooks, and Tablets should be in a separate category, due to the form factor. Tablets have more in common with phones than they do with notebooks.

UrbanTerrorist (talk) 20:01, 8 January 2011 (UTC)[reply]

The medians are calculated just like any other median, surely? In each column, the median is the middle value of the group. Thus the median of 1, 2 and 999 is 2. Where there is an even number, it's the mean of the middle two, so the median of 1,2,3 and 999 would be 2.5.) Are you saying that our table doesn't do this? If not, what precisely do you think does not add up?Harumphy (talk) 20:08, 8 January 2011 (UTC)[reply]
Are they calculated like any other median? Who knows? There is no explanation as to the method(s) used, and no reference to an explanation. In effect we are given numbers, and told to believe them, which is against the policies of Wikipedia. Either provide an explanation, or remove the median figures.
UrbanTerrorist (talk) 14:49, 11 January 2011 (UTC)[reply]
The Median label links to Median article which explains how medians are calculated. Do you consider this is not enough ? Do you propose adding a footnote with more details on it ? Wikiolap (talk) 18:51, 11 January 2011 (UTC)[reply]
The term 'median' has a precise meaning and there's only one way of calculating it. Anyone who wants to can calculate the median herself and get the same figure. The problem here is not the article, but your inability to click on the link to the median page to find out what it means. Harumphy (talk) 13:58, 17 January 2011 (UTC)[reply]
A link to the explanation would make things clearer. UrbanTerrorist (talk) 07:51, 27 January 2011 (UTC)[reply]
And the very common problem of confusing median with mean. or believing they share some properties that they do not. Like adding up to 100 % (under certain circumstances). The different properties are not necessarily easy to understand, and the article is quite heavy for somebody not used to mathematical theory. --LPfi (talk) 10:49, 18 January 2011 (UTC)[reply]
Agreed. Which is why an explanation, or a link to an explanation is needed. UrbanTerrorist (talk) 07:51, 27 January 2011 (UTC)[reply]
I should mention that I'm using the numbers for some articles that I'm writing, and I want to make sure that they are as accurate as possible, thus the questions. And yes, I link back to the source.

What gives with server market share BY REVENUE.

Revenue doesn't measure server market share, it measures how much money the supplier of one type of server rakes in. It also measures how much purchaser have had to pay for servers of that type ... both of these are the same number. Becuase some types of server cost far more than others, the metric skews the perception of market share towards which servers cost the most. These are perceived as having a greater market share, even though they may have relatively small numbers.

Furthermore, there are far more purchasers of servers than there are suppliers. Ergo, from the overwhelmingly predominant perspective, it is better to label this number as COST rather than revenue. Far more people would see it as a cost as opposed to those who see it as revenue.

So, I will keep trying to change the word "Revenue" within the server market share section to read "Cost" instead, until it sticks, because this wording gives the proper perspective on it from the vast majority viewpoint.

Alternatively, one could remove the "Revenue" metrics entirely, because they simply do not show server market share as they purport to.—Preceding unsigned comment added by 118.210.63.179 (talk) 10:46, 9 January 2011

Please sign your comments - otherwise we won't know who said what. Please see Wikipedia:Signatures.
Before you 'keep trying to change the word "Revenue" etc.' please read Wikipedia:Edit_warring.
As you've said, revenue and cost refer to the same number. It's the same sum of money seen from the two sides of the deal. The sources we're citing measure sales, not purchases. So revenue is the more accurate term in this context. Harumphy (talk) 13:59, 9 January 2011 (UTC)[reply]
Why include market share by revenue figures at all? They're useful for stock investors (i.e. which OS is generating the most revenue from a given market), but they will deceive casual readers expecting to learn about the "Usage share of operating systems on servers". Wallers (talk) 14:16, 9 January 2011 (UTC)[reply]
IDC and Gartner are well known sources in the server industry. They report in revenue probably because business people are more interested in the money instead of the number of units -- it's what their use to. Also it's easier to measure because you can't just check what OS a server is running like you can with desktop web browsers. A server could be hosting several virtual servers (upwards to 15) and each virtual server can have it's own ip address. So without detailed investigation, you'd see 15 servers when in reality there's only one real server. Henceforth, the real number of servers (as opposed to number of installs of an OS) is more closely correlated to revenue. Although, it gets complicated because the licensing of Linux servers can be free if the company goes with a distro without 24/7 support (like debian or centos) or it could be costly if paying for a full subscription of RHEL. So, even-though Linux is low on revenue, it's still really high in actual usage. Jdm64 (talk) 18:20, 9 January 2011 (UTC)[reply]
Jdm64 wrote it exactly right. There are different metrics to measure market share, and they indeed measure different things. Market share by units is interesting to know who which server OS is most popular. Market share by revenue is interesting to see which server OS vendor is making most money. So there is no contradiction - both are interesting, both are useful, just for different purposes. Wikiolap (talk) 18:11, 10 January 2011 (UTC)[reply]
"Making the most money" also means "getting the most money out of people for less cost to yourself". "Revenue" is a word with positive connotations in people's mind, whereas "Cost" has negative connotations. "Revenue" and "cost" are the same number ... what is revenue to sellers of servers is cost to buyers of servers. Since there are far more buyers than sellers, it is in the best interests of more people to show market share from the perspective of buyers rather than sellers (that is, to show it as cost rather than revenue). A casual reader might see the OS with the highest revenue and without thinking about it associate that positive term with "the best choice", when in fact it is the most costlly to him/her. From this perspective, citing statistics for sale value(price) of servers, labelling it as positive-sounding term revenue, and claiming that this shows "market share" is doing a sever dis-service to most people. In fact, it comes perilously close to free advertising for one company, which I would have thought goes against Wikipedia policy.—Preceding unsigned comment added by 118.210.63.179 (talk) 14:08, 11 January 2011
I think you are stretching the definition of the advertising a bit :) Your thinking about positive vs. negative association is interesting, but it is your opinion only. In encyclopedia we cite verifiable and reliable sources - and both Gartner and IDC qualify as such. They label the metric Revenue, and we must respect that.Wikiolap (talk) 18:54, 11 January 2011 (UTC)[reply]

IDC also report units, so there's no reason to use revenue, and I've changed the numbers to the unit rather than revenue figures. For the Gartner figures, I checked the source, and they appear to be unit figures as well, not revenue figures.Shalineth (talk) 06:36, 10 February 2011 (UTC)[reply]

Can you show where in the source the reported percentages are refered as units ? In the table that we cite, the column headers say Revenue. Wikiolap (talk) 20:53, 10 February 2011 (UTC)[reply]
The Gartner source is a three-year-old Reuters article. The text in the article reads:
According to research firm Gartner, the Windows share of global server shipments gained a percentage point to 66.8 percent in 2007 from a year earlier. Open-source Linux's share fell by a percentage point to 23.2 percent last year and Unix dropped to 6.8 percent in 2007 from 8.1 percent in 2006.
Note that this refers to the share of global server shipments, i.e. units, not to the share of global server revenue. The figures are also similar to IDC unit figures from about the same time, but very different from IDC revenue figures. It was only in 2005/6 or so that Windows severs overtook Unix servers in terms of revenue, but Windows has been ahead of Unix in unit shipments since the 1990s.
The IDC figures in the table are indeed revenue for server hardware (not revenue for operating systems), but it is a methodological error to use this as an indicator of server OS 'usage share'. What possible sense is there in saying that one server costing €20 000 and running one instance of AIX contributes 40 times as much to the AIX 'usage share' as one server costing €500 and running one instance of Linux or Windows?
I had provided a source for IDC unit shipments and corrected the table to include them, but this was reverted.
In a comparison of server revenue or server profitability, prices matter. If HP, for example, are selling a lot of €50 000 servers and Dell are selling a lot of €1 000 servers, that has a huge impact on their respective results. If each server runs only one copy of an OS, however, the 'usage share' is 1 for each server. The idea that multiplying server operating system units by the cost of the hardware the OS runs on somehow represents 'usage share' is completely nonsensical. Shalineth (talk) 16:28, 12 February 2011 (UTC)[reply]

Overview section

Recently somebody added the overview table, and various editors including me attempted to knock it into shape. I don't think we've been very successful. I can't see where many of the figures come from. The web client medians don't (and shouldn't be expected to) add up to 100% and so don't constitute 'share' anyway. Should we delete this section, or can it be improved? Harumphy (talk) 08:55, 17 January 2011 (UTC)[reply]

I'd lean for deleting it. Many of the fields are blank because some of the selected OSs come from disjoint usage (ie. mainframe and smartphones). Although, the one thing going for the table is the quick summary. I'd purpose that key stats from the table to be included in the opening paragraph. Something to elaborate on the current opening, but with some actual numbers. Jdm64 (talk) 10:32, 17 January 2011 (UTC)[reply]
Does anyone want to speak in the overview section's defence? If not I'll delete it in a couple of days from now. As far as putting figures in the opening paragraph goes, ideally that would be done in a way that doesn't need updating every month. Harumphy (talk) 09:12, 18 January 2011 (UTC)[reply]
I vote to delete it, the idea of this overview table never appealed to me, and it doesn't look like it is helping the article. Wikiolap (talk) 17:26, 18 January 2011 (UTC)[reply]
Now deleted as agreed. Harumphy (talk) 10:14, 21 January 2011 (UTC)[reply]

Tablets

Tablets need to be moved to their own section. They have nothing in common with Netbooks, though they may replace them in some sales. Netbooks should be combined with Notebooks, the difference between them is artificial.

UrbanTerrorist (talk) 07:55, 27 January 2011 (UTC)[reply]

Yes. A netbook is just a small laptop/notebook. I've changed the sub-heading "Netbooks and Tablets" to just "Netbooks". There's very little info on tablets as a category so far. The main one is the iPad which runs iOS and this gets covered anyway. Do all tablets speak mobile, or are some them WiFi only?--Harumphy (talk) 09:20, 27 January 2011 (UTC)[reply]
Sorry Harumphy, I've been busy. Please see the note at the bottom under Long Term Suggestions. UrbanTerrorist (talk) 03:35, 11 August 2011 (UTC)[reply]

Font sizes

I've reverted 1exec1's changes to a couple of tables, in which the font size was fixed at 85% of normal.

Please, if font sizes look too big on your computer, it doesn't mean they look too big on everyone else's. The web is not a wysiwyg medium. You can adjust your browser's normal font size to suit your preferences. I have adjusted mine, and I don't want you reducing the font size on my computer (and everyone else's) just because it looks better on yours!

Besides, from a graphic design point of view it looked awful. --Harumphy (talk) 09:37, 9 February 2011 (UTC)[reply]

Possibly dubious server share claims based on websites

Are there any authoritative sources suggesting that scanning public websites is a reasonable way to estimate server market share? It seems rather dubious to me. For one thing, Linux is well known as a good OS for running web servers (the LAMP stack), so a sample of web servers may not be representative of servers in general, but rather biased towards Linux.

Another problem is that a single server OS can host a large number of small websites, whereas a large website may require several servers, especially if it makes heavy use of SSL. If website characteristics differ across sites, then an estimate of server market share based on the number of websites would be biased towards the system most favoured by smaller, less complex sites. Netcraft report a 50 per cent Windows share for SSL websites (http://news.netcraft.com/ssl-survey/), compared with a 25 per cent share for non-SSL websites, which suggests that estimates based on websites may indeed be biased towards Linux.

Unless an authoritative source suggesting that counting the number of public websites using a particular server OS is a valid way of estimating server OS market share, this looks like original research, and I suggest it be deleted. A separate section on web server OS share might be reasonable. Shalineth (talk) 06:55, 10 February 2011 (UTC)[reply]

Netcraft is reliable and verifiable source, and we properly reference it. They choose to analyze OS share of web servers, and we accurately mention it. Hence this is not original research. Some readers may or may not agree with their methodology, but it is really not up to us to make judgments whether or not we like it. As encyclopedia we report citing reliable and verifiable sources. Wikiolap (talk) 20:51, 10 February 2011 (UTC)[reply]
Netcraft present website statistics, not server OS market share. You're confusing two different things, and that's where the original research lies. It's like looking at valid statistics for flights out of a particular airport and claiming that represents airliner market share.Shalineth (talk) 11:13, 12 February 2011 (UTC)[reply]

Marketshare Servers based on websites is totally misleading

The current server share language is complete nonsense. It somehow asserts that webs servers are a good indicator of server share. That is utter nonsense. The vast majority of servers are not web servers.

There has been edits made to temper the uncited claims in this section, but they have been reverted.

It's clear that this section is merely a +POV apology for certain products. It's clear that wikipedia is being used as a promotional tool for certain products. —Preceding unsigned comment added by 173.206.8.177 (talk) 18:59, 11 February 2011 (UTC)[reply]

There are two issues here:
1. The explanations in the Server section are indeed unreferenced, and this invites people to add even more unreferenced material. I am OK with completely removing the text, but the past experience shows that someone will add it again anyway. The better approach is to find reliable and verifiable references for that portion (something I wasn't able to easily find myself).
2. Methodology of measuring market share. We as encyclopedia should not pass judgement on whether some methodologies are "complete and utter nonsense" or not. Some people claim that measuring revenue is nonsense, some claim that measuring web servers is nonsense. Everybody is free to make their opinions - but we as encyclopedia just report on what our sources say - Gartner, IDC, Netcraft etc.
Wikiolap (talk) 20:12, 11 February 2011 (UTC)[reply]
Re: #2; "We as encyclopedia should not pass judgement on whether some methodologies are "complete and utter nonsense" or not." I agree. But in this context, presenting only publicly available webservers -- in a conversation about Server OS Marketshare **is**.
The discussion re: web server marketshare is irrelevant here, in this context. All the uncited material from the above paragraph and the data in the "method units (web)" table should be removed. This is intentionally misleading and utter nonsense to compare oranges to bananas in way.
173.206.8.177 (talk) 20:29, 11 February 2011 (UTC)[reply]
Could you tell why exactly you want to remove web units data?1exec1 (talk) 22:56, 11 February 2011 (UTC)[reply]
We have had, at various times, three different methods of counting servers in this section: unit sales/revenue/web servers. All three have strengths and weaknesses - there is no clear-cut right and wrong here. We should just report all three, perhaps in three separate tables, pointing out the strengths and weaknesses of the methods too.--Harumphy (talk) 00:00, 12 February 2011 (UTC)[reply]
Reporting all three in separate tables sounds like a good first step. Based on the source, however, the Gartner figures are units, not revenue, so that's wrong to start with (I corrected the mistake, but someone reverted it with no explanation). Reporting server market share in both units and revenue would be the best idea.
Website share should be split out into a separate category, since it's a completely different issue from server OS market share. The text is also horrible, and should probably be deleted. I made some minor improvements to make it less POV, but those were reverted too.
If you oppose splitting the website figures into a separate category, can you point to an authoritative source that claims Netcraft's website survey has anything at all to do with server market share?
Overall, it's obvious someone is abusing the article to promote a particular POV. I'm not really interested enough to bother with it, but maybe someone with more time on their hands can correct this. If not, I suppose it'll be another case where the reputation of WikiPedia is damaged by a zealot pushing a particular POV and reverting any corrections or attempts to make the text NPOV. Shalineth (talk) 11:32, 12 February 2011 (UTC)[reply]
Web servers are a subset of servers-in-general, so I think the best thing to do would be to do units and revenue in two tables, then add a sub-heading "Web servers" with the third table in that new sub-section.--Harumphy (talk) 13:45, 12 February 2011 (UTC)[reply]
Yes, certainly, but web sites are not the same as web servers. I imagine it takes an enormous number of servers to run www.facebook.com, for example, whereas even a small server could run hundreds of very simple sites. The fact that web servers are a subset of total servers is a minor problem. The bigger problem is that there is no one to one correspondence between web sites and web servers, much less between web sites and either servers generally or server OS installations. This means that, barring authoritative evidence to the contrary, web site numbers cannot be considered valid estimators of even web server OS market share, much less overall server OS market share. Shalineth (talk) 14:50, 12 February 2011 (UTC)[reply]
3. The definition of "server" is broad. Web server market share might be estimated via the web while giving numbers for File server market share (down to NAS) via web is a challenge. --95.117.233.197 (talk) 13:59, 12 February 2011 (UTC)[reply]
4. The conjecture that IDC or Gartner figures substantially underestimate Linux or open source servers is logically unsound. As documented here, IDC unit figures for server shipments include Windows, Linux, Unix and other. Servers sold with no operating system would thus fall into the other category. However such servers make up only about 0.3% of the total (for Q1 2010). This implies two things:
1. The Windows and Unix market shares, 75.3% and 3.6% respectively in Q1 2010, are minimum market share levels, and do not overstate market shares for shipped servers.
2. The Linux market share is not substantially understated. Even if Linux is installed on every single server that didn't ship with either Windows or Unix, its Q1 2010 market share would only increase from 20.8% to 21.1%.
In light of the above, I suggest that the unsupported conjecture that IDC numbers understate open source server operating systems be deleted from the article, unless authoritative evidence to the contrary is provided. Shalineth (talk) 14:50, 12 February 2011 (UTC)[reply]

Suggestions for correcting server market share section

I propose the following corrections to the server market share section:

  1. Remove unsupported text claiming that IDC/Gartner figures understate open source OS share.
  2. Remove irrelevant web site share figures for possible inclusion in a separate section on website OS shares.
  3. Correct labelling of Gartner unit figures, which are currently mislabelled as revenue figures.
  4. Replace methodologically incorrect IDC server hardware revenue figures with methodologically correct server unit figures.

I probably shan't have time to check the page before next weekend. Comments appreciated. Shalineth (talk) 16:35, 12 February 2011 (UTC)[reply]

1 - I support removing all unreferenced claims.
2 - measuring market share of web sites is a valid method that at least 3 different sources use (Netcraft, securityspace, w3tech) - we should not remove legitimate reliable and verifiable sources. We already have some text which tries to clarify difference between methodologies. Maybe this text could be improved, but it should not have unreferenced claims either (see #1)
3 - Gartner reports revenue. The source is reliable and verifiable, but not public - the report itself costs money. I had access to it couple of years ago, I will try to get access again and verify that it is indeed revenue.
4 - IDC reported market share by revenue, and it is perfectly valid methodology (IDC is reliable and verifiable source). I used to have additional line in the table for IDC numbers by unit, but it was removed by other editors. I will be happy to add it back.
Wikiolap (talk) 00:46, 13 February 2011 (UTC)[reply]
Remove unreferenced claims. Report web site share in a new section, separate from the server section. Report both units and revenue in separate tables with correct labelling, even if there's only one cited source.--Harumphy (talk) 11:08, 13 February 2011 (UTC)[reply]
If we separate website and server-share reports we better do not include the website share at all. I suggest reordering the current table in the way that sources reporting website share are grouped together. We can also introduce one more column that says which method was used to acquire the statistics. Also see my answer below. 1exec1 (talk) 15:20, 13 February 2011 (UTC)[reply]
I disagree. The article already has a separate section for 'Web clients', which is distinct from the sections for 'Desktop and laptop computers', 'Netbooks' and 'Mobile devices'. The consistent approach for servers would be to have a section for 'Web sites', which is distinct from 'Servers'. Shalineth (talk) 21:09, 21 February 2011 (UTC)[reply]
2. Measuring market share of web sites is a valid way of measuring web site share. This is an article about server OS usage. Is there an authoritative source claiming that measuring web site share is a valid way of measuring either web server share or web server OS share? If not, I suggest it belongs in its own section (or perhaps own article) -- an article about web site market share, as opposed to (web) server OS market share. Again, I must stress, these are not synonymous. It is a severe methodological error to assume they are. Shalineth (talk) 12:19, 13 February 2011 (UTC)[reply]
3,4. Gartner and IDC report both revenue and units, although not all reports contain both measures. Revenue is a valid measure for market share, which can be defined in terms of either revenue or units. This article is about usage share, which implies units. Second, the revenue figure is for servers, not server OSes. That would be fine in an article about server hardware market share, but this is an article about server OS usage share. Again, the figures are absolutely valid, but they're being used incorrectly in this article. Shalineth (talk) 12:19, 13 February 2011 (UTC)[reply]
@Shalineth: Website share is a proxy to the actual server OS usage share in the same way as inspecting user agent strings is a proxy to desktop OS market share. If you consider them not appropriate, then sources reporting server market/unit share are not appropriate reference points either, as they report the current sales, not the share of already deployed servers.
In conclusion all sources used in the article are biased in one or another way. Since we are only presenting and commenting the data, not interpreting it, all sources must have the same credibility, unless there is a strong reason not to do so.1exec1 (talk) 15:20, 13 February 2011 (UTC)[reply]
This is true. Sales by hardware units and sales by hardware revenue are also proxies for OS usage share. None of the three methods correlates directly with OS usage share, but all three are of interest nevertheless. We should just report what the sources say, accompanied a concise summary of the strengths and weaknesses of each method. It is for the reader to decide how much credence to give to each method, not us. --Harumphy (talk) 09:57, 14 February 2011 (UTC)[reply]
@ 1exec1
It isn't quite the same thing, since there's usually a 1:1 mapping of web clients to client OSes. For web servers, a single server can run a huge number of websites, and at the other extreme, some websites require large server farms. All this means that the approximation is much closer on the client side. In any case, I think it's perfectly reasonable to include web site OS share, as long as it's properly labelled as 'Web site OS usage' and not conflated by original research with 'server OS usage'.
The same applies to 'server OS unit shipments' and 'server hardware revenue'. It's fine to include them both, as long as it's made very clear what they are, and 'server hardware revenue' isn't mislabelled as 'server OS revenue' or 'server OS usage'. What actually brought this article to my attention in the first place was confused comments by Linux advocates who thought 'revenue' in this article referred to software vendor revenue, not to server hardware revenue, and were going on about how most users don't pay for Linux so revenue figures are invalid, etc. The section on server OSes is very unclear about these things, and looks like a clear case of misrepresentation of data (not necessarily intentional -- though the unreferenced comments suggest it is). The data are valid, but are being misused. Shalineth (talk) 21:09, 21 February 2011 (UTC)[reply]
This sounds like a consensus to me - we keep the valid data in the article, but relabel it to disambiguate what it actually means. I would support this effort.Wikiolap (talk) 23:51, 21 February 2011 (UTC)[reply]
It sounds like consensus to me too.--Harumphy (talk) 13:34, 27 February 2011 (UTC)[reply]

Time limit for out-of-date sources

There was some discussion earlier in Talk:Usage_share_of_operating_systems#Should_we_remove_AT_Internet_Institute_from_web_client_stats.3F. I think it's fair to say there's a consensus that we should apply the same time limit, whatever that limit is, to all the sources. At the moment it's 12 months. Someone suggested we should reduce it to 6 months. (If we did that then ATII would get removed on 1st April if they haven't updated by then, because they last updated on 31/9/2010.) So, should be cut the time limit to 6 months?--Harumphy (talk) 13:34, 27 February 2011 (UTC)[reply]

I think yes. The previous discussion was stopped by the fact, that Wikipedia doesn't update either. As the problem has since been solved, I know no reason to keep a single old source, that skews the data.1exec1 (talk) 17:11, 27 February 2011 (UTC)[reply]
6 months seems reasonable to me. Jdm64 (talk) 02:21, 28 February 2011 (UTC)[reply]
FYI ATII has just updated. They must have heard us!--Harumphy (talk) 16:02, 1 March 2011 (UTC)[reply]
Yes, and I'm extremely disappointed with them. As you can see with the more detailed PDF, they consider Android as the "Google Operating System" and as if not being Linux, providing unaccurate data for this table... 89.181.106.123 (talk) 00:29, 2 March 2011 (UTC)[reply]

Mobile Devices Citation

Caption on image currently reads "Share of 2010 Q4 smartphone sales to end users by operating system, according to Gartner", followed by a citation.
The numbers in the pie chart are not contained within the cited article. The cited article was written on 19 May 2010, and reports on 2010 Q1 numbers.
Caption should be revised to cite an article containing the numbers used on the pie chart, or the pie chart should be changed to reflect the numbers in the cited article. Mismatches are bad, mmmkay?
64.113.8.130 (talk) 22:55, 4 April 2011 (UTC)[reply]

Web clients - remove sources

Both AT Internet and StatOwl ignore mobile clients in their reports (well, in fact AT Internet notices the existence of iOS but doesn't consider Android worth counting, nor as a Linux "variant", StatOwl just ignores them). That makes the rest of the values inflated, so comparing the numbers from these two sources with the rest isn't a fair comparison. Thus, I propose for us to just stop taking into account both these sources, until they start reporting (or taking into account in their reports) the existence of mobile web clients. 195.23.131.230 (talk) 15:58, 12 April 2011 (UTC)[reply]

AFAICS AT Internet includes Android and a number of things under 'other'. This is perfectly OK for our purposes. StatOwl is more of a problem because they just take desktop OSes with above 0.1% share and expand the numbers so they add up to 100%. This is inconsistent with the rest of our table and there's no easy way of fixing it. So I think we should keep AT but I've no objection to removing StatOwl if that's where the consensus is.--Harumphy (talk) 07:17, 13 April 2011 (UTC)[reply]
AT Internet: The fact that AT Internet puts under "other" things that we don't makes our data on "other" and what fits in there for AT Internet and not for us erroneous. The only ways we're being correct about the data we're dealing with is either by removing AT Internet as a source, or putting things like Android also under other, like they do. So we actually have three different choices: 1) being wrong (as we are now), 2) removing one source (and thus removing the accuracy of the data we're presenting), or 3) putting Android under other, which I honestly don't like, since Android is technically Linux, so the numbers of "Linux" would be "some Linux", which would cause confusion... 195.2 width="100%"3.92.1 (talk) 16:07, 8 August 2011 (UTC)[reply]
StatOwl - I vote on removing StatOwl, since the fact that they don't have an "other" makes their data meaningful only in comparison between those OSs they have stats on. It might be interesting data, but it simply doesn't fit on what we're trying to represent in this table. 195.23.92.1 (talk) 16:07, 8 August 2011 (UTC)[reply]
I oppose removing StatOwl - they are valid reliable and verifiable source. We could add note explaining their methodology if more explanations is needed, but not to remove this source.Wikiolap (talk) 17:34, 13 April 2011 (UTC)[reply]
They are reliable and verifiable, yes, but they're not measuring the same thing we're representiong on that table. They represent the share between a list of OSes, while we're representing the share between all OSes (thus the "other" column). They don't give us enough data (an other column, for instance) to even find out what's the real percentage of those OSes they're representing, so their numbers, while interesting, simply don't have enough info to fit in our table. Putting them there, as they are nowadays, just adds known-yet-unmeasurable error into the table... 195.23.92.1 (talk) 16:07, 8 August 2011 (UTC)[reply]
I agree with your comment higher up this section about the ambiguity of our 'other' column. It isn't immediately obvious that what we count under 'other' varies from source to source. Rather than eliminate a source because it doesn't fit our idea of 'other', it would be better to eliminate the 'other' column from the table. There's nothing wrong with AT as a source. StatOwl, on the other hand, is more problematic because it only covers desktop OSes with >0.1% share and then expands them to fill 100%. So I think we should keep AT, dump StatOwl and dump the 'other' column. --Harumphy (talk) 22:10, 9 August 2011 (UTC)[reply]
I concur with the the idea of removing StatOwl. However, I don't think that dropping the 'other' column is a good idea unless all the rows add up to 100%. Since that column is defined as 'whatever doesn't fit to the current columns', or simply '100% - sum of the columns', it will be implied even if we dump it. So I don't see point in doing that. The abovementioned issue of AT not using the same 'other' definition as ours can be solved by merging the problematic cell into one for now.1exec1 (talk) 00:18, 11 August 2011 (UTC)[reply]
Sorry, I don't get that last bit. What do you mean by the "problematic cell" and what are you suggesting we merge it into? --Harumphy (talk) 10:28, 12 August 2011 (UTC)[reply]
I meant doing something like this:
Source Date Microsoft Windows Apple Linux kernel based Symbian Black-
Berry
OS
Other
7 Vista XP All
versions
Mac
OS X
iOS GNU/
Linux
Android
AT Internet [4] Apr. 2011 28.8% 16.4% 42.1% 88.4% 6.9% 2.8% 0.9% 0.5% 0.5%

1exec1 (talk) 09:30, 14 August 2011 (UTC)[reply]

Doing so is fine by me, as long as we do the same for the median... 195.23.92.1 (talk) 19:28, 17 August 2011 (UTC)[reply]
It seems a bit messy to me, especially if we do the same for the median. Overall I don't think it's an improvement.--Harumphy (talk) 10:01, 18 August 2011 (UTC)[reply]
Since dumping the "Other" column without adjusting the percentages would be odd (lines wouldn't add up to 100%) and this solution is messy, would you accept a solution where the "Other" column would be simplified (and we would add the Symbian and Blackberry values to the "other" column)? Feel free to add your comment and also your vote in the "Vote Count" section for this alternative 195.23.92.1 (talk) 14:21, 18 August 2011 (UTC)[reply]
If we're keeping the 'Other' column, I'm in favour of leaving it as it is. The fact that for one source it includes Symbian and Blackberry is a very minor irritation which I can live with more easily than any of the proposed remedies.--Harumphy (talk) 15:23, 18 August 2011 (UTC)[reply]
Shouldn't we at least put some kind of notice in the footnotes, then? Keeping it "as it is" results on both wrong data on "Other" (more than it should) and "Symbian" and "Blackberry" (less than it should). Another (while messy more agreeable to me) thing we could do would be doing the same it's done to Clicky and proposed to StatOwl, and "calculate" which percentage of that "Other" is "Symbian" and "Blackberry", taking the other sources as a reference. 195.23.92.1 (talk) 14:43, 19 August 2011 (UTC)[reply]
How about this: we replace '---' with 'n/a', change heading 'Other' to 'Other inc. n/a' and add a footnote saying n/a = data not available from source --Harumphy (talk) 21:37, 19 August 2011 (UTC)[reply]

StatOwl - further thoughts

Looking at the above discussion, I can see that I've blown hot and cold on StatOwl over the months. The only real problem I have with StatOwl is that it ignores mobiles and inflates desktop share to 100%. We have a related problem with Clicky Web Analytics, which produces separate stats for desktops and mobiles, which we multiply by roughly 0.94 and 0.06 respectively to get the figures for our combined table. (I work out the exact figure each month from the mean of two sources as explained in the footnote.) If we used the same correction for StatOwl - i.e. multiply its figures by the desktop factor of 0.94-ish, that would eliminate the inconsistency. Naturally this would have to be explained in the footnote. I've suggested this in the past, but not got support for it, so for many months the status quo has been that applying this kind of correction based on figures from two other sources is somehow OK for Clicky but not OK for StatOwl. It seems to me that we should be consistent here, so please comment here and see the further vote option below.--Harumphy (talk) 08:10, 19 August 2011 (UTC)[reply]

If you do so, you won't have data to fill in the blanks: if you put the rest of the percentage to reach 100% in "Other", then that data is wrong (because you have Android, iOS, Blackberry and Symbian's percentages there), if you also don't fill "Other", then the table will be... strange looking, even if more accurate. From these two choices I still prefer the third (ditch StatOwl), but if that's not what will happen... which of the two sollutions up there do you propose? 195.23.92.1 (talk) 14:35, 19 August 2011 (UTC)[reply]
I think the other 6% or so should go in the 'other' column. The footnote explains how the 'other' column is calculated so I don't have a problem with it.--Harumphy (talk) 21:24, 19 August 2011 (UTC)[reply]
Three in favour, none against, so I've implemented this. --Harumphy (talk) 14:40, 24 August 2011 (UTC)[reply]

Vote Count

We're discussing two sources in particular, and they have different suggestions... Here's a summary of the votes we can see by reading the discussion (please update this if you add something to the discussion):

StatOwl - Let's remove it!

AT Internet - let's merge the "Others" cells!

AT Internet - let's count "Symbian" and "Blackberry" as "other"!

Apply desktop/mobile split (mean of Net Applications and StatCounter Global Stats figures) to both Clicky and StatOwl

  • Yes, apply to both - 2 vote - Harumphy, Jdm64, 195.23.92.1
  • No, apply to neither - 0 votes
  • Apply to Clicky but not StatOwl (status quo) - 0 votes

Linux table headings

For clarity and consistency between sections, I suggest we change the top-level heading in both the web client and mobile device tables from "Linux" and "Linux based" respectively to "Linux kernel based", and change the second-level heading in the web client table from "mainstream" to "Linux". --Harumphy (talk) 19:34, 11 May 2011 (UTC)[reply]

I don't think that's the best solution. For one "Linux kernel based" is a long title. Second, I think it would be confusing. What's the difference between "Linux" and "Linux kernel base"? I understand what you're trying to say, but would others? I think it's fine how it is, or possibly, "Linux" as the top heading (or "Linux based") and then sub-headings of "GNU/Linux" and "Android/Linux". Jdm64 (talk) 22:14, 11 May 2011 (UTC)[reply]
Linux has two meanings: (1) the Linux kernel, and (2) the family of operating systems based around it, which are largely binary compatible with each other and traditionally known as Linux distributions. Then there is Android, which uses a forked Linux kernel, is binary incompatible with Linux distributions and has a stack sitting on the kernel which is very different from anything else. The only thing that Android has in common with Linux distributions is the kernel, and that is a heavily modified, incompatible derivative. I am aiming to better reflect the two meanings, and to deal with the fact that within a couple of months or so it looks as though Android will be more mainstream than the stuff we currently call "mainstream". As far as length goes, "Linux kernel based" will fit without expanding column width. (I've tried it.) I don't thing we should use GNU/Linux or Android/Linux as they really are too long, don't reflect what the sources say and do not aid understanding at all. --Harumphy (talk) 08:05, 12 May 2011 (UTC)[reply]
I am more confused by "Linux kernel based" vs "Linux based" as they may be understood as synonyms and anything Linux based is certainly Linux kernel based. We must of course use terminology that reflects what the sources are talking about, but isn't most Linux except Android indeed GNU/Linux (which is not longer than "Linux based")? If there is significant use of other Linuces (affecting the decimal points we are writing out) simply "Android" and "Other Linux" should do. --LPfi (talk) 11:51, 12 May 2011 (UTC)[reply]

[section break]
Just to be clear, I'm suggesting this:

Linux kernel based
Linux Android

The top line is an umbrella heading that accurately reflects the only thing that Linux distributions and Android have in common: some sort of Linux kernel. In the second line, Linux means what it is most commonly understood to mean - a Linux distribution. In this I'm taking the view that Android is *not* a Linux distribution in the conventional sense because it has so little in common with Debian, Ubuntu, Fedora, RHEL, SuSE etc. All of the stats sources except Wikimedia separate Linux and Android in this way. --Harumphy (talk) 12:40, 12 May 2011 (UTC)[reply]

Like LPfi said, anything Linux based is surly Linux kernel based; This is like how Linux is a Unix-Like OS. Your headings look redundant, especially to somebody that doesn't know about Linux; and it doesn't make somebody want to learn what the distinction is. I think the layout below clearly shows the distinction between normal Linux and android. "Linux based" is a link to "Linux kernel". "GNU/Linux" could be 2 separate links to GNU and Linux or one link to Linux Distribution. How is that not simple and clear? Jdm64 (talk) 20:22, 12 May 2011 (UTC)[reply]
Linux Based
GNU/Linux Android
The phrase "Linux based" is no more informative than just "Linux", because it doesn't make clear which of the two things called Linux forms the base. Is could mean either just the kernel or the kernel plus the stuff that makes a Linux distribution. So, to answer your question, it's not simple and clear because it's ambiguous. Sure, the kernel's always there, even in Android, but the other stuff isn't. By excluding the word kernel, it doesn't make it clear that Android is based on only the kernel and not the other stuff. The 'umbrella' heading should reflect what the things under it have in common. They have only one thing in common: the kernel. That is why the k-word is the key to comprehension here. --Harumphy (talk) 23:38, 12 May 2011 (UTC)[reply]
Ok, fine, include kernel. But that still doesn't remove the confusion about "Linux kernel based" and "Linux". It should be "GNU/Linux" to show how Linux kernel based is different than Linux. Jdm64 (talk) 01:24, 13 May 2011 (UTC)[reply]
Fair enough. Thanks. I'll settle for that. --Harumphy (talk) 07:13, 13 May 2011 (UTC)[reply]

I believe Android should be reclassified as a mobile device. See my comments there. hhhobbit (talk) 14:24, 5 June 2011 (UTC)[reply]

Count Amazon Kindle?

Amazon Kindle was reported to likely break 8 million units sold last year. http://www.slashgear.com/amazon-likely-to-break-8-million-kindle-units-sold-this-year-21120580/

With quite a few media being sold: http://news.cnet.com/amazon-kindle-books-outselling-all-print-books/8301-17938_105-20064302-1.html Better data is likely available. Seems these are significant numbers. --89.12.7.116 (talk) 20:57, 26 May 2011 (UTC)[reply]

This page is about usage share of operating systems, not devices. The OS that the Kindle uses is Linux, so if we were to add it, it would only be a small side note that the Kindle runs Linux. I think it's more appropriate that the information be added to Linux-based devices. Jdm64 (talk) 00:16, 27 May 2011 (UTC)[reply]

I have written this about thirty times and each time started over. I would like to do that again right now Saying Kindle is Linux is like saying Mac iOS is OS-X, or OS-X is FreeBSD. Mac OS-X uses launchd to start everything. Except for a few things that init starts, init is basically something that all other processes have as their parent if they lose their immediate parent. launchd does not work the same way. Is OS-x's launchd the same thing as init in Unix / Linux? No. The same thing is occurring with these mobile OS. One mobile OS has the distinction of being derived from nothing but being its own little entity from the start - Blackberry. All the other mobile OS are diverging so far away from what they were derived from that the code base is becoming meaningless. iOS really is that different from OS-X. But each OS is really not just the kernel. It is all of the things that go together including the hardware that make up that system. Unless you want to have a separate category for each of these mobile OS I suggest you lump them all together with the category mobile OS. They have more similarities with each other than they do with what they were derived from. Apple has joined Windows in having malware that self installs now with no password required on Macintosh OS-X as long as the user account you are using has administrator privileges. It has the promise of continuning that way unless Apple finally wises up and begins requiring a password for software installs for all OS-X users. May I humbly suggest these malware problems are making a lot of people mobile OS only users? But you have been caught napping. Apple sold more iPhone and iPad systems in the last two quarters than they did OS-X. The malware problems with the predominant desktop systems combined with Twitter and other things are making many current desktop OS systems dinosaurs. So I suggest you have a separate mobile OS category with maybe a break down showing what each was derived from. But the malware problems of the predominant desktop systems are rapidly making mobile OS as the tour de force of the future. Would I have predicted that two short years ago? No. I was also caught napping. It is rapidly progressing toward a future where many people will be mobile OS only users, storing their data in the cloud (data storage repositories) and printing to new printers that use BlueTooth. Any general mobile OS that doesn't make provisions to share the data that was created on it with a different general mobile OS from another vendor will rapidly become a relic of the past. IMHO, your current classification scheme was what was there in the past and what we have now is becoming increasingly incongruent with what you have. You are missing what has been happening with these mobile devices. Mobile OS are rapidly becoming the OS of the future. The fact that 8 million Kindle units have been sold indicates that things are changing. Did we have eight million new installs of desktop Linux systems last year? No. Your percentages are woefully out of data, but mostly because your categorization is wrong. Kindle is not Linux. iOS is not Macintosh / OS-X. They are now separate entities with very little similarity to what they were derived from. hhhobbit (talk) 02:57, 6 June 2011 (UTC)[reply]

The problem is I still don't know where the data would fit on this page given the current sections. I'm not saying the information is unimportant, just not suited for this page. This page is still about OSs, and the OS of the Kindle is Linux kernel based. It's just not a traditional desktop distribution. Similarly iOS is based on the Darwin OS, just like MacOSX. Jdm64 (talk) 20:41, 6 June 2011 (UTC)[reply]
To me, the more important issue (and the answer is not clear to me re Kindle), is under what circumstances should a device's OS fit into this article. In some sense, every automobile with a computer chip has an OS in it (a real-time kernel of some kind), but I doubt that fits the intent of this article. Kindle has a Linux kernel. How much else of what we think of as "an OS" does Kindle have? If it weren't for Kindle's ability to browse the web, it would be a single-purpose dedicated device, not really different (IMHO) than the smarts in an automobile - or a microwave oven for that matter. My point is, where to draw the line? Again, I don't know the answer to that. Perhaps a section for devices that can't download apps. If Kindle is included, then so should the 300 M NON-smart phones sold last quarter be included. (a variety of proprietary "OS"s) ToolmakerSteve (talk) 04:37, 21 August 2011 (UTC)[reply]

Long Term Suggestions

I was looking at the discussion, and at the article. What struck me is that the current active discussion topics seem to be discussing the different facets of the same issue, and I think that we should look at combining them. Problem is that since you sometimes link to me, I can't work on the page :) I can however make suggestions. My apologies if the formatting is a bit rough. Formatting on discussion pages drives me to distraction sometimes.

  • Technology Types

I think we can limit things to three technology types:

Personal Computers - Desktops, Notebooks, Netbooks, Laptops, Nettops, in other words any stand alone computing device which is designed to be used by a single user and which has a fill sized keyboard.
Mobile Devices - Tablets, EReaders, MP3 Players, Phones, in other words any stand alone computing device which is designed to be used by a single user, which while may have a keyboard it will not be full sized, or it will be an on-screen keyboard. Optional Bluetooth or USB keyboards do not count as they are not part of the basic device and it is designed to function without a keyboard.
Servers - in other words any computing device which is designed for multiple user use, either over a network, or through direct connection as was once common. Servers include all computing devices which are not stand alone such as Desktop Client units. Mainframes and Supercomputers are effectively specialized Servers.
  • Numbers

This gets fun. No matter what is done no one will be happy. I'd rather be too expansive here though. While it's difficult to be certain about reliability of any numbers below 5%, the fact that something shows up is of interest. Part of the problem is that everyone wants the numbers to favor them. This puts us in opposition to them, because we want the numbers to be accurate and favor no one.

Unless there is solid evidence that the numbers from a supplier are inaccurate we need to show them. If an analyst or investigator is able to come up with evidence that there is a problem we need to provide a link to it with a note that this supplier's numbers are questionable.

When we are displaying numbers, we need to make sure that the numbers are from the same time period. In the Server Usage Share we have dates of 2007, Jan. 2009, July 2009, September 2010, and Q1 2011. Dates this far apart are impossible to make a valid comparison with. We need to set a rule on age range allowed. My personal suggestion is that the widest range should be eighteen months. It might make the charts a lot smaller, but it will make them a lot more sensible.

Longest term we should probably consider splitting this into three articles, i.e. usage share for each technology type so that each type can be handled in far more detail. UrbanTerrorist (talk) 19:59, 12 August 2011 (UTC)[reply]

sales are not equivalent to use

Re "Moreover sales are not equivalent to use, as Windows comes pre-installed on many computers that will be used with other operating systems."

AFAIK, the actual PERCENTAGE of Windows computers that are wiped and replaced with Linux is small. I have NEVER heard any evidence, anecdotal or substantive, to counter that. Unless you have a SOURCE for that statement it should be re-worded. ToolmakerSteve (talk) 20:03, 22 August 2011 (UTC)[reply]

To balance the (IMHO overstated) emphasis on Windows sales not equating with usage, I've added a SOURCED reference mentioning PIRACY, which is a factor that increases usage above sales. (My interest is in making the best possible estimates comparing various desktop OS to smartphone OS unit usage. E.g. I want to know when Android passes Windows to become the #1 OS in units.) ToolmakerSteve (talk) 20:33, 22 August 2011 (UTC)[reply]

Apologizes for pushing this point further, but I just noticed that the 1% Median web browsing statistics for Gnu/Linux also is consistent with hypothesis "the % of PCs on which Windows are replaced by Linux is statistically small." IMHO, a fraction of a percent - less than the statistical error of the available sources - having no significant impact on the total Linux percentage. Thus, the vague adjective "MANY" in ".. pre-installed on many computers that will be used with other .." is inappropriate. However, since I have not found a source, I will leave it to the author who added that sentence, to reword it to be less misleading. I DO like the basic concept of pointing out to readers that there is a difference between sales and usage, so I DO favor keeping that sentence in some fashion; on the other hand, it is important to not overload/confuse/mislead the average reader with information that may be statistically minor. ToolmakerSteve (talk) 22:40, 22 August 2011 (UTC)[reply]

One possible proxy for Linux usage is downloads. It would be best to combine that with a survey that samples downloaders, to find what they are doing with their copy - if there is such a survey. E.g., I have a hard drive with an older version of Red Hat Linux on it. Not currently installed in a machine. Some fraction of Linux downloads are in dual boot setups with Windows - would be interesting to have users estimate how much time they spend in each OS. To distinguish between hobbyists experimenting with it occasionally versus substantive use. ToolmakerSteve (talk) 00:51, 23 August 2011 (UTC)[reply]

I added a source having an alternate analysis of ~ 6% for Linux in 2009, and showed that such analysis would yield an alternate Q2 2011 Linux figure of ~ 5 million. However, making that extrapolation might qualify as "original research", hence dubious. I've e-mailed the author requesting any updated figures/links. ToolmakerSteve (talk) 03:00, 23 August 2011 (UTC)[reply]

Please see the discussion at the top of this page about the C. Martin blog piece. In the light of this I've reverted this one edit.--Harumphy (talk) 10:52, 23 August 2011 (UTC)[reply]
Thanks, I had missed that discussion. I also have since learned that even if 6% had been credible momentarily in 2009, due to Netbook sales, the extrapolation to today would not be valid. In 2009, there may have been a period where Linux was more strongly selling on Netbooks, e.g. by Dell. Microsoft responded with low price for Windows 7 Starter on limited hardware (e.g. 1 GB RAM), and has successfully turned vendors such as Dell back into near-100% sellers of Windows. I find no significant support for the notion that anyone other than the rare highly technical user would choose Linux (for a PC), given Windows available at negligible cost. (Quite the contrary, there is anecdotal evidence that a more common action, when a Windows license adds significantly to a computer's cost, but is optional, is to purchase the computer with a free OS, and then replace that with a pirated copy of Windows.) ToolmakerSteve (talk) 11:00, 23 August 2011 (UTC)[reply]
After searching to see what reports are available for different OS segments, it occurs to me there might be a simple explanation as to why there aren't more available numbers for Linux sales on PCs, from the various research companies: maybe the numbers aren't worth reporting on. Why do I say this? Because it is notable that numbers for Linux server sales are readily available. (Granted, server sales are lower volume and higher dollar, so easier to track.) If Linux were making significant inroads in general PC use, more companies would deem that worth researching. I consider this additional indirect evidence that Linux sales volumes on general PCs continue to be < 5%. ToolmakerSteve (talk) 12:19, 23 August 2011 (UTC)[reply]

After further thought about reasons that might cause significant numbers of people to go to the effort of replacing an OS, and the POV tone of the sentence under discussion, I have replaced it with the following attempt at a neutral statement: "Also, sales may overstate usage. Most computers are sold with a pre-installed OS; some users replace that OS with a different one, perhaps for security reasons, or to install an OS for which more applications are available.[citation needed]" ToolmakerSteve (talk) 20:45, 23 August 2011 (UTC)[reply]

This is anecdotal, but I have personally replaced Windows with Linux on about forty or fifty computers. I know a lot of people who have done that on more computers than I have. So the number of computers using Linux in use could be far different than what the analysts are estimating. The issue is getting reliable numbers, and to the best of my knowledge, no one has proposed a method that seems likely to be reliable. UrbanTerrorist (talk) 20:07, 16 October 2011 (UTC)[reply]

LOL that graphic

that graphic on the right is not a good member of this page. it is created from impossible to identify data, and its citation is the page it is on. come on people. someone can do better than this. Forcep caliper (talk) 03:05, 30 September 2011 (UTC)[reply]

I agree. Referencing itself seems awkward. And giving usage shares for "web client operating systems" without saying what those are is even more confusing. — Preceding unsigned comment added by NotDifficult (talkcontribs) 07:45, 28 October 2011 (UTC)[reply]
In what way is the data impossible to identify? It's the median of eight sources, all of which are cited. So the figures can be verified precisely. What's the problem?--Harumphy (talk) 13:15, 28 October 2011 (UTC)[reply]

Ubuntu

"Clicky Web Analytics, StatOwl and Wikimedia indicate that Ubuntu has an order of magnitude more usage than any other identified desktop Linux distribution."

So does this mean that Ubuntu should get its own column in the chart?--Harizotoh9 (talk) 07:36, 25 October 2011 (UTC)[reply]

Somewhere in this talk page or its archives there is some discussion of what the threshold should be for including an OS in the web client table. For some time now the consensus has been that an OS only gets its own column if its identified by more than half the sources. Ubuntu is identified by three of the eight, but the consensus requires five out of eight. About four other Linux distros (IIRR Debian, RedHat, Fedora, SuSE) are also identified by three sources FWIW. --Harumphy (talk) 13:52, 25 October 2011 (UTC)[reply]
We should consider practical side of inclusion also. The horizontal space of the page is not infinite. 1exec1 (talk) 18:20, 29 October 2011 (UTC)[reply]

Median constitutes improper synthesis and original research

I first marked it as original research - a marking which was promptly deleted. I deleted the section and it was promptly reverted. The argument still stands: median is not an acceptable calculation:

  • While "well defined" it constitutes improper synthesis of the numbers it calculates over. It reaches a conclusion not supported by any of the sources. read WP:OR. It does not correctly reflect the sources.
  • Wikipedia policy requires consensus even for routine calculations like totals and counts. I marked it as WP:OR - a marking which should not be summarily deleted as was done by Harumphy.
  • Median is by no means a routine calculation; it is a statistical method which is not applicable in this setting: The result will be highly dependent on which sources are selected, ie the numbers are a result of article editing (specifically source selection) and not attributable to a source.

User Harumphy has threatened to treat it as edit warring if I remove the line again. However, my position stands: This is original research and it does not belong here. There is not consensus, so Harumphy, please remove that line yourself. Useerup (talk) 11:16, 30 October 2011 (UTC)[reply]

If the median was being used to infer something then it might constitute improper synthesis. But it isn't being used for that. It's just being stated as a median without any conclusion being drawn from it. The most relevant part of WP:OR is surely WP:OR#Routine_calculations, and there has to date been a consensus among editors here that it's OK as far as that policy is concerned.--Harumphy (talk) 11:29, 30 October 2011 (UTC)[reply]
Sure, if you delete the WP:OR markings you can claim consensus. Median is certainly not a routine calculation. Median, mean etc are original research and improper synthesis because the result is not supported by any of the sources. You are creating a synthesis over a number of sources. This is wrong on many levels, not least that the result will depend heavily of the sources selected. To use the median you need a source which calculated that median and which supports why a median is proper. There is no such source referenced, hence OR. Useerup (talk) 11:46, 30 October 2011 (UTC)[reply]
Reading archived discussions I don't see a discussion with a consensus at all. I see someone touched upon the subject by discussing the mean value - but no discussion and "consensus" on the applicability of median at all. But that really doesn't matter, as there is no consensus at this point. Useerup (talk) 12:23, 30 October 2011 (UTC)[reply]
To make matters worse, the median is supposed to "remove outliers" (per archived discussion). But the numbers do not at all express the same distributions. Some are demographically biased, others are openly geographically biased. Calculating a mean or a median (or any other statistical function) is not just OR - it is totally improper as it lumps together apples and oranges. Useerup (talk) 12:23, 30 October 2011 (UTC)[reply]
A few points in reply:
  • I agree that any past consensus becomes moot if there isn't consensus now.
  • I ask that given that the table's format has been stable for some time, it shouldn't be altered until a new consensus has been reached here first.
  • I disagree with your assertion that median is OR. In what way is a median less of a routine calculation than, say, the simple addition that is specifically endorsed by WP:OR#Routine_calculations? After all, they are both just forms of y=f(x1 ... xn). (If you know the values of x then there's only one possible value of y whether it's a median, mean or simple addition). You keep asserting that it's OR but ISTM that (a) you have not yet justified that assertion, and (b) even if it is, it's an allowable form of it. AFAICS what we're doing is entirely consistent with WP:OR#Routine_calculations. If you disagree, please explain why, don't just baldly assert your opinion as fact. --Harumphy (talk) 12:53, 30 October 2011 (UTC)[reply]
Median is not an routine calculation like a simple conversion between units of measure (inches to meters, birth date to age etc). It is a statistical function which is applicable in certain situations and not in others. For simple/routine calculations this is uncontroversial, you cannot argue that the conversion feet to meters introduces new knowledge or is open for interpretation. For statistical functions your are making assumptions and creating synthesis. I have explained why above: You are using median across a data set with very, very different numbers: Some numbers have expressed geographically bias, others has openly demographically bias. Median is as wrong as mean in those situations. If I introduce yet another stat counter (or remove one) it will immediately change the median number. Thus, the selection of sources becomes a basis for the calculated median. That selection is performed by wikipedia editors and has no basis in any of the sources. The policy is pretty clear, you cannot combine multiple sources to reach a conclusion not expressly supported by any one of the sources. Useerup (talk) 13:19, 30 October 2011 (UTC)[reply]
The median isn't a 'conclusion'. It's just a summary. A summary inevitably compromises precision in the pursuit of brevity. That doesn't render it invalid, or OR. What does anyone else think?--Harumphy (talk) 14:45, 30 October 2011 (UTC)[reply]
I think that saying that median is not routine calculation is itself OR, thus that assertion itself needs proper discussion before we can discuss its applicability here. Seriously, it has been discussed here already, and since the current table doesn't clearly violate any of the Wikipedia's policies, consensus has higher authority than anything else. 1exec1 (talk) 15:23, 30 October 2011 (UTC)[reply]
Going directly by WP:NOR:
  • Do not combine material from multiple sources to reach or imply a conclusion not explicitly stated by any of the sources. If one reliable source says A, and another reliable source says B, do not join A and B together to imply a conclusion C that is not mentioned by either of the sources. This would be a synthesis of published material to advance a new position, which is original research.
    • Here we have a table of 8 sources which say A B C D E F G H. This article joins all of those to imply conclusion M which is not mentioned by any of the sources. Please explain how that is not OR?
    • The use of median (even if OR was allowed) in this case seems highly doubtful. A median is computed over a homogeneous set of numbers expressing the same property for a number of observations, i.e. ages of students in a class. The problems here: :::::***the median here is not computed over the same property: One number is the web usage for mostly German sites, another "mostly" in U.S., a third "mostly" web designers and other self-selected communities. So what does the median represent? U.S.? Global? Germans? Coffee-drinkers? It makes no sense. It is the median of seeds in boxes of fruits. Some boxes with apples, some with oranges some with rotten bananas.
      • a median is only valid when computed over a complete set of observations. The number of students in a class is well-defined, countable, verifiable and finite. The number of web client usage share counters is uncountable and the selection here has been selected by editors.
  • This policy allows routine mathematical calculations, such as adding numbers, converting units, or calculating a person's age
    • These are simple mathematical arithmetic calculations and conversions; a far cry from statistical calculations. The closest you can get to mean or median (and that would be a stretch) is "adding numbers". However, this article not just adds numbers, it calculates a median over numbers from multiple sources, thus the calculation is sensitive to the the sources chosen by wikipedia editors, and thus is not supported by the sources. The sources may individually be reliable (with the caveats for each one) but the list has been comprised by WP editors and median calculated over that list. This is clearly new conclusions entered by WP editors.
    • It is even worse: At least 2 of the sources have been "corrected" by WP editors (in good faith, but still) further creating OR.
Claiming that "saying that median is not routine calculation is itself OR" is... strange. This is the talk page and not the Bizarro universe where everything is opposite. As everywhere on wikipedia the burden falls on the editor who wants to enter (or keep) a claim to demonstrate that it is not original research, see WP:VERIFY. To demand that anyone challenging a claim must first demonstrate the such a challenge itself is not OR is a novel take Useerup (talk) 16:54, 30 October 2011 (UTC)[reply]
You keep asserting that a conclusion is reached / implied by calculating median. It is not. This is explicitly stated in the article, along with the caveats regarding the accuracy and data skewing. Median is just that - a median of that data, no conclusion is implied anywhere that refers to the calculated median for support. Thus the SYNTH point is weak. 1exec1 (talk) 18:33, 30 October 2011 (UTC)[reply]
The bolded line with median is a conclusion. It states "this is the usage share". In the archived discussion the median is even pushed as a way to do away with "outliers". Useerup (talk) 19:27, 30 October 2011 (UTC)[reply]
The bolded line doesn't state it. It's just you inferring it. Thus the only error is in your own perception.--Harumphy (talk) 23:03, 30 October 2011 (UTC)[reply]
BTW your 'improper synthesis' tag on two of the table's footnotes is wrong so I'm removing it. This is a routine calculation that was unanimously approved by the three editors who discussed and voted on this very issue, and thus compliant with WP:OR (See the last vote in Talk:Usage_share_of_operating_systems#Vote_Count above.)--Harumphy (talk) 23:29, 30 October 2011 (UTC)[reply]
WP:NOTYOURS and WP:NOTDEMOCRACY. Issue stands - "correcting" numbers is improper synthesis. I apologize for being so blunt, but please don't remove tags before issue has been resolved. Useerup (talk) 01:09, 31 October 2011 (UTC)[reply]
So what exactly does the bolded median line state? Does it compute over the other rows (multiple sources)? Is it supported by any one of the sources? Are the numbers in that row attributable to reliable secondary sources or are they the result of wikipedia editing, i.e. selection of sources? Do any of the sources coalesce different demographics or different geographical regions and explain how median is a safe method? Useerup (talk) 01:20, 31 October 2011 (UTC)[reply]
For what it worth - I originally objected to the median in summary table on the same grounds as User:Useerup raises now. At the time, there indeed were majority of editors who thought it was not OR. So User:Harumphy is right - there was consensus. And while I still think it is OR, I also agree with User:Harumphy that any change to this tables or calculation would require new vote.Wikiolap (talk) 02:42, 31 October 2011 (UTC)[reply]

Windows 7 is now the Widely Used Operating System

Many Websites and other media reports that Windows 7 is now the most widely used operating system and has overtaken Windows XP. I wanted to request other editors to kindly check and refresh the usgae share to keep the page updated. Even the Windows XP page says that it is the 'second most popular version of windows' which logically shows that Windows 7 is now on the top. Meanwhile Windows Vista share has also changed indicating more usage for Windows 7. Changing both text and graph will be better. Thanks TheGeneralUser (talk) 16:42, 30 October 2011 (UTC)[reply]

Six of the eight sources that we track in this article still have XP as greater than 7, but on recent trends that's likely to change in about a couple of months or so. The Windows XP page cites w3schools as its source, a source which monitors web usage by web developers only. They are a small and highly atypical subset of the total user population so for our purpose it's not a credible source.--Harumphy (talk) 23:13, 30 October 2011 (UTC)[reply]