Talk:Hard disk drive

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Cscr-former.svg Hard disk drive is a former featured article candidate. Please view the links under Article milestones below to see why the nomination failed. For older candidates, please check the archive.
September 9, 2007 Featured article candidate Not promoted
WikiProject Computing / Hardware (Rated B-class, Top-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.
Taskforce icon
This article is supported by Computer hardware task force (marked as Top-importance).
 

HDD Areal Density vs Moore's Law[edit]

Split Talk Section "2016 desktop capacity revised forecast" here See original comments in 2016 desktop capacity revised forecast by 71.128.35.13 (talk) 23:20, 23 May 2014 (UTC)

Regarding comparison to Moore's Law (ML), the long term, that is, from the invocation of Moore's law circa 1965 shows magnetic areal density (AD) growing at a slightly higher rate than ML and during the 90s and into the 00s at a much higher rate. Both AD and ML appear to be slowing down in this decade due to fundamental physical limits so the original statement is accurate while the reverted change is perhaps misleading by equating ML and AD into this century. Therefore I reverted the edit. It would be possible to fix the edit by removing the reference to ML but since most people know of Moore's law I the comparison is useful and should remain as is without getting into in which decade AD did or did not exceed ML. Tom94022 (talk) 17:56, 24 May 2014 (UTC)

The Moore's law reference does not document HDD AD long term; however the Coughlin reference does document 1990-2010 specifically. Could you provide a reference for longer term HDD? (It's certainly in the 25-100% ballpark for 1962-2014, I would imagine). The point is that the recent HDD AD trend has slowed, as Coughlin(2012) indicated. 71.128.35.13 (talk) 18:16, 24 May 2014 (UTC)

You really shouldn't revert without a basis other than u just don't like it, but since u asked. a Google search of "HDD Areal Density Trend" images turns up a number of reliable sources, one of which I picked. You should also note that the Moore's Law page states ML is also slowing down to about 30%/year, again not too far off from HDD AD. Tom94022 (talk) 00:13, 25 May 2014 (UTC)
This is in response to the entry you titled expansively,
 "Fair and accurate comparison to Moore's law wikipedia entry". 
First, I'll discuss the accuracy deficit in detail (and in a sense "authoritatively"), then the fairness issue, and finally propose an improvement to the article.

Thanks for pointing to Whyte (2009) areal density data, and to the Moore's law wikipedia entry. I had already come across that wiki entry, and this did not happen by chance.

I wrote that entry. You may be surprised to learn, that although I'm just a new Wikipedia user with no tie to the storage industry, I recently authored that particular new section. I'm glad to see that my work has found an audience. Wikipedia articles can have unexpected consequences.

As I wrote, MPU prices improved about 30% per year (halving every two years) before and after the thoroughly-researched late-1990s surge of technical advancement. Therefore, I can certainly explain for “u” Sir/Madam what that “30%” means, and how you appear to have misunderstood it.

You should be careful to distinguish between growth (a performance increase, with a certain doubling time) as opposed to decline (price improvement). The article says MPU prices halved every two years. The -30% annual decline rate is equivalent to a performance increase of

 (1 / (100% – 30% / year) - 1) = +43% CAGR. 

The halving time offers a way to double-check the calculation,

 exp( ln(0.5) / 2 years) – 100% = -29% per year; 

so halving time would really be closer to 1.94 years, or

 exp( ln(0.5) / 1.94 years) – 100% = -30% per year.
So, you “really should” use the +43% MPU performance CAGR increase, not the -30% / year price decline, to compare against areal density growth. The disk areal density slopes +40% per year in the reference you kindly provided (Whyte, 2009 of IBM).

https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/resource/BLOGS_UPLOADED_IMAGES/areal_2.jpg).

Whyte's silicon density slope of +34% CAGR almost parallels the aforementioned disk areal density slope. As Whyte(2009) explains explicitly these two have tracked together, “Another interesting side note can be seen when you add the areal density of silicon, which to this day has tracked almost scarily to Moores Law.”

Given the similarity between these trend slopes, judging which slope is “somewhat higher” would require one to perform a statistical test against some confidence value (p-value). You haven't provided any statistical reference that would support a conclusion as to which is higher. I suggest you return to the original “not substantially/substantively different” phrasing, instead of minting the new and unsupported claim of “somewhat higher.”

The slowing of Moore's law that you pointed to is neither here nor there. The 30% figure is way, way off base because it confuses performance growth with price decline as I have shown, and because it refers to MPUs some time ago. It does not refer to SSDs recently.

What really matters for HDDs is the recent SSD (relative) price trend. These data are available. From 2010 to 2013 SSD prices trended at the usual Moore's law rate and halved every two years, a capacity increase of around 40-45% per year. For reference, the following sources give recent SSD price trends: http://techreport.com/review/23149/ssd-prices-in-steady-substantial-decline http://www.storagenewsletter.com/rubriques/market-reportsresearch/when-will-ssd-have-same-price-as-hdd-priceg2/

In conclusion, during the 2010-2013 time period, the gap in terms of price per unit storage of information between SSD (40-45% growth per year of density) and HDD (10-25% growth) has narrowed by a factor of two or three. This is the headwind HDDs face.

I admonish you to look at the rules at the top of this page. “Be polite, and welcoming to new users” (like me!). “Assume good faith.” Accusing me of reverting “without a basis other than u just don't like it” does not assume good faith. “Avoid personal attacks.” Now as for me here, I'm just writing this in self defense. I see now that veteran industry marketing professionals employ a bare-knuckles, frank and direct style to get a commercial message out on heavily-trafficked and respected internet sites, to suppress inconvenient information and to stand guard over the community's content zealously as if it were owned privately. Even so, I would counsel you to renounce the commercialism that, at least from my own point of view, has been allowed to pervade this content of this HDD article.

I direct your attention to an example of commercial bias found in this article's “future development” section. Current version: “New magnetic storage technologies are being developed to support higher areal density growth, address the superparamagnetic limit, and maintain the competitiveness of HDDs with potentially competitive products such as flash memory-based solid-state drives (SSDs).” Obviously SSDs do compete, and are mischaracterized here as merely "potentially competitive." A less slanted approach would indicate that HDDs and flash memory are often complementary rather than exclusive. Proposed revision: “HDDs store most of the information in the world, and this is expected to continue because of their low cost and long retention times. HDDs face competition for some information storage applications from flash memory. Hierarchical storage management combines several storage types to improve overall cost and performance: faster but costly technologies such as SSDs and/or DRAM are joined with slower but less costly HDDs or tape. For example, the Fusion Drive is one of many commercial products that combine a small SSD and large HDD. New magnetic storage technologies are being developed to support higher areal density growth and address the superparamagnetic limit, as follows:”

71.128.35.13 (talk) 23:23, 25 May 2014 (UTC)

Sorry for assuming your lack of justification of a revert was because u "just didn't like it". I should have just said you shouldn't revert without explanation and have so amended my edit.
May I suggest u intent your responses so that a flow can be followed (perhaps as a newbie u were not aware of this practice).
May I also suggest u also “Assume good faith” and "Avoid personal attacks” - yr "self defense" is both and for the most part wrong.
Moore's Law relates to transistor density (doubles every two years) and is apposite to magnetic areal density. The section of the Moore's Law article I referenced predates your contribution and clearly relates to density improvements and neither price decline or performance improvements as u assert in your statement above. If you examine carefully the Whyte (2009) areal density graph u will see that HDD AD has a somewhat higher slope than the therein depicted 40% Moore's law slope (presumably 43%) - about 1 order of magnitude better over 54 years, albeit tracking "almost scarily.". So I think the current statement accurately and fairly describes Whyte. If u dispute this I will be happy to Photoshop the graph to prove the point. On the other hand I can live with "not substantively different" particularly if one uses 43% per year instead of 40% per year.
Competition is already covered in the lede so all that is necessary removing the word "potentially" which I have done. Tom94022 (talk)
Made some corrections above after more carefully examining Whyte. Tom94022 (talk) 17:06, 27 May 2014 (UTC)
Are you comparing Moore's law against Whyte(2009) disk areal density 1956-2010 CAGR? For "precision," Moore's law should be defined as “doubles every two years,” as you have indicated. Not 43%. (sqrt(2) – 1) = +41% CAGR = "doubles every two years" I see that I erred in claiming mainstream adoption has been reached for CPP_GMR.
71.128.35.13 (talk) 20:51, 27 May 2014 (UTC)
Actually Whyte compares Moore's law to HDD AD growth in his second graph, which BTW, I have asked for his permission to reproduce in Wikipedia. I agree that ML is a CAGR of 41% and have so changed the article (not sure how I got 43%). FWIW, if, for example, u connect the [[History_of_IBM_magnetic_disk_drives#IBM.27s_first_HDD_versus_its_last_HDDs|end points of IBMs magnetic disk drive products] you will get a CAGR of 47% (if I did the math right), somewhat more than Moore's Law and note that in 2002 IBM did not have the highest AD. Tom94022 (talk) 22:15, 27 May 2014 (UTC)
You directed me to Whyte(2009) as follows: “If you examine carefully the Whyte (2009) areal density graph..." The slope of Whyte(2009) is 42% CAGR calculated as follows:
Starting date for all comparisons = 1956 Interval (years) = 0 Density = 0.002
Date2010 Interval (years) = 54 Density = 400000 CAGR = 42% = ((exp(ln(400000 / 0.002) / 54)) - 1)

IBM (2002) offers a variety of numbers from which to cherrypick.
Date2002 Interval (years) = 46 Density = 26263 CAGR = 43% = ((exp(ln(26263 / 0.002) / 46)) - 1)
Date2002 Interval (years) = 46 Density = 46300 CAGR = 45% = ((exp(ln(46300 / 0.002) / 46)) - 1)
Date2002 Interval (years) = 46 Density = 70000 CAGR = 46% = ((exp(ln(70000 / 0.002) / 46)) - 1)

If you Photoshop the Whyte(2009) chart you will confirm 42% CAGR. Going back to IBM(2002) still doesn't move the CAGR very far from Whyte(2009). As originally defined over 1956-2010 (without moving the goalposts and truncating to 1956-2002), a fair and accurate comparison is as follows: Whyte(2009) areal density 42% CAGR is not substantively different from Moore's law 41% (doubling every two years).
IBM(2002) is found at History_of_IBM_magnetic_disk_drives#IBM.27s_first_HDD_versus_its_last_HDDs
Whyte(2009) is found at https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/resource/BLOGS_UPLOADED_IMAGES/areal_2.jpg 71.128.35.13 (talk) 20:19, 28 May 2014 (UTC)
My eyeball estimate from Whyte was that after 54 years HDD AD was about one order of magnitude higher than a ML growth. Yr estimate of 42% turns out to be 46% higher than ML growth - 1% better over 54 years does matter. Also I expect u estimated the AD in 1954 and 2010; in fact the maximum AD in 2010 was 635 Gb/insq yielding a 44% GAGR which over 54 years would be 3 times what would have been achieved with a ML CAGR. Furthermore connecting the endpoints tends to understate the CAGR that would be achieved by a best fit straight line. Thus, I think the evidences supports characterizing long term HDD CAGR as somewhat or slightly higher the the Moore's Law CAGR. Tom94022 (talk) 01:36, 29 May 2014 (UTC)
Without careful accounting, the goalposts on the Santa Teresa hills could become “unmoored.”
Moore's law as defined originally: “doubling every two years” or 41% CAGR
Goalpost moved: Whyte(2009) Moore's law line has 34% CAGR
This won't work: Moore's law was and is "doubling every two years” by definition.

Comparison as defined originally: CAGR (% per year)
Goalpost moved: Ratio of the areal densities, compounded over 54 years. (1 + 42%, which is 1% more than Moore's law)^54 / (1 + 41%)^54 - 1 = 46% greater than the base case (a unit-less ratio)
It would be unfair and inaccurate to move to density ratio instead of CAGR, like looking at distance rather than speed. The two cannot be compared numerically, because they do not share the same units: CAGR has units of inverse time (% per year), but the ratio is unit-less (no units at all). It's gibberish, and magnifies a small or non-existent difference in slope into a huge difference in areal density over "just" half a century. Could one compare the slope in angular degrees/radians of El Capitan to the height in meters of Mauna Kea, or Al Shugart's bar tab in dollars to Larry Ellison's yacht length in feet? (Still, I've heard Shugart's was bigger.)

Comparison as defined originally: Whyte(2009), 400 Gb/insq
Goalpost moved: Unearthed new 2010 density data from that same IBM Almaden research facility: 2009 areal density of 520 Gbit/in2 and 2010 density of 635 Gbit/in². This would boost CAGR slightly, (635 / 400)^(1 / 54) – 1 = 0.9% per year. Less than one percent, but every point counts if the customer buys into compounding over half a century.
No, foraging for new data is not acceptable. There may be tasty data exceeding 42% CAGR to be picked on the Almaden foothills overlooking the cherry orchards on Cottle Road that gave birth to the hard disk drive. Jim Porter likely was present at the delivery. Regardless, the fair comparison as defined originally is Whyte(2009) 400 Gbit/in2.

Comparison as defined originally: Use two endpoints, or fit to all the data points by least-squares instead? Actually, this wasn't really nailed down in the first place. But, the two 42% CAGR endpoints aren't tasty; these cherries might not be ripe.
Goalpost moved: Fit instead with least squares instead, because just looking at the two endpoints could understate CAGR. Extracting the Whyte(2009) data would surely be tedious, so this fallback strategy should stay hidden among the Blossom Valley trees, never put to a test.
Regardless, I'd rise to the challenge and look at a least-squares fit. Could you Photoshop extract the Whyte(2009) data points which lie between the two fixed endpoints (year 1956, density 0.002; and year 2010, density = 400)? I'd extract by hand, fit CAGR with least squares, and compare fairly and accurately Whyte(2009) versus Moore's law “doubles every two years.”

The long-standing (prior to 25 May) phrase was "not substantively different"; (over-) extended on 00:15, 25 May 2014 to "somewhat higher". It would save me work and you Photoshopping, if we meet in the middle with “not substantially/substantively different” or just “similar to”.71.128.35.13 (talk) 18:54, 29 May 2014 (UTC)

We agree that Whyte's graph depicts an HDD long term AD CAGR of 42% which is 2.4% greater) than 41%.
We agree that if one uses actual endpoint data for the same period the HDD LT AD CAGR is 44% which is 7.3% greater than 41%
We agree that using actual IBM data for a slightly shorter period the HDD LT AD CAGR is 46% which is 12% greater than 41%
I have seen no numbers that suggest the HDD LT AD CAGR is less than or equal to 41%. Before all this analysis I was willing to return to the original “not substantially/substantively different” but you reverted that language insisting upon evidence. After this analysis the evidence does suggest to me that "somewhat/slightly higher" is more accurate so I am reluctant to return to the less accurate original language. Frankly the new language does place the technical achievement of the HDD industry in an interesting light - exceeding Moore's Law for a longer period of time than the semiconductor industry has tracked Moore's Law is notable. Tom94022 (talk) 05:17, 30 May 2014 (UTC)
One more small point, Whyte's article is dated 2009 so his last data point is 2009 not 2010 giving a CAGR of 43.4% again somewhat/slightly higher than 41%. I suppose I can change the date range in the article to 1956-2009 Tom94022 (talk) 05:58, 30 May 2014 (UTC)
Here, you will see numbers which do indeed show that the HDD long term areal density CAGR is, in fact, less than or equal to Moore's law doubling every two years. You sought to fit longer term data by least squares, and you cited 1956-2010 IBM(2010) Almaden data from Fontana Jr., Decad and Hetzler. The very same researchers, Decad, Fontana and Hetzler (IBM(2013)), have released data through year end 2012. IBM(2013) is found here: http://www.digitalpreservation.gov/meetings/documents/storage13/GaryDecad_Technology.pdf
Year end 2012 = 750 Gbit/in2
Year end 2011 = 750 Gbit/in²
Year end 2010 = 635 Gbit/in2
Year end 2010 is identical to IBM(2010), which also reports 635 Gbit/in2
Year end 2009 = 530 Gbit/in2
Year end 2008 = 380 Gbit/in2

Longer term 1956-2012, looking at just the two endpoints,
Date=1956 Interval (years) = 0 Density = 0.002
Date=2012 Interval (years) = 56 Density = 750000
Just the two endpoints gives CAGR = 42.3%

All the IBM data (including the earlier IBM data from Whyte(2009) and http://media.bestofmicro.com/,7-V-303547-3.jpg) are shown below. Year code 1956.68 is September 4, 1956 when RAMAC was introduced. Year code 2013 is the end of year 2012.

1956.68 2.0E-06
1957 2.3E-06 1977 3.9E-03 1997 1.9E+00
1958 4.7E-06 1978 5.1E-03 1998 3.6E+00
1959 9.8E-06 1979 6.3E-03 1999 6.9E+00
1960 2.0E-05 1980 7.7E-03 2000 1.5E+01
1961 3.5E-05 1981 9.5E-03 2001 3.2E+01

1962 4.7E-05 1982 1.2E-02 2002 6.8E+01
1963 6.3E-05 1983 1.4E-02 2003 8.1E+01
1964 8.4E-05 1984 1.8E-02 2004 9.8E+01
1965 1.1E-04 1985 2.2E-02 2005 1.3E+02
1966 1.5E-04 1986 2.6E-02 2006 1.8E+02

1967 2.0E-04 1987 3.2E-02 2007 2.3E+02
1968 2.7E-04 1988 4.0E-02 2008 3.1E+02
1969 3.7E-04 1989 4.9E-02 2009 3.8E+02
1970 4.9E-04 1990 6.0E-02 2010 5.3E+02
1971 6.6E-04 1991 7.7E-02 2011 6.35E+02

1972 8.9E-04 1992 1.3E-01 2012 7.5E+02
1973 1.2E-03 1993 2.1E-01 2013 7.5E+02
1974 1.6E-03 1994 3.5E-01
1975 2.1E-03 1995 5.9E-01
1976 2.9E-03 1996 1.0E+00

Now, fitting to all these data points by least squares would be more fair and more accurate than just fitting to the endpoints, and this has been done:
Areal density least squares fit using all 58 data points, CAGR = 40.66%
Recall that Moore's law is defined as (sqrt(2) – 1) = 41.42% per year CAGR

In conclusion, HDD slope is 40.66% per year and Moore's is defined as 41.42% per year. Nonetheless, I'd say they are similar and statistically indistinguishable.

By the way, at least two of the recent IBM data from 2010 and 2012 are just laboratory demonstrations, not shipping products. IBM RAMAC in 1956 was a product. It would not be fair to compare a product like RAMAC with a lab demo. This lab_demo versus real_product confusion is widespread. According to, http://www.storagenewsletter.com/rubriques/market-reportsresearch/ihs-isuppli-storage-space/
“in 2010, the highest areal density that could be achieved for a platter amounted to 550Gb per square inch.” This compares to 635 Gbit/in2 from IBM(2013). The lab_demo/product ratio is 635/550 = 1.15
According to Seagate's press release in 2012 http://www.seagate.com/about/newsroom/press-releases/terabit-milestone-storage-seagate-master-pr/
early 2012 saw 620 Gbit/in2 products. This compares to 750 Gbit/in2 given by IBM(2013). The lab_demo/product ratio is 750/635 = 1.18
A demo/product correction factor of 1.16 will be used. This reduces the slope a bit (less than one-half of one percent), just looking at the two endpoints as follows:
Date=2012 Interval (years) = 56 Density = 750000/1.16 CAGR = 41.9%
Date=1956 Interval (years) = 0 Density = 0.002
Date=2012 Interval (years) = 56 Density = 750000 CAGR = 42.3%

Restatement of conclusion: Areal density CAGR during 1956-2012, looking at real products and excluding laboratory demonstrations, is slightly less 40.66% per year, not substantially/substantively different than Moore's law 41.42% per year (doubling every two years). No statistical evidence has been presented to support the claim that the HDD as compared to Moore's slopes have a "statistically significant" margin of difference.71.128.35.13 (talk) 19:46, 30 May 2014 (UTC)

May I suggest you are not only unmooring the goalpost when you extend the interval beyond 2010, you are also venturing into original research when you concatenate different data sources, extract data points from graphs, adjust data points and then attempt to draw conclusions therefrom. First point is that we know that since about 2005 HDD AD CAGR has been less than 41%/year so each year you extend beyond Whyte will reduce the LT CAGR - why not go all the way to May 30, 2014, to make a point? More importantly your ability to extract accurate data from these low resolution graphs is limited so one can only conclude that your analysis fails to prove your hypothesis that the LT HDD AD CAGR is less than Moore's law. The endpoint data on the other hand are accurate. If you want to get the actual data and run it thru 2010 (or 2009) then you might have a point. Whyte data show a LT AD CAGR exceeding ML from 1956 to 2010 and the endpoint data confirm it to 2009 and 2010. Other accurate endpoint tests also confirm LT AD CAGR exceeding ML from 1956 to 2003. We have a confirmed reliable source for the LT HDD AD CAGR (1956-2010) exceeds ML; if u can find a reliable source that says otherwise then you might have a point. Tom94022 (talk) 02:40, 31 May 2014 (UTC)
One last point about yr data - apparently u have added points from other trend lines, e.g., 1957 to 1961 - after RAMAC in 1956 I think the next data point might be 1962 or so. This is statistically invalid as only the actual data points should be used to fit a straight line. Tom94022 (talk) 02:52, 31 May 2014 (UTC)
You point to an “hypothesis that the LT HDD AD CAGR is less than Moore's law.” But, this never had to be proven, and the evidence for this is no better than the evidence for the reverse. Both ways, it's too fuzzy to call.

Only the following is disputed: firstly, the original HDD article before 25 May had “not substantively different from Moore's Law.” Secondly, the new version after 25 May claims “somewhat higher”. The issue isn't the value, positive or negative, of the difference. It's simply that there is no statistical justification to distinguish between the two, one way or the other. Though I'm still open to looking at your data, as you wrote, “If u dispute this I will be happy to Photoshop the graph to prove the point.”

Your deconstruction of the data shows that the slopes have a lot of slop. Moore's law is exact by definition only, not in reality. As Gordon Moore (1995) wrote of Moore's law, “I did not expect much precision in this estimate.” I'd put Kryder's law (1990-2010 or even 1956-2013) at 25-100% per year. Not 40.66% per year. Around 40% per year historically would even be fine by me. The author of the original version of the article set the bar, long before 25 May, pretty low. Where it should be. It's easy to demonstrate insufficient support for a claim, but harder to prove it. Given the weak support for the aggressive claim of distinguishability, it would be prudent to return to the long-standing uncontroversial formulation.71.128.35.13 (talk) 18:19, 31 May 2014 (UTC)

Actually it is your hypothesis that the LT HDD AD CAGR is less than or equal to Moore's law that has to be proven since the evidence from a reliable source, Whyte, is that over the time period 1956 - 2009 it exceeded ML. So far you have only confirmed Whyte. I have a longer response in mind but I am pretty busy this week so I probably won't post it until next weekend. Tom94022 (talk) 16:14, 2 June 2014 (UTC)
BTW, I take it you agree that your analysis of "All the IBM data ..." is statistically flawed; if so may I suggest u delete it or at least strike it? Tom94022 (talk) 16:20, 2 June 2014 (UTC)
I'd delete those (few) early years with missing data, and fit by least squares again. The results wouldn't change drastically, I expect. I'll try later, when I've time.
Your claim is that the two slopes are significantly, in the statistical sense, different. (“somewhat higher”) Mine is that there is insufficient evidence to show that long term Kryder's law and Moore's law (both fuzzy in the real world) slopes are different. (“not substantively different”) That's the text before 25 May.

I oppose the change of 25 May, because I've no good evidence to support a “significant” difference or “somewhat higher” slope. The difference in slope (pos or neg) could go either way.71.128.35.13 (talk) 03:49, 3 June 2014 (UTC)

Whyte has sparse data points from 1956 thru 1992, everything else should be eliminated. Then from 1992 thru 2004 many of his data points overlap which will make determining their values difficult. Good luck - I expect when u are done the best fit straight line will have a slope greater than ML - all u will do is confirm what is visually presented. Tom94022 (talk) 05:43, 3 June 2014 (UTC)
You point to sparseness of Whyte(2009) data for the early decades, and by extension the sparseness of IBM, Grochowski and most of the available the density data. Grochowski (2003)
http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/grochowski.pdf
shows rapid density acceleration in the 1990s and very steep acceleration before the mid-1960s when the industry was just starting. This is the same density story as Whyte(2009).

I do take areal density with a grain of salt, and not just because of the lab_demo versus shipping_product confusion. The data sources are few, because IBM/HGST produced much of the storage and have disseminated their version of the story widely for decades. Therefore density is difficult to corroborate independently. Density is a technical parameter, a step removed from real prices. Importantly, prices have more economic relevance and reality to users, buyers, the producer price index, labor productivity and the national GDP. Density is less generalizable than price: it's harder to measure the areal density of a ferrite core or a flipflop or a DRAM. Density could be another rabbit hole (or more commonly a squirrel hole) atop the Santa Teresa hills near the green Almaden Research oasis. Prices, by the same token, would be more realistic if they were adjusted for quality. But this is what we've got.

No worries. Since there are quite enough data, I won't need a lot of luck. McCallum(2014) has independent retail-level (or list price in the early years) prices that go back decades.
http://www.jcmit.com/disk2014.htm
http://www.jcmit.com/diskprice.htm

The magnetic storage price trend post 1980 slope parallels semiconductor Moore's law (semiconductor is flash since around 2007, preceded by DRAM, preceded by older forms of rapid-access-main_memory devices like flipflops and ferrite core) price slope. During the 1980s and again now, HDD prices were one order of magnitude better than the closest DRAM/flash semiconductor alternative. Theses two slopes are parallel. The non-magnetic price slope holds all the way back to the 1950s, around -38%/year (1957-2014). HDD prices improved very rapidly by -55% per year (not quite -60% to -100% per year) from the early-1990s to early-2000s. Prices continued to improve at the very strong pace of -47% per year during 2000-2009. HDD prices since January 2010 (two years before the floods) to now (two and a half years after the floods) have improved by only 10% per year.

Conclusion: during the last three decades, the HDD price (not density) slope has not been substantively different from the semiconductor price slope. Prior to the mid-1970s, magnetic storage price improvement lagged. Particularly since 2010, the HDD price slope around -10% per year has been very much slower than flash memory (Moore's law) slope around -36% per year. In the last year or so HDD prices improved (dropped) by slightly better than 20%. Long term over five decades, I've still no good evidence to support a “significant” difference or “somewhat higher” slope for HDD price progress over Moore's price progress. The difference (pos or neg) could go either way.

I dispute the new "somewhat higher" slope claim of 25 May.71.128.35.13 (talk) 20:41, 3 June 2014 (UTC)

WP:3 has been implemented for dispute resolution.71.128.35.13 (talk) 19:33, 6 June 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Even Plumer supports an AD CAGR greater than Moore's law from 1955 thru 2005 - let me spell it out for you:

  • "Compound Annual Growth Rates (CAGR)of about 40% in the first 35 years."
  • "The growth in CAGR from 40% to 60% to 100% which began in the mid 1990s and spanned the following several years (Fig 1)Fig 1 depicts 100%/year from 1998-2002

Whyte shows the 60-100% CAGR extending from the early 1992 to 2004. If u combine 38 years of 40% with 12 years of 60-100% you will get about 49% per year which well exceeds Moore's Law. My quick and dirty calculation says that it takes only two years of 80% per year to raise 40%/year to 41.4% over a 50 year span. So given the periods of very high growth rates it seems obvious that HDD AD CAGR had to exceed ML (at least until very recently). You seem to prefer Plumer over Kryder - in 2005 Plumer was an engineer at Seagate while Kryder was Seagate's Chief Technical Officer and Senior Vice President, Research and University Professor of Electrical and Computer Engineering, Carnegie Mellon University. I think Kryder quoted is at least as reliable a source as a Plumer paper. Finally Plumer's approximately 40% only covers the first 35 years and not 1955-2005 so it doesn't contradict any statement about such a longer period, particularly since he agrees that for many years there after the rate substantially exceeded a Moore's Law rate. Tom94022 (talk) 09:15, 25 July 2014 (UTC)

You have cited Walter (2005), but this remains unverified. You say Kryder is an expert. While correct, this is irrelevant. Kryder did not write the headline nor the Walter (2005) article. The 5 decade interval is 1956–2010; no reason to switch to 2005. You cry Kryder, Kryder, Kryder, but what does Kryder say in the Walter (2005) citation with respect to the five decades? Nothing, specifically. Where is the “Kryder quoted” to which you refer? Nowhere, exactly. So the question remains: which part of Walter (2005) supports the five decade claim?

Your parsing of Plumer (2011) represents original research WP:OR. I need not spell it out for you, instead Plumer will: “In order to achieve the approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years, several key technology innovations have been employed." Here, Plumer says 40% and 50 years. You now characterize Plumer as a low-level engineer, and move from an appeal to Kryder's authority to an ad hominem attack. Regardless of organizational rank, Plumer, a magnetic storage engineer in 2011, is a more reliable source for 1956–2010 technology assessment than Walter, a writer in 2005.

Plumer did not say 49%; you do, using WP:OR methods. But in view of the high degree of statistical uncertainty, even your WP:OR fabricated 49% is similar to the Moore's law rate. Not “higher than;” not “well exceeds.” Plumer's “approximate 40%” has one significant digit, and does not disagree with 49%. It's not precisely 40.00%. Moore's law is so fuzzy that some quote a 24-month doubling time and others 18 months.

“Once more unto the breach, dear friends, once more,” to quote some writer. Moore's law progress was similar to areal density growth during 1956–2010. It stands once more, twice more, a dozen times more, at least for a few years until data for 1956–2016 are available. The Wikipedia article was correct prior to your 25 May edit: “not substantially different than the 40% per year Moore's Law growth.”

“Or close the wall up” (the very next line from that same passage) with a reference that can stand up to verification, not with job titles from a Rolodex.71.128.35.13 (talk) 20:14, 26 July 2014 (UTC)

  1. An interview by Walter of Mark Kryder published in Scientific American is a reliable source and does not require verification. Nor is it necessary to have specific quotes; Walter's paraphrasing of Kryder is sufficient. Unless u can find something by Kryder disavowing Walter there is no reason to exclude it nor not attribute it to Kryder.
  2. Routine calculations do not count as original research, provided there is consensus among editors that the result of the calculation is obvious, correct, and a meaningful reflection of the sources. You will probably not agree that my calculations are routine, etc. even though you have used such calculations in your several failed attempts to show a CAGR <41%. Perhaps some other editors will join the discussion
  3. Apparently you agree that in some contexts 49% is approximately Moore's law, so that u must agree than approximately 40% can NOT disprove an hypothesis that the CAGR of areal density was greater than Moore's law thru at least 2006. In which case u should stop reverting or changing on the basis of Plumer.
  4. Moore's law is not fuzzy - doubling every 24 months = 41.4% CAGR. The observed data trend is either greater than or less than that number. Whyte has about 60 data points - unfortunately we don't have the underlying data but we must presume good faith. AD is known with great precision and accuracy, shipping date is less accurate, but for you to say there is no statistical difference between the Whyte trend and ML is sophistry.
  5. The reason to cut back the observation from Whyte to 2005 or 2006 is that where there is the latest clear kink in the curve; but note even extended to 2010 Whyte AD is above ML. The point is that since the mid 2000s HDD AD has not been progressing as fast as it had done so in the past. I think this is the message the reader needs to hear.
Actually this whole thing started when u, without explanation, removed all reference to Moore's Law and then refused to accept it until confronted with Whyte among others. I think I actually wrote the original phrase, "not substantially different" but only after u forced a reference did I find the evidence that HDD AD has actually out-performed ML over a long period of time, at least until recently. Perhaps other editor's seeing this now the posted graphic from Whyte will help them see this also.

Leading edge hard disk drive areal densities from 1956 thru 2009 compared to Moore's Law.

BTW, please don't argue that my annotation of Whyte is original research - it's graphically what u did in one failed attempt rebut the hypothesis that HDD AD CAGR > Moore's Law thru at least 2006.Tom94022 (talk) 22:38, 26 July 2014 (UTC)


On the other hand, the ability of the magnetic disk people to continue to increase the density is flabbergasting--that has moved at least as fast as the semiconductor complexity.

Gordon Moore, PC Magazine March 25, 1997
Given that the HDD AD continued to double annually for 5 or more years after this quote, it does lend qualitative support to an HDD AD CAGR > Moore's Law thru the mid-2000s. Tom94022 (talk) 00:54, 27 July 2014 (UTC)
BTW, the policy of no original research does not apply to talk pages; I offered my off the cuff calculations in the hope they might lead so a consensus not for inclusion in the article. Tom94022 (talk) 04:53, 27 July 2014 (UTC)
In conclusion, putting together Moore, Kryder and Whyte gives reliable sources for:

During its first fifty years (1956 – 2006) HDD areal density increased at a flabbergastingly rapid rate, likely exceeding the 41% compound annual growth rate (CAGR) of Moore’s Law but the growth rate decreased substantially thereafter and most recently the CAGR has been in the range of 8-12%.

Proposed lede for Future development section
Tom94022 (talk) 17:06, 27 July 2014 (UTC)
The proposed lede is wrong, very wrong. The areal density on the Whyte graphic is “similar to” not “likely exceeding” Moores law (41.4% per year). There is no “flabbergastingly rapid” rate, if one considers areal density in the context of Moore's law. This breathless prose, this hype, violates WP:NPOV. The areal density slowdown began in the early 2000s, not the late 2000s. Let's look under the covers of a few pieces of the sophistry below.

Illka Tuomi, who was the Chief Scientist at a large company, has shown that Moore's law has very large error bars. http://firstmonday.org/ojs/index.php/fm/article/view/1000/921

The graphic brings 42 observable data points that begin in 1956 and end in 2009, and has a green line for 41% per year Moore's law slope. The caption states this interval is 53 years. You are leading us to believe that the blue areal density trend is “somewhat higher than” the slope of the green line. But is it really higher?

Let's see what this graphic actually says. An image processing routine has found the value of each data point, and they are listed below.

Year, Areal density (Gb/in.sq)
1956.8 2.06E-6 1992.9 2.75E-1 2001.4 2.54E+1
1962.6 5.12E-5 1993.5 3.82E-1 2001.8 2.71E+1
1964.5 9.86E-5 1994.5 5.30E-1 2001.5 3.53E+1
1965.6 2.16E-4 1995.0 6.46E-1 2002.9 4.59E+1
1970.6 8.04E-4 1995.9 8.40E-1 2003.0 5.96E+1

1973.6 1.55E-3 1996.5 1.42E+0 2003.1 5.23E+1
1975.5 3.19E-3 1997.2 1.51E+0 2003.9 8.83E+1
1979.9 7.98E-3 1997.4 2.56E+0 2005.1 1.15E+2
1982.3 1.26E-2 1997.8 3.12E+0 2006.3 1.31E+2
1985.2 2.28E-2 1998.1 3.79E+0 2007.9 1.59E+2

1987.9 3.85E-2 1998.3 4.33E+0 2008.5 2.44E+2
1990.0 6.51E-2 1998.8 5.27E+0 2009.8 3.07E+2
1991.5 1.03E-1 1999.0 6.00E+0
1991.8 1.43E-1 1999.8 1.08E+1
1992.7 1.63E-1 2000.5 1.83E+1

While both the graphic and simple arithmetic support “at times greatly exceeded Moore's law growth” this claim is vague and has no upper limit. Fitting routinely by least squares indicates 86% per year for 1995–2000, and the graphic confirms that the rate of progress reached 60–100% then.

The areal density slope fitted routinely by least squares is 40.9% per year, for all 42 observed data points, as charted here: http://postimg.org/image/8micd1wyl/ The slope is 40.5% per year for the 1956–2006 subset.

We are led to conclude that density growth during 1956–2009 (and 1956–2006) was similar to the Moore's law rate of 41.4% per year. The data show the slowdown began in the early 2000s, not the late 2000s.71.128.35.13 (talk) 21:04, 27 July 2014 (UTC)

You continue to confuse a trend line with the actual data points; any given data point may be above or below a trend line. It is indisputable that the actual growth rate for AD from 2005-2009 when measured from the shipment of the prototype RAMAC to that of the state-or-the-art AD exceeds an annual compound rate of 41.4% for each of the years 2005-2009 as shown in Whyte. It is true thereafter using the data points from other sources. That is shown by the graphic and is confirmed by your data.
It is not reasonable to draw your precise conclusions from your trend line analysis of Whyte since the data points are likely to be imprecise due to if nothing else error introduced by translation of these points into an image and compressing it into a jpg. For example, the RAMAC point:
' Date Areal Density
Ramac Actual 1956.70 2.00E-06
Whyte 2 by me 1956.82 2.02E-06
Whyte 1 by me 1956.30 1.99E-06
Whyte by IP 1956.80 2.06E-06
Each of your derived data points suffer from such errors so that any trend line is at best approximate. It is also original research in that there is no consensus that this is reasonable transformation. Regardless, any trend line is not data and so irrelevant to this question
Wikipedia is not a copying machine nor do we editors necessarily have to precisely describe a reference. 2005 or 2006 is clearly a breaking point and therefore it is acceptable even desirous to reflect that date in the section lede. The only reason I can see to break at 2009 or 2010 per the end of Whyte is to extend time giving a slightly lower CAGR from the beginning (slightly lower but still >ML CAGR), which while true is not particularly helpful to the reader. Dividing the times discussed into 1956-2006 and then 2006 to present makes good sense in the context of Whyte and many other reliable sources.
Although there is a reliable source for "flabergastingly" it is not necessary so I will move it to a foot note but you have yet again failed to provide any evidence beyond your point of view and your original research so I will again change the lede along these lines. Tom94022 (talk) 19:21, 5 August 2014 (UTC)
Ignorance of least squares regression is not an option if one is make a convincing quantitative assessment of dissimilarity. The technique is well described in Wikipedia. Rhetoric and endpoints won't do. Because each point has measurement error, least squares regression includes all the data points, not just the endpoints. Even if one were to (incorrectly, sloppily, perhaps even deceptively) cherry-pick the endpoints (1956-2009), this would still be 43% per year: similar to Moore's law. It is not reasonable to exclude those 40 data points between the endpoints. The slope of all the data points (40.9% per year) is similar to Moore's law.

At the same time, Tuomi showed that Moore's law has very large error bars. His article is almost novella-length at 14,000 words, but education isn't free: often it takes a substantial investment of time. Tuomi published in a peer-reviewed journal, not a blog or company-sponsored marketing slides.
TUOMI, Ilkka. The Lives and Death of Moore's Law. First Monday, [S.l.], nov. 2002. ISSN 13960466. Available at: <http://firstmonday.org/ojs/index.php/fm/article/view/1000/921>. Date accessed: 05 Aug. 2014. doi:10.5210/fm.v7i11.1000. The abstract follows:

Moore's law is fuzzy indeed, and areal density has grown at a rate similar to Moore's law over the span of more than five decades as noted explicitly by Plumer (2011).
71.128.35.13 (talk) 22:40, 5 August 2014 (UTC)
I'm not sure of the relevance of your lecture on least squares regression and your citation to Tuomi. You continue to misuse trend lines instead of the actual data points. BTW I can't find any error bars in Tuomi; he shows that the data points have very large deviations (errors) from the trend line, not the other way around. A compound growth rate on semi log paper is a straight line. A trend line might have a bound if one entered the data points with their error bars and ran some sort of Monte Carlo analysis to establish confidence intervals. But this would still be irrelevant to the simple mathematical statement that 43 > 41.4.
You agree that the AD CAGR from RAMAC to 2009 is 43%. I am sure u will agree that it is greater than 43% for each of the years 2005 thru 2007. Surely you cannot deny that 43 >< 41.4. There is very little error in the two points, mainly in the dates so the difference between 41.3 and >43 is significant. Since we appear to be in agreement, that should end this discussion. Tom94022 (talk) 03:26, 6 August 2014 (UTC)

No agreement could be apparent to any rational observer. Clearly, this is disputed material. While there is a desire to terminate the discussion unilaterally, the desire is unrealistic and termination would be shortsighted. To paraphrase wikipedia guidelines, no agreement is required; consensus does not mean unanimity which is not always achievable; nor is it the result of a vote. Further discussion, however unwelcome on the part of certain editor(s), may provide future editors with insight into the two alternatives: density growth over five decades that was “similar to” or “somewhat exceeding” Moore's law.
I can and do deny that "43 < 41.4"; because, mathematically this is wrong. In fact given the uncertainties that limit our confidence in this comparison, 43 is approximately equal to 41.4, and the figure of 41.4 should be seen as 40-ish not as precisely 41.421356237. Furthermore, 43 just relies on two endpoints: linear regression using all the data points gives 40.9 which is 41-ish or 40-ish.
In the first place, the slope of Moore's law itself is fuzzy according to Tuomi. It's not 41.4214% per year.The Lives and Death of Moore's Law – By Ilkka Tuomi
Secondly, Plumer (2011) found that areal density grew about 40% per year over the past 50 years: “In order to achieve the approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years, several key technology innovations have been employed." [1]
Thirdly, the linear regression slope of areal density for all 42 data points from the graphic (Whyte) is 40.9% per year, similar to Moore's law.
Fourth and finally, Marchon (2013) indicates that areal density grew at “the historical, Moore's law equivalent of ~40%/annum.”[2] To repeat: “the historical, Moore's law equivalent of ~40%/annum.” With emphasis added to make this even more apparent: “Moore's law EQUIVALENT” according to Marchon of HGST and co-authors Pitchford of Seagate and Hsia of Western Digital.
Therefore according to multiple credible sources, namely Tuomi, Plumer(2011), the Whyte graphic, and very particularly and specifically Marchon(2013), areal density grew at a rate similar to (“equivalent” according to Marchon et al.) Moore's law over the interval of more than five decades. 71.128.35.14 (talk) 18:08, 6 August 2014 (UTC)

I think 43 < > 41.4 is obvious from the data underlying 43 but since you assert there are uncertainties please state what u think are the uncertainties in your 43% and I will do a worst case analysis to show the lowest possible bound. Addressing your points by number:
  1. You misstate the implications of Toumi. The slope of Moore's Law is not fuzzy; the performance of the semiconductor industry has deviated from a "Moore's Law" and Moore's Law has not always been doubling in 24 months. He has nothing relevant about the CAGR of HDD AD from RAMAC to any specific date.
  2. Plumer's approximately 40% could be an actual number as high as 44.5% which again has no relevance to the actual number.
  3. You continue to confuse a trend line with the actual data. Furthermore, your trend line derived from Whyte has inherent noise you introduced that does not allow any meaningful comparison to 41.4....
  4. Marchon's “the historical, Moore's law equivalent of ~40%/annum” is not qualified as to time, just an ambiguous historical. The only graphic goes only to 1990. Since the period he is referencing is unknown it is your unsupported assertion applies it back to RAMAC.
Accordingly, none of your multiple credible sources, namely Tuomi, Plumer(2011), the Whyte graphic, and Marchon(2013) deny that the AD CAGR from RAMAC to 2006 was according to you 43%. While it is likely true that a trend line thru the leading edge AD data points to 2006 has a CAGR of about 41.4%, it is also possible that such a trend line may exceed 41.4%. You have not identified such a trend line, your trend line derived from Whyte is fatally flawed for this analysis by your methodology and most important it is not clear that a trend line is anymore relevant than a two point analysis, particularly since any two points of Whyte can be determined with a high degree of precision and none of the points are outliers. Tom94022 (talk) 02:42, 7 August 2014 (UTC)

A factor of 100 error in price improvement (30 billion versus 300 million) was today passed off as fact, and inserted into the article. Note that these errors are never random: they invariably lead in the direction of overoptimism and industry booster-ism.
Logic and mathematics are debased across the board here in the service of sophistry. Reality plays no part in this circus of denial. But numbers really do mean something. The initial claim of “Surely you cannot deny that 43 < 41.4” was refuted plainly “I can and do deny that '43 < 41.4'; because, mathematically this is wrong.” Sticking to your guns while keeping your head buried in the sand, you now claim anew that “I think 43 < 41.4 is obvious from the data ...”
No error, no matter how glaring, is ever admitted or corrected. Are we to believe that 4300000 < 41? How about 2 < 1?
71.128.35.13 (talk) 19:39, 7 August 2014 (UTC)
Thanks for correcting my math error. I seem to recall you reminded me of Wikipedia's policy of WP:CIV civility; please act as you preach. When I make a mistake I correct it as I would have done on the price issue and have done on the "<". Do you deny that 43.0 is greater than 41.4? You really don't respond to anything, just continue to produce more evidence that never supports your position which btw has always been to minimize the areal density growth. But the good news is your edit to the price section seems to be an admission that a two point analysis is meaningful. In the continuing absence of your response I shall shortly post a worst case two point analysis that the CAGR between RAMAC and 2006 is significantly greater than 41.4. Tom94022 (talk) 20:55, 7 August 2014 (UTC)
You may also recall my several earlier responses to this issue. Linear regression must calculate slope from all available valid data points, not just the endpoints while excluding all the data points in-between. This means all 42 points should be included for Whyte, and both of the points for the 300-million-fold calculation. By the way, two point slopes are subsumed by and fully incorporated in the mathematical technicque of linear regression.
Two-point linear regressions are only meaningful if only two points are present. Otherwise, the linear regression should include all available valid data points. Once again, you appear to have overlooked repeatedly my several earlier responses in a continuing misunderstanding of the mathematical requirements of linear regression. As you may recall this calculation has already been performed in a meaningful fashion (not the less-meaningful two-point version), and the slope of all 42 data points in the Whyte graphic is 40.9% per year.71.128.35.13 (talk) 21:29, 7 August 2014 (UTC)
May I repeat your admonishment "to look at the rules at the top of this page ... assume good faith ..." I fully understand linear regression, having used it many time in my career, usually to predict the future. I have repeatedly responded that I question first linear regression's relevance at all to this discussion (see e.g., the assumptions of linear regression) and secondly the usability of your data points derived from Whyte (also an issue in the Duke paper). Your data from Whyte shows that the actual growth from RAMAC to each of the years 2005-2009 exceeded a 41.4% CAGR as does the graphic now posted. You have yet to answer why u think the uncertainties of a two point analysis are such that one cannot drawn any conclusion as to its relationship to 41.4. Let's leave Moore out of this, why isn't your 43% statistically a number larger than 41.4? I will shortly post a proof that in the worst case the CAGR of AD from RAMAC to 2006 exceeds 41.4 by a significant amount. Tom94022 (talk) 22:20, 7 August 2014 (UTC)
You wrote that the growth rate for the 300-million-fold improvement “btw should have been 40% from 1956.42 to 2014.00”. Two decimal points might be overstating the precision of dating here. On 5 August you wrote that Ramac Actual was actually 1956.70. The 1956.42 date does not show up anywhere else, and no source ever dates RAMAC in the first half of 1956. It is always reported in the second half of the calendar year. So based on Ramac Actual in the second half of 1956, the growth rate should be 41% not 40% per year.
One cannot leave Moore out of this for the sake of expediency. Moore's law is very fuzzy, as Tuomi has shown in 14,000 words of detailed explanation.

Rather than answering the rather negative and inconvenient question of why not calculate slope from just two endpoints, I prefer to look at the positive side and will present instead “A Modest Proposal” for why one would use just two endpoints. Two-point forecasts are a well known tool of storage industry marketing professionals, consultants, and sales people. The technique has enormous advantages when applied to the Whyte graphic, because linear regression based on all data points not just the two endpoints restricts one's freedom to manipulate data and rig results.
Just by selecting the “right” endpoints and excluding all of the data points from the middle, the two-point method can deliver almost any slope the user wants. Let's start by considering the Whyte endpoints, 1956–2009. In order to maximize slope over many decades, it is best to select the 1956 start point because density was really quite dismal in the beginning. Renaming the starting point for marketing purposes, to enshrine it in the mind of the reader and set it in stone: RAMAC_Actual–2009 gives slope of 43% per year. Switching to various different endpoints in the 2006–2009 time frame shows that slope holds steady near 43% per year, so one can demonstrate an apparent and false pseudo-stability. Now let's see what would happen if the interval were reduced by six years: with 1956–2003.9 the slope jumps to 45%. Reducing the interval by lopping off six years from the start and looking at 1962.6 (the RAMAC 2 plus?)–2009 would reduce the slope to around 39%.
Linear regression with all the data points gives a single statistical best (meaning smallest margin of error), but not best for marketing, slope estimate of 40.9% per year, a far cry from the flexibility and multiplicity of different slopes offered by two-point forecasts. Experienced, wiley and cunning practitioners of the art consider many alternative endpoint scenarios to maximize the slope, looking for unusual upward spikes near the end or dips at the beginning like RAMAC Actual 1956 in order to cherry-pick a maximum slope.
Two-point forecasting certainly is an essential tool of any storage industry marketing and sales professional who seeks to tailor the data to fit the message, not the reverse. It works just as well on the rapidly growing flash memory market as it does on the stagnant magnetic storage market.
Obviously these mathematical techniques are not as important as setting strategic goals, targeting the right market, crafting a message that resonates with the audience and deploying that message across appropriate media to the customer audience. From the very first days of wikipedia, editors have frequently discussed the issue of being one of the media that can be employed, very cost effectively one might add, to deliver marketing messages. Social media is the hottest trend in advertising. Wikipedia remains relevant today in this context because it should properly be seen as a popular, respected and trusted form of social media.
On a literary note, as indicated above, the satirical hyperbole found in “A Modest Proposal” by Jonathan Swift (1729) relates directly to my intent here.
— Preceding unsigned comment added by 71.128.35.13 (talk) 19:35, 8 August 2014 (UTC)
If you bothered to [re]search you might discover that there a four possible dates for RAMAC, June 1956 when a prototype shipped to Zellerbach, Sept 4, 1956 when the RAMAC was announced within IBM, Sept 14, 1956, when the RAMAC was officially announced (prototypes previously installed) and finally "mid-1957" when production units were to be available. I prefer to use the June 14, 1956 date although there is an argument that the production date is July 1, 1957 +/- 45 or so days. Two decimal digits are appropriate when we have a specific date.
Correct me if I am wrong, but didn't u use a two point measurement in calculating a 40% CAGR of bytes/$ here in spite of having well over 300 data points at the reference you cited, Cost of Hard Drive Storage Space? I looked without success for 40% CAGR at the reference. Tom94022 (talk) 08:57, 11 August 2014 (UTC)


(un-indented)Don't look for any reference regarding 40% versus 41%. This observation is just two points in 2H1956 and YE2013 with a ratio of 300 million, not 30 billion, to one. The math is routine and verifiable. It is both foolish and astoundingly tendentious to pursue 40% instead of 41%, because this difference is not significant like that factor of one hundred pricing error, 300 million versus 30 billion.

Furthermore, we agree that dates in the second half of 1956 are supported solidly, though one date in the first half of 1956 may exist. Even if we were to take into consideration all four (4) date candidates that you now propose, the median date would still be in the second half of 1956. You are one of the select few IBM-specialist magnetic-storage historians, quite distinct from most readers of this article, who would be interested to hear that a conflicting date of 13 September instead of 14 September appears even today in wikipedia: “The IBM 350 disk storage unit, the first disk drive, was announced by IBM as a component of the IBM 305 RAMAC computer system on September 13, 1956.” All these citations are found in the second half of 1956.[3][4][5][6]

I wonder: does the one day difference in date, September 13 versus 14, result from time zone or international date line difficulties? Should this be dated in New York at the corporate headquarters and news media center, or in San Jose CA where they did the work before most readers were even born? Because these minutiae have no bearing on the quality of this article and serve only to distract and bother the editors, it would be counterproductive to over-[re]search the dates to two decimal places as you would have us do.

Would u mind explaining why your original two point calculation of 40% was an acceptable alternative to a trend line derived from the 300+ data points in your reference when you made this edit?
Since we are calculating various CAGRs to one decimal place for purpose of comparison, it is a good idea to use at least two on dates for such calculations. Excel doesn't care and the rounding off to one digit is free. Such precision has nothing to do with what dates appear in Wikipedia articles.
One day doesn't matter in calculations using 1/100th of a year (~+/- 2 days) but [re]search on your part might have discovered the 305/650 announcement was officially released on Sept 14, 1956 but it had likely been distributed to the press at an demonstration earlier than Sept 14 resulting in at least one newspaper printing an announcement on Sept 13. The press conference was on Sept 14. So it is not surprising that there is some confusion in the various sources that carries through to Wikipedia articles. Personally I think the press conference on Sept 14, 1956 is the most reliable date for RAMAC announcement in Wikipedia articles and the nominal date for CAGR calcultions.
All you had to do to find a June reference was click on the link provided. There are many such references all it takes is a little [re]searching on your part, e.g. try searching on "RAMAC Zellerbach". Tom94022 (talk) 18:01, 16 August 2014 (UTC)
Why original calculation was two points and 40%: Sure, this is done to keep the paragraph consistent. If this article showed a graphic of 300 data points and its least squares slope, then this paragraph should state that slope and show that graphic. But, the paragraph lists only two endpoints for each of the following parameters between 2H1956 and 2014: capacity per HDD, physical volume of HDD, weight, price, and average access time. The two point slope is obtained from those endpoints. Price decreased from about US$15,000 per megabyte to less than $0.00006 per megabyte, a greater than 250-million-to-1 decrease. 250E6 ^ (1 / 57.3) – 1 = +40% CAGR — Preceding unsigned comment added by 71.128.35.13 (talk) 21:55, 16 August 2014 (UTC)
I hope the IP will agree that areal density points and trends thereof are plotted with and calculated from production units, not test units or laboratory demonstrations. It seems all of us then have been using the wrong date associated with the RAMAC 350 AD since based upon reliable sources did not ship until at the earliest January 1, 1958, even later than the announced date of mid 1957, see edit here. This further shows that the IP's trend line based upon Whyte is not reliable for comparing to a ML rate and that all two point analyses based upon very hard data will show that in the worst case for any year from at least 2004 to 2014 the HDD AD annualized growth significantly exceeded 41.4%. Since the IP in other cases lacking reliable data was willing to use a two point analysis I hope this hard data will put an end to this endless debate. Tom94022 (talk) 18:56, 18 August 2014 (UTC)
On the contrary, the first shipment to a customer was in 1956, not 1958. "The first delivery to a customer site occurred in June 1956, to the Zellerbach Paper Company, in San Francisco, CA." [7] 71.128.35.13 (talk) 23:08, 27 August 2014 (UTC)
No, I do not agree. This table has two points (today and RAMAC), while the Whyte graphic has 42 data points. However, the table does not discuss growth rate because you removed the footnote after endless debate. I do agree that the footnote should be restored to this table.
The Whyte graphic has 42 data points with slope of 40.9% per year which is similar to Moore's law. I do agree that Barry Whyte of IBM put those data points on his blog graphic, so you should ask Barry Whyte to correct the RAMAC data point on his graphic. Wikipedia is not permitted to revise a reference or photoshop corrections. I oppose fiddling with the graphic until Whyte himself corrects the data point, and publishes the corrected graphic in an openly accessible location. If he were to “correct” the RAMAC data point all the way out to early 1959 (not early 1958), then the “corrected” slope of the 42 points would match Moore's law (doubling every two years or 41.42% per year). 71.128.35.13 (talk) 21:13, 18 August 2014 (UTC)
Please do not move the goalposts again; this talk section is about AD vs ML in Future development section wherein you insist that AD CAGR has been "similar to" ML without any evidence other than your interpretation of Whyte. The are many reasons why your interpretation of Whyte is useless for calculating the slope to one decimal, including but not limited to the errors introduced by converting a jpg to data points, but we now know the first point is wrong and BTW the 2006 point is wrong. So it is not a reliable source for your calculation of a trend line to one-tenth of a percent. FWIW, I don't think its errors are sufficient to preclude its use as a graphic. On the other hand hard reliable data establish that from any RAMAC, date but particularly from the production date, to almost any date this century the AD CAGR significantly exceeded 41.4. You have no reliable source that says otherwise. Tom94022 (talk) 22:25, 18 August 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Before anything else, I apologize for jumping into your discussion without providing any actual contributions. In a few words, while (to me, FWIW) it's really delighting to see that there are still people who have have the energy required for going into such details and for doing that over an extended period of time, it's also sad to see all that energy—​please pardon my choice of words—​wasted. Why wasted? Well, please keep in mind that very few of the article readers care about that fine details and involved calculation methods. Just wanted to point that out; please don't get me wrong, I'm not suggesting that any of you two should give up and go away from this discussion. :) — Dsimic (talk | contribs) 21:30, 18 August 2014 (UTC)

It's not so much a discussion as a dialog of the deaf; you can help by taking a position - the IP seems to be willing to stop when two or more editors disagree with his position.
Using hard reliable data it can be shown that over its first fifty or so years from the 1956 RAMAC[a] to the 2006 Toshiba MK2035GSS[b] the areal density of HDDs increased at an annualized rate of at least 44.0%; the nominal rate was 44.2%per year an uncertainty of only a few 10th of a percent. [c] This is significant in that had the areal density increased at a Moore’s law rate of 41.4% the Toshiba drive would have been introduced in 2008 or later rather than 2006. If the earliest production RAMAC date is January 1, 1958, the worst case (lowest) rate is 45.7%. There is little uncertainty in these numbers, a few 10th of a percent. Isn't this proof enough that over the long term HDD AD CAGR somewhat exceeded a ML growth rate?
No, these are just two data points and the dates are fudged. What happened to the original dates of 2009 and 1956? One cherry-picked Toshiba data point, and here the date has been moved/adjusted for no good reason from 2009 to 2006, and one heavily "date corrected" and massaged/adjusted RAMAC point (conveniently moved again, from 2H1956 to 1H1958) don't substantiate the claim that density grew faster than Moore's law. Actually, their growth was similar over five decades (2H1956-2009). It seems one editor will never stop the debate about whether growth was similar to or greater than, regardless of the facts and references that contradict his or her position. Some of those citations are as follows:
1. For this comparison, the slope of Moore's law (whether it doubles every two years, 18 months, 27 months, etc.; whether it measures transistors, linewidths, or components, etc.) is fuzzy according to Tuomi. [The Lives and Death of Moore's Law – By Ilkka Tuomi
This is reference number one. It is 14 thousand words long.
2. Secondly, Plumer (2011) found that areal density grew about 40% per year over the past 50 years: “In order to achieve the approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years, several key technology innovations have been employed." [8] This is reference number two.
3. Thirdly, the linear regression slope of areal density for all 42 data points from the graphic (Whyte) is 40.9% per year, similar to Moore's law. Playing/fudging/manipulating-shamelessly with the RAMAC date (1956 to 1958) won't jack this slope up much higher than 41%. This is reference number three, and this graphic was kindly added here by the Tom94022 editor.
4. Fourth and finally, Marchon (2013) indicates that areal density grew at “the historical, Moore's law equivalent of ~40%/annum.”[2] To repeat: “the historical, Moore's law equivalent of ~40%/annum.” With emphasis added to make this even more apparent: “Moore's law EQUIVALENT” according to Marchon of HGST and co-authors Pitchford of Seagate and Hsia of Western Digital. This is reference number four.
Therefore according to multiple credible sources, namely Tuomi, Plumer(2011), the Whyte graphic, and very particularly and specifically Marchon(2013), areal density grew at a rate similar to (“equivalent” according to Marchon et al.) Moore's law over more than five decades. WP:NOR demands this kind of solid support from real references, not any analysis or synthesis of published material that serves to reach or imply a conclusion not stated by the sources. To demonstrate that you, Tom94022, are not adding OR, you must be able to cite reliable, published sources that are directly related to the topic of the article, and directly support the material being presented. This is the WP:NOR requirement, and Tom94022 violates this rule egregiously. 71.128.35.13 (talk) 00:07, 19 August 2014 (UTC)
Your four points are old arguments. I won't waste anyone's time again stating why they are not relevant to the question, they are fully answered above. Please stop repeating yourself.
The Toshiba citation above is a 2006 hard data point from a reliable source. I started with, preferred and argued for a measurement to 2006 which is about the current breaking point in the curve. But whatever the the data points chosen from from RAMAC to 2004-2010 and beyond, the HDD AD CAGR is always significantly greater than 41.4.
Calculating a CAGR is a routine calculation which may be used in an article as you have done so in the past so I don't think you can seriously dispute all mathematically correct calculations. Original research is allowed on talk pages to help achieve consensus so your original research objection has no merit.
Rather than as u say "cherry picking" dates I have found reliable published sources for each of my date points including all uncertainties in the dates and then used the uncertainties to calculate worst case (i.e. lowest) CAGRs. Since ALL the worst case CAGRs exceed 41.4 it is reasonable to say that the long term growth rate has slightly exceeded a ML rate.
BTW, you have raised the strawman argument of "uncertainties" so it is a bit ingenuous for you to the object when I go as far as finding reliable sources to quantify the uncertainties.
If you have nothing new to say, perhaps it is time to let some other editors in?Tom94022 (talk) 01:33, 19 August 2014 (UTC)
The argument won't change because the facts haven't changed, and neither have the WP:NOR rules. Repetition of the WP:NOR mantra is no vice, and an editor's analysis or synthesis of the data is no virtue (phrasing inspired by Barry Goldwater). Ask not what you can say about the data; ask what the sources themselves say to directly support the material being presented (WP:NOR and JFK).
No reliable published source has said that Moore's law lagged areal density growth historically over five decades. However, Tuomi says that Moore's law is fuzzy (whether it doubles every two years, 18 months, 27 months, etc.; whether it measures transistors, linewidths, or components, etc.) from the start;[The Lives and Death of Moore's Law] Plumer says areal density grew about 40% per year over the past 50 years;[9] and Marchon describes the CAGR of current storage areal density on a disk surface as “the historical, Moore's law equivalent of ~40%/annum.”[2] 71.128.35.13 (talk) 22:52, 19 August 2014 (UTC)
You apparently fail to understand WP:NOR rules. In this talk page you raised a strawman argument of uncertainty in dates and areal densities for which I have found reliable sources as to RAMAC areal density and dates and as to the 2006 state-of-the-art areal density and dates. I then performed a routine calculation for both the nominal CAGR and the worst case (lowest) CAGR which is always allowed on talk pages and can be used in an article with consent. You have routinely performed such calculations and posted them to articles without obtaining consent so I don't think you can now object to the calculation per se, either here or in the article. These calculations show that in the worst case the 1956-2006 AD CAGR exceeds 41.4%, one of the several and most common expressions of Moore's Law. I could do similar calculations for any year from 2004 to at least 2009, starting in 1956 or 1958, Nothing you have said rebuts or in any way contradicts this.
If you have nothing new to say, perhaps it is time to let some other editors in?Tom94022 (talk) 19:06, 24 August 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────(un-indenting) Here, I'm not citing an editor's calculation of slope. Instead I rely on authoritative references (Marchon, Tuomi and Plumer) who indicate that the rate of density improvement over five decades was similar to Moore's law. 71.128.35.13 (talk) 21:32, 24 August 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  1. ^ RAMAC 2.000 E-06 AD, earliest date June 1, 1956, nominal date Sep 14, 1956, early production date Jan 1, 1958
  2. ^ Toshiba 1.788 E+02 AD, late date Aug 31, 2006, nominal date Aug 15, 2006
  3. ^ The only significant uncertainty is in the date of shipment
  1. ^ Plumer et. al, Martin L. (March 2011). "New Paradigms in Magnetic Recording". Physics in Canada 67 (1): 28. Retrieved 18 July 2014. "approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years … growth in CAGR from 40% to 60% to 100% which began in the mid 1990s and spanned the following several years" 
  2. ^ a b c Marchon, B.; Pitchford, T.; Hsia, Y. T.; Gangopadhyay, S. (2013). "The Head-Disk Interface Roadmap to an Areal Density of Tbit/in2". Advances in Tribology 2013: 1. doi:10.1155/2013/521086.  edit "the compound annual growth rate has reduced considerably from ~100%/annum in the late 1990s to 20–30% today. This rate is now lower than the historical, Moore’s law equivalent of ~40%/annum"
  3. ^ "IBM Archives: IBM 350 disk storage unit". 03.ibm.com. Retrieved 2011-07-20. 
  4. ^ "CHM HDD Events: IBM 350 RAMAC". Retrieved 2009-05-22. 
  5. ^ "IBM Details Next Generation of Storage Innovation". 2006-09-06. Retrieved 2007-09-01. 
  6. ^ Preimesberger, Chris (2006-09-08). "IBM Builds on 50 Years of Spinning Disk Storage". eWeek.com. Retrieved 2012-10-16. 
  7. ^ Maleval, Jean-Jacques (2011-06-20). "History: First HDD at 55 From IBM at 100 Ramac 350: 4.4MB, $11,000 per megabyte". storagenewsletter.com. "Ramac 350: 4.4MB, $11,000 per megabyte ... The first delivery to a customer site occurred in June 1956, to the Zellerbach Paper Company, in San Francisco, CA." 
  8. ^ Plumer et. al, Martin L. (March 2011). "New Paradigms in Magnetic Recording". Physics in Canada 67 (1): 28. Retrieved 18 July 2014. "approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years … growth in CAGR from 40% to 60% to 100% which began in the mid 1990s and spanned the following several years" 
  9. ^ Plumer et. al, Martin L. (March 2011). "New Paradigms in Magnetic Recording". Physics in Canada 67 (1): 28. Retrieved 18 July 2014. "approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years" 

Highlights In History Section[edit]

There are five HDD parameters whose improvements over the years are highlighted at the bottom of the History Section of this article. Most go down over time and one goes up so a ratio is used as the one consistent measurement of the improvement over time. An IP recently added to only one of the parameters two additional measurements, the CAGR and the inverse CAGR, the latter being megabytes/$, a somewhat uncommon expression. They have been moved to a footnote which I hope will not reappear in the section.

It is not foolish to expect that consistency across these five parameters will be useful to the reader. So I first think even the footnote should go. If we want to have a second and totally redundant measurement then I suppose CAGR could be added to all five but what information does it add. I do object strongly to the inverse, $/megabyte, either as a third measurement on Price/MB or as a sixth parameter with its measurement. As a third measurement, who needs three and I suspect a reader will find a decreasing parameter having in increasing growth rate some what confusing. Since it is the inverse of the more common Price/MB it would be totally redundant as a sixth parameter and furthermore has the disadvantage of not being very global.

It is pretty clear the IP added it because he wants to show it approximates a Moore's Law growth rate. This is true but this is not the place for it. To the IP: IMO this is better placed in the Moore Law article, or the Storage Density Article (which needs a lot of work). I suppose it could even go into a new section in this article, but that would have to be a far more expansive coverage than just the two points in this summary part of the history section.

In summary, my recommendation is one consistent measurement, ratio, for the five parameters and kill the footnote. Tom94022 (talk) 07:35, 11 August 2014 (UTC)

Thank you for commenting on this! How about turning the bulleted list into a table, with "Old", "New" and "Improvement" columns? That would be more compact and might be more clear to anyone who's looking for the trends in HDD industry. — Dsimic (talk | contribs) 07:46, 11 August 2014 (UTC)
A table would work. The "old" parameter values are RAMAC, circa 1956. The dates associated with the new parameter values are current but not necessarily consistent, so maybe 1956 and Current for the column heads. I believe all the Current parameter values are referenced for anyone needing more detail. Tom94022 (talk) 08:01, 11 August 2014 (UTC)
Ok, went ahead and implemented the table. Looking good? — Dsimic (talk | contribs) 08:26, 11 August 2014 (UTC)
Well done, thanks and good night (I'm in California) Tom94022 (talk) 09:01, 11 August 2014 (UTC)
Thanks. :) Hopefully other editors will also find this compaction to be an improvement to the article. — Dsimic (talk | contribs) 09:06, 11 August 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── This compaction is an improvement, however as Dsimic put it: I think having more info in form of a note can't hurt. I restored the footnote to highlight the congruence between price and areal density.

Speaking of areal density, why isn't it on this summary chart? It's mentioned 10 times elsewhere in the article, as frequently as cost and price. To the Tom09422 editor: IMO areal density deserves a place in this table because areal density and price per byte are congruent, while Moore's law is tangential. By the way, that Memory_storage_density Article does need a lot of work, as you will see in my comment on its talk page in response to your request for citation. I did manage to find an historical HDD price reference to support the claimed HDD price over there.

To correct the areal density oversight here in this HDD article summary table, I've added areal density. Hopefully, other editors will find areal density to be as relevant to HDD performance characterization as volume (ft^3), mass (lbs) and access time (ms).71.128.35.13 (talk) 20:37, 11 August 2014 (UTC)

Including areal density into the comparison table is fine with me. However, I'd leave comments regarding the footnote to Jeh and Tom94022 as I'm no longer either supporting or opposing it... From now on, when it comes to that footnote I'm a flat line. :) — Dsimic (talk | contribs) 21:14, 11 August 2014 (UTC)
I am inclined to remove the footnote. One, there is a point where more information is just clutter. Two, expressing something that has undergone periodic great change interleaved with periods of relative stagnation as an "annualized percentage change" is misleading. At least the IP deigned to show up here while simultaneously edit-warring his change back into the article. Jeh (talk) 01:11, 12 August 2014 (UTC)
I agree the footnote should be removed and am doing so.
Speaking of Areal Density, it like many other highly technical parameters was not in the table because it is not meaningful or visible to the average consumer as are the other parameters. So I would like to remove it but I can live with it if other editors agree, if nothing else I will put it at the bottom of the table. Tom94022 (talk) 06:20, 12 August 2014 (UTC)
First, Jeh, the claim that storage price has undergone periodic great change interleaved with periods of relative stagnation surely sounds reasonable, but the data don't support it. Retail prices are shown on IBM Almaden Research slide 5 http://stratos.seas.harvard.edu/files/stratos/files/db2_blu_academia.pdf and also at http://www.jcmit.com/disk2014.htm Three decades show buttery-smooth storage price progress that is in fact a straight line, well characterized by an APR. Therefore, APR is useful and its stability far exceeds expectations.
Second, branding APR as a "misleading" indicator of unstable trends stems from an unrealistic expectation that is contrary to the widespread use of APR in technology sector and in economic indicators. GDP growth over two centuries has been routinely expressed in terms of annual rate. We know perfectly well that GDP growth wasn't constant through recessions, wars, financial panics, and depressions. APR, however imperfect, is a useful real-world descriptor of labor productivity per hour, the technical progress of microprocessors (Moore's law) and areal density. It's just the average slope of a not-perfectly-straight line, no more or less misleading than other summary descriptors.
This disputed deletion was hustled through in the literal dead of night, at 08:26 11 August 2014 (UTC) which is 1:26 am in California where one editor claims to be located and 4:26 am on the east coast. It's always faster for the experienced editor to act at night, without allowing the newbies to comment. I'd be interested in your reply, should you deign to do so. 71.128.35.13 (talk) 20:24, 12 August 2014 (UTC)
It is a bit disingenuous to call oneself a newbie when since Dec 2012 u have made 165 edits amounting to about 25,000 words. U have asked that I assume WP:good faith on your part - "hustled through in the literal dead of night" doesn't read like you are willing to reciprocate. I edit when I have the time, sometimes late at night. Be that as it may, I think it is fair to say that HDD pricing ($/MB and $/box) had punctuated stability into the late 1980s at which point both declined at varying rates, until maybe the recent times (your "elephant"). APR may make for pretty illustrations but it can be misleading when there are no underlying linear processes. We don't know if the Almaden data are reliable since the sources are not given and given the kinks and hickups in the data points it is likely that any AGR covering a long period of time could be misleading. Tom94022 (talk) 00:32, 13 August 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Tom94022, you misrepresent me and you mislead us all. First, your mislabeled "clean up" edit summary misrepresented new data that you added. Then today you only admit to an error in rounding off significant digits, a math mistake. But you do not admit to the deceptive, by omission, edit summary mislabeling.

Second, your detective work is actually an exposé on my IP address that should strike fear into the heart of any privacy-respecting editor. It sounds like you have dug deep to uncover a believable, and likely accurate, summary of IP activity from what sounds like my very own address. In fact, I don't know anything of what happened on this IP address before the spring of 2014, and I know only a portion of what happened since.

My lack of knowledge is understood easily, because you have confused me with my IP address. I am in fact a newbie, contrary to your inaccurate accusation of several years and hundreds of posts. My first post was in the spring of 2014. Because I respect both other editors' privacy and my own, I use the same IP address as a multitude of other individuals none of whom I know personally. I have no idea what they do or have done on wikipedia. I take responsibility for my own edits, and I do not deliberately misrepresent edit summaries or leave them blank out of laziness. I've made a lot of edits and contributed/written thousands of factual, unbiased, technically sound, well-referenced, high-quality words since the spring of 2014 from this IP address, alongside those other individuals whom I mentioned earlier.

It takes a long time to learn the ropes, and appreciate how business on wikipedia is really done by experienced and forensic editors like you, sir or madam. I am thankful that I've taken basic measures to protect my privacy from the threat of inquisition that is now revealed for all to see and is posed against any editor in disagreement.

Third, I think it is fair to say that HDD prices declined at a steady rate for three decades. Here's the reference that supports this claim, slide 5: http://stratos.seas.harvard.edu/files/stratos/files/db2_blu_academia.pdf The technical name of the underlying process described by the APR parameter is exponential growth. The effects of exponential growth are seen everywhere in the HDD industry and particularly in the first table of this WP article under the title “Improvement of HDD characteristics over time,” as well as in the Whyte graphic and in Moore's law. It is disingenuous or perhaps just mathematically dysfunctional to trumpet exponential areal density improvement, while denying the validity of annual percentage change (APR) or the historical existence of an underlying linear process of exponential growth. If these price data are unreliable and their provenance is wholly uncertain, as you would have us believe, you might share your concern with IBM Almaden Resarch in San Jose, California (and perhaps provide more authoritative data sources to them and to wikipedia as well). Guy Lohman is the name of the Research Manager who cites this price data as recently as the spring of 2014. I think these retail price data deserve their proper due, much more so than an empty cautionary warning unsupported by any substantive reference or price data.

Fourth, if you dig into the data forensically and deeply, instead of into other editors' IP address histories, you will find that the underlying price data (model numbers, dates, retail stores and advertisers) are located on the website of a Canadian former professor of information sciences. The price data are not at Almaden Research. This is contrary to your totally wrong and deceptive assertion that “We don't know if the Almaden data are reliable since the sources are not given.” The source of the data and a detailed accounting of the each of the prices are indeed given. You may recall that I showed this reference to you a month or two ago, here on this very talk page.

Fifth, drawing upon valid price data would improve the quality of this wikipedia article. This particular article wholly lacks long-term and comprehensive magnetic storage price data, and the editors actions in the last two days have even removed the one footnote detailing one example of the rate of storage price APR improvement over the span of five decades. The footnote was labeled as “clutter.” The above jcmit/IBM price reference, which you poo-poo and decry so as to sweep it under the rug while IBM Almaden Research cites it, can fill the chasm of emptiness in this article. Areal density is not the whole story, as this article misleads its readers into believing. Price improvement, a vital concern to users, has gone hand in hand with areal density growth. This article's price blindness is not the result of happenstance; the blindness is engineered deliberately by the preconceptions and agendas of those who wrote this article. As for me, I have no affiliation with the magnetic storage industry.

I support restoring the storage price footnote to this article, improving the long-standing (going back many years according to historical records) editor disharmony by writing better quality unbiased edits, stopping the deceptive edit summaries and tendentious and largely irrelevant to article quality talk-page discussions, halting false claims that a source of un-liked but accurate data is unreliable, and respecting the privacy of WP contributors and their IP address histories. 71.128.35.13 (talk) 03:17, 13 August 2014 (UTC)

Just as a note, there's nothing wrong with (re)viewing other editors' contributions, and that requires no voodoo. Everything that's submitted to Wikipedia becomes public, as does the history of edits; please see Help:User contributions for more information. — Dsimic (talk | contribs) 03:58, 13 August 2014 (UTC)
OK, copy that. For the performance improvement table, would it not be more accurate to use characteristics for just one of the $0.05/Gbyte 3.5 inch consumer drives across the board (weight, volume, etc.) rather than mixing and matching the best characteristics from various different products that could never be combined into one real product (low weight, small volume, lowest price, highest areal density, etc.)? The baseline 1956 product characteristics were not mixed and matched: they were all from just one real product.
No editor response as of yet to my request to restore the storage price improvement rate (APR) footnote to this article? 71.128.35.13 (talk) 21:17, 13 August 2014 (UTC)
Hm, regarding what to use for comparisons in the overview table... It all depends on one's point of view, and to me all technology advancements in the HDD industry are equally important. Thus, it might be more suitable to "cherry pick" highlights from multiple current products, as that shows better where the HDDs currently are. For example, there are 1.8-inch HDDs that aren't the cheapest, largest or fastest, but they're lighweight and have small volume; then, there's also a variety of large, relatively cheap and quite fast 3.5-inch drives, etc. These examples show two pretty much disparate product categories, but they both "show off" areas of the progress achieved so far. At the same time, IBM 350 disk storage unit (a component of the IBM 305 RAMAC computer system) is the first HDD, thus there isn't much to "cherry pick" from on that side of the comparisons. That's just my opinion, of course.
By the way, we might consider microdrives to be used in the overview table, as they have much smaller volume and weight than 1.8-inch HDDs. — Dsimic (talk | contribs) 08:49, 14 August 2014 (UTC)
FWIW, I think "cherry picking" from among the current drives is most appropriate for this section where we are trying to give the reader an overview of the progress from RAMAC to date. Thus AD is likely a 2.5-inch, Price/Megabyte is likely a 3.5-inch desktop, etc. Tom94022 (talk) 00:35, 15 August 2014 (UTC)
With regard to price, there are all sorts of things wrong with the IP's cite, I just don't have time to respond in detail, but I would note that in this section end points are sufficient to give the reader an overview, so that trend line slope even if well established would be overkill as indeed is two point CAGR. Tom94022 (talk) 00:35, 15 August 2014 (UTC)
This nitpicking brings to mind Voltaire: the perfect is the enemy of good. This reference is better than leaving the price vacuum. Editors should at least be concrete when disputing citations, not pontificate darkly: “there are all sorts of things wrong with the IP's cite, I just don't have time to respond in detail”. IBM Almaden Research also uses this cite: it's IBM's cite too. I think the retail prices deserve their proper due, much more so than unsupported warnings (FUD).
Yes, this table depends on one's point of view. Looking at rounding precision leaves important questions hanging. Whether technical benchmarks ought to be considered in isolation or in an integrated HDD hinges on a fundamental issue: what's the subject of this article? Does it examine the progress of HDDs (always a happy story), the HDD as an integrated system, or the HDD as an assemblage of parts? If it's about the system, then à la carte benchmarks are bait, and real users inevitably switch to an integrated system. Marketers respond to today's reality by saying “we've come a long way,” “we're making good progress” in reference to the rate of improvement or APR, and promise great things on the “roadmap.” This HDD article has a bit of each.
Flash memory rendered microdrives obsolete, though used examples might still be found on ebay. Any narrow focus on technical capability will eventually be blindsided by lack of demand. The tide of technical change has turned, and certain benchmarks have already receded from their high water marks. While progress (areal density, perhaps) may march forward, those roads on the map aren't yet built, and no particular technology is assured of advancement or even of survival. Extinction is the natural result of creative destruction schöpferische Zerstörung, an idea that carries vestiges of Marxism. Were it not for a lack of demand impeding growth, we could still build faster Concordes, better horse buggies, and smaller microdrives.
A Model T automobile in one column shouldn't be compared with the price of a Yugo, the speed of a Ferrari, and the efficiency of a solar-powered car in the adjacent column. A WP article on the subject of automobiles ought to compare the Model T with a Toyota Corolla or VW Golf. Henry Ford and IBM RAMAC would surely fare better in this comparison had they optimized separately for each benchmark.
This table should give a 30,000 foot view. To keep things simple, it should list a consumer product because enterprise HDDs are a small part of the market and transaction prices are not transparent. High water marks belong in the “History of HDDs” article, and/or farther down in the technical subsections of this HDD article, and/or broken out as separate headings like “1956 system,” “Current HDD,” “Improvement” ratios for a current system, and “Bests” or “Historic bests” for the smallest microdrive, cheapest (per byte) HDD, and biggest capacity HDD. 71.128.35.13 (talk) 19:05, 15 August 2014 (UTC)
Well, at least horse buggies are still around, right? — Dsimic (talk | contribs) 02:48, 16 August 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Around yes: but not so important to GDP, while HDDs retain importantance. 71.128.35.13 (talk) 21:46, 16 August 2014 (UTC)

To IP 71.128.35.13[edit]

I am posting this here because the IP apparently does not access his talk page. A self professed newbie, fairly sophisticated in Wiki usage, prolific, argumentative and using a shared IP address is a suspicious editor. From, Wikipedia:Sock puppetry, "Wikipedia editors are generally expected to edit using only one (preferably registered) account. Using a single account maintains editing continuity, improves accountability, and increases community trust, which helps to build long-term stability for the encyclopedia." (emphasis added)

May I suggest u at least register and perhaps over time the suspicion will lessen. You could also try to shorten you responses so as to not filibuster a topic. Tom94022 (talk) 22:50, 15 August 2014 (UTC)

Yup, I edit from 5.13&5.14(once) and I've no conflict of interest. BTW, this is the page for discussing improvements to the article, not for ad hominem puppet stuff. 71.128.35.13 (talk) 22:34, 16 August 2014 (UTC)
If you really want to "take responsibility for all of your edits" you should create an account. Jeh (talk) 23:36, 16 August 2014 (UTC)

Why Whyte Cannot Be Used To Precisely Calculate a Trendline[edit]

The following shows the errors in the first seven data points introduced by the IP when he tries to reverse engineer data out of a jpg graph.

IP Date Nearest Product Actual Year IP AD Actual AD AD Error
1956.8 IBM 350 1956.7 2.06E-06 2.00E-06 -3%
1962.6 IBM 1311 1963 5.12E-05 5.10E-05 0%
1964.5 IBM 2311 1965.3 9.86E-05 1.02E-04 3%
1965.6 IBM 2314 1966 2.16E-04 2.20E-04 2%
1970.6 IBM 3330 1971 8.04E-04 7.80E-04 -3%
1973.6 IBM 3340 1973 1.55E-03 1.69E-03 8%
1975.5 IBM 3350 1976 3.19E-03 3.07E-03 -4%

The source for the actual data is the IBM 1981 JRD article, "a Quarter Century of Disk File Innovation" except for the 2311 which is not in the article but its date is the FCS of the first System/360 and its AD is exactly 2 x the 1311. The AD errors are not systematic and can be as large as 8%. The JRD article only gave year of shipment and 5 out of the 7 data points have the wrong year; for the two products with known FCS months, one is off by almost a year. There other errors or omissions in Whyte such as not correctly identifying the 2006 state-of-the-art, leaving out the double density 350 and I suspect others.

Don't get me wrong, I think Whyte did a great job and is a reliable source as it stands in graphic form, but it is not usable for establishing a trend line to one decimal digit for purposed of comparing to 41.4%.

We don't know whether Whyte was in error or the errors were introduced by the IP's process or some combination of both, but this sample of 7 of the 42 data points, along with other errors or omissions demonstrates why calculating a trend line to one decimal digit in this manner is a case of garbage in and garbage out. Tom94022 (talk) 06:48, 19 August 2014 (UTC)

Stability of slope[edit]

Slope calculations are moot, because WP:NOR disallows editor analysis. Updating seven points (-3%, 0%, 3%, 2%, -3%, 8%, and -4%) would not change the slope significantly. With 7 updated and 35 more recent data points, slope would increase from 40.9% to 41.2% per year, still similar to Moore's law. 71.128.35.13 (talk) 22:42, 19 August 2014 (UTC)

Since the policy of no original research does not apply to talk pages updating 7 out of 42 points is perfectly permissible. Given six out of the first seven points u use are incorrect as to AD or date or both, there can be no doubt that most if not all of the remaining 35 points are in error. This means that any slope calculated to one decimal digit from this inaccurate data is meaningless (garbage in garbage out). You could fix all the points, including adding the missing ones (e.g. IBM 350-3, Toshiba, etc.) but that would be original research and if you did it I suspect you would discover a trend line greater than 41.4. But the trend line is still meaningless when we are talking about actual data points which may be above or below a trend line, meaning the actual growth at that point is above or below the trend. Tom94022 (talk) 18:48, 24 August 2014 (UTC)
There's no need to speculate endlessly. Just update the remaining 35 data points and recalculate the 42-point slope. The slope, regardless of data revisions, has been shown to be stable. The slope was 40.9% per year or about 41% and after updating 7 of the 42 points the slope is still 41.2% or about 41%.
The slope calculation is moot, because WP:NOR relies on authoritative sources not calculations by editors. Reliable sources (Plumer, Tuomi, Marchon) indicate the density improvement rate was similar to, a little more or a little less than, Moore's law over five decades. 71.128.35.13 (talk) 21:19, 24 August 2014 (UTC)
Do we agree that your calculation of slope from Whyte is moot? Tom94022 (talk) 00:43, 25 August 2014 (UTC)
No, that smells Faustian. The calculation based on all 42 points, is much more accurate and reliable than your two-point calculation of slope. The straw man here is the issue of measuring point by point, which just diverts attention from the actual stability of (42 point) slope. WP:NOR states that editor calculations must give way to reliable sources. No reliable reference has said that Moore's law lagged areal density improvement over five decades. In fact, Plumer, Tuomi and Marchon indicate that density growth was similar to Moore's law. Albeit moot, the slope of all valid data points is stable (41% per year with 7 updated points and 35 original points) and entails less editor manipulation and obfuscation than the moot and unreliable slope of just two points. 71.128.35.13 (talk) 00:03, 26 August 2014 (UTC)
I agree that "The slope calculation is moot, because WP:NOR relies on authoritative sources not calculations by editors." They are your 42 points, have been shown to be wrong and it is your calculation. What more is there to be said? Tom94022 (talk) 17:56, 26 August 2014 (UTC)
No reliable reference supports the wholly unfounded claim that Moore's law lagged areal density improvement over five decades. 71.128.35.13 (talk) 01:15, 27 August 2014 (UTC)

RAMAC Price and Ratio[edit]

The IP reverted a change with the comment,

WP:Verifiability "Do not use articles from Wikipedia as sources." Must provide an underlying source (reference). 180 million still has false precision; $0.05 corresponds to 200 million.)

Actually the RAMAC price/$MB (a routine calculation) has its two components properly referenced on the cited page so it is indirectly referenced meeting the verifiability requirement, which I believe the IP knows, and according the WP:Verifiability the IP should have fixed the reference rather than blowing it away with a tag.

I really don't understand the "false precision" comment, it seems to me that mathematically 9200/<0.05 == >184 million which properly rounds to >180 and not >200. Perhaps a better way to look at it is if

.045 < current price < 0.50 then,
184 million < Ramac prioe/current price < 204 million

so to say that the ratio is >200 million as proposed by the IP is misleading at best. Tom94022 (talk) 17:52, 26 August 2014 (UTC)

WP:Verifiability is violated here: "Do not use articles from Wikipedia as sources." This inline citation (price of $9,200) is unverified because it merely points to Wikipedia, regardless of the purported source buried several layers deep: History_of_IBM_magnetic_disk_drives#IBM.27s_first_HDD_versus_its_last_HDDs[1]
WP:Citation needed is also violated: "If someone tagged your contributions with [citation needed] and you disagree, discuss the matter on the article's discussion page." You revert without discussing first.
The updated price of $10,000 has an honest-to-goodness (non-WP) inline citation,[2]
The number $0.050 is fabricated by the editor and the number is actually $0.05. That extra trailing zero is deceptive. Therefore, the two significant digits of 180 million are erroneous, incorrect false precision. 71.128.35.13 (talk) 01:15, 27 August 2014 (UTC)
The Blackblaze reference states their price in Sept 2013 was $0.044. The chart in Blackblaze clearly shows the price in 2013 starting well above $0.050 and descending below $0.50 but remaining well above $0.040, about at the $0.044 reached in September; therefore, the average for 2013 is clearly below 0.050 and above 0.045 and nothing is "fabricated" other than perhaps your objection. Tom94022 (talk) 17:50, 27 August 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

The RAMAC purchase price and capacity sources are clearly given in the cited section and you are being disingenuous and argumentative by ignoring them, so just in case you choose to dispute this again here they are:
IBM Archives: IBM 350 disk storage unit gives the capacity as 5 million characters which a routine calculation converts to 3.75 MB (not 5 MB a u used above)
Ballistic Research Laboratories "A THIRD SURVEY OF DOMESTIC ELECTRONIC DIGITAL COMPUTING SYSTEMS," March 1961, section on IBM 305 RAMAC (p. 314-331) has a $34,500 purchase price at Boeing Wichita.
As you did, it is then routine to calculate Price/MB. There is nothing in WP:Verifiability that precludes such indirect verification, but if you insist upon direct verification then as stated in WP:Verifiability you should now update this article to these sources.
Your [re]search has indeed discovered another source that has a different RAMAC purchase price. There is already consensus on $34,500 at RAMAC Purchase Price so a discussion should be held there as to what to do about this new source; after all we really don't need to different numbers for the same thing, do we? Tom94022 (talk) 17:50, 27 August 2014 (UTC)
Wikipedia:Verifiability#What_counts_as_a_reliable_source The $9,200 claim requires support from a direct inline citation, not from another WP article and not from an editor's routine calculation. In fact, this source WP:AD directly says $10,000 per megabyte: http://www.sfgate.com/business/article/Hard-driving-valley-began-50-years-ago-And-most-2469806.php 71.128.35.13 (talk) 20:29, 27 August 2014 (UTC)
Your source is on its face inaccurate since 50000/3.75 does not equal 10,000. Again I suggest u move this duologue to the appropriate page. Tom94022 (talk) 20:54, 27 August 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── These sources directly say $10,000 (or $11,364) per MB: [2] [3] You must directly support your un-referenced and dubious $9,200 claim with a non-wikipedia inline citation, Wikipedia:Verifiability#What_counts_as_a_reliable_source because Wikipedia content is not considered reliable unless it is backed up inline by a reliable source.

The claim of 3.75??? megabytes is on its face inaccurate because it is unsupported by an inline reference, and because these reliable sources say 4.4 MB. wp:ad http://www.snopes.com/photos/technology/storage.asp [3] 71.128.35.13 (talk) 23:13, 27 August 2014 (UTC)

I can't believe u are seriously disputing 3.75 MB; u are either very ignorant of storage history or just being argumentative. The IBM Archive and many other places say 5 million characters and there are many sources that establish the 350 recorded 6-bits per characters, e.g. "stored 5 million 6-bit characters (the equivalent of 3.75 million 8-bit bytes)", "IBM's 305 RAMAC had a 3.75-megabyte internal hard disk drive", "stored 5 million 6-bit characters (the equivalent of 3.75 million 8-bit bytes", all u have to do is Google "RAMAC 3.75 million" If you still insist there is an argument we can turn to the technical manuals for the 350 which clearly show 5 million 6-bit characters. You seem to [re]search just to make an argument rather than trying to establish facts suitable to Wikipedia. Since both your references are fundamentally flawed they cannot be reliable. But you really should take this to the RAMAC page where we can probably gain consensus. If u won't move there, then I guess I will invite those editors here. There really should be only one place where we discuss RAMAC and other articles such as this should not disagree. Tom94022 (talk) 01:05, 28 August 2014 (UTC)
Those rumored IBM internal documents are not publicly available references, and cannot be used in Wikipedia. Several references have 6 bits plus one odd parity bit, total of 7 bits per character: that's how they arrive at 4.4 MB, not 3.75 MB.[[2]][3] 71.128.35.13 (talk) 02:37, 28 August 2014 (UTC)
In modern terms the RAMAC 350 used a 6/8(1,7) run_length_limited code (RLL), encoding 6 data bits into 8 channel bits with a minimum spacing of 1 and a maximum spacing of 7. The two additional bits are a space bit and a parity bit. RLL codes are used throughout storage and in all cases the capacity is based upon data bits not channel bits such as parity and space in this case. A simple example is the CD which encodes eight data bits to fourteen channel bits (EFM) and then adds a bunch of parity bits but the capacity is stated in terms of (data bits)/8 not (channel plus parity bits)/8. My source is an IBM publication on the web and I have a local copy but it shouldn't be necessary to go there since you agree that the seventh bit is a parity bit. Tom94022 (talk) 18:36, 28 August 2014 (UTC)
References are definitely necessary and absolutely required. Two references say 4.4 megabytes.[3]][3] Has any real reference (not purported, not rumored, not alleged, not calculated indirectly) been shown to support 3.75 directly? 71.128.35.13 (talk) 00:13, 29 August 2014 (UTC)
Actually I have given u three references to 3.75 MB which for some reason u choose to ignore them. But better yet the RAMAC 305 Customer Engineering Manual of Instruction (c) 1959 on page 7 states there are 8 channel bit positions within each character, two of which are not used in the bit coding (Space and Parity), leaving 6 data bits. Figure 86 shows the waveforms of the 8 channel bits within the M350 just in case u are confused by the title of this reference. Will you now agree that the M350 capacity was 3.75 MB or are you going to look for some other tertiary and incorrect source? Or perhaps u will claim parity should be counted? Tom94022 (talk) 05:21, 31 August 2014 (UTC)


Disputed BRL reference[edit]

The editor calculates $9,200/MB, however no source supports this claim directly and IBM internal documents are not useful as references in Wikipedia. Two reliable sources directly indicate $10,000 or around $11,000 per MB.[2][3] 71.128.35.13 (talk) 02:48, 28 August 2014 (UTC)

The IP is again disingenuous, the above cited 1961 BRL reference in online and specifically states a purchase price of $34,500 for Boeing Wichita. It is a routine calculation to then divide by 3.75 to get $9,200. The question then becomes which is the more reliable source, the 2006 source and its echo in 2011[a] or the 1961 source. On its face, a serious repeated contemporaneous survey such as performed by BRL should carry more weight than a comment made in a speech almost 50 years later.
Furthermore, the known $650/month rental price of the 350 would at $50,000 give a purchase price to rental ratio of 77:1, far exceeding the 55:1 ratio or less for similar IBM equipment. $34,500 is a 53:1 ratio, far more believable. There is no difference of opinion here, at least one number is wrong and the best evidence supports $34,500. Tom94022 (talk) 19:16, 28 August 2014 (UTC)
The foregoing is disingenuous with respect to WP:civility, and dubious data are rampant. Even the "known" rental data for one unit of $650 per month is faulty. This 1961 source shows $975 per month at Yale Univ, $975 per month at USA ESCO, and $975 per month at Western Electric Co. in Indianapolis.[[4]]
On the subject of dubious data, the IP has cited from the IBM 650 RAMAC section and not the IBM RAMAC 305 section referenced above. FWIW, the 350 Disk IS NOT listed in the section he cites so I have a hard time seeing this misrepresentation as a simple mistake. The $975/month is for the 355 Disk and not the lower priced 350 Disk. Had he searched the correct citation he would have found at least three instances of a $650/month rental of the 350 Disk File. But we don't have to take just the BRL survey for a $650/month rental price since had the IP [re]searched the truth he might have found
"The monthly rental for a basic RAMAC was $3,200, of which $650 was for the disk storage unit ..."[4] Tom94022 (talk) 06:22, 29 August 2014 (UTC)
The editor has calculated $9,200 based on 3.75 megabytes, but two other references directly show $10,000 or $11,000 and 4.4 megabytes. WP:verifiability disallows reliance on an editor's calculation. Instead of relying on an editor's calculation of one unit's price, the article should list $10,000 (one significant digit) that represents the three sources. When sources disagree, use a rounded number that represents them all, cite them all in-line, and move on. Don't rely solely on an editor calculation. 71.128.35.13 (talk) 00:32, 29 August 2014 (UTC)
Again the IP misstates the situation, both his cited sources use the same $50,000 from a 2006 speech, shown above to be highly unlikely; one source divides by 5.0 MB and the other by 4.4 MB; both capacities are incorrect. We would being doing a disservice to Wikipedia and failing as editors if we accepted two fallacious divisions as in any way equal to the one simple division of numbers from reliable sources that any editor or reader can do. Sources do say the earth is flat but that doesn't mean we as editors are obligated to use such assertions in any articles on the earth, even if one IP says so.
Finally the IP asserts in his edit note that " wp:Verifiability disallows editor calculations." but there is simply no support for this assertion in the policy; no mention of calculate in any form appears in the policy. Routine calculations such as the results of a division are allowed to appear in articles with consensus so it seems to me that the only implication of verifiability for division is that both the numerator and the denominator be verifiable, and in this case they are. I suppose if we had to we could add more to the reference, but I think the reference makes it pretty clear what constitutes the numerator and the denominator and both are verifiable. Tom94022 (talk) 06:22, 29 August 2014 (UTC)
The $9,200 is flat out dubious, misleading and unsupported. No reference, anywhere, ever, listed $9,200/MB. In 1956 IBM announced both the 350 and 355 RAMACs simultaneously.History_of_IBM_magnetic_disk_drives#IBM_350 The $9,200 is tied to the RAMAC350 at $650 per month and simply ignores the RAMAC355 at $975 per month. Both RAMACs were real and contemporaneous, so obviously the price is higher than $9,200. Several sources, not a Wikipedia editor's calculation, directly support $10,000/MB. 71.128.35.13 (talk) 18:30, 29 August 2014 (UTC)
For anyone interested in learning more about the RAMAC history I highly recommend the RAMAC 350 Restoration Web Site and the RAMAC oral history project. I have used both in this tedious duologue.
The cited BRL reference clearly gives one instance of a 350 Disk File at $34,500 There are four instances at a rental of $650/month confirmed the Pugh reference which yields a reasonable 53:1 price/rental ratio. Thus 34,500/3.75 = 9,200 is supported, accurate and correct; the IP's assertion that it is "dubious, misleading and unsupported" is a flat out lie.
Apparently the IP does not understand what the 355 was - it was a 350 with three actuators interfaced to a Model 650 system - the Model 350 disk file initially had only one actuator. It shipped much later than the M350. From the same BRL data cited by the IP it seems the the 355 Model 1 had a purchase price of $62,200; with a capacity of 6 million characters (5 bit) it's capacity was also 3.75 MB resulting in a price of $16,587/MB. Of course we ignore this, using the M350 because it was first and is lower.
What are flat out dubious, misleading, inaccurate and false are the 5.0 MB and 4.4 MB capacities used to arrive at the $10,000/MB and $11,364/MB the IP continues to cite. The $50,000 purchase price is also dubious in that it appears in a speech 50 years later without any sourcing. Wikipedia values accuracy and both accuracy and truth are knowable for mathematical calculations such as the capacity of the M350 or M355! This makes the two citations used by the IP unreliable sources not suitable for citation. To equate these two citations with the BRL survey and the RAMAC capacity would be a false equivalence. Tom94022 (talk) 05:08, 31 August 2014 (UTC) updated: Tom94022 (talk) 18:59, 31 August 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Wikipedia values wp:verifiabililty, and this tedious defense of $9,200 is bereft of any reference. Apparently the editor can't find direct support for $9,200. No source, beside one Wikipedia editor, ever said $9,200. This is all wp:or original research; it midleadingly focuses on the RAMAC305 and ignores the contemporaneous RAMAC650_355; and it is laden with false precision.

Here's another source that is not a calculation and not original research which gives a RAMAC price of $10,000/MB. Both RAMAC350 and RAMAC355 were introduced simultaneously in 1956 and withdrawn simultaneously in 1969. As of the late 1950s the 650 computer with RAMAC355 storage was produced at the rate of one per day in higher volume, nearly 2000 units, than any other computer in history up to that time.[5] We could agree that the RAMAC355 price was higher than the RAMAC350. One dubious editor-produced calculation puts it 65% higher ($15,200 versus $9,200 per disputed MB), and a more rational and reasonable calculation has it 25% higher (($975 / 6 million characters * 5 million characters) per month versus $650 per month). Either way, the 355 price was indeed higher, and these two models were contemporaneous according to IBM. This means the real, true price must have been substantially higher than $9,200. More like $10,000 with one significant digit of precision.

Again, no reference supports the entirely editor-fabricated claim that the 355 was produced later than (not contemporaneous with) the 350. So, the $9,200 price is still dubious, misleadingly over-precise, ignores the higher-volume and contemporaneous 355, and unsupported by any reference. The real, verifable, properly supported, not falsely precise, price is $10,000. 71.128.35.13 (talk) 23:49, 31 August 2014 (UTC)

I really hate to repeat myself, but there is nothing in Wikipedia's Policy on Verifiablity that precludes an editor performing a routine calculation which can then be used in an article; the IP is simply making up a prohibition. The IP has previously edited CAGR calculations into an article so it is rather dishonest of him to deny the validity of routine division yielding $9,200, particularly since both the numerator and the denominator have reliable sources.
The IP again cites an unreliable source since the price/MB therein is based upon an incorrect 5 MB capacity.
The IP asserts the character size of the M350 is the same as that of the M355 without any evidence what so ever, but is then willing to perform a routine calculation in support of his position. BTW, it maybe true, I don't think so, but it is irrelevant to this duologue.
The IP is deliberately misleading when he characterizes the 350 and the 355 as contemporaneous; they were announced simultaneously in a press release on Sept 14, 1956[6], but at that time there were at least two 350s installed (Zellerbach and USN Norfolk[7], each one having at least one 350; some having 2. According to Phister the number of first generation disk drives installed on all IBM systems in 1961 was 900 units leaving very little room for 355s on 650s. A low number of disks on M650s is supported by a 1962 survey of educational computing[8] that identified only 2 RAMAC units out of 38 installed M650s. It seems the M355 is yet another red herring by the IP to justify his position, which u should recall proposed $10,000/MB without any reference to M355 at all.
There is no false precision in $9,200 since the prices of the various models are known to 5 digits, the capacity is an integer and the 350 was the only product shipping in 1956. What the IP is asserting is a false equivalence between several inconsistent unreliable sources and this routine calculation. Tom94022 (talk) 06:48, 1 September 2014 (UTC)
RAMAC355 passed customer acceptance test at USA ESCO in mid-1957; the reference is here [[5]] A tit-for-tat “deliberately misleading” accusation wouldn't accord with the doctrine of WP:civility, though the editor does spew incivility. In the discussion above, the claim that RAMACs first shipped in 1958 was kicked to the curb by a Zellerbach Paper Company 1956 reference. We hew to reliable sources and place our full faith in the WP:verifiability of RAMAC prices, not restricted to an arbitrary editor-selected 14 units.
The references below show RAMAC355 and RAMAC350 dates of customer acceptance and the running periods (a month or more) for measurement of reliability. Furthermore, the 355RAMAC price is 25% higher than the 350RAMAC: $975 per month for 6 million characters versus $650 per month for 5 million characters. Therefore, the $9,200 editor-produced price mis-represents the BRL data by model selection and overprecision. RAMAC is more nearly $10,000 per megabyte with one significant digit of precision.[[6]]
[[7]] (six million characters RAMAC355)USA ESCO Type 355 Disk Storage at $975 ea. Per month with running period from Oct 59 to May 60 Passed Customer Acceptance Test Jul 57;
Western Electric Co. Type 355 $975 per month with running period 16 May 60 to 17 Aug 60 Passed Customer Acceptance Test Aug 59'
[[8]] (five million characters)Boeing, Wichita 350 Disk Storage $34,500; Passed Customer Acceptance Test 10 Jun 58; running period included 1 Mar 60 to 31 Mar 60
WE Aurora 350 Disc Storage Unit $650 per month with running period 1 May 60 to 31 Jul 60 Passed Customer Acceptance Test 1 May 60
Georgia State 350 Disc Storage Unit $650 per month with running period 1 Jun 60 to 30 Jun 60 passed Customer Acceptance Test 1 Jan 60
A few of those purportedly red RAMAC355 herrings, obviously not the true IBM color, are named above. There are a lot of documented Big Blue RAMAC fish in the deep Blue sea. False precision and biased model selection are sinful. The BRL data verify the Official IBM PR Announcement that the RAMAC350 and RAMAC355 are indeed contemporaneous. And yea verily, the RAMAC355 price is indeed higher than the editor-calculated, unreferenced $9,200. So sayeth the Records of IBM, as revealed to us in the gospel of Thelen_BRL. Our faith has been confirmed by the presentation of the Price Data; amen to $10,000 per megabyte with one significant digit of precision.[[9]] 71.128.35.13 (talk) 03:43, 3 September 2014 (UTC)
Thank you for acknowledging BRL as a reliable source.
The ESCO citation for the IBM 355 date is ambiguous in that ESCO reported two IBM 650 systems, Number 800 having 4 IBM 355s and Number 700 without any disk drives. The acceptance test date does not say what was accepted in 1957, it could have been one or both.
There is good evidence that the IBM 355 character was a 5 bit decimal digit and not a 6 bit alphanumeric digit[9]. The IP has presented no evidence that the character size of the two RAMACs was the same so any ratio is suspect. But since he is willing to do such ratio calculations I fail to understand any objection to $34,500/3.75.
What is indisputable is that in 1956 the only disk drives shipped to customers was the IBM 350 as part of an IBM 305 System, including one to Zellerbach in June and one to the USN Norfork before Sept 14, 1956. Since the table cell in question in this article is in a column labeled "Started with" all discussion of the IBM 355 is a red herring since the disk drive industry started with the IBM 350 not the IBM 355. Tom94022 (talk) 16:48, 3 September 2014 (UTC)

End of one part of the duologue[edit]

IBM's archivist has provided me with a copy of IBM Announcement Letter 259-38, May 5, 1958, announcing the "IBM 350 Disk Storage Model 2", "similar to the Model 1" to be installed as a second disk file on the 305 system with a rental price of "$700/month" and a purchase price of "$36,400". Initial deliveries were scheduled to begin "September 1958." Note that there is one M350 in the BRL survey at $700/Month, presumably a Model 2; the others have rental of $650/month and one has a purchase price of $34,500. This clearly establishes the BRL as a reliable source for the purchase price of the 350 Model 1 as $34,500 and confirms as unreliable the sources using $50,000 for the purchase price of the original RAMAC, Model 1 or 2. If the IP insists otherwise I suppose I will have to find a way to post it but I would hope he/she would attribute good faith on my part and stop wasting time on this issue. Tom94022 (talk) 20:09, 1 September 2014 (UTC)

Summary of arguments in favor of current edit[edit]

This dispute is over the contents of the Price row in a table in the article

Improvement of HDD characteristics over time
Parameter Started with Developed to Improvement
Price US$9,200 per megabyte[10][dubious ] < $0.05 per gigabyte by 2013[11] 180-million-to-one

The following facts are from reliable sources and not disputable.

  1. The industry "Started with" the only hard disk drive to ship to customers in 1956, the IBM 350 (shipped to at least Zellerbach[12] and USN Norfork before September 14, 1956[6]).
  2. The capacity of the IBM 350 was 5 million 6-bit characters which is precisely and only 3.75 MB[13]
  3. The purchase price of the IBM 350 was $34,500 (BRL[10] as confirmed by Pugh and the IBM 350 Model 2 announcement).

It is an indisputable fact that $34,500/3.75 MB = $9,200/MB.
It is an indisputable fact that 9,200/<0.05 = >184 million which properly rounds to 180 million
The IBM 350 capacity and purchase price are precise whole numbers, any other values reported by any source are on their face incorrect and such a source cannot be used in Wikipedia other than perhaps in context. Tom94022 (talk) 16:58, 4 September 2014 (UTC)

Argument by IP against current edit[edit]

Capacity (based on 6 bit characters) is absolutely disputable. Respected sources say 7 non-binary bits, not 6 binary bits per character and 3.75 MB capacity. The Official History of IBM hails the RAMAC350 as an Icon of Progress, “The whole thing could store 5 million binary decimal encoded characters at 7 bits per character.” That might be 5 million characters (7 bits / 8 bits per modern byte) or around 4.4 million megabytes.[[10]] Al Shugart himself is quoted directly saying it had a 7 bit code, “The coding used on the disk drive was a 7-bit code - 6-bit + 1 binary. Very straightforward and simple.”[[11]] The American Society of Mechanical Engineers designated RAMAC as An International Historic Landmark, noting it had 7 non-binary bits per character: “The 350’s … contained a total capacity of 5 million binary decimal encoded characters (7 bits per character) of storage.”[[12]] RAMAC355 likewise had 7 bits per character because it “used the same mechanism as the IBM 350 and stored 6 million 7-bit decimal digits.”[[13]], 20% more than the RAMAC350. Again according to wikipedia, “each character was 7 bits, composed of two zone bits ("X" and "O"), four BCD bits for the value of the digit, and an odd parity bit ("R") in the following format: X O 8 4 2 1 R”[[14]] To recap, sources supporting a non-binary version of 7 bits not the 6 binary bits that are tied to 3.75 MB include the Official History of IBM, a direct quote from Al Shugart himself and the American Society of Mechanical Engineers International Historic Landmark designation. Certainly, the non-binary code would carry more than 6 binary bits of information, hence more than 3.75 MB, because four BCD bits take more space than binary bits: "Standard BCD requires four bits per digit, roughly 20 percent more space than a binary encoding (the ratio of 4 bits to log210 bits is 1.204)."[[15]] 71.128.35.13 (talk) 20:33, 4 September 2014 (UTC)

Neither non-binary bits nor non-binary version of 7 bits has an obvious meaning, but the 305 RAMAC manuals[14] at bitsavers clearly describe 6 data bits per character. Shmuel (Seymour J.) Metz Username:Chatul (talk) 22:57, 18 September 2014 (UTC)
Agreed - specifically, see page 70 in the above-referenced PDF. And note that it is explicitly talking about the data format "on the drum or disk", so there is no arguing that it's the 305's internal format and the format on disk could be different. This is a "horses' mouth" reference. Jeh (talk) 23:49, 18 September 2014 (UTC)

Correct price is $10,000 per Megabyte[edit]

No editor-calculated original research WP:OR is justified here since direct sources are available. Here's one direct source[[16]] that gives a RAMAC350 a price/megabyte of (US$)15,200. So, we've numerous reports, not editor calculations. They present an obstacle for the tendentious and calculating editor because none of the sources say precisely $9,200/MB nor 3.75 MB nor 6 binary bits per character.

Respected sources say 7 non-binary bits, not 6 binary bits per character and 3.75 MB capacity. The Official History of IBM hails the RAMAC350 as an Icon of Progress, “The whole thing could store 5 million binary decimal encoded characters at 7 bits per character.” That might be 5 million characters (7 bits / 8 bits per modern byte) or around 4.4 million megabytes.[[17]] Al Shugart himself is quoted directly saying it had a 7 bit code, “The coding used on the disk drive was a 7-bit code - 6-bit + 1 binary. Very straightforward and simple.”[[18]] The American Society of Mechanical Engineers designated RAMAC as An International Historic Landmark, noting it had 7 non-binary bits per character: “The 350’s … contained a total capacity of 5 million binary decimal encoded characters (7 bits per character) of storage.”[[19]] RAMAC355 likewise had 7 bits per character because it “used the same mechanism as the IBM 350 and stored 6 million 7-bit decimal digits.”[[20]], 20% more than the RAMAC350. Again according to wikipedia, “each character was 7 bits, composed of two zone bits ("X" and "O"), four BCD bits for the value of the digit, and an odd parity bit ("R") in the following format: X O 8 4 2 1 R”[[21]] To recap, sources supporting a non-binary version of 7 bits not the 6 binary bits that are tied to 3.75 MB include the Official History of IBM, a direct quote from Al Shugart himself and the American Society of Mechanical Engineers International Historic Landmark designation. Certainly, the non-binary code would carry more than 6 binary bits of information, hence more than 3.75 MB, because four BCD bits take more space than binary bits: "Standard BCD requires four bits per digit, roughly 20 percent more space than a binary encoding (the ratio of 4 bits to log210 bits is 1.204)."[[22]]

So, the problems that produce over-precision in the editor's $9,200 calculation are model selection (350 or 355, which were contemporaneous and priced differently), capacity (6 binary bits or 7 non-binary bits per character) and price (various sources offer different price/MB). All direct sources, and the dubious over-precise $9,200 editor calculation, agree that RAMAC “at the start” cost $10,000/MB with one significant digit of precision. 71.128.35.13 (talk) 20:06, 4 September 2014 (UTC)

(Moved IP comment here for better flow) Questioning and civil discussion is the first step on the path to wisdom. Thanks for the opportunity to summarize the (correct) answers to the questions you've posed:

1. The capacity of the 350 Disk File is "other" (nearly 4.4 MB), because BCD encoding is not precisely 6.000 bits (more nearly 7 bits) per character.

2. The purchase price was $34,500 according to BRL, and the contemporaneous RAMAC355 was 50% higher (with 20% greater capacity as well). No, price/MB is not a routine calculation. Here the editor posing these questions selects a biased model type (350) and ignores the contemporaneous higher-priced RAMAC355. Any calculation is unneeded and inappropriate original research WP:OR on the part of the wikipedia editor because several reliable sources are available that directly give price/MB.

3. The premise of your question (agreement that price is a routine calculation) is wrong. We can calculate the ratio here, without undertaking an editor calculation of the price or agreeing to your premise. The ratio is > 200 million to one. ($10,000 per megabyte versus < $0.05 per gigabyte). 71.128.35.13 (talk) 20:28, 4 September 2014 (UTC)

An End To The RAMAC Price Duologue[edit]

With an apparent acceptance that the 350 Disk File (Model 1) had a purchase price of $34,500, may I now suggest that this duologue can be ended by allowing other editors to discuss the following three questions, perhaps referring to the summaries immediately above.

Questions For Discussion
  1. What was the capacity of the 350 Disk File (Model 1) in MB:   3.750,  4.375,  4.4,  5.000,  other?
  2. Is the division $34,500/(answer 1 above) a routine calculation suitable for inclusion in an article?
  3. If the answer to 2 above is yes,  what is its ratio to <$0.05/MBGB:   >180 million-to-1,  >200 million-to-1,  other?

Information to answer these questions can be found repeatedly and in detail above this section. I for one will not longer respond to the IP but will briefly answer any questions placed below by any other editor and am willing to accept any consensus reached by a reasonable number of registered editors Tom94022 (talk) 16:58, 4 September 2014 (UTC)

Discusssion from editors other than the IP would be appreciated below

  1. Capacity 3.75MB. In what we call "capacity" today, it includes user data only, and excludes all parity, ECC, servo and metadata. This was not the case in the early days; in the 1990s I worked on drives that reported the larger "unformatted" capacity (lawyers made us stop doing that). For any comparison, we must do some conversion between the old and new uses of the term capacity. I can conceive of no other way that to use data (non-parity) bits; (modern drives have much larger ratios of non-data to data capacity). That means the parity bit for each 6 bits should not be counted. This is suitably sourced. Al Shugart's comments support the "+1" as not being data. 5 million times 6 bits = 3.75MB (8-bit). BTW, introducing BCD into this mix would reduce the capacity, as 4 BCD bits only allows for values of 0-9, not 0-15 as allowed by 4 binary bits, but evaluating the impact of BCD on capacity isn't sourced. (Also, BCD values of 10-15 could be reserved for "special conditions" not defined by the storage itself.)
  2. Yes; $34,500/3.75MB calculation is okay to include ($9200/MB). This is allowed per WP:CALC and does not violate WP:SYNTH.
  3. 180 million-to-1. --A D Monroe III (talk) 23:56, 4 September 2014 (UTC)
  • I agree with 3.75 MB. To pull this off, we must compare apples to apples. Common sense must prevail. On with building Wikipedia. BTW, I would suggest a non-breaking space (&nbsp;) between the value and the unit of measure. Greg L (talk) 03:01, 5 September 2014 (UTC)
  • I'm not commenting on the pricing, but I do agree that the storage was 3,750,000 bytes. To allow correct comparison with modern capacity measures, one should not include meta-bits like parity and stop bits. I'd only change my mind if it was possible to turn off the parity check and use the 7th bit for data, but nobody has produced evidence of that. Incidentally, I'm pretty sure that in those days 3,750,000 bytes was 3.58 MB since "MB" meant 220 bytes, but I'm not suggesting to use that. Zerotalk 02:21, 5 September 2014 (UTC)
  • I have no knowledge of the drive under discussion but I agree that early systems gave their capacity in terms of every bit on the platter, not just user data. Perhaps replace US$9,200 with "Over US$9,000" and leave the 180-million-to-one. The footnote should give a very brief explanation per the above summary. The IP will never be satisfied and the formal procedure would be to start an RfC. Johnuniq (talk) 03:54, 5 September 2014 (UTC)
  • Consensus is clear enough There is no need to wiki-red-tape any further over such a minor point when the answer was so obvious. Greg L (talk) 04:28, 5 September 2014 (UTC)
  • For the beginning, here are a few quotes:
Within RAMAC, all data is read, transferred, and written serially by word, character, and bit. There are eight bit positions within each character position. They are identitified as bits S, X, 0, 1, 2, 4, 8, and R. Bit S merely provides a spave between the recording bit positions of each character, and is not used in the bit coding. Bit R has no numeric or alphabetic value, but is added to certain characters so that every character will have an odd numer of bits. This convention makes possible a technique whereby RAMAC may perform a validity check on each character transferred.
The disk drive could store 5 million characters using a 6-bit character code plus a parity and space bit per character.
The 350's fifty 24-inch disks contained a total capacity of 5 million binary decimal encoded characters (7 bits per character) of storage.
Based on the first quote, each RAMAC's byte had six end-user-accessible bits, as we have to discard space that's unusable to end users; for RAMAC, those are space and parity bits in each "platter" byte. As a note, modern HDDs also contain quite a lot of space that isn't accessible to end users, while only the usable amount of space is advertised; one of the motivations behind the 4Kn initiative is the reduction of "wasted" platter space associated to ECC data, and ECC data is only one part of the end-user-inaccessible space. For example, a typical 1 TB HDD with 512-byte sectors also provides additional capacity of about 93 GB for the ECC data; see this PDF and this illustration for more information.
The third quote might be a bit contradictory, but each character is actually written to platters as seven bits, out of which only six bits are accessible to end users as the seventh ("R") bit is the parity bit used by RAMAC; eighth ("S") bit is practically empty space. This is probably a small marketing trick.
Based on all that, RAMAC has 5,000,000 characters with six bits each, what's 30,000,000 bits or 3,750,000 bytes. As the "1 MB = 1,000,000 bytes" marketing gimmick applies to modern HDDs, we have to apply it to RAMAC, too, thus we end up with 3.75 MB as the RAMAC's capacity that can be used for later comparisons. $34,500 / 3.75 is the logical calculation for price per MB, what results in $9,200 per RAMAC's MB of storage. When that is compared to less than $0.05 per MB, we end up with a somewhat larger than 184,000-to-one ratio – how is the 180,000,000-to-one ratio calculated? — Dsimic (talk | contribs) 05:00, 5 September 2014 (UTC)
The summary above includes ">184 million which properly rounds to 180 million". Three significant figures are not needed or justified. Johnuniq (talk) 07:02, 5 September 2014 (UTC)
Hm, how can 184,000 be rounded to 180,000,000? That's the question. :) — Dsimic (talk | contribs) 07:21, 5 September 2014 (UTC)
$9,200/$0.05 = 184,000 and the extra 1,000 is because one is in megabytes and the other in gigabytes. Johnuniq (talk) 09:40, 5 September 2014 (UTC)
Sorry but that simply isn't true, as we're comparing $0.05/MB and $9,200/MB. In today's HDDs is that a MB costs $0.05, not a GB. — Dsimic (talk | contribs) 20:36, 5 September 2014 (UTC)
My bad, it's $0.05 per GB for contemporary HDDs (though it seems a bit too low), I got a little confused. However, the third question on top of this section specified $0.05 for the MB price, I'll get it corrected. — Dsimic (talk | contribs) 21:09, 5 September 2014 (UTC)
IBM used the term BCD to refer to a six bit character set in which the digits 0-9 were encoded as 000000 through 001001; the term does not mean that the value of the right four bits is restricted to that range for other characters. Typically there will be 63 or 64 valid characters in a BCD character set. Shmuel (Seymour J.) Metz Username:Chatul (talk) 21:34, 9 September 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Perhaps one reason for this question being so hard to settle is that there are good arguments for all the options:

  • 3.75MB If one tried to store a .jpg file on an IBM 305 RAMAC there would only be room for a 3.75MB file.
  • 4.375MB The IBM 350 drive stored seven bit bytes. The decision to allocate one bit to parity is arguably a system design decision, not a disk drive limitation. If we had a working 350 and wanted to interface it to, say, a Raspberry Pi, one could easily put the interface upstream of the parity circuitry and store 4.375MB worth of .jpg files.
  • 5MB Nobody was storing images on computers in 1956. IBM 350 disk drives were used for storing numbers and characters and if we ran the same software today, say with a 305 emulator on a PC, we would store one character per modern byte. So an IBM 350 used in the normal way it was used then would be as useful as a 5 MB modern disk drive used the same way.

In the table in question we are reporting economic utility improvement ratios, and those ratios calculated in any of the ways mentioned are extremely large, almost incomprehensible, perhaps the most dramatic in human history. So my inclination would be to report them in the most conservative way, which would calculate the ratios on the basis that one character on the 350 is the economic equivalent of one 8-bit byte on a modern drive. So e.g. for "Starting with", I would say "5 megacharacters." However, I would then put a footnote at the bottom of the table, that said something like:

“* These ratios assume that one six-bit character on the IBM 350 is equivalent to one eight-bit byte on a modern disk drive. Comparing on a cost per bit basis, one should increase these ratios by a factor of 1.14 or 1.33 depending on whether one includes the seventh parity bit or not.”

In other words, here is a big number and arguably it could be even bigger.

As for rounding, again I would also be conservative and never round up. 180,000,000 is not excessive precision here.--agr (talk) 20:58, 5 September 2014 (UTC)

  • Using "5 megacharacters" sounds good to me, with the additional note explaining the details. — Dsimic (talk | contribs) 21:05, 5 September 2014 (UTC)
I disagree. There's no useful way to compare these so-called "megacharacters" to bytes. It's true there are use clashes in this, but unless we just forget about doing any comparision, we can only use the lowest common denominator -- data bits. Since RAMAC had a designated parity bit, I can only assume it wasn't sufficiently reliable without it. On a modern HDD, I can attempt to use Write-Long and Read-long commands to store extra data in the ECC field to a higher capacity, but I'd never be able to usefully retrieve the data I stored, so it can't count as storage capacity. Also, even if we try and get "smarter" about use, this would be countered by it's BCD focus that reduces effective capacity. But most importantly, by attempting this smarter use-case conversion, we'd now need a source that supports this, or we violate WP:SYNTH. --A D Monroe III (talk) 22:50, 5 September 2014 (UTC)
There is much confusion here over "BCD". IBM referred to their six-bit character code (which included almost 64 printable characters) as "binary-coded decimal". Their use of this term does not mean that only decimal digits were stored, so there is no "BCD focus that reduces effective capacity". The "zone bits" correspond to the 12- and 11-zone punches on the IBM punch card. See the character code table in the IBM 1401 article. Jeh (talk) 22:58, 5 September 2014 (UTC)
Hm, then maybe we can go with 3.75 MB and briefly describe how it was calculated in a note? I also don't count parity bits as something that should be used to store user-accessible data. Also, as Jeh already described, IBM practically misused BCD as a term (at least as we know it), as both letters and numbers were stored on RAMAC. Why would only numbers be stored? At least you can't sell something that can store only numbers to someone who wants to run a business with that thing. :) — Dsimic (talk | contribs) 23:24, 5 September 2014 (UTC)
My take: The parity bit should not get counted in usable capacity any more than do the CRC bits on a modern hard drive, or the parity stripes in a RAID array for that matter. So it is 5 million x 6 bits = 30 million bits. There are defensible arguments for calling this 3.75 million bytes. To refer to it as "five million characters" without explanation that these "characters" had only 64 possible values, not 256, is to invite a misunderstanding that the drive had the same usable capacity as a 5 MB Shugart drive, when, of course, it did not. Jeh (talk) 03:16, 6 September 2014 (UTC)
Yes, but "five million 6-bit characters" would be correct and also match the recommended usage. Zerotalk 04:05, 6 September 2014 (UTC)
I would then suggest "five million 6-bit characters, equivalent storage to 3.75 million 8-bit bytes", as "6-bit" by itself may not convey enough meaning to the non-technical reader. I would explicitly spell out "million" as "MB" is ambiguous, and besides, "five million BCD characters" is how IBM described the device iirc. Jeh (talk) 08:25, 6 September 2014 (UTC)
"8-bit" in "8-bit bytes" is pretty much redundant. — Dsimic (talk | contribs) 08:57, 6 September 2014 (UTC)
For all practical purposes today, yes. But it was not always so. Anyway, I think the phrasing I suggested has the advantage of parallel wording, and more directly conveys to the reader the reason for the 5 vs. 3.75 million difference. Good writing is not always about sweating things down to the absolute minimum number of words. Jeh (talk) 19:07, 6 September 2014 (UTC)
Yeah, that's why we also have "machine word" and (as you've suggested in the edit summary) "octet" terms. Agreed, if we take the route of explaining 6-bit characters and everything, an additional explanation could only help. — Dsimic (talk | contribs) 20:04, 6 September 2014 (UTC)
I don't agree that "8-bit byte" is redundant, although I prefer the term "octet". I've seen documentation that assumed byte sizes of 6, 7 and 12. Shmuel (Seymour J.) Metz Username:Chatul (talk) 21:50, 9 September 2014 (UTC)

I don't agree that bit count is the only common denominator for comparison. Addressable characters was arguably as important a metric. The 305 processed characters which back then came almost exclusively from punch cards, and their coding in 1956 did not require more than 64 bits. If one concocted a science fiction plot where one had to go back and fix an IBM 305 with a minimal modern drive, a 3.75 MB drive would not do while that 5 MB Shugart would work just fine. As for "I can only assume it wasn't sufficiently reliable without it" {the parity bit}, there is no reason to assume that. To begin with, the parity bit did not improve reliability. It only told you that a failure occurred. What were you supposed to do then? Stop processing, figure out what track was bad, find the original data and reload it? If the data on the drive was the result of several updates, that would be extremely tedious if not impossible. (Transaction logging was well into the future.) So if parity failures had occurred with any significant frequency, the drives would have been unusable. Also, I can't find any reference in any of the documents that the IBM 650 version of the drive, the 355, used a parity check. An alternate explanation for the parity bit is IBM's need to convince customers that it was safe to switch from punch card storage of their data, something customers had decades of experience with, to magnetic media. Note that the 6 bit BCD with parity format is exactly the same as IBM's 7-track tape format. The obvious solution for our article is to make clear the different comparison approaches. All I am suggesting is that when computing the huge improvement in drive capacity per dollar, a gigantic number, that we start with the most conservative number and then point out ratio is up to 33% bigger if one just looks at bits.--agr (talk) 21:41, 7 September 2014 (UTC)

Hm, if it's about replacing a RAMAC with the smallest possible modern HDD, there are no reasons not to create a translation layer that would map 6-bit "words" (or bytes) onto standard 8-bit "words" with no wasted bits; that way, 3.75 MB would still be enough as we can ditch parity bits and leave that to modern HDD's ECC functionality. Also, parity checks aren't usable for restoring corrupted data, for sure, but that at least made it possible to know that something was wrong with a RAMAC, and possibly repair it (I guess). — Dsimic (talk | contribs) 22:04, 7 September 2014 (UTC)
Arnold, a parity check certainly does improve reliability. A lot of read errors are ephemeral (electronic noise, physical vibration, etc), so the first step on getting the error is to try the read again. The chance of eventually getting the correct data off the disk is significantly greater than if there was no check. Maybe this was even done automatically by the device controller as all modern disk controllers do (do we have the documentation to determine that?). Another way that parity checks enhance reliability is that users can keep two copies of critical files and know which is correct if one gets a parity error. Zerotalk 01:17, 8 September 2014 (UTC)
FWIW, this is how IBM describes the characters of a 305 and the 350:

Within RAMAC, all data is read, transferred, and written serially by word. character, and bit. There are eight bit positions within each character position. They are identified as bits S, X. 0, 1, 2, 4, 8, and R. Bit S merely provides a space between the recording bit positions of each character, and is not used in the bit coding. Bit R has no numeric or alphabetic value, but is added to certain characters so that every character will have an odd number of bits. This convention makes possible a technique whereby RAMAC may perform a validity check on each character transferred.
See also Figure 86. for File [i.e. 350] Write and Read Waveforms.

RAMAC 305 Customer Engineering Theory Of Operations, IBM Corp, © 1959, p.7-8 and 85
Just as a not-so-important note, this quotation was already available in the section above. — Dsimic (talk | contribs) 23:54, 8 September 2014 (UTC)
A closer read of the IBM 305 documentation make it clear that IBM was not doing any automatic error recovery on the 305. Also the 305 only had room for 200 instructions total, so keeping live backups was unlikely (and they would have cut the capacity of the system in half). The 6-bit plus parity bit encoding was used throughout the 305 and any parity failure halted the machine. The 350 disk drive just transferred the 7-bit data from and to the CPU's 100-character magnetic-core data buffer. The IBM 305 operation manual says (Ref 4, p.72) "Each character that enters or leaves the magnetic-core unit is checked to insure that it contains an odd number of bits. Because all information transfers (except certain arithmetic operational transfers) take place through the magnetic-core unit, the machine will recognize an error whenever an inadmissible character is transferred. Any combination of bits that give an even count will stop the machine and turn on the parity check light."
The IBM 350 disk system instead achieved reliability with a read and compare after write system. "The file check is a check on the recording of information on the disks. Whenever a record is written in the disk storage, the machine automatically rereads the same record into the core unit. Then the record is read back from the disk storage track and compared, character by character, with the re-reading of the record in the magnetic-core unit. A difference in comparison causes the file error light to be turned on and stops the machine." (p.73) The operator could try the write operation again manually: "The operator may attempt the transfer again by depressing the check reset key and then the program start key." Note that this file check approach does not depend on the parity bit. So it would seem that the IBM 350 drive was a reliable mechanism for storing 7-bit data, hooked to a CPU that then used one of those bits for additional error checking.
Again, I am not saying any one basis for comparison is the right answer, just that there is a good argument for each, and when computing the massive multiple in cost improvement from 1956 to now we should start with the most conservative number. I would also point out that the difference between using 5, 4.375 or 3.75 meg as the comparison point probably presents less variation than there is in establishing "current" disk prices.--agr (talk) 03:43, 9 September 2014 (UTC)
A block of data usable by a system at the bit stream level (the 350 interface and most interfaces into the 1990s) consists of a stream of serial gap bits, followed by a stream of encoded data bits, followed by a stream of check (and now correction) bits[b] Today there are always many more encoded data bits than system usable bits; however block capacity is always stated in (system usable bits)/8 and drive capacity is then in multiples of these blocks and rounded at MB thru TB as appropriate. We don't know why the designers of the 350 recording channel chose to intersperse some of their gap bits, S, and their check bits, R, within their encoded data bits[c] but it really shouldn't matter since gap and check bits have never been counted to measure capacity available to a system. Since the 350 data bits are not encoded, the 350 sector has precisely 600 data bits per sector available to the system. Seventy five bytes per sector in today's terms, no more, no less.
I'm not sure how the bits (bytes/8) are character encoded matters. Sectors read lately off the 350 at the Computer History Museum are likely stored as Unicode (implying a 10 MB current replacement capacity) and since disk storage is free the sectors themselves maybe stored in 4k sectors (implying a 204.8 MB current replacement capacity) but the entire contents of the museum's 350 will fit into precisely 3.7504 MB of modern storage (7,325 512 byte sectors) or maybe 3.751836 MB (4k sectors) either rounding to 3.75 MB. Capacity required currently for replacement under various other coding schemes is interesting and might be worth discussing someplace but does it make sense in this summary table?
Read after write was easy in tape but very difficult in disk and it gradually disappeared as disk drives got more reliable.[d] Regardless, the 350 recording channel engineers achieved their targeted channel reliability and how it was achieved really shouldn't matter in determining capacity. BTW from a recording channel error rate the 350 was probably better than today's drives at a soft error rate (say 10-9 vs. 10-6) but worse at hard error rate (say 10-11 vs. 10-14). Overall reliability improvement is also about 3 orders of magnitude (and more if u count RAID) but isn't that a different parameter from capacity? Tom94022 (talk) 18:01, 9 September 2014 (UTC)
IMO 3.75 is the appropriate number, suitably footnoted as to derivation. In the end, I suppose I could reluctantly go along with using 5 Mchar as the divisor with a footnote to 3.75 MB or the other way around, but I see no justification at all for 4.375 (or 4.4). Tom94022 (talk) 18:01, 9 September 2014 (UTC)
Whatever we use to compare, it has to come from a source, not based on our own reasoning, even if (at seems clear to me in some cases) we can do more informed reasoning than the sources, lest we violate WP:SYNTH. Let's just review the sources, and make a selection based on those. --A D Monroe III (talk) 19:24, 9 September 2014 (UTC)
Hm, I'd say that unfortunately we can't simply extract the size of disk drive in megabytes from available sources, simply because it seems that back at the time it was much more important to express the capacity as an equivalent of punch cards, thus the only important thing was how many characters could be stored. As a matter of fact, that's what was the purpose of first disk drive, to replace punch cards. — Dsimic (talk | contribs) 22:26, 9 September 2014 (UTC)
There are sources for 3.75 MB, 4.4 MB and 5.0 MB so our task is to see if we can get consensus as to whether there is a genuine dispute among the sources or whether all sources of one or more values are unreliable. For example, we might arrive at a consensus that the sources citing 4.4 MB are unreliable in this context because according to a contemporaneous and highly reliable source, the IBM CE manual, the recorded characters were 8 bits not 7 bits and in calculating capacity both in 1956 (capacity then in characters) and today (capacity in bytes) only data bits count - therefore, using 7/8 is improper and such sources are not suitable for Wikipedia in this limited context. Tom94022 (talk) 05:59, 10 September 2014 (UTC)
We have a policy on this Wikipedia:These_are_not_original_research#Conflict_between_sources. Basically in a situation like this where there are quality sources that appear to conflict, we should explain the different views of the question and not try to pick winners or losers. That would take a lot less effort than all this debate.--agr (talk) 20:09, 10 September 2014 (UTC)
If we go with including stuff from multiple different sources, what should we take as the value used in the comparison table? The average of three values? — Dsimic (talk | contribs) 21:00, 10 September 2014 (UTC)
I suggest using the most conservative number, 5 million, which produces ratios that everyone can agree represents a minimum net improvement. Then add a footnote below the table that says something like: “* These ratios are based on equating one six-bit character on the IBM 350 with one eight-bit byte on a modern disk drive. Comparing on a cost per bit basis, one should increase these ratios by a factor of 1.14 or 1.33 depending on whether one includes the seventh parity bit or not.”--agr (talk) 21:19, 10 September 2014 (UTC)
FWIW, that's something I could live with. — Dsimic (talk | contribs) 21:28, 10 September 2014 (UTC)
The cited policy states, "If reliable sources exist which show that another apparently reliable source is demonstrably factually incorrect, the factually incorrect material should be removed." We should be able to come to a consensus as to the demonstrably factual accuracy of 3.75MB, 4.4MB and/or 5.0MB as the equivalent capacity in modern terms. I believe this particularly applies to 4.4 MB for which I can find no support for including parity in a capacity calculation, either in characters or bytes. see also WP:Inaccuracy
Since data bits are the lowest common denominator between 350 characters and today's bytes I suggest it is the better basis for comparison and should be the value used in the comparison table. It is certainly POV to select the primary value based upon a desire to show a minimum net improvement. It appears we agree 3.75MB is factual.
Whether we footnote the 5.0MB or not depends upon whether we can find a reliable source for the equivalence of one modern byte to one 305 character when measuring capacity. To a certain extent it is a bit of apples and oranges since a byte can have different character sets depending upon its code page. Did the 305 support binary operations (I'm going to research this)? if so, then perhaps we are falling into a semantics trap since if one compareds 6 bit words to 8 bit words I doubt if anyone would say they were equivalent. Furthermore, to the best of my recollection, most of the 5.0 MB sources confuse character and byte, a demonstrably factual inaccuracy. As I said above, I guess I can live with 5.0MB in a footnote, but now I would add if we can find a reliable source for the equivalency. Tom94022 (talk) 22:18, 10 September 2014 (UTC)


──────────────────────────────────────────────────────────────────────────────────────────────────── Arnold, I am not sure what u meant by "addressable characters" and how that can be used to normalize capacity. FWIW the 305/350 only had 48 characters (including "blank") whose code map looks nothing like ASCII or anything modern, e.g "1" is 01H in 350 vs 60H in ASCII, "A" is 03H vs 41H, etc. There is a mysterious to me 350 symbol "□" with bit code 27H which may not exist in ASCII although certainly 27H is a byte. Storage is character agnostic, there would have to be a translation layer from a modern byte to the 305/350 bit code and that translation layer would equally work with packed 305/350 characters as well as unpacked characters. So yes in our science fiction plot a 3.75 MB HDD would store the entire contents of a 350 in a packed format and perform well as a 350 emulator (keep in mind the 350 read and wrote 600 data bits per block while a modern disk drive reads and writes 4096 or 32,768 bits per block so the fictional emulator has to do lots of parsing in its translation layer). If u meant addressable disk storage locations in a 350 then we all agree it is 5 million, but I don't see how the mapping of the bit values to characters is relevant to normalizing. Its the size of the location that counts, we state RAM size in bytes regardless of whether they are accessed in 32, 64 or 128 byte chunks. Isn't the size of the storage location in bits the best analogy? Tom94022 (talk) 05:54, 11 September 2014 (UTC)

Tom, here is the addressable character argument. The ratio we are arguing about is an economic one, and economic comparisons between different eras are typically done based on the normal practices in each era. The normal practice these days is to store one character per octet. If we were converting a 305 system to modern technology we’d just convert to ASCII or UTF-8. (BTW that mysterious 350 symbol "□" with bit code 27H was called lozenge and has a code point in Unicode.) As of a couple of years ago there was a company in Texas that still used punched card inventory control with IBM 402 accounting machines. When they eventually cut over they will undoubtedly convert their punched card records to octets on a one for one basis (unless they go to Microsoft software which uses UTF-16 in which case they’ll need 2 octets per character). The character conversion needed is just a 48 character lookup table, much less complex than the the translation layer you posit (try designing an 8-bit to 6-bit converter in 1956 vacuum tube logic). Note that ASCII is a 7-bit code, and it is hardly common practice to use such a translation layer. So that 5 mega-character drive back in 1957 arguably was providing the same economic benefit as a 5 megabyte drive would today.
I agree with u that an economic analysis is one way to look at it; however, I conclude such an analysis arrives at 3.75 MB as the appropriate capacity. Storage is character agnostic and arguably today it takes two octets per character. All drives today are bit serial just like the 350. There is not necessarily a table look up in either scheme if the characters were stored in their native bit form either 6 bits/char or 8 bits/char. In either scheme the hypothetical controller would have to generate sector marks, insert a leading gap since the first character cannot start at the sector mark and strip the 600 or 800 bits out of a longer bit stream. The 6 bit/char version would have to generate parity and insert the blank a rather trivial operation with today's microcontrollers. If for some reason the controller designer wanted to store the characters in today's unicode then it would require 10MB (thanks to the lozenge) and require a complex table look up - altogether not a likely implementation. So any modern drive greater than 3.75 MB would provide the same economic value and since we are talking economics shouldn't we go with the lowest value? Tom94022 (talk) 19:36, 11 September 2014 (UTC)
The question of seven bits vs six bits depends on where you draw the line between the 350 disk drive and the 305 processor. Remember, the disk drive units themselves neither generated nor checked parity bits. That was done in the CPU’s core memory buffer, which was used for multiple functions besides interfacing to the disk drive. There is no reason to think the parity bit was included just to look for disk errors; remember this was a vacuum tube machine and tubes failed regularly. So it’s perfectly reasonable to describe the 350 as a 7-bit disk drive hooked up to a computer that operated on 7-bit bytes that include 6 data bits and one parity bit. That is the view taken by several reliable sources and it is not objectively wrong. It also fits with essentially the same disk drive being used with the IBM 650 to store 7-bit decimal numbers in bi-quinary format. if some museum succeeded in restoring a 350, it could be used to store 7-bit data directly. That’s not to say the six bit view is wrong either, reliable sources take that view as well, and the article can and should present all three viewpoints.
I don't know of any disk drive recording channel ever that did not include some form of error checking, parity, CRC and/or ECC. You really have no basis for expecting a restored 350 to reliably store 7 bit data directly. An imperfect analogy is the RLL versions of the ST506; by changing from MFM to RLL it would record 50% more data, but not reliably (as many hackers found out). This is WP:OR and unconvincing to this old recording channel engineer.
The line drawn is at the interface which at the data interface is essentially the same for all drives into the 1990s, that is bit serial data. The seminal ST506 has a bit serial data interface with a generally accepted specified capacity of 5 MB but in fact the raw capacity is up to 6.2 MB with the difference represented in check bits and gap bits none of which are counted in the ST506 specified capacity and should not be counted in converting the 350. A very reliable source, IBM, tells us that the S bit is always zero and that the R bit "has no numeric or alphabetic value." In memory and storage capacity specifications parity or ECC bits are not included in specified capacity - think SDRAM with ECC, DRAM with parity, any disk drive (including the 350 as described by IBM as 5 million 6 bit characters, not 8 bit characters) so without explanation any calculation using 7 bits is a mathematical error which should not be reproduced in Wikipedia.
I find it interesting that the 7 bit advocates ignore the 8th bit. IBM describes the 350 as a 5 million character machine have an 8 bit recorded character. Why is the P bit counted but the S bit not counted? Both are there, neither contributes to the character definition. My guess is that 7 bits is an urban legend generated by someone who didn't read the book. In any event, an advocate of seven bits is not accurately describing the 350 and this is another reason to say any such calculation is factually inaccurate and not suitable for Wikipedia. Tom94022 (talk) 19:36, 11 September 2014 (UTC)
When we calculate a comparison between the 350 and modern drives, we are pushing the OR boundary and it is reasonable to start with the most conservative view and then point out the number could be higher if calculated in different but reasonable ways. And we haven’t even gotten to putting in the effects of inflation.
As for the 305’s ability to operate on binary data, that is a story in itself. The 305 could only add and subtract. It did all its compare and testing operations via its plugboard control panel, where everything was done in punch card code using relay logic. The two high order data bits corresponded to the 12 and 11 rows on a card and those could be tested individually. The low order four bits were converted into the digits 0 to 9, so they could not be tested directly.
Anyway Tom, I am not saying that your way of looking at this is wrong, just that there are other reliably sourced ways that have a reasonable basis and we should present them all. —agr (talk) 14:52, 11 September 2014 (UTC)
I've designed, known and/or used many memory and storage devices over the years and I cannot think of a single memory or storage device where the specified capacity included check or gap bits. Word lengths do differ but even when they I can't think of a single modern case where the specified capacity was not expressed in the common language of bytes (with binary or decimal prefixes). For example, the DEC PDP10 used a 36 bit word but it's disk drives were specified in both words and bytes (ignoring gap and check bits) converting on a bit basis. Similarly a 2GB SDRAM has 2GiB regardless of whether the physical interface is 32, 64 or 128 bits and independent of the number of bits used for checking. Finally, a 750GB HDD provides 750 GB of user data whether it is bit serial (SATA) or word serial (PATA) and regardless of the number of spare, check and gap bits. It's tough to prove a negative, but we have a whole bunch of history that suggests parity and gap bits do not count in specifying storage capacity and no reliable source that says why they should be.
I am saying that 4.4 MB is factually incorrect by a number of tests and cannot not be used. There is an argument to footnote the 5 million characters (not 5 MB), but by any other test the capacity in modern terms is 3.75 MB Tom94022 (talk) 19:36, 11 September 2014 (UTC)
If someone builds a dedicated word processor that uses a standard hard drive just to store data, with everything including the file system encoded as 7-bit ASCII with parity, would you say the hard drive was now a 7-bit drive?--agr (talk) 13:44, 12 September 2014 (UTC)
In that case, it's up to the word processor for using such a storage layout design, and that implies nothing to the underlying HDD. As an opposite example, what if someone used flash-based storage with such a word processor? As we know, capacities of flash-based storage products are also expressed in user-addressable bytes, despite the fact virtually all such products include some amount of overprovisioning (even over 30%) and keep bending over backwards to present such an awkward internal structure through a nice and clean external interface. — Dsimic (talk | contribs) 17:42, 12 September 2014 (UTC)
No, a drive storing 8 bit bytes in blocks of 512 or 4096 bytes has its capacity measured in bytes. How many 7 bit characters it stores is another question and depends upon implementation. One way is to throw away one bit per byte and map 1 character into 1 byte; the drive capacity in MB is unchanged but the capacity in characters is the same number 7/8th that. An other way is to compact strings of 7 bit characters into the standard size blocks, so a 4096 byte block stores 4,681 characters with a negligible loss of .003% so that the capacity in characters is essentially 8/7th that in bytes - the capacity in bytes is unchanged. To make it a 7 bit drive one would have to store an integer number of 7 bit characters per block.

──────────────────────────────────────────────────────────────────────────────────────────────────── In summary of where we might be at

3.75 MB has reliable sources and is factually accurate. I believe all except maybe the IP agree.
4.4 MB
I contend that although there are several sources for this number it is factually inaccurate, as follows:
IBM states a 350 character is recorded as 8 bits of which 6 are data bits (source: contemporaneous IBM CE Manual)
IBM discloses the 305 has a set of 48 characters mapping into 6 data bits (source: contemporaneous IBM Programming Manual)
Information theory confirms that 48 characters map into 6 data bits. (no source right now, but can this be disputed)
One data byte as used in storage capacity specification has 8 data bits. (no source right now, but can this be disputed)
It is factually inaccurate to equate one 350 character to 7 data bits. While there are multiple sources that make this equivalence none explain their reasoning. All of these sources are much later than the contemporaneous IBM documents and none state their source for 7 bits. Accordingly the use of 7 bits must be considered factually inaccurate and any calculation based thereupon must be excluded according to Wikipedia policy.
5.0 MB
I agree that if the 305 characters are mapped as recorded (8 bits) into a modern drive drive it would require 5 MB. Unfortunately I cannot find a reliable source that says this. Everyone I've looked at just makes an explicit statement, many just say "The HDD weighed over a ton and stored 5 MB of data" (31,100 hits)

I suggest that both 4.4 and 5.0 have reached the point of urban legends, repeated without verifying the underlying facts. Nonetheless, I can accept going with 3.75 and $9,200 in the table with a footnote that reads something like:

"Other equivalent capacities have been reported such as 5.0 MB which corresponds to a one-to-one mapping of the recorded 350 character bits into a byte. Using 5.0 MB would reduce the price/MB from $9,200 to $6,900."

This assumes the 31,100 hits did do their due diligence, otherwise it might be OR :-) Tom94022 (talk) 19:06, 12 September 2014 (UTC)

Per your first sentence, I think there is a reasonable argument that it' was the 305 designers' choice to use a character code with a parity bit, and that choice implies nothing about the underlying 350 HDD, which neither generates nor detects the parity bit and hence can be regarded as a 7-bit drive. Maybe you buy that view maybe you don't, but it is certainly not "objectively wrong" as Tom claims and there is no basis to reject the sources who take that viewpoint.--agr (talk)
Arnold, your "designers' choice" statement is speculative and at best improbable WP:OR. Virtually all disk drives until the 1990s were incapable of distinguishing bits, be they gap bits (including header), check bits (parity, CRC or ECC) or encoded data bits yet drives were for the most part specified in terms of data (user) bytes with a given format and channel code. Again the ST506 was specified with MFM and any controller that could write RLL would apparently increase the capacity by 50% but as many found out it wasn't reliable and voided the warranty. You might have a better argument about the unused S bit; maybe it could have been used or maybe not, that is, unpersuasive to me original research and although u can make an original research argument here u need to gain consensus before it can be used in an article. I don't see why it is so hard, IBM said 6 data bits, 8 channel bits - 7 bits is not supported by any reliable source (albeit there are a lot of people repeating this urban legend). Tom94022 (talk) 21:18, 12 September 2014 (UTC)
The use of the IBM 350 parity bit as a data bit violates IBM's implicit specifications for the drive so such a calculation cannot be used as a basis for establishing a 4.4 MB capacity. Sort of like saying the ST506 was a 7.5 MB drive because a controller could write and read RLL data at the drive interface. I have a long explanation of why at my sandbox and if anyone wants to discuss my reasoning they can do it here or there.
Yet one more way of looking at things! I hope we all agree that MB means one million data bytes where the byte has 256 unconstrained states. According to IBM the IBM 350 recorded character has 6 data bits of which 48 states are used but 64 are available. So dimensional analysis goes like this:
(5,000,000 char/IBM350) * (6 data bits/char) / (8 data bits/data byte) / (1,000,000 data byte/million byte) = 3.75 MB of data
Advocates of any other use some (other bits/char) dimension to arrive at a number that is (other bits/data bits) MB. Perhaps an interesting and relevant number but not without context of what are the "other bits" and in what context are they meaningful. In this context (recorded bits/char) could be meaningful but (data bits + parity bit)/(char) doesn't have any meaning I can think of. Tom94022 (talk) 21:38, 13 September 2014 (UTC)

Yet another way of looking at the 350 Capacity[edit]

Modern disk drives are specified by the disk drive vendor in data bytes available to a system (1 data byte = 8 unconstrained data bits). IBM at that time specified the 350 as 5 million characters and that at the drive's interface there were 6 unconstrained data bits per character. For Wikipedia purposes this should be the primary value with any other value, particularly something published 50 years later subject to explanation.

There are many bytes and bits under the cover of a modern disk drive but they are not disclosed by the vendors and most are not accessible by a system, but even when they are, they are not normally used by the vendor in specifying the drive's capacity. For some time some serial bit drives were specified by the vendors in two capacities, unformatted and formatted[e]. Even then it was the formatted capacity that most often defined the drive, two examples:

  • The ST506 is generally accepted as a 5.0 MB disk drive; it was specified by Seagate as a 5.0 MB formatted and 6.38 MB unformatted. Using the constraints of the Seagate format it is possible to build a 6.1 MB ST506 and I am sure someone did. However, even if there was a reliable source for the a 6.1 MB ST506 it would at best rate a footnote in the ST506 article. Note the ST-506 article states, "5 megabytes after formatting."
  • The IBM 3330 with its 3336-1 is generally accepted as a 100 MB disk drive even though IBM's publications acknowledge that this is with an IBM full track record of 13030 bytes per track, its capacity is less at any number of records per track greater than 1 and the stated capacity did not include 7 spare tracks . IBM's public maintenance literature for the subsystem discloses an unformatted track length of 13440 bytes/track corresponding to an unformatted capacity of 105 MB. DEC used the identical pack in a fixed block mode and only stored 83 MB. While there are reliable sources for capacities other than 100 I would argue most are of undue weight but some may be worthy of a footnote. Note the 3330 Section states "Its removable disk packs held 100 MB (404x19x13,030 bytes)"

Thanks to the unpublished original research of the RAMAC Restoration team we know the unformatted capacity of the IBM 350 was "about" 5000 bits/track for unformatted IBM 350 capacity of 6.25 MB. It doesn't matter whether this number is exact or not, because it does allow us to put each of the known 350 bits in proper context. The 350 consisted of:

3.750 MB of formatted capacity (disclosed as 6 data bits per character)
0.625 MB of parity bytes (disclosed as 1 parity bit per character)
0.625 MB of space bytes (disclosed a 1 space bit per character)
1.250 MB of other gap bytes (not disclosed but some number is inherent in magnetic recording)

Totaling

6.250 MB of unformatted capacity (from RAMAC Restoration team)

It is factually correct that 4.375 = 3.75 + 0.625 and rounds to 4.4 but in the context of a 350 it is an incomplete statement of the unformatted capacity so without such context it is obviously factually incorrect, but even if there was a reliable source that placed it in context using 4.4 MB would violate undue. We don't know how many check bytes are in a modern drive, we know how many in similar bit serial drives but don't publish such in Wikipedia, why do we care about those in the 350?

It is factually correct that 5.0 MB is the modern capacity needed to map the 8 bit recorded character on a bit for bit basis into an 8 bit byte of a modern disk drive. I'm not sure there is a reliable source for this mapping and I don't think is it particularly relevant but I could accept placing it in a footnote if there is consensus that it should be used.

I will change the article to note the Capacity is "formatted"

Can I now take the dubious tag off the article? Tom94022 (talk) 20:28, 15 September 2014 (UTC)

No, this is disputed. The RAMAC350 capacity is 4.4 MB according to Claus Mikkelsen of IBM (another direct source, not calculated) who wrote that RAMAC 305 had "4.4 MB usable capacity."[23]

The following material was copied from the following section so as to separate two threads ----- Tom94022 (talk) 20:37, 18 September 2014 (UTC)

Also, that reference contradicts itself on 305 RAMAC capacity, later listing it as "5MB of storage" on two different pages. It also says its disk platters where '1" thick'! They were certainly not anywhere near 1 inch thick, as shown in the pictures right above this claim. --A D Monroe III (talk) 17:10, 16 September 2014 (UTC)
Also, a 2014 source asserting 4.4 MB without explanation is of questionable reliability with regard to a product last documented in the early 1960s wherein the manufacturer never used such a number. So anything other than 5 million characters is a calculation and to be reliable the bases should be disclosed. Furthermore, the 355 was rarely if ever called a RAMAC by anyone so the 650 discussion is a red herring. Anyhow, I have contacted both authors of the IPs latest unreliable source to see what they say. Tom94022 (talk) 18:02, 16 September 2014 (UTC)
I agree. There is reason to support "5 million 6-bit characters"; we can say "3.75 million bytes"; we could even say "30 million bits". We have to regard the "4.4 million bytes" claim as just mistaken. Jeh (talk) 19:21, 16 September 2014 (UTC)
I agree that there were definitely 5 million characters. But each character was not 6 bits. According to the references provided, including Al Shugart himself[24] and [25] each character was 7 bits. This 1960 IBM document says IBM computers used Bi-quinary coded decimal (7 bits per character).[26]
Another red herring, we all agree that the 650 used Bi-quinary coded decimal - the article is about disk drives not systems. You should read Section XIV Code Translation, what we are doing is translating to an 8 bit unchecked Byte code which turns out to be easy since IBM says their were 6 data bits per character in the 350.
The full Al Shugart transciption states "7-bit code - 6-bit + 1 binary" Al probably said "parity" but we know this is an incomplete recollection in 2001 of something Al hadn't work on in more 50 years. Again not a particularly reliable source Tom94022 (talk) 00:11, 17 September 2014 (UTC)


Relevance of IBM 355 and IBM 650[edit]

Consider the following pictures: a direct way of looking at the 350 capacity clearly visible on the operator panel. They show how the IBM 650 and RAMAC represented each seven-bit digit (corresponding to 4.4 MB total RAMAC305 capacity) as a Bi-quinary coded decimal. Here are two references that support Bi-quinary coded decimal:[27][28]

IBM 650 – seven bits

—Two bi bits: 0 5 and five quinary bits: 0 1 2 3 4, with error checking. Exactly one bi bit and one quinary bit is set in a valid digit. In the pictures of the front panel below and in close-up, the bi-quinary encoding of the internal workings of the machine are evident in the arrangement of the lights – the bi bits form the top of a T for each digit, and the quinary bits form the vertical stem.

(the machine was running when the photograph was taken and the active bits are visible in the close-up and just discernible in the full panel picture)

IBM 650 front panel. Value is 05-01234 Bits

IBM 650 front panel

Close-up of IBM 650 indicators
\ 0 \\ 10-10000
\ 1 \\ 10-01000
\ 2 \\ 10-00100
\ 3 \\ 10-00010
\ 4 \\ 10-00001
\ 5 \\ 01-10000
\ 6 \\ 01-01000
\-----quinary
\ 7 \\ 01-00100
\ 8 \\ 01-00010
\ 9 \\ 01-00001

71.128.35.13 (talk) 01:00, 16 September 2014 (UTC)

I'm not aware of any dispute about how the 650 stores data on the drum or in core, and I see nothing in the cited references to suggest that the character format of a 350 attached to a 305 is anything but six bits plus parity. There is certainly nothing in either reference to suggest that bi-quinary is relevant to the 305 or 350.
No parity bit was used here according to Tom09422 as of 01:11, 10 September 2014 (UTC) with reference to the 650 manual of instruction, "If as is likely it uses a bi-quinary coded decimal code which in modern terms is a self checking 7 bit channel code then no parity would be required." The RAMAC interfaced to the 650 computer, and stored punched card data in a 7 bit Bi-quinary coded decimal format. Here is an IBM reference dated 1960 that describes in great detail the error checking implemented with Bi-quinary coded decimal on the RAMAC computers of that time period:[29]
Please do not misleadingly paraphrase me; there is no evidence regarding the use of a parity bit in the 355; all I said is it would not be required. Frankly I suspect they used a space bit and 7 Bi-quinary coded decimal bits, but all of this is a red herring since the reference is to the beginning of HDDs, the first disk drive, the 350 not the 355. Although there is no evidence that IBM ever formally called either the 350 or the 355 a RAMAC, your discussing them as one could lead to misunderstandings so please identify the drive you are discussion. Tom94022 (talk) 00:11, 17 September 2014 (UTC)
Au contraire, this usage is not a "red herring," because it is sanctioned by Big Blue. There is indeed documentary evidence that IBM formally called the 350 a "RAMAC," and my discussing them as one is fully sanctioned by the RAMAC 305 Customer Engineering Manual of Instruction, which says on page seven that they are integral: "Development of a machine to perform accounting functions by in-line processing has long been desired. However, the most fundamental requirement of such a machine is its ability to read, alter, and replace any of the file records in any random sequence. Such a machine was not practical until the development by IBM of the 350 Random Access File. This file is an integral part of the RAMAC." 71.128.35.13 (talk) 23:47, 17 September 2014 (UTC)
Regarding "an IBM reference dated 1960 that describes in great detail the error checking implemented with Bi-quinary coded decimal on the RAMAC computers of that time period", it does not say that. It references only the IBM 650. Not the 305. The same manual also describes many other coding schemes that were used on other IBM computers. Your statement makes it sound as if the manual talks only of BQCD and says it was used on all RAMAC computers; that is a misrepresentation of the source. Jeh (talk) 00:08, 19 September 2014 (UTC)
Regarding "RAMAC 350", that is simply not a product name that IBM ever used. That the 350 "random access file" was "integral" to the RAMAC 305 does not mean that "RAMAC 350" was a valid product name. "Fully sanctioned"? Nonsense. That's just you jumping to conclusions. It would be very helpful if everyone here would stick to actual product names. "RAMAC 350" makes it unclear whether you're referring to the disk drive (the 350) or the computer (and typo'd the number). Jeh (talk) 00:59, 19 September 2014 (UTC)
One strange thing is that your first reference cites the RAMAC as the fastest disk drive ever made, when in fact is was the slowest disk drive shipped by IBM and slower than any other disk that I'm aware of with the exception of the RCA Data Record File. Shmuel (Seymour J.) Metz Username:Chatul (talk) 15:27, 16 September 2014 (UTC)
The fastest hard drive ever made characterization was written by an IBM expert in 2014. This may be understood to mean that this product was the fastest as of the 1956 date of manufacture. This is also a truism, because there was no prior hard drive (no competition). [30] 71.128.35.13 (talk) 23:04, 16 September 2014 (UTC)
Also, that reference contradicts itself on 305 RAMAC capacity, later listing it as "5MB of storage" on two different pages. It also says its disk platters where '1" thick'! They were certainly not anywhere near 1 inch thick, as shown in the pictures right above this claim. --A D Monroe III (talk) 17:10, 16 September 2014 (UTC)
Also, a 2014 source asserting 4.4 MB without explanation is of questionable reliability with regard to a product last documented in the early 1960s wherein the manufacturer never used such a number. So anything other than 5 million characters is a calculation and to be reliable the bases should be disclosed. Furthermore, the 355 was rarely if ever called a RAMAC by anyone so the 650 discussion is a red herring. Anyhow, I have contacted both authors of the IPs latest unreliable source to see what they say. Tom94022 (talk) 18:02, 16 September 2014 (UTC)
I agree. There is reason to support "5 million 6-bit characters"; we can say "3.75 million bytes"; we could even say "30 million bits". We have to regard the "4.4 million bytes" claim as just mistaken. Jeh (talk) 19:21, 16 September 2014 (UTC)
I agree that there were definitely 5 million characters. But each character was not 6 bits. According to the references provided, including Al Shugart himself[31] and [32] each character was 7 bits. This 1960 IBM document says IBM computers used Bi-quinary coded decimal (7 bits per character).[33]
If you view the RAMAC Oral History Project video at about 1:09:36 you will hear Al correct himself to 6 data bits + 1 parity bit; he apparently forgot about the 8th bit. According to an email conversation with an author of the Share citation, 4.4 MB came from a sign he observed without any further review. Neither is a reliable source for 1950s technology Tom94022 (talk) 23:54, 18 September 2014 (UTC)
Another red herring, we all agree that the 650 used Bi-quinary coded decimal - the article is about disk drives not systems. You should read Section XIV Code Translation, what we are doing is translating to an 8 bit unchecked Byte code which turns out to be easy since IBM says their were 6 data bits per character in the 350.
The full Al Shugart transciption states "7-bit code - 6-bit + 1 binary" Al probably said "parity" but we know this is an incomplete recollection in 2001 of something Al hadn't work on in more 50 years. Again not a particularly reliable source Tom94022 (talk) 00:11, 17 September 2014 (UTC)
The RAMAC305 hard drive, has 7 bits per character. You should read the RAMAC 305 Customer Engineering Manual of Instruction (page 9), specifically with regard to the RAMAC305 process drum, “Each character position is further broken down into the 7 bit system of coding.”[34] 71.128.35.13 (talk) 23:47, 17 September 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────Each character represents the same information, no matter whether it is in the core memory, on the RAMAC hard drive, or on the punched card. According to the 1957 RAMAC305 Manual of Operation, “The IBM RAMAC is built around a random-access memory device that permits the storage of five million characters of business facts in the machine. In effect, the machine stores the equivalent of 62,500 80-column IBM cards.”[35]

What was on those cards? This would determine how much the RAMAC could store. It had five million characters on 62,500 punched cards: (5e6 / 62500) = 80 characters per punched card. Punched cards had 7 bits per bi-quinary coded decimal character as indicated in this reference: “The standard 80-column punchcard … stored about 70 bytes of data“[36] The RAMAC 305 Customer Engineering Manual of Instruction (page 9) likewise states that with respect to the 305 process drum “Each character position is further broken down into the 7 bit system of coding.”[37] So, total storage is (62,500 punched cards)[38] * (70 bytes per punched card)[39] = 4.4 MB, or (7 bits per character)[40] * (5 million characters)[41] * (1 byte / 8 bits) = 4.4 MB.

Numerous sources say that IBM650 and RAMAC had 7 bits per bi-quinary coded decimal character, “Each digit was represented in seven bit "bi-quinary" notation: one bit out of 5 represented a value from zero to four; one bit out of two indicated whether or not to add 5 to that value, giving the electronic equivalent of the abacus.”[42][43]

The American Society of Mechanical Engineers confirms that the RAMAC hard drive “contained a total capacity of 5 million binary decimal encoded characters (7 bits per character) of storage.”[44] This is 4.4MB (5e6 characters * 7 bits / character * 1 byte/ 8 bits). This was a “usable capacity” of 4.4 MB (not raw unformatted) according to an IBM expert.[45]

IBM equipment of that time, including the 650, RAMAC305 and RAMAC350 used bi-quinary coded decimal characters. Characters were the bedrock, the common foundation. The seventh bi-quinary coded decimal bit is not a parity bit. As Al Shugart (another ex-IBM RAMAC expert) said with respect to RAMAC305, “The coding used on the disk drive was a 7-bit code - 6-bit + 1 binary.”[46] 71.128.35.13 (talk) 23:47, 17 September 2014 (UTC)

No doubt about it: Bi-quinary coded decimal has 7 bits per character. 71.128.35.13 (talk) 23:25, 16 September 2014 (UTC)
And the 650 computer used bi-quinary, because they wanted a machine that did arithmetic in decimal, and electronics to do addition and subtraction in bi-quinary were easier than those for BCD.... but so what? Do you not understand that the character coding in an I/O device can be, and often is, different from that stored in the computer to which the device is attached? Punch cards have possible 12 bits per character position, but that doesn't mean the characters take 12 bits each to store when the computer reads them.
We have ample good references that the "five million characters" quoted for the 350 referred to six-bit characters. We have no good evidence that the 350 stored 7 bits per character, unless we count the parity bit... and since we don't count the ECC bits when describing modern hard drive capacities, we shouldn't count the parity bit in the 350. As for the 650 - the drive attached to the 650 was actually a 355, not a 350. All of this handwaving about the data format on the 355 is irrelevant to the 350. We can expect similar technologies, but there's no reason to expect them to be identical. So even if the 355 does store digits in bi-quinary form (to match the computer to which it was attached) there is no reason to extrapolate that to the 350. Jeh (talk) 01:04, 17 September 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────Here's some good evidence from the RAMAC 305 Customer Engineering Manual of Instruction (page 9), specifically with regard to the RAMAC305 process drum, “Each character position is further broken down into the 7 bit system of coding.”[47]

That just about wraps it up: each RAMAC305 character has 7 bits. 71.128.35.13 (talk) 23:47, 17 September 2014 (UTC)

(Again and again and again...) The "RAMAC 305" refers to the computer (and its built-in drum storage). That says nothing conclusive about the data format on the 355 disk. Nor is there any reason to think the 355 disk and the 350 disk (the subject of this argument) were the same in this detail. If "that wraps it up", then I guess you're out of arguments, because this does not provide any evidence that the 350 held anything but 5 million six-bit characters. Jeh (talk) 23:55, 17 September 2014 (UTC) Apologies, I was confusing the "RAMAC 305" computer with the 650. It's the 650 that uses the 355. More in a few minutes. Jeh (talk)
OK, looking at the manual you linked... I think you did not read enough. "Seven-bit system of coding" is indeed mentioned on page 9. But this does not support your claim. The coding scheme is clearly illustrated on page 8. Certainly there are seven bits per character. But near the top of page eight, first column, it says
"There are eight bit positions within each character position. They are identified as bits S, X, 0, 1, 2, 4, 8, and R. Bit S merely provides a space between the recording bit positions of each character, and is not used in the bit coding. Bit R has no numeric or alphabetic value, but is added to certain characters so that every character will have an odd number of bits." (emphasis added)
In other words, it's a six bit code plus a parity bit (bit R) and another bit (bit S) for "spacing".... eight bits total. There is no indication that the 350 disk unit (they call it a "file") stores characters in any other fashion.
Looks like my conclusion was valid anyway: If "that wraps it up", then I guess you're out of arguments, because this does not provide any evidence that the 350 held anything but 5 million six-bit characters... if we're not counting the parity bit or other "overhead" bits. (And we shouldn't, just as we don't count ECC bits in a modern HD.) Jeh (talk) 00:40, 18 September 2014 (UTC)
Look again. There is no parity bit in Bi-quinary coded decimal: it's a binary bit. It's very clearly seven bits per character in the Bi-quinary coded decimal wikipedia article.
At the risk of repetition, this replies to your claim of six bits per character. It should be noted that you haven't deigned to provide here any reference supporting the claimed six bits per character. Mountains of evidence, from the RAMAC 305 Customer Engineering Manual of Instruction[48] to Al Shugart[49] to Claus Mikkelsen of IBM[50] support seven bits per character.
Keep in mind that the RAMAC 305 is an "integral" part of the IBM350, according to the RAMAC 305 Customer Engineering Manual of Instruction: "Development of a machine to perform accounting functions by in-line processing has long been desired. However, the most fundamental requirement of such a machine is its ability to read, alter, and replace any of the file records in any random sequence. Such a machine was not practical until the development by IBM of the 350 Random Access File. This file is an integral part of the RAMAC."[51]
Each character represents the same information, no matter whether it is in the core memory, on the RAMAC hard drive, or on the punched card. According to the 1957 RAMAC305 Manual of Operation with respect to the hard disk drive, “The IBM RAMAC is built around a random-access memory device that permits the storage of five million characters of business facts in the machine. In effect, the machine stores the equivalent of 62,500 80-column IBM cards.”[52]
What was on those 62,500 cards inside the RAMAC350? This would determine how much the RAMAC350 could store. It had five million characters on 62,500 punched cards: (5e6 / 62500) = 80 characters per punched card. Punched cards had 7 bits per bi-quinary coded decimal character as indicated by this reference: “The standard 80-column punchcard … stored about 70 bytes of data“[53] The RAMAC 305 Customer Engineering Manual of Instruction (page 9) likewise states that with respect to the 305 process drum “Each character position is further broken down into the 7 bit system of coding.”[54] So, total storage is (62,500 punched cards)[55] * (70 bytes per punched card)[56] = 4.4 MB, or (7 bits per character)[57] * (5 million characters)[58] * (1 byte / 8 bits) = 4.4 MB.
Numerous sources say that IBM650 and RAMAC had 7 bits per bi-quinary coded decimal character, “Each digit was represented in seven bit "bi-quinary" notation: one bit out of 5 represented a value from zero to four; one bit out of two indicated whether or not to add 5 to that value, giving the electronic equivalent of the abacus.”[59][60]
The American Society of Mechanical Engineers confirms that the RAMAC hard drive “contained a total capacity of 5 million binary decimal encoded characters (7 bits per character) of storage.”[61]
This is 4.4 MB (5e6 characters * 7 bits / character * 1 byte/ 8 bits). This was a “usable capacity” of 4.4 MB (not raw unformatted) according to an IBM expert.[62]
IBM equipment of that time, including the 650, RAMAC305 and RAMAC350 used bi-quinary coded decimal characters. Characters were the bedrock, the common foundation. The seventh bi-quinary coded decimal bit is not a parity bit. As Al Shugart (another ex-IBM RAMAC expert) said with respect to RAMAC305, “The coding used on the disk drive was a 7-bit code - 6-bit + 1 binary.”[63] 71.128.35.13 (talk) 00:52, 18 September 2014 (UTC)
No, YOU look again. Now you're confusing the RAMAC305 and the 650. The reference YOU gave for the RAMAC305 does not say one word about "bi-quinary coded decimal", nor about any character coding scheme that could be confused with that, and the "seven-bit system of coding" includes a parity bit.
I mean, really. What part of "Bit R has no numeric or alphabetic value, but is added to certain characters so that every character will have an odd nunber of bits" leads you to believe that Bit R is the seventh bit in a BQCD code? (That's a direct quote from the IBM RAMAC305 manual that you linked. Top of page 8, first column.) How can it be a bit in a BQCD code if it "has no numeric or alphabetic value"? That is an excellent description of a parity bit, on the other hand.
Look at the character code chart at the bottom of page 8. It clearly shows that the bits are not weighted as in BQCD, but rather in binary: 1, 2, 4, 8. Digits 1 through 9 were represented in pure BCD: "3" by bits 1 and 2; "5" bit bits 1 and 4; "6" by bits 2 and 4; etc. There is a special case for zero, using the "0 bit", so it is not quite pure BCD but it is flatly NOT bi-quinary coded decimal. And just as clearly, the "R" bit that you claim is the seventh bit for BQCD clearly has no role to play in the numeric values! If it did, then "3", being represented by bits 1, 2, and R, would not have a value of 3 at all. The IBM 305 manual furthermore states, as I quoted, that the "R" bit (the seventh bit) is a parity bit (though not using that word), and the character code table proves it.
Here, let me make it easy for you. Here are the bit codes for the digits 0 through 9, from the RAMAC 305 manual that you linked:
  X 0 1 2 4 8 R
0   1
1     1
2       1
3     1 1     1
4         1
5     1   1   1
6       1 1   1
7     1 1 1
8           1
9     1     1 1
Honestly, does that look like BQCD to you? What flavor of BQCD has an "8" bit and no "5" bit or "3" bit, represents "3" with bits "1" and "2", or "9" with bits "1" and "8"? Answer: None. This is BCD with a special case for zero.
Or are you just going to pretend that the manual you linked doesn't count, or doesn't exist?
This is a "horse's mouth" reference; there is no other possible interpretation; and any reference that states, implies, or can be reasonably interpreted to mean that the 305 used bi-quinary coded decimal is therefore clearly wrong. (And I expect that Al Shugart meant "parity" where he said "binary". A statement of "six bits, plus one binary" would make little sense - what, the six bits aren't binary too? As for the slide show by Mikkelson...that's cute, but it hardly trumps the IBM reference material. I expect Mikkelson just misremembered (or believed Shugart). Jeh (talk) 01:40, 18 September 2014 (UTC)
And, I just have to respond to this:
"In effect, the machine stores the equivalent of 62,500 80-column IBM cards.... Punched cards had 7 bits per bi-quinary coded decimal character as indicated by this reference:"
Your assertion of BQCD on punched cards is nonsense. Have you ever even looked at one?! IBM punched cards had 80 characters, yes, but they did not use BQCD coding. Each character was represented by digit punches 0 through 9, which for digits simply represented themselves; I think you will agree that there is not a hint of BQCD (or BCD) there. Then there were two "zone punches" called "11" and "12", but these played no part in numeric representations, except that an "11 punch" over the units digit could denote a negative number. (By itself, it represented the minus sign character.) These were combined with the digit punches in various ways to produce letters and a small set of special characters. For more confusion, for some characters, the "0" punch became a zone punch too. Take a look at the character code chart in the IBM 1401 article. (Like the 305, the 1401 made a special case of the internal storage of "0". But instead of having a "zero bit", it represented "0" in core with the "2" and "8" bits set. This probably had something to do with its implementation of BCD arithmetic.)
Your claim that "Punched cards had 7 bits per bi-quinary coded decimal character as indicated by this reference:" is particularly egregious. First, the slide show you referenced doesn't say a word about BQCD, nor "7 bits". If punched cards were coded using BQCD then each column could only have held one BQCD character - i.e. a digit from 0 through 9 - and that was of course not the case. Coding of the 64-character set supported by IBM cards would have required two columns per character, and that was of course not the case either.
Conclusion: Your claim of BQCD on punched cards is ridiculous. A more realistic interpretation is that by the usual character coding, each column could represent one of about 64 different characters, i.e. 6 bits; that's 480 bits per card, or 60 bytes. BUT! I mightily doubt that IBM was worrying about any of that when they said "62,500 80-column cards", 8-bit bytes not being in common use or maybe not even thought of yet. Hm, one could also think of a punched card as holding 80x12 = 960 bits; that's 120 bytes! (And it is possible on some later IBM machines to read and punch cards and use every possible bit.) But.. Nah. The simple fact is that when you read a card into a computer memory by the usual character coding, it takes 80 character positions in the memory. 62,500 x 80 = 5,000,000 characters, each of which has 64 possible values, encoded in six bits. Hence the claim that the 350 held the equivalent of 62,500 IBM cards. It really is that simple. Your machinations to use this quote to support 7-bit characters (or 4.4 MB) on the 350 is a circular argument; you're assuming 7-bit characters from the beginning. But they're not. The RAMAC 305 manual proves it. Jeh (talk) 04:08, 18 September 2014 (UTC)
Nailing it to the wall: Re the 650... We know how digits 0 through 9 were represented, in BQCD. But what of other characters? I have been unable to locate a full character code chart for the 650, but numerous references, including our own WP article, state that characters other than digits were represented as pairs of BQCD digits! Thus it took 14 bits to represent a character out of a set of 100 possible codes, and the machine's fixed-length 10-digit "word" could hold just 5 characters. This is absolutely not storing one character in every seven bits, and it is totally different from the 305's internal character set.
One might ask: But why couldn't they store an arbitrary character in one seven-bit BQCD digit position? After all, seven bits give you 128 possibilities, not just the ten digits represented by BQCD! The answer is that those seven bits are always interpreted by the machine as representing a decimal digit in BQCD form (even if software is interpreting pairs of digits as characters). So there are only a small number of valid configurations. That's the "self-checking" aspect, and it's why they didn't need a parity bit. But if you took advantage of the 128 possible arrangements of seven bits to store (for example) the entire ASCII-7 character set, almost all of them would not be valid BQCD configurations and would cause the machine to raise an error flag ("invalid digit" or some such).
So, your claim that characters were any sort of "common foundation" among these early machines is pretty much belied by the 650. Not only is its character coding radically different from the 305's, but also, as a character-processing machine, it's an absurd design. Besides taking two digit positions to store one character, it was a fixed-length word (10 digits) machine; so individual digits (and therefore individual characters) were not addressable. This makes the handling of both arbitrary-length character strings and of individual characters very awkward. But it does lead to a better understanding of the info we have on the 355: "six million digits", or three million characters, using the 650's coding scheme of two characters per digit. That's probably why they never quoted the 355's capacity in characters. Stated that way it looks like barely more than half the capacity of the 350.
Back to the RAMAC 305 and its 350 "file": (By the way, although colloquialisms do abound, in terms of official product names there was no "RAMAC 350"; it was the "RAMAC 305" and the "350 file" that was attached to it - check the manual.) Your assumption that all of these early IBM machines used the same internal representation for characters, and that information on the 650's use of BQCD applies equally to the 305, is very clearly flat-out wrong. And all of your conclusions that depend on that assumption are therefore also wrong. They were just very different machines.
In sum: The 305 absolutely did not use BQCD. The manual never mentions BQCD and the character code chart (page 8 of the manual) does not illustrate anything that could be interpreted as BQCD. Bits are called 1, 2, 4, 8, 0, X, and R. That's seven bits, and the manual does say "seven-bit coding", but per the manual, the seventh bit (R) is a parity bit, "carrying no numeric or logical value." So the 350's "5 million characters" are five million six-bit characters, each with a parity bit, which we do not count when counting its usable capacity. The fact that there is no parity bit in BQCD is completely irrelevant, as this machine did not use BQCD (regardless of how many other IBM machines did). QED. Jeh (talk) 20:22, 18 September 2014 (UTC)
No, each 80-column punched card had 70 bytes, not 60 bytes, as stated in this reference: http://www.extremetech.com/computing/90156-the-history-of-computer-storage-slideshow/2]
So RAMAC 350 stored (70 bytes / punched card) * (62,500 punched cards) = 4.4 MB formatted capacity. 71.128.35.13 (talk) 19:34, 18 September 2014 (UTC)
IBM intertwined the 305 RAMAC and 650 RAMAC, or as you put it "confused" them, by their deliberate strategy. On September 14, 1956 Thomas J. Watson, Jr. of IBM announced, or as you put it "confused," the 305 RAMAC and 650 RAMAC (they use nearly the same hard disk drive and the same bit-encoding-method to handle characters from punched cards) in the following IBM press release:[64]
"... revolutionary new products to accelerate the trend toward office and plant automation were announced today by International Business Machines Corp.: 305 RAMAC and 650 RAMAC, two electronic data processing machines using IBM's random access memory, a stack of disks that stores millions of facts and figures..."[65]
"Headline: 650 RAMAC and 305 RAMAC \\ The 650 RAMAC and 305 RAMAC both utilize the magnetic disk memory device announced as experimental by IBM a year ago. ..."[66]
"The 650 RAMAC combines the IBM 650 Magnetic Drum Data Processing Machine with a series of disk memory units which are capable of storing a total of 24-million digits. The 305 RAMAC is an entirely new machine which contains its own input and output devices and processing unit as well as a built-in 5-million-digit disk memory."[67] 71.128.35.13 (talk) 19:38, 18 September 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────We must take your six-bit claim that "bits are not weighted as in BQCD, but rather in binary: 1, 2, 4, 8" on the RAMAC with a grain of salt (shake, shake, shake). It's just an oversalted "two bit" claim.

RAMAC 650 actually had seven bits per character. The IBM 650 "was a two-address, bi-quinary coded decimal computer".[68]

Alex Laird has dismissed the (six-bit) binary claim on the 650 RAMAC: "These days, all computer hardware is designed for two-bit binary communication. However, before massive amounts of standardization occurred in the technological realm, a few computers tinkered with the idea of making a computer run on hardware that wasn’t base two. These computers, the Colossus, the UNIVAC, the and IBM 650, to name a few, were coded using bi-quinary coded decimal. Of these, the IBM 650 is the only one that was mass-produced."[69]

Laird goes on to refute the six-bit binary claim on the RAMAC 650 once more, "The hardware communicated using bi-quinary coded decimal instead of in binary coded decimal as all modern computers (and even most historical computers) do."[70]

That salt you're shaking should help with the flavor of this dish of crow. 71.128.35.13 (talk) 19:42, 18 September 2014 (UTC)

IBM never officially referred to the 355 as a RAMAC nor is there any evidence that it was the disk drive the industry "started with" as the column is captioned in the article. RAMAC is ambiguous! As near as I can tell IBM used it in product and marketing materials only with the 305, 650 and the much later array. What is uniquely identified as the IBM 350 disk storage Model 1, has been frequently referred to as the RAMAC so it has that meaning also. I made a mistake in captioning this Section with "RAMAC" - it should have been "IBM 350." The IP knows this and continues to use RAMAC ambiguously to assert his point of view. May I suggest the IP is a disruptive user in deliberately and continuous using RAMAC in a confusing manner and we should now proceed to a request for comment? Tom94022 (talk) 20:37, 18 September 2014 (UTC)
IP: Regarding the punched card issue... ah, another cute slideshow, and this one from a tech blog. This number of theirs is just wrong. Simple arithmetic: 80 columns, each of which represents one of about 64 possible characters. It takes six bits to count from 0 through 63, so that's six bits per column. 80 x 6 bits = 480 bits = 60 bytes total of information if it was packed into eight-bit bytes. How they got to "70 bytes" is a mystery, but however they did, it's obviously wrong. To get there, each column would have to encode 7 bits - i.e. one of 128 possible character codes, rather than one of 64. And no commonly-implemented character code ever defined for punched cards ever did that. So it's six bits per column. 70 bytes per card? Rejected. Jeh (talk) 22:12, 18 September 2014 (UTC)
Aside: I worked a LOT with punched cards and I wanted to work with text processing sysytems; I would have loved to have even the 95 printable characters of ASCII-7 available on punched cards. (n.b.: I regard "space" as printable, but not "del".) They were not. Yes, IBM's EBCDIC codeset did define punch combinations for all possible 8-bit internal byte values, but there was no keypunch ever made that would punch them... so we still were limited to about 64 possible characters, just like on the 1401 but with a few different punch combinations. And I assure you, IBM punched cards absolutely did not use BQCD!
For the rest of it... Honestly, falling back on claims that "IBM deliberately confused" the 350 and the 355 just looks like thrashing on your part. We have well established statements of "six million digits" capacity for the 355, and "five million characters" for the 350. The 355 was used with the 650, which did use BQCD; it was clearly a "digit-oriented machine".

Coalescing the argument[edit]

But this is about the 350. The 350 held 3.75 million end-user usable bytes, formatted as 5 million 6-bit characters, each with a parity bit. The core of the proof - it's not an argument any longer - is as follows:
  1. The 350 goes with the 305.
  2. The 305 has a character set consisting of digits in the range 0 through 9, letters A through Z, and a handful of symbols like currency symbol, slash, etc. (Character code chart on page 8 of the RAMAC 350 manual).
  3. Bits in each character position are called 1, 2, 4, 8, 0, X, and R. (Character code chart on page 8 of the manual).
  4. That's seven bits, but per the text at the top of page 8 of the manual, the seventh bit (R) "carries no numeric or logical value." The manual's description of bit R shows clearly that it is a parity bit, computed to give each character stored an odd number of "1" bits. The character code chart confirms this.
  5. So there are six data bits per character.
  6. It is true that there is no parity bit in 7-bit BQCD, but this is irrelevant, as BQCD is simply not used in the 305.
  7. When IBM says the capacity of the 350 is "5 million characters", this means five million characters of the same format used in the 305.
  8. Claims of 7 bits per character in the 350 are widespread in nontechnical sources. However, if this count did not include a parity bit, then each character would have 128 possible values, not 64... and this does not match the character code set of the 305, nor any other manifestation of reality re the 305 or 350. Hence "7 bits per character" means "6 bits plus one for parity", just as it does in the 305 manual.
  9. This is further supported by pages 70 and 71 of the 305 RAMAC Random Access Method of Accounting and Control Manual of Operation,[14] which is explicitly describing "The method of coding these characters on the disk and drum tracks" (emph. added) and clearly describes the seventh bit on the IBM 350 disk as a parity bit. The character code chart provided on page 71 confirms it; there is no "data significance" to bit R; it is the seventh of seven bits shown, and it is a parity bit, matching the description in the preceding text. This is a "horse's mouth" document and there is just no wiggle room here.
  10. So, five million characters x 6 bits each = 30 million end-user usable bits.
  11. 30 million bits at 8 bits per byte is 3.75 million bytes.
This conclusion is inescapable from this sequence, and I see nothing in any of your comments to refute any of the steps. If you still disagree with this conclusion, please indicate with which numbered point(s) immediately above you disagree; and why; and provide reliable references for your position. Please note: Anything about the 650 or the 355 or BQCD is irrelevant; responses regarding any of those will be interpreted as non-responsive. Jeh (talk) 22:12, 18 September 2014 (UTC)

Counter argument[edit]

Re: 1 We are talking about the history of hard drives. As with modern hard drives, the 350 was a separate unit within the 305 system. The same mechanism was used in the 355 and 1405.
  • Agreed we are talking about the history of hard disk drives, not the systems to which they attach and that the 350 was sold separately as a component of the 305 RAMAC system. While the 355 and 1405 used many of the same parts as the 305 they were not the same mechanism. Tom94022 (talk) 18:08, 19 September 2014 (UTC)
All sources I am aware of say the same mechanism was used, in particular the CHM RAMAC Oral History.--agr (talk) 21:35, 19 September 2014 (UTC)
Oh, come now. From http://www-03.ibm.com/ibm/history/exhibits/storage/storage_1405.html :

The IBM 1405 Disk Storage of 1960 used improved technology to double the tracks per inch and bits per inch of track -- to achieve a fourfold increase in capacity -- compared to the IBM RAMAC disk file of 1956. Storage units were available in 25-disk and 50-disk models, for a storage capacity of 10 million and 20 million characters, respectively. Recording density was 220 bits per inch (40 tracks per inch) and the head-to-disk spacing was 650 microinches. The disks rotated at 1800 rpm. Data were read or written at a rate of 17.5K bytes a second.

Four times the data density. Heck, the rotation speed wasn't even the same! Oh, I'm sure they were "the same" at the "30,000 foot" level, but "same mechanism?" Rubbish. Jeh (talk) 01:09, 21 September 2014 (UTC)
2. Note that there are only 48 possible characters in the 305 character set.
  • While true it is irrelevant to the history of hard disk drives. The drive was specified to record blocks of 100 8 bit characters with additional requirements on the gaps before and after the data block, all laid out in the CE manual. Tom94022 (talk) 18:08, 19 September 2014 (UTC)
3. Agreed
  • While true it is irrelevant to the history of hard disk drives. Tom94022 (talk) 18:08, 19 September 2014 (UTC)
4. This is the crux of the issue. The 350 neither generated nor was aware of the parity bit. That was internal to the 305 CPU. Nor is there any indication that the parity bit was needed or used specifically to enhance the 350s reliability, indeed any parity failure halted the CPU. If someone today built a text processing machine that exclusively used 7-bit ASCII with parity and stored that text on a modern hard drive, we would not say the hard drive then became a 7-bit drive.
  • The crux of the issue is the 305 CE manual describes 8 bits not seven for the 350 as in fact it does for other components of the 305. It is partially true that the "350 neither generated nor was aware of the parity bit" but the whole truth is the 350, like most if not all of the later HDDs until the 1990s did neither generate not were aware of the meaning of any bit. There is no justification for including the parity bit and not the space bit. Tom94022 (talk) 18:08, 19 September 2014 (UTC)
The justification for including the parity bit and not the space bit is that many sources say the drive was 7 bit. Your interpretation of the CE manual is [[WP:OR[[, unless you have a source that agrees with your interpretation. Note that on page 189 of the CE manual it says the the space bits are eliminated by a special circuit before the data gets to the core memory.--agr (talk) 21:35, 19 September 2014 (UTC)
5. Strictly speaking a 305 character is not 6-bits of information. There are only 48 possible characters, so the information content per character is log2(48)≈5.585 bits per character.
  • Agree with the original statement, the additional "strictly speaking" it is irrelevant to the history of hard disk drives. Tom94022 (talk) 18:08, 19 September 2014 (UTC)
If the measure is information capacity, it is relevant. That is how information content is measured.--agr (talk) 21:35, 19 September 2014 (UTC)
7. Agreed, BQCD is a red herring
8. yes
Agreed Tom94022 (talk) 18:08, 19 September 2014 (UTC)
9. Not sure what you mean by "nontechnical sources." We don't normally get to pick an choose between reliable sources, instead we present both positions and their arguments. Also no one is disputing that the parity bit was stored on the 350. But again that parity bit is only generated and tested in the 305 CPU, not the disk drive. The 350 could store any 7 bit pattern, and there is a suggestion from a timing chart in the CE manual that 8 bits were possible.
There was no reason for IBM to describe the drive at the time as anything other than 5 million characters. That is what they were selling. Wikipedia is based on secondary sources and we prefer the opinions of modern experts who are knowledgeable about old and new technology, e.g. Al Shugart.
10 . The 6 data bits were not "end-user usable." There was no way to access the binary code from the stored program or the plugboard. So arguably five million characters x 5.585 bits each = 27.9 million end-user usable bits.
  • Any purchaser of an IBM 350 could access the 6 data bits. The fact that an end user of a 305 could not, while true, is irrelevant to the history of hard disk drives Tom94022 (talk) 18:08, 19 September 2014 (UTC)
Yes, but it is also true that any purchaser of an IBM 350 could access the 7 data bits. It is not clear if this was true for 8 bits. The space bit was suppressed in hardware.--agr (talk) 21:35, 19 September 2014 (UTC)
"any purchaser of an IBM 350 could access the 7 data bits." This appears to me to be OR on your part, or perhaps rather WP:SYNTH from the info in the 305 and 350 manuals. You need to provide a RS for this claim, one that is as authoritative as the 305/350 manuals we're already referencing. Jeh (talk) 00:56, 21 September 2014 (UTC)
11. If so the 27.9 million bits at 8 bits per byte is 3.49 million bytes. My point is that there is no one "right" way to look at this. Our article should represent the different viewpoints of the sources. --agr (talk) 15:00, 19 September 2014 (UTC)
  • While 3.49 MB is one valid way to look at the capacity it is irrelevant to the history of hard disk drives. There are several ways, 4.4 MB is an obviously incorrect one.Tom94022 (talk) 18:08, 19 September 2014 (UTC)
In summary, the 305 is as irrelevant as the 650. The 350 was sold as a separate product and is not limited by the 305 character set. Absent any evidence to the contrary, we have to accept the limitations of the IBM format as clearly disclosed in the 1959 IBM CE manual, namely that both the parity bit and the space bit are required. There is no basis for an assumption that the parity bit was not needed; it is original research and likely incorrect. The analogy is most every drive thereafter into the 1990s, e.g., the ST506, which had a specified unformatted capacity (i.e., 6.38 MB for the ST506) and a capacity specified by the manufacturer in a given format (i.e., 5.0 MB for the ST506). The only issue here is how to convert 5 million 6 data bits into bytes as we currently use the term, and there are only two answers, 3.75 MB or 5.0 MB, the latter having no reliable source. Tom94022 (talk) 18:08, 19 September 2014 (UTC)
The operations manual tells us that the parity bit is generated and tested in the 305 CPU for all data transfers, not just to and from the Disk Drive. That is no more OR than your 8-bit claim. Remember the argument is about your attempt to throw out certain sources as objectively wrong. That is a very hard case to make on Wikipedia and requires more than your (or my) reading of WP:primary sources: " All interpretive claims, analyses, or synthetic claims about primary sources must be referenced to a secondary source, rather than to an original analysis of the primary-source material by Wikipedia editors." --agr (talk) 21:35, 19 September 2014 (UTC)
It is not OR to quote the manual, a primary source, which clearly states the 350 recorded character is 8 bits. This is not interpretation, a claim, analyses, or syntheses - just a statement of fact from a primary source and as such should be sufficient to establish that secondary sources using another number without explanation are objectively wrong.
It is OR on your part to assert that because parity bit is generated and tested in the 305, not the 350, for all data transfers this somehow matters more than the S bit. The S bit is generated and suppressed in the 305 and would u like to provide the OR to say what happens if it is not present say due to a read error. My OR says parity error Also, it is OR on your part to speculate that the parity bit could be used for other purposes. Tom94022 (talk) 00:04, 20 September 2014 (UTC)

Reply to counter argument[edit]

1. How do you know it was the exact same mechanism? And how do you know it used the exact same electronics? The 355's "six million digits" does not make it sound like it was formatted the same way. If stored in BQCD straight from the 650, that would take 42 million bits; compare with the 350's 36 million bits (I count parity bits there because parity is inherent in BQCD code). It would have taken a trivial circuit to convert BQCD to and from simple four-bit BCD with a parity bit and store them that way... but that's only 30 million bits including parity, a significant step down from the 350's 35 million bits including parity. No.. the 355 must have had a different formatter.

The CHM oral history says the basic mechanism was the same. One big difference was the addition of two more access arms, but that is not relevant to our discussion. Undoubtedly there were differences in the electronics.--agr (talk) 22:09, 19 September 2014 (UTC)
The "differences in the electronics" are key to the whole thing. If they were the same device they would have had the same number.

2. That's just an artifact of the circuits in the card reader and card punch. There are six bits per character position in the 305, even if the peripheral equipment IBM chose to build doesn't let you get at all 64 possible characters.

And six bits is an artifact of the 305 CPU.--agr (talk) 22:09, 19 September 2014 (UTC)
And the IBM docs on the 305/350 state unequivocally that the seventh bit is a parity bit. Jeh (talk) 22:03, 20 September 2014 (UTC)

4. "The 350 neither generated nor was aware of the parity bit." How do you know this? Anyway, this is a red herring. Even if it is true, since the only way to store data on a 350 was via a 305, and the 305 would error check if it detected a character with even parity, there would be no way to store such characters on the 350. Hence the 350 can be used to store five million six-bit characters. I think this is a distinction without a difference.

The 305 operations manual is clear on this. See page 70-71, and figure 47.--agr (talk) 22:09, 19 September 2014 (UTC)
Um, maybe. You know perfectly well that diagrams of this sort (even the "schematics" in the CE doc) are "cartoons"; they leave out a lot of detail. We can't conclude that there was nothing in the 350 that checked or generated the parity bit. This leaves us with the 305 and 350 docs, which state explicitly that the seventh bit is a parity bit, "carrying no numeric or logical value." It is OR on your part to infer from the IBM docs that it would have been possible in some way to use the 350 without the 305, and thereby (or by any other means for that matter) to store seven data bits per character, no parity. Jeh (talk) 22:03, 20 September 2014 (UTC)
IBM manuals of the era were written carefully. The 305 operations manual section is describing error handling. The notion that IBM had another error checking means and did not bother to mention it in the manual is far fetched. But I suggest you are the one engaging in OR by reading the manual in just one way to exclude certain sources a (like the program manager for the 350) as "unreliable."

4a. "If someone today built a text processing machine that exclusively used 7-bit ASCII with parity and stored that text on a modern hard drive, we would not say the hard drive then became a 7-bit drive." And if someone built a computer with integral hard drive that stored six bits per character, but the peripheral units only allowed generation of 48 different character values out of the 64 theoretically possible, we would not say the hard drive became a 5.585-bit drive. You can't have it both ways.

I am not trying to have it both ways. I am saying there is more than one right answer, and we should respect the differences in the sources.--agr (talk) 22:09, 19 September 2014 (UTC)
Not when the sources are not equally reliable. Your assumption that someone could have bought 350s from IBM, without the accompanying 305, and used them successfully to store seven data bits per character position, is not backed up by an WP:RS. So far, it appears to be pure conjecture on your part. Do you have RSs indicating that this was ever done? Do you have docs from IBM, of the same calibre as the 305/350 docs, indicating that this is possible? Jeh (talk) 22:03, 20 September 2014 (UTC)
We are talking about comparing the 350 with a modern drive, that should be done one the same basis, with both drive in isolation.--agr (talk) 21:44, 21 September 2014 (UTC)

5. See reply 2 (and 4a). There are six data bits.

9. Actually, yes, we do pick and choose, in that we evaluate the reliability of sources. When we see a tech blog say "70 bytes per card" with absolutely no justification nor reference, and simple arithmetic (the same sort you're doing in your step 5) proves otherwise, we can decide that that source is not reliable on that point. Similarly, a slide whipped up for a fun talk at a SHARE meeting is not of the same reliability as the manufacturer's user and CE manuals. Editors on WP evaluate the reliability of sources all the time. Otherwise we would just quote everything in every source and be done with it.

Sources like ASME or Al Shugart are not in that category.--agr (talk) 22:09, 19 September 2014 (UTC)
They're not that bad, no, but they're not at the same trust level as the IBM docs. The quote by Al Shugart is from a transcript of an informal round-table discussion that happened decades after the 350 project ended. Furthermore it makes no sense: "a 7-bit code - 6-bit + 1 binary"??? What, the other six bits were not binary? He might have meant to say "parity" instead of "binary"; he might have meant "a 7-bit code - (6-bit + 1) binary", implying that the "+1" bit was special and the core code was 6 bits. Or it might even be a transcription error. Anyway, "a 7-bit code" is still consistent with the 305/350 doc that indisputably describes 7 bits per character... one of them being a parity bit. Jeh (talk) 22:03, 20 September 2014 (UTC)
Al Shugart was program manager for the project and went on to found one of the biggest companies in the modern disk drive industry. And I don't know what transcript you are looking at but the video of the interview is readily available and after saying "a seven bit code, six bits plus one binary bit," he immediately corrects himself, adding "one parity bit rather."--agr (talk) 21:44, 21 September 2014 (UTC)
The same is true of the ASME document. One, it is not exactly at the "peer-reviewed research paper" level. Two, it's from the society of Mechanical engineers. I think good MEs walk on water, but we can't expect them to be experts about digital data storage technology. So this document is not of the same trust level as IBM's 305/350 documents. Three, again, "7 bits per character" is not inconsistent with the 305/350 docs, which clearly describe 7 bits per character... and state that one of those bits is a parity bit - and there is just no other way to interpret them. Jeh (talk) 22:03, 20 September 2014 (UTC)
We have no requirement for peer-reviewed sources on Wikipedia. And saying gee, they are mechanical engineers, so why do they know about electronics? is really hair splitting.--agr (talk) 21:44, 21 September 2014 (UTC)

9a. "The 350 could store any 7 bit pattern..." How do you know there was no error checking inside the 350? It seems very unlikely to me that IBM would build the 305 to just halt on a parity error but give no indication as to the source of the error. But even if true, this again is a distinction without a difference. The 350, connected to the 305 as it had to be, could not store "any 7 bit pattern", not unless the parity check in the 305 failed.

Again the manual is clear on this. IBM computers of this era halted on a parity check, not just the 305. More sophisticated responses came later. And, no the 350 drive did not have to be connoted to a 305, it just was. In analyzing at a hard drive is is reasonable to consider it in isolation, just as we would a modern hard drive incorporated into an integrated system. And if you do look at the 350/305 as a system the data is characters from a 48 bit set.--agr (talk) 22:09, 19 September 2014 (UTC)
Was the 350 ever used by anyone, connected to anything other than a 305? RSs please. Jeh (talk) 22:03, 20 September 2014 (UTC)
No one is saying it was,just that the parity bit was not part of the disk drive, as the same manuals makes clear.

10. There are six bits, damn you!

Some sources say 7. And please see WP:CIVIL.--agr (talk) 22:09, 19 September 2014 (UTC)
Let it be known that this, namely (Jeh at 18:49, 19 September 2014 (UTC)) is a disruptive user, clearly beyond the WP:Civil pale. 71.128.35.13 (talk) 20:15, 20 September 2014 (UTC)
Please. That's a ST:TNG reference, meant in a spirit of levity. Don't grasp at broken straws on social AND technical grounds at the same time, 'K? It just makes you look desperate (and would get you laughed out of ANI if you brought it up... so, please do!). Jeh (talk) 22:03, 20 September 2014 (UTC)
My humor detector was apparently improperly calibrated. I withdraw my complaint.--agr (talk) 21:44, 21 September 2014 (UTC)

11. See 9. There are no reliable sources that support, for example, 4.4 million bytes (despite one IP's voluminous and fallacious posts to the contrary), not in the face of overwhelming "horses's mouth" evidence that the claimed seventh bit was a parity bit. Similarly, a claim of "5.585 bits" is not supported by any reliable source, only by your calculation. And while we can use such calculations to evaluate sources, we can't use them to generate material for the article; that's WP:SYNTH. Jeh (talk) 18:49, 19 September 2014 (UTC)

Information content is a standard calculation, done in other articles, but I am not arguing for using it, just saying stick to reporting what the sources say, and stop claiming some are "objectively wrong." And we don't depreciate comments just because the editor is an IP.--agr (talk) 22:09, 19 September 2014 (UTC)
So on the one hand you argue that, because there is nothing in published docs that says you couldn't use a 350 separate from the 305 and store seven data bits per character position, we should regard it has holding 5 million characters of 7 bits each. And on the other hand you argue that it should be regarded as holding only 5.585 bits per character because of the limits of the card reader and punch that IBM happened to sell with the 305. I don't understand how you can argue both of these positions at the same time. Jeh (talk) 22:03, 20 September 2014 (UTC)
Re the IP, I'm not deprecating them because they're an IP. I'm deprecating them because they're wrong and because the IP continued to repeat previously-refuted arguments (arguing in circles) and is in general behaving like a tendentious editor.
And personally, Arnold: if I think something is objectively wrong, I'm not going to feel limited by any orders from you not to say so. Not every source is equally reliable and editors are free to, indeed are expected to, describe the reasons for their evaluations of sources. Jeh (talk) 22:03, 20 September 2014 (UTC)
I've spent a fair bit of time in the 305 Manual[13] which I suggest is the only reliable source for this discussion. The manual clearly shows a bit serial data interface; bit serial disk drives are traditionally specified at their interface with the format given by the drive vendor, not by what the system does with it, so much of the chatter about the 305 is irrelevant beyond the format. The format is disclosed a block comprising a leading gap with some bits, 100 8 bit characters and a trailing gap. Arnold is partially correct that "The 350 neither generated nor was aware of the parity bit." - this is pretty clear from the circuit diagrams, but it is more complete to note that the 350 did not know the meaning of any bit. Since the recorded character is 8 bits the manual makes any 7-bit or 4.4 MB assertions suspect. No one yet has stated the basis for omitting the space bit but adding the parity bit. Further more, the IBM specified format constrains many bits in the block, including the parity and space bit. The 305 manual specifies at its bit serial interface 30 million unconstrained bits and lots of other constrained bits, but since modern HDD capacity is measured in bytes of 8 unconstrained bits the only valid capacity for comparison is 3.75 MB. My proof is somewhat different than JEHs but we arrive at the same conclusion and I hope Arnold now will agree. Tom94022 (talk) 20:42, 19 September 2014 (UTC)
I've pointed out above that the space bit must be suppressed before data gets to the core buffer, according to the CE manual, p.189. On the other hand the parity bit is presented to the core buffer and is only checked there. So there are different valid ways of looking at the capacity of the 305 for comparison purposes. I argue that when we make our smug comparison about how much better modern drives are we should take the most conservative view of that ratio and compare characters with modern bytes, because that is how users employed the drives then and now.--agr (talk) 22:09, 19 September 2014 (UTC)
Arnold you also pointed out that "In analyzing at a hard drive is is reasonable to consider it in isolation, just as we would a modern hard drive incorporated into an integrated system." Exactly, the core buffer is a part of the 305 system not part of an isolated drive. There is no "check" on any bit in the 350 but if the S bit is not generated by the drive, all subsequent decodes of the characters of one block in the system will be wrong and some will cause a parity error upstream from the drive, but nothing will happen in an isolated drive. So what makes the parity bit so special that we count it but not the space bit ? In isolation the data interface of the 350 is just a stream of bits, with the S and P both required and providing no information. In isolation the drive can't distinguish any bit, but a drive failure to generate properly either the S bit or the P bit will cause the system to report an error. There really is no justification for using just 7 bits of the 8 bits per character in the bit stream other than it is an urban legend. There is nothing smug about this comparison, but it sure sounds like you have a point of view to minimize the improvement. Tom94022 (talk) 23:34, 19 September 2014 (UTC)
Please take a careful look at figure 86 on page 86/253 of the 305 CE Manual[13] particularly the lines labeled "Disk Write Data" (the serial input stream), "Disk Flux" (what is written to the disk) and "Disk Data" (the serial output stream). This is the IBM format for the 350. Note there are 200 usec of AGC bits (about 166 bits) at the beginning of a block. You have to write them, u can't use them for anything else or the drive might not replay correctly. Note the about 400 usec region with no bits, it's there for a reason so writing bits into it may have unpredictable results. At the end of this blank region is the S bit of the first character, followed but the remaining 7 bits of the first character and then followed by the remaining 99 characters of the block. Elsewhere the spec says there is a 180 usec blank gap after the last character. IBM constrains the P and S bits so that a character has only the 64 available states of the 6 data bits. This is the only published IBM format that can be used to calculate the capacity in current terms. Absent a source that explains how they are going to get 128 states or 256 states out of these 8 bits as specified by IBM we have to conclude that such a calculation is wrong.
The industry has come somewhat of a full circle, from bit serial data interfaces to bit serial interfaces with data in one of several packets across that interface. There are lots of serial bits in today's packet any of which could be redesignated by a adapter designer but that would clearly violate the spec and be rejected by today's smart drives. Yesterday's dumb drive would accept them and replay whatever it could but there would be no guarantee that the playback would be reliable, even if one read after writing. Some SAS drives do have additional bits that could be used for additional capacity, the extra bits in the 520 or 538 byte sectors, without posting an error but they are not counted by the drive vendor in specifying capacity, a Seagate ST2000NX0273 is 2 TB regardless of the sector size. Capacity is based upon the manufactures specs and IBMs specs yield a capacity in current terms of 3.75 Bytes.
Arnold I hope this will convince you, but if not I suppose I could live with 3.75 as the main value with footnotes saying "Other reported capacities of 4.4 MB (based upon 7-bits per character) and 5.0 MB (based upon 8 bits per character) would result in $xxxx/MB and $yyyy/MB" Tom94022 (talk) 16:48, 20 September 2014 (UTC)
It won't convince any reader of IBM technical documents circa 1960.
IBM RAMAC 305 disks use the same coding system as the drum. This "coding system ... is used throughout the machine for magnetic recording on the disks and drum." according to the RAMAC 305 Manual of Operation dated April 1957. [71](page70)
IBM technical manuals do not support 6 bits per character. The horse's mouth spoke of seven bits per character as of 1960: IBM white paper, "Digital Computers - Logical Principles and Operation" by George J. Saxenmeyer, General Products Division Development Laboratory Endicott, N. Y., September 23, 1960 [72] describes the IBM RAMAC 305: "The core buffer has a capacity of 100 characters. ... This buffer consists of an array of seven planes of 100 (10 x 10) cores each, a separate plane for each of the character bits."(page 11) The core buffer capacity is 700 binary bits or 100 characters at seven bits per character.
On the IBM 305 RAMAC "one character position contains eight bit-spaces for the seven bits of the character".(page 11) [73]
"Currently-produced systems use a 51-character seven-bit parity-checked code." ... "the current seven-bit code has a capacity of 64 characters."[74](page 33) Bi-quinary coded decimal has seven bits per character. 71.128.35.13 (talk) 20:15, 20 September 2014 (UTC)
As has been pointed out to you already, the IBM documents specific to the 305/350 do describe a seven-bit code... and go on to describe the seventh bit as a parity bit. Using phrases like "carrying no numeric or logical value" to describe it. (I really don't understand how you can continue to ignore those words.) It's a parity bit. Yes, there are some shorter descriptions that simply say "seven bits" but every more technical document, more specific to the 305/350, notes that the seventh bit is a parity bit.
"Currently-produced systems use a 51-character seven-bit parity-checked code." Yes, seven bits including the parity bit - hence six data bits. As thoroughly described in the 305/350 documentation. The entire capability of this code is not quite fully utilized by the 305/350, as indicated by:
"the current seven-bit code has a capacity of 64 characters." Right, because without the parity bit (which is one of the seven has amply documented in the 305 manuals), you have six data bits, giving 64 possible characters. If the seventh bit was not a parity bit it would have 128 possible characters. These points do not contradict the claim of six data bits, they are consistent with it.
"Bi-quinary coded decimal has seven bits per character." Wrong. BQCD has seven bits per decimal digit. Every IBM computer and device (650, 355, NOT the 305 or 350) that used BQCD had to use two such decimal digits to represent each possible "character", giving 100 possible characters. BQCD and any machine that used BQCD is irrelevant to this discussion. The document you are referencing is a general description, giving the codes and techniques used in a variety of different machines. You can't use anything it says about BQCD in a discussion of the 305, or the 350, or the system that included both. That BQCD uses seven bits per decimal digit (and has an inherent built-in error check, so no parity bit was deemed necessary), while the coding used in the 305 used six bits plus a parity bit, also totaling seven, is mere coincidence.
You have had these facts explained to you several times now, and your bringing up BQCD yet AGAIN, as if it had anything at all to do with the 305/350, is unproductive at best. Jeh (talk) 22:03, 20 September 2014 (UTC)
In sum, the careful, thorough, technically competent, and honest reader of IBM docs of this era—a reader who is not cherry-picking sentences in an attempt to advance an already thoroughly-discredited position—will read the material specific to the 305/350, will find the references to the "seven bit code"... but will also find (and not ignore) the description of the seventh bit as being used only for error checking, "carrying no numeric or logical value". A bit that "carries no numeric or logical value" cannot be used in a capacity calculation, as long as we are not using the ECC bits in modern hard drives in their capacity calculations.
Such a reader will also correctly ignore all information about BQCD, as BQCD was never used on the 305/350.
While pondering the above, you might want to review WP:IDIDNTHEARTHAT. Jeh (talk) 22:15, 21 September 2014 (UTC)

Wrapping up[edit]

My conclusion so far: IBM docs presented so far do not explicitly rule out the possibility of using the 350 to store seven data bits per character, with no parity. However, until RSs of the same trust level are presented that explicitly indicate this is possible, such speculation on the part of WP editors is just that. We must go with the most trusted, most detailed, and most specific RSs, the 305/350 documents already extensively cited. They state clearly that the seventh bit was a parity bit and not generally usable, "carrying no numeric or logical value". Until equally reliable sources are presented stating that it was possible to use the seventh bit for arbitrary data, the conclusion of "5 million six-bit characters" must stand. Since the IBM docs state "seven bits per character" but then go on to say that one is a parity bit, this is not contradicted by various claims of "seven bits per character". There were seven bits per character, just, one of them was only ever used as a parity bit. To put the calculation on equal footing with specs for modern hard drives, we don't count the parity bit, any more than we would count the ECC bits in modern hard drives. Jeh (talk) 22:12, 20 September 2014 (UTC)

I agree with Jeh. The RS weight is heavily against 4.4MB; we cannot even mention it without violating WP:UNDUE. Consensus is also against doing so, with only the IP dragging the rest of us in circles. Please, let's end this here. --A D Monroe III (talk) 16:45, 21 September 2014 (UTC)
I don't agree that it is undue to give weight to the modern characterization of an RS like the ASME, vs. Wikipedian analysis of IBM docs from the era, but I also think the issue is not that important and more than enough time has been spent on it, so I am happy to move on. Cheers.--agr (talk) 22:16, 24 September 2014 (UTC)
Please. ASME's focus, being a society of mechanical engineers, was on the mechanics, not the data stream. And put next to the IBM technical documents, the ASME's award document is little better than a puff piece. I have no doubt that they spent more time on graphic design than they did on technical research. And I don't agree that reading "[one of the seven bits] carries no numeric or logical value" and concluding "the seventh bit should not be counted as part of the drive's capacity" requires any significant amount of "Wikipedian analysis". Jeh (talk) 22:45, 24 September 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── So, this leaves only the IP as a dissenter, and all of the IP's arguments have been refuted. We're done here. The capacity was 5 million 6-bit characters, equivalent to 3.75 million 8-bit bytes. A footnote to the effect that "references to 4.4 MB are counting the seventh bit as a data bit, when in fact it was a parity bit", this claim ref'd to the 350 CE manual, would be not out of place. Jeh (talk) 22:49, 24 September 2014 (UTC)

Six bits per character, as long as the plethora of seven-bit sources are given their proper due:
Seagate "RAMAC 4.4 MB" http://www.thic.org/pdf/Nov02/seagate.dlitvinov.perpendicular.021105.pdf
American Society of Mechanical Engineers ASME "The 350’s fifty 24-inch disks contained a total capacity of 5 million binary decimal encoded characters (7 bits per character) of storage.[75] (This is 4.4 MB.)
The Official History of IBM, 350 Disk Storage Unit "The whole thing could store 5 million binary decimal encoded characters at 7 bits per character" http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/ramac/
Museum (Ross Perot collection) RAMAC "storage capacity 4.4 megabytes."[76]
"in 1957, the original RAMAC ... boasted a storage capacity of 4.4 megabytes."[15]
"the 305 random-access-method of accounting and control (RAMAC) system and its disk file ... system was launched in September 1956 and included near line random disk file storage device. The disk file helped in access of 4.4 megabytes of data"[77]
"The total storage capacity of the RAMAC 305 was 5 million 7-bit characters, or about 4.4 MB."[78]
IBM introduced the 305 RAMAC system "It started with a product announcement in May of 1955. IBM Corp. was introducing a product that offered unprecedented random-access storage — 5 million characters (not bytes, they were 7-bit, not 8-bit characters).[79]
"Milestones in the hard disk drive industry" "350 RAMAC formatted capacity 4.4 MB" page 29[80]
"The 350 stored 5 million 7-bit characters (about 4.4 megabytes)."[81]
71.128.35.13 (talk) 23:29, 24 September 2014 (UTC)
Each of the above is obviously less authoritative than the IBM 305/350 documents, which indicate unequivocally that the seventh bit was a parity bit, with "no numeric or logical value". Bits with "no numeric or logical value" do not get counted in modern hard drives, so for a proper comparison, they should not be counted for the 350. I will say the same thing I said to Arnold: Unless you can come with a source that's at least as authoritative as the IBM 305/350 documents, and which states that the 7th bit was ever usable as a data bit, the correct answer is 5 million 6-bit characters = 3.75 million bytes. Until you can come with such a source, the "proper due" to any or all of these 7-bit claims is to note that, according to the most authoritative sources (the 305/350 documents), they're counting the parity bit in the calculation. If you can't find such a source, your repeating the same arguments and referring to the same not-so-authoritative "sources" yet another time will not advance your position; it will just be tiresome. Jeh (talk) 04:31, 25 September 2014 (UTC)
I agree with JEH. FWIW, Engineering Design of a Magnetic-Disk Random-Access Memory, Feb 1956, by IBM SJ Engineers also states the IBM 350 recorded an 8 bit character. Seven bits as a discrete quantity does not exist in the reliable sources from the time of the IBM 350, 6 data bits and 8 recorded bits do, so a mere assertion of 7 bits or 4.4 MB in a modern source absent a discussion of which 7 usable bits and why not 8 usable bits is not reliable. Tom94022 (talk) 18:07, 25 September 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  1. ^ In email correspondence with Maleval the 2011 article author he cited the same 2006 source for $50,000
  2. ^ Sometimes as in CDs the check bits are interspersed with the encoded data bits, mostly they are not.
  3. ^ Probably because electronics were very expensive then compared to media
  4. ^ I think the addition of the count field (sector header) was the first step followed by internal write circuit checks and head checks (causing far more false positives that actual failure detection).
  5. ^ As near as I can tell this began in the early 70s and ended as all drives became intelligent (e.g. SCSI and PATA) in the 1990s
  1. ^ IBM's first HDD versus its last HDDs
  2. ^ a b c Fost, Dan (2006-09-11). "Hard-driving valley began 50 years ago / And most other forms of data storage eventually became a distant memory". San Francisco Chronicle (Mountain View,CA). Retrieved 2014-08-26. "In 1956, the RAMAC cost $50,000, or $10,000 per MB." 
  3. ^ a b c d e Maleval, Jean-Jacques (2011-06-20). "History: First HDD at 55 From IBM at 100 Ramac 350: 4.4MB, $11,000 per megabyte". storagenewsletter.com. Retrieved 2014-08-27. "Ramac 350: 4.4MB, $11,000 per megabyte ... The first delivery to a customer site occurred in June 1956, to the Zellerbach Paper Company, in San Francisco, CA." 
  4. ^ Pugh, Emerson (1995). "Building IBM: Shaping an Industry and Its Technology". p. 226. 
  5. ^ IBM Archives - IBM 650
  6. ^ a b Text of an IBM press release distributed on September 14, 1956
  7. ^ IBM 350 disk storage unit
  8. ^ "Computing In The Universty," American Mathematical Society and the Ohio State Research Center, Datamation, May 1962
  9. ^ Pugh, IBM's Early Computers, p658 fn55, "Because of 650's encoding conventions, 355 capacity was 6 million decimal digits."
  10. ^ a b Ballistic Research Laboratories "A THIRD SURVEY OF DOMESTIC ELECTRONIC DIGITAL COMPUTING SYSTEMS," March 1961, section on IBM 305 RAMAC (p. 314-331) states a $34,500 purchase price which calculates to $9,200/MB.
  11. ^ Farming hard drives: 2 years and $1M later
  12. ^ "in June 1956 ... to Zellerbach ... "
  13. ^ a b c RAMAC 305 Customer Engineering Manual, p.7 describes the 350 character coding as having 6 data bits plus two other bits that do not affect the character coding. Therefore it is a 6-bit character.
  14. ^ a b 305 RAMAC Random Access Method of Accounting and Control Manual of Operation (PDF), April 1957, p. 70, Form 22-6264-1. 
  15. ^ J. M. D. Coey (25 March 2010). Magnetism and Magnetic Materials. Cambridge University Press. ISBN 978-0-521-81614-4. 

355 Capacity in modern terms[edit]

As an added complication, the similar 355 stored data as decimal digits; a character was stored as a pair of digits, with code points from 0 to 99. The capacity was 6 million digits or 3 million characters; how would you express that in 8-bit bytes? Shmuel (Seymour J.) Metz Username:Chatul (talk) 21:50, 9 September 2014 (UTC)

Do we know how each digit was stored? — Dsimic (talk | contribs) 22:30, 9 September 2014 (UTC)
We do not. There is no known document comparable to the 305 CE Manual. The best I know of is the 650 Manual of Operation. What is disclosed is that it is has a full track data organization, reading and writing 60 contiguous words of 10 digits each with a two word gap between the end of the data and its beginning, see figure 5 on page 18. There is no track to track orientation, a read starts with the sensing of the gap while a track write operation starts at any location with writing of the gap. There is no mention of parity; it does say that as each digit is read it is checked for validity. Now for some speculation and/or original research:
  • If as is likely it uses a bi-quinary coded decimal code which in modern terms is a self checking 7 bit channel code then no parity would be required.
  • It would be possible to decimal pack 6 million decimal characters into 3 million bytes so one way to look at the capacity is it would take 3.002368 MB of modern storage using 4k sectors to store the contents of a 355.
Bottom line - I know of no reliable source that allows us to determine the 355 capacity in modern terms. Tom94022 (talk) 01:11, 10 September 2014 (UTC)
Thank you very much for the explanation! This is really interesting, and such a data layout resembles more of a basic file system, rather than "bare" data storage as we have it in modern HDDs. Asking "how much of a modern HDD do we need to replace a 355" is pretty much like asking "how much of btrfs data structures do we need to replace a CP/M-formatted floppy disk", what wouldn't be easy to answer at all. — Dsimic (talk | contribs) 04:55, 10 September 2014 (UTC)

"A HDD" vs. "an HDD", and some barebones[edit]

Hey Luigi.a.cruz! Just wanted to discuss latest few edits on this article, which are well summarized in this edit. Regarding whether "a" or "an" should be used with "HDD", the decision is based on how something is pronounced, not on how that's written. As "HDD" is pronounced as "aitch-dee-dee", thus it starts with a vowel, "an" is to be used; a good example is "an hour", as "hour" is also pronounced with opening vowel.

Regarding the "IDE-based barebone", what's the "barebone" supposed to mean in the HDD context? When a "barebone" is mentioned to someone who deals with PCs, it usually refers to computers sold with just a motherboard, PSU and case – but how does that make a special case for HDDs? At the same time, for the "Past and present HDD form factors" summary table we want "cherry-picked" best samples, so there's pretty much no need to explain the size limits for various device classes.

Please advise. — Dsimic (talk | contribs) 20:55, 5 September 2014 (UTC)

Agree with Dsimic. "IDE" (or rather "PATA" as it should be called) is not a separate form factor (for the purposes of this table, anyway) and does not need to be mentioned here. (If it is, then I will insist on additional efn's for earlier PATA versions that only supported 128 GiB, for SCSI and SCA, etc.) Jeh (talk) 17:26, 6 September 2014 (UTC)
Agree, we do not distinguish other obsolescent or obsolete interfaces such as SCSI, ST412, SMD, etc. I recommend we remove all the added PATA references. Tom94022 (talk) 23:33, 8 September 2014 (UTC)
BTW, "HDD" is pronounced "haitch-dee-dee" in some versions of British English, and therefore might be proceeded by "an" instead of "a". But this article is in American English, so "a HDD" is correct here. --A D Monroe III (talk) 17:21, 16 September 2014 (UTC)
Hm, why "a HDD", when the American English pronounciation starts with a vowel? Guess that was a typo? — Dsimic (talk | contribs) 02:28, 17 September 2014 (UTC)
"An HDD" is the correct American form. "HDD" starts with a vowel sound, "aitch". Therefore we use "an". I don't understand why ADM III would say otherwise. Jeh (talk) 04:34, 17 September 2014 (UTC)
Oops! Typo. I meant "a HDD" for Brit English, and "an HDD" for Amer English, which is for this article. Sorry about that. (Can we get off dumb ol' text and switch Wikipedia to full audio anytime soon?) --A D Monroe III (talk) 22:05, 17 September 2014 (UTC)
Thanks for the clarification. Speaking about text vs. audio, hm, they both have pros and cons. :) — Dsimic (talk | contribs) 03:45, 18 September 2014 (UTC)

An IBM 305 is not an IBM 650[edit]

The discussion of the capacity of an IBM 350 connected to an IBM 305 RAMAC repeatedly conflates the 305 with the 650. They are different machines, with different data representations.

The IBM 650 had signed 10 digit words, with each digit represented in bi-quinary, and represented character data as pairs of decimal digits. The 305 represented characters as 6 bits plus parity. In both cases the character coding used was influenced by the coding on the Hollerith card. Similarly, the formatted capacity of the 350 disk is not in the same units as the formatted capacity of the 355; the first is in characters (six bits plus parity) while the second is in signed 10-digit words. The relevant manuals are all available at http://bitsavers.org/pdf/ibm/305_ramac/ and http://bitsavers.org/pdf/ibm/650/. Shmuel (Seymour J.) Metz Username:Chatul (talk) 23:44, 18 September 2014 (UTC)

Bad source?[edit]

The reference to the size and capacity of the IBM 350 (here) actually references Wikipedia itself. Surely this can't be used as a reliable source. --Lewis Hulbert (talk) 14:46, 19 September 2014 (UTC)

Good point, the reference will have to change. Thanks Tom94022 (talk) 16:52, 19 September 2014 (UTC)
Although the wording is almost exactly to what's written in the article. Perhaps an unsourced copy/paste? --Lewis Hulbert (talk) 18:56, 19 September 2014 (UTC)
Another good point. How about Oracle magazine as a reliable source not obviously relying upon Wikipedia? Tom94022 (talk) 21:06, 19 September 2014 (UTC)

Discuss: was the rate of areal density growth decreased circa 2006 or 2010?[edit]

Data at [82] 71.128.35.13 (talk) 00:41, 26 September 2014 (UTC)

The linked graph does not explicitly disclose a date for a "kink" in the AD curve nor does the author, Coughlin, state one in the article publishing the graph. It would be difficult to trend data points from the graph since Coughlin apparently used a smooth curve function in connecting the data points. Even if one tried, the "kink" is dependent upon the number of data points selected for a recent trend line, which would be impermissible OR. The displayed White curve has a clear kink between two trend lines in the mid 2000s and the Anderson 2013 cite, depending upon how you read it, suggests a "kink" occurred between 2008 and 2010. The date of the "kink" appears to be inadvertent disagreement among sources - inadvertent, because we are interpreting the published data in a way that they perhaps never intended it to be used. Personally I think "since c. 2007-2011" is awkward and would suggest "after 2005" as a way of expressing all sources in a readable manner. It is also acceptable to just leave out a date for the kink as undue since does it really matter when the kink occurred? Tom94022 (talk) 16:45, 26 September 2014 (UTC)
The date of the inflection is discernible and notable. An analyst with Morningstar has indeed explicitly disclosed “around 2010” as the date of the slowing in areal density.[83] This analyst finds “a slowing areal density curve … that slowed around 2010 and will advance areal density 25% annually till 2020.”[84] 71.128.35.13 (talk) 18:58, 26 September 2014 (UTC)
IMO the date of inflection is not discernible from Coughlin. Yet another analyst stating "around 2010" doesn't help much. Its pretty clear that different sources looking at different sets of data can find different dates for the "kink" which we could treat as difference among sources requiring some consideration to all or since it is not clear to me why the specific date of the "kink" is notable, we can just ignore this. Readability is important, isn't it? 22:42, 26 September 2014 (UTC)
Notability and relevance are the foundation of readability. Various analysts using different data sources have noted the inflection around 2010. Hitachi Data Systems Vice President and Chief Technology Officer, Hu Yoshida, writes that a “paper in the IEEE Transactions on Magnetics Vol 48 May 2012 shows that the roadmap for areal density increases in magnetic hard disk drives (HDD) has slowed down” and the paper has a chart “which shows that the decline has been going on since 2010.”[1] 71.128.35.13 (talk) 17:44, 27 September 2014 (UTC)
Several references support the inflection “around 2010.” Is there a citation supporting "around 2006"? 71.128.35.13 (talk) 20:33, 28 September 2014 (UTC)
Other references support other inflection dates. Furthermore so do some u cite. Yoshida [1] acknowledges he has "written several times about the declining rate of areal bit densities" and the graph he points to clearly has an inflection in the middle 2000s to 40% CAGR since prior to that time there is universal agreement that the prior growth rate was in the range of 60-100%. Coughlin's chart appears to have an inflection circa 2005 and perhaps circa 2008; he is on the record that the CAGR circa 2010 was about 30%, again a substantial decline from 60-100%. I can't access the Morningstar reference so I can't comment on its perspective. We have to avoid an inappropriate synthesis which seems to have occurred in yr last edit. Actually an inflection date might not yet be at all appropriate since What appears to have been going on since the mid-2000s is that the AD rate of change has been more or less continuously been decreasing to the now current low of about 8-12% (of course, this might change with the next disk drive announcement, up or down). Since the data show a continuing slowdown rather than an inflection its not clear why any one date is more notable than another and it is more readable to just say something like
"However, the growth rate trend has been decreasing since 2006 and, as of 2014, growth has been in the annual range of 8–12%"
Depending upon the language, either the Yoshida or Coughlin reference would be appropriate, not both. Tom94022 (talk) 21:25, 28 September 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── The edit simply changes “2006” to “2010”: “However, the trend decreased dramatically around 2010 and,[1][2][3] as of 2014, growth...”

The references are clear, accessible and support “around 2010”: Kaur of Morningstar sees a slowing areal density curve “that slowed around 2010.” [2] Yoshida indicates that “the decline has been going on since 2010.”[1]

Once again, the edit simply updates the inflection date from 2006 to 2010: “However, the trend decreased dramatically around 2010 and,[1][2][3] as of 2014, growth...” Is there a citation for “around 2006”? 71.128.35.13 (talk) 23:44, 28 September 2014 (UTC)

Simply changing the date of an inflection point misleads by implying there has been only one inflection point since the CAGR was 60-100% circa 2005. There are at least two maybe more, or maybe none. At this point there are so few data points since 2010 that it is not clear that there has been an inflection about 2010. I suggest the important point to the reader is that it dropped from a very high level to the current level and not how it got there, one or two or no inflection points. Tom94022 (talk) 00:21, 29 September 2014 (UTC)
Just throw it all up against the wall, and see what sticks. Do any of them have support from reliable sources? Without support, none of them belongs in Wikipedia.
Two, maybe more inflections? No source has been presented to support this idea.
Maybe none (no inflection point)? No source has been shown to support this. This looks wacky on its face, and is difficult to reconcile with the historical data.
So few data points since 2010? This is not correct (it's wrong), because the shipping HDD density chart has products spaced eight months to one year apart.[3]
Only one inflection point? Yes indeed, bingo. Reliable sources (Yoshida of HDS and Morningstar) are directly quoted in support of an inflection “around 2010.” The following text has valid citations: “However, the trend decreased dramatically around 2010 and,[1][2][3] as of 2014, growth...” 71.128.35.13 (talk) 02:46, 30 September 2014 (UTC)
As I said before two of your citations support at least two inflection points. As usual you ignore evidence that contradicts your viewpoint. To repeat:
  1. Coughlin 2014[3] clearly shows at least two possible changes in slope, one about 1Q2005 and the one you prefer about 1Q2009. Furthermore Coughlin in 2010 looking at an earlier version of the same chart stated, "AD growth appears to be slowing to ~30% annually."[4]
  2. Yoshida 2013[1] acknowledges he has "written several times about the declining rate of areal bit densities" and the graph he points to clearly has an inflection in the middle 2000s to 40% CAGR since prior to that time there is universal agreement that the prior growth rate was in the range of 60-100%.
FWIW Coughlin 2014 looks like what a transition from what was exponential growth to what is now low linear growth in which case the slope thru any set of points on a semilog plot would continuously decline. If Coughlin published his data it would be interesting to see whether the data since 2006 best fit a linear curve or an exponential curve. This is one reason why I prefer the simple description rather than finding trends where they may not exist.
Correct me if I am wrong but unlike you I see about a two year gap in Coughlin 2014 data points from the about 1Q2011 to 1Q2013 and then only one or two points there after. At a 10% CAGR it takes 5 to 7 years to clearly see exponential growth and with only 1-3 data points covering 3 years its hard to call it an exponential trend (the other two trends in the section span 50 and 15 years with many points).
There is nothing about the quotes u cite that exclude the existence of decreases prior to 2010.
It is not necessary to have an explicit "around 2006" quote since we have reliable sources in graphs of Coughlin, Yoshida and Whyte that the slope changed around 2006. Since u found two of these sources I think that would be sufficient.
Reliable sources show that there are at least two inflection points, so that both the original statement and your proposed "simple" edit are incomplete and therefore misleading. The simplest statement is that the growth rate has been declining since about 2006 to about 8-10% in 2014. I suppose it is also accurate and more complete to say that AD CAGR dropped to about 30% per year from about 2006 to 2010 and dropped again to about 8-10% from 2011 to 2014, citing Coughlin 2014 and Anderson but that seems like too much detail. You should note I have omitted the word "dramatically" since it is not clear which is more dramatic, 100-60% dropping to 30% or 30$ dropping to 8-10% nor has the word been used in any source. I can live with either the complete description or the simple one but I prefer the simple version. The current one inflection description is misleading. Tom94022 (talk) 19:06, 1 October 2014 (UTC)
"Around 2010" doesn't rely on an unsupported, mis-leading editor interpretation of what's "shown." Reliable sources are quoted directly in support of "around 2010." Kaur of Morningstar sees a slowing areal density curve “that slowed around 2010.” [2] Yoshida indicates that “the decline has been going on since 2010.”[1] The Wikipedia editor who proposes a gradualist, vague, fuzzy, passive, non-inflected story-line must support that using clear, direct quotes from reliable sources. The citations must interpret this event, not a Wikipedia editor. 71.128.35.13 (talk) 20:09, 1 October 2014 (UTC)
"Around 2006" is neither more or less reliable than your preferred and only "around 2010." Perhaps u forgot that your original "around 2010" edit was based solely on your interpretation of the Coughlin 2014 graph. You continue to repeat yourself ignoring three reliable sources for a decrease around 2006. In the absence of consensus I guess the article will remain as is. Tom94022 (talk) 23:55, 1 October 2014 (UTC)
Actually the policy is "read the source, understand it, and then express what it says in your own words.". Yr self invented demand for a "clear direct quote" is actually contrary to the policy to "avoid copying.". There are three reliable sources (two of them found by you) showing the decline began about 2006 so unless you can find a reliable source that clearly shows something else then I will consider this matter closed and revise the article accordingly. If you revert to your one inflection construction I will then report you to WP:ANI for both edit warring and tendentious editing. Tom94022 (talk) 15:09, 2 October 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Density was still growing at a healthy rate, 30-40% per year, during 2006Q1-2010Q1.[3] 71.128.35.13 (talk) 00:44, 3 October 2014 (UTC)

Since we apparently agree on the AD rate trend from 2006Q1-2010Q1 you must agree it was a decrease from the 60-100% per year trend for the fifteen years prior to 2006 I can see no reason for u to object to either of my proposed constructions. Your "simple" change in language ignores the substantial change in trend "around 2006" in favor of an equally substantial change in trend "around 2010." I have proposed two reasonable alternatives that encompass both trend changes, u seem fixated on just the 2010 change. Why don't u propose some language that encompasses the changes over the entire period from 2006 to the 8-12% currently cited and agreed upon? Tom94022 (talk) 22:58, 7 October 2014 (UTC)
Near-normal growth, 30–40% per year, or above-normal growth was sustained through 2010. The sag “around 2010,” not 2006, is notable and departs from the trend of the decades preceding:
71.128.35.13 (talk) 8 October 2014 (UTC)
I won't waste anymore time or space; yr proposed sentence is unacceptable in that it misleads a reader into thinking the only change occurred "around 2010" when all sources clearly show a substantive change "around 2006" from a then normal growth of 60-100% per year. Tom94022 (talk) 06:32, 9 October 2014 (UTC)
The rate of growth did not change substantially around 2006.[7][8]
Leading-edge hard disk drive areal densities from 1956 through 2009 compared to Moore's law
[85] Normal growth was 40% per year over five decades, and above normal was 60–100% per year.[9][86] Density was still growing at a healthy rate, 30-40% per year, during 2006Q1-2010Q1.[3] 71.128.35.13 (talk) 19:33, 9 October 2014 (UTC)
I won't waste anymore time or space; yr proposed sentence is unacceptable. The growth rate from 1990-2005 was 60-100% per year according to most sources and most sources show a decrease about 2006. Tom94022 (talk) 21:26, 9 October 2014 (UTC)
The claim of a decrease around 2006 has no support (is unreferenced); actually, the references indicate that the rate of growth did not change substantially around 2006.[10][11][87]
I won't waste anymore time or space; all of your cited references support a decrease around 2006, u just choose to ignore their support. Tom94022 (talk) 23:23, 9 October 2014 (UTC)
This quote from a journal article contradicts the claim of a decrease below the historic normal rate around 2006: “Over the past five years [through 2011] innovations such as the use of perpendicular recording have allowed for continued growth in AD although at more moderate and historic rates of 40-50%.”[12]
As usual u again cite the same source, ignoring or deliberately distorting it. According to this source, 2006 (i.e, 2011-5) is fairly characterized as a point at which the growth was not as moderate as 40-50%, it was as we well know, 60-100%. I am really tired of your inability or unwillingness to accurately summarize the many sources all of whom see a growth rate of 60-100% from 1990 to about 2005 and declining thereafter. Tom94022 (talk) 22:51, 11 October 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Growth fell below the historical normal of 40% per year since 2010: "roadmap for areal density increases in magnetic hard disk drives (HDD) has slowed down … the decline has been going on since 2010"[1] 71.128.35.13 (talk) 16:54, 13 October 2014

As usual u again cite a known source, ignoring or deliberately distorting it. This source displays no data prior to 2004 showing a 40% growth rate from 2004-2006. Since we know the growth rate was 60-100% from 1990 to 2005, this source also supports a decline from 60-100% to 40% around 2006 (that is the lines would cross in 2005). I am really tired of your inability or unwillingness to accurately summarize the many sources all of whom see a growth rate of 60-100% from 1990 to about 2005 and declining thereafter. Continuously citing the same sources over and over again without any new discussion or insight is really unhelpful. Tom94022 (talk) 00:25, 14 October 2014 (UTC)
Figure 1 shows growth was around 45% per year from 1990 to 2005, and it held steady (did not decline) around 2006.[88]71.128.35.13 (talk) 02:29, 14 October 2014 (UTC)
As usual u again cite a known source, ignoring or deliberately distorting it. It clearly shows a growth rate of 100% per year into the 2000s - to my eyes circa 2003 which is about 2006 for these purposes. You started this odyssey with Coughlin[3] which has every data point thru 1Q2005 to the left of a 60% per year CAGR line, how can you now seriously argue around 45% per year since 1990? Tom94022 (talk) 23:14, 19 October 2014 (UTC)

Please arrive at consensus here, and THEN change the article accordingly[edit]

You two are edit-warring in the article. Edit-warring is not confined to simple reverts. Once a disagreement over content is evident, the subject text should be reverted to its pre-dispute state (I've done that) and no further changes should be made to the affected part of the article until a new consensus has been established via talk page discussion. Repeatedly changing the article back and forth, even if you think your change is backed by sources and that those sources are definitive, is unproductive, tendentious, disrespectful of other editors, and confuses readers. If you persist I'm going to report BOTH of you to AN/I and let them sort it out. Jeh (talk) 19:48, 27 September 2014 (UTC)

Your revert is unjustified. Here's why:
“Sometimes editors will undo a change, justifying their revert merely by saying that there is 'no consensus' for the change.”
“If the only thing you have to say about a contribution to the encyclopedia is that it lacks consensus, it's best not to revert it.”WP:DRNC
“Your bias should be toward keeping the entire edit.”WP:ROWN BTW, please do report. Outside opinion(s) are welcome. 71.128.35.13 (talk) 20:31, 28 September 2014 (UTC)
You are wikilawyering, grubbing around for WIKI: pages that seem to support your edit, while ignoring the fact that there is an ongoing dispute and you should not be continuing to edit the article once a dispute is evident. What you've quoted there comes from essays, the opinions of the individual writers. They don't even have the status of guidelines. Nor are they really applicable: We're not talking about a single edit here that lacked consensus. We're talking about the ongoing dispute being carried out in mainspace. Reversion was IMO necessary if only to give you each a dash of cold water in the face. Jeh (talk) 08:47, 29 September 2014 (UTC)
Anyway, your arguing on this point makes me tired. Just discuss it here, 'K? It won't kill you. Jeh (talk) 08:55, 29 September 2014 (UTC)
I agree with the revert and will try to come to a consensus, but this has been shown to be difficult in the past. FWIW, I only made one edit in the article's relevant section. Tom94022 (talk) 21:25, 28 September 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  1. ^ a b c d e f g h i j Yoshida, Hu (2013-02-19). "HDDs and NAND Flash will be Around for Some Time". blogs.hds. Hitachi Data Systems. Retrieved 2014-09-27. "roadmap for areal density increases in magnetic hard disk drives (HDD) has slowed down … the decline has been going on since 2010" 
  2. ^ a b c d e f Kaur, Simran (2014-09-15). "Seagate treads safely in uncertain demand environment; narrow moat has negative trend.". Morningstar. Retrieved 2014-09-26. "a slowing areal density curve … that slowed around 2010 and will advance areal density 25% annually till 2020." 
  3. ^ a b c d e f g h i Coughlin, Tom (2014-09-16). "shipping HDD maximum areal density over time". Coughlin Associates www.tomcoughlin.com (Atascadero, CA: Forbes). Retrieved 2014-09-23. "technology transitions are coming more slowly now for HDD companies than in the past ... for shipping HDD maximum areal density over time" 
  4. ^ [1] Invest in New Technologies or Divest in Market Share, Coughlin Associates, (c)2010
  5. ^ Coughlin, Tom (2014-04-07). "New Areal Density Point for Cloud Storage HDDs". Coughlin Associates www.tomcoughlin.com (Atascadero, CA: Forbes). Retrieved 2014-06-09. "(overall 12%/yr from 2011Q3 to April 2014) represents roughly a 25% areal density increase ... this is the first increase in drive areal density since a 7% increase in Q2 2013. The increase in Q2 2013 was the first one since Q3 2011" 
  6. ^ Dave Anderson (2013). "HDD Opportunities & Challenges, Now to 2020" (PDF). Seagate. Retrieved 2014-05-23. "PMR CAGR slowing from historical 40+% down to ~8-12%" and "HAMR CAGR = 20-40% for 2015-2020" 
  7. ^ Plumer et. al, Martin L. (March 2011). "New Paradigms in Magnetic Recording". Physics in Canada 67 (1): 28. Retrieved 8 October 2014. "See figure 1. … Over the past five years innovations such as the use of perpendicular recording have allowed for continued growth in AD although at more moderate and historic rates of 40-50%." 
  8. ^ Whyte, Barry (September 18, 2009). "A brief History of Areal Density]". IBM DeveloperWorks. Retrieved July 25, 2014. 
  9. ^ Plumer et. al, Martin L. (March 2011). "New Paradigms in Magnetic Recording". Physics in Canada 67 (1): 28. Retrieved 8 October 2014. "approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years … Over the past five years innovations such as the use of perpendicular recording have allowed for continued growth in AD although at more moderate and historic rates of 40-50%." 
  10. ^ Plumer et. al, Martin L. (March 2011). "New Paradigms in Magnetic Recording". Physics in Canada 67 (1): 28. Retrieved 8 October 2014. "See figure 1. … Over the past five years innovations such as the use of perpendicular recording have allowed for continued growth in AD although at more moderate and historic rates of 40-50%." 
  11. ^ Whyte, Barry (September 18, 2009). "A brief History of Areal Density]". IBM DeveloperWorks. Retrieved July 25, 2014. 
  12. ^ Plumer et. al, Martin L. (March 2011). "New Paradigms in Magnetic Recording". Physics in Canada 67 (1): 28. Retrieved 8 October 2014. "See figure 1. … Over the past five years innovations such as the use of perpendicular recording have allowed for continued growth in AD although at more moderate and historic rates of 40-50%." 

Strange sentence[edit]

"Computers do not internally represent HDD or memory capacity in powers of 1,024; reporting it in this manner is just a convention." — I can't make any sense of this sentence. What does "internally represent HDD or memory capacity" mean? Of course most computers store most numbers in binary. Zerotalk 02:22, 28 September 2014 (UTC)

While u understand that today's computers store information in a binary form, it is possible that some readers do not, so the preface is a form of explanation as to why Binary Prefixes are a convention. If u can come up with better language please do so. Tom94022 (talk) 18:04, 28 September 2014 (UTC)
"Storing a number in binary" is not equivalent to "storing a number scaled by a power of 1024". What the sentence is trying to say is that space on a hard drive (or a partition, or a volume, or free space) is not counted internally in GB or TB or GiB or TiB or whatever. The internal count is rather in blocks. That's how the HD reports its capacity: As the number of available Logical Block Addresses. The typical "2 TB" hard drive with 512-byte blocks would be reported, and recorded internally, as having 3,906,250,000 blocks (assuming that the "2 TB" is exact; usually drives hold a few more blocks than the claimed capacity). It's true that this largish integer would be stored "in binary", but the point here is that the binary number does not have to be multiplied by a power of 1024 to get the size in bytes. It is merely the display utility (such as the "Properties" dialog in Windows) that decides to divide the number of blocks by 1024^4 so as to display the size as "1.82 TB" (really TiB). To display size in bytes, the internal number does have to be multiplied by the block size, is 512 bytes on most drives today, 4096 with 4Kn drives - neither of those are powers of two. As Tom said, if you can come up with better wording on this point, please do so. If you don't want to edit the article directly, suggest your wording here and we can beat it around a bit. Thanks! Jeh (talk) 20:28, 28 September 2014 (UTC)
I know everything you wrote, but it doesn't make the sentence any better. The part "in this manner" seems to refer to reporting in powers of 10, but the grammar would indicate "this" refers to "in powers of 1,024". Also, you are right that many data sizes are measured in blocks or pages, but other things are not (eg. file sizes). The operating system can keep sizes in any form or format it pleases. I think that's the real point. It is also wrong to put "HDD" and "memory capacity" together because RAM chips are measured with the GB=230 convention. Personally I would just delete the sentence, but if you must have something I'd suggest some variation of "Reporting HDD using powers of ten is just a convention in the industry and does not reflect the way that storage capacities are represented in the computer.". Zerotalk 23:35, 28 September 2014 (UTC)
We can clarify the grammar (it has the correct intent). HDD and memory do belong together since in too many cases HDD capacities are reported by systems in powers of 1024. I'm happy to just clarify the grammar, but if u still think it is strange, then perhaps in yr language, how about, "Reporting on HDD and memory capacity (or usage) in powers of 1,024 is a convention and does not reflect the way that they are represented in the computer." Tom94022 (talk) 16:09, 29 September 2014 (UTC)

Subdividing Form Factor Section[edit]

I agreed with Dismic that there is not sufficient material in the Form Factors section to justify sub-sections and therefore reverted the reversion. The edit proposed seemed overly fragmented with single paragraph sections, and the reason suggested, need for anchors, does not require sub-sections. Tom94022 (talk) 21:47, 12 October 2014 (UTC)

I've not seen any policy assertions so we are in personal taste territory with regard to subsection size but let us make that a moot point. Please recall I asserted having stable anchors is what matters and no one has objected to invisible anchors. I assert "Be Bold" to the invisible anchors alternative. – Conrad T. Pino (talk) 05:46, 13 October 2014 (UTC)
The added {{Anchor|...}} are on new lines to make the diff verification cleaner but that may be adding a little white space. The difference I measured over 1100 vertical pixels seems minimal to me but I'm happy to embed the {{Anchor|...}} within the target paragraph. – Conrad T. Pino (talk) 06:04, 13 October 2014 (UTC)
Hello! I'm glad that you're fine without the introduction of sub-sections. I've just repositioned the anchors a bit as they could be introducing double-spacing, what would be against MOS:BULLETLIST. — Dsimic (talk | contribs) 11:30, 13 October 2014 (UTC)
Thank you! Performing that edit was very kind. Best regards, – Conrad T. Pino (talk) 20:07, 13 October 2014 (UTC)
You're welcome. :) — Dsimic (talk | contribs) 20:12, 13 October 2014 (UTC)