Talk:Orders of magnitude (data)

WikiProject Computing
This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.

Powers of ten or powers of two

I created this with a list of powers of ten to match the other order of magnitude charts. But perhaps it would make more sense to go with powers of two (and sectionalizing accordingly by prefixes)? Fredrik 16:01, 19 May 2004 (UTC)

Stick with the powers of 10. Wikipedia defines an order of magnitude as a power of 10 and you don't want to generalize that (at least I wouldn't). Sectionalize by SI prefixes (kilobit, megabit etc.) but mention binary prefixes (kibibit, mebibit etc.).
Because the byte is (more?) frequently used as unit of data, kibibyte, mebibyte etc. should also be included. And of course, byte should really be octet… This should yield some delightfully complicated fun!
Herbee 22:45, 2004 May 22 (UTC)
Seems good to me then. Thanks for contributing :) -- Fredrik 12:33, 23 May 2004 (UTC)
Why would you not want to generalise that? To me generalisation would seem a perfectly reasonable thing to do especially in a case like this. Indeed it seems that the Order of magnitude page has been changed ... yeah, in three years what do you expect? Currently it reads
Anyhow, if it had been I who created the article, I'd have used powers of two ... and/or four/thirty-two/1024. But, then on the other hand, who says we can't have our cake and eat it too? Bring on the delightfully complicated fun. Jimp 02:42, 19 April 2007 (UTC)
It's done ... enjoy. Jimp 07:36, 20 April 2007 (UTC)

Kilobyte could refer to either, and many other sizes as well; it is therefore not used in this list.

The point has already been made (in the article) that kilo always means 1000, and never 1024. Seen in that light, the above statement might actually reinforce the confusion we're trying to avoid.
Herbee 16:45, 2004 May 23 (UTC)

I guess I should've added "In casual use...". But I'm fine with it removed too. Fredrik 16:48, 23 May 2004 (UTC)
from the lead--"This article assumes a descriptive attitude towards terminology, reflecting actual usage by the speakers of the language." --wouldn't "reflecting actual usage by the speakers of the language" involve the article at least mentioning somewhere that kilo, etc. are commonly used for the powers of 2? —Preceding unsigned comment added by Evildeathmath (talkcontribs) 17:17, 11 December 2008 (UTC)

.4 × 109 bits – Size of the human genome, 3.2 billion base pairs

and later:

1.28 × 1010 bits – Capacity of the human genome, 3.2 billion base pairs

At 3.2 billion base pairs, wouldn't the potential information content be 1.28 × 1010 bits? Transcription from DNA runs in one direction down one strand and can see any of four states at each basepair (A,T,C,G); not just that the pair was one of two (AT,GC) possible pairs. Nor do complementary Codons code for the same amino acid.

Ah, but binary numbers have the wonderful ability to encode four states in just two bits. Explicitly, one might encode the AT base pair as 00, CG as 01, GC as 10 and TA as 11. So it's actually 2 bits per base pair. Accordingly, I'm changing the size of the human genome to 6.4 gigabits.
Herbee 22:23, 2005 Apr 21 (UTC)

DNA is only found in pairings, since only AT, GC pairs are found, the theoretical 2 bits per pair is wrong, it is only 1 bit per pair, as the pairing has more to do with how DNA is replicated than anything else, 3.2 billion base pairs == 3.2 billion bits in potential raw storage, i removed the "3.2×109 base pairs (Each pair encodes two bits of data.)" bit if incorrectness. --99.49.10.193 (talk) 03:39, 26 January 2010 (UTC)

That said, the amino acid translation for codons is more-than-one-to-one (in fact it's about four-to-one), thus reducing that potential back to around 3 &times 109 post-translation. But even after that, the human genome doesn't come close to saturating its potential information content, since about 97% is Junk DNA. And that brings us to a content of about 9 &times 107 useful bits -- a small stack of floppies. Mimirzero 07:25, 7 Jan 2005 (UTC)

I changed to page to use "Capacity" instead of "Size of Genome" to satisfy my second complaint without getting extra obscure on the page.

The meaning of size is perfectly clear while capacity is not. Moreover, the article is about size and not about 'amount of content'. For instance, the 'total amount of printed material in the world' is listed, and that certainly includes a lot of junk and redundancy. Accordingly, I'm changing back to the size of the human genome.
Herbee 22:23, 2005 Apr 21 (UTC)

At a glance?

The article claims that 150 megabits is the approximate amount of data the human eye can "capture at a glance". That sounds pretty cool, but what does it mean? I'm pretty sure nowhere near that much actually makes its way to any useful bit of the brain on a brief glance at a scene. It's not too different from the number of photoreceptors in the human retina; is that what's meant? I'm willing (just barely!) to believe that there may be a useful sense in which the figure is right, but stated so baldly it seems more like a factoid than a fact. Gareth McCaughan 21:56, 2005 Apr 21 (UTC)

The retina article claims an information rate of 0.6 megabits/sec through each optic nerve, so each "glance" would take about two minutes. The 150 megabits seem a factor 100 too high, at least.
Herbee 22:46, 2005 Apr 21 (UTC)
This is all bullshit, nerves transmit information analog, so it cannot be measured in bits. And even if you want to give an approximation: The retina has about 100 million "sub pixels" (there are only about 1 million nerves going to the brain, but it is possible to store many informations in one signal, a technologicla example for this is analog television), each is capable of measuring the light intensity exactly enough that you can tell the difference of 250-1000 different levels. So this is 8 to 10 bits/sub pixel. So you need about 800-1000 mbits for each frame. According to Persistence of vision the frame rate is approx. 50 fps. So this would be about 40-50 Gbit/s for the raw output of each optic nerve. 0.6 mbit/s might be what becomes conscious after the "post processing" in the brain, but the actual data rate is about 4 orders of magnitude higher. --MrBurns (talk) 16:16, 6 February 2011 (UTC)
Nerves are not analog devices - they fire discrete "impulses" and are essentially digital in nature. Read Action potential for more detail. Essentially, as the polarization of the nerve increases, its output stays close to zero until the "action potential" is reached. Then it rapidly undergoes a cascade reaction that results in a gigantic 'pulse' of output...which we describe as the neuron "firing". Once it's fired, it has to recover for a while until it can fire again. Hence it is a distinctly digital process. It either reaches the action potential and generates a maximum possible output - or it doesn't reach that potential and doesn't generate any output. That's pretty much exactly what happens in a digital circuit too. The impressions we get of the world that seem analog in nature come from the cell firing repeatedly - a whole stream of "1" bits - whose frequency and duration are determined by the magnitude of the signal. Humans cannot tell 250-1000 different brightness levels - the number is more like 64...although it's a bit more complicated than that because we have to distinguish between 'absolute' and 'relative' brightnesses - and our ability to distinguish color, brightness and motion varies considerably between the center of the retina and the edges - and is highly dependent on the ambient light levels. We're very good at detecting changes in the rate of brightness change across a surface, for example - but very poor indeed at spotting absolute brightnesses (Check out Same color illusion, for example). Persistance of vision doesn't work like that - it doesn't have a "frame rate" per-se, and there is actually considerable scientific doubt that the phenomenon exists at all. The optic nerve doesn't even transmit "pixel" data - it sends things like edge slopes and rates of movement. Considerable pre-processing happens in the back of the retina. Doing calculations about what you can see on a TV screen is an exceedingly naive way of guesstimating the bandwidth out of the eye. SteveBaker (talk) 17:35, 6 February 2011 (UTC)
At least in relative brightness, the human eye can tell more then 64 brightness levels, I actually see all 256 gray levels on a computer screen at once if it is set up the right way. So my eyes can tell at least 256 brightness levels. If the maximum number was 64, then it would be useless to develop implementations which have more then 8bits per color (i.E. scRGB). And even if some of the pre-processing is done in the retina, 600,000 bits/s are unrealistic low, because this would mean, that any of the 1 million nerve cells of the optical nerve transmits 0.6 bits/s in average. This means that when a small dot-like object appears, in most cases it would not be seen in less then a second, which contradicts experience. 600,000 bits per second might be an average, but it is unusual to use averages in table like the one in this article. it is certainly not the maximum. As a maimum I still think it is as far from reality a the 27 kbits stated for a typical telephone book in the next section. --MrBurns (talk) 21:37, 6 February 2011 (UTC)
As I said, you are dramatically over-simplifying a much more complex problem. I made a test page for you with four photo's of our illustrious leader: http://www.sjbaker.org/tmp/t1.html - now, if you look at the backgrounds of the four images - you can easily spot that the one on the bottom-left has "contours" because it's a 64 grey-level image. That's because our eyes are very sensitive to the smoothness of gradients and 64 (or even 1024 under optimum lighting conditions) grey levels aren't enough to fool us into seeing perfect smoothness.
But what about the two on the right? One of them has 256 grey shades and the other only 64 - it's REALLY hard to tell which is which without cheating and using a histogramming tool! It's not impossible - but it requires just the right ambient lighting around the screen you're viewing it on. That's because there aren't enough smooth gradients - and without that, you really have a hard time of it. 64 grey levels ARE enough for "natural" scenes. Discussing the number of bits our vision has is a truly meaningless thing if you're trying to measure it using the standards of computer images. It's much more complex than that.
Your 'analysis' of how data flows along the optic nerve is ridiculous because the nervous system simply doesn't work like that. You're assuming that some really low-level representation is being used. It is possible, for example, that one of those million nerves does nothing buy say "there was a brief flash" - and another that says there is a "small dot-like object someplace" - and another that says "something 'interesting' happened in the top-left quadrant" - for those nerves, 0.6 bits per second is plenty. 600,000 bits (used intelligently) can describe any visual scene you can imaging - because identifying any object you've ever experienced since birth would take maybe 40 bits...and where it is in the visual field (another dozen bits) - and how long it was present for (maybe 8 more bits) - other bits can express motion, color shifts. We don't entirely understand how this is done - but it's VERY clear that it doesn't work like a video camera.
There are all sorts of really clever, subtle experiments that have been done that show that your brain is 'wired' to disguise errors and hide its limitations from your consciousness. Optical illusions are great at demonstrating ways in which that self-deception can be circumvented.
So, let's ignore your "OR" - and use a number that can be backed up with a solid reference from some scientist who is an expert in the field. SteveBaker (talk) 02:09, 7 February 2011 (UTC)
I never wanted to research a number for the article, I just wanted to avoid that the 0.6 mbits/s are inserted into the table, so what I am doing here is not a violation of WP:OR. And when it is backed up by a "solid reference", I would accept it, but only if the source states the 0.6 mbits/s as an outcome of an experiment, which is exact enough, not as a guess (scientists also guess sometimes). Also, any information is at least 1 bit in size. So if the nerves would not be capable of transmitting about 1 bit/s, we would have a much higher failure rate on seeing a small dot-like object in less than 1 second. And if the human eye is capable of telling more than 64 grey shades on a monitor in some conditions, it also can do this in nature under perfect conditions, because a monitor cannot display anything which isn't possible in nature, actually the gamut of a monitor is much smaller than the gamut of all possible natural colors (although this doesn't matter much when we are talking about grey shades, because good CRT-monitors are able to produce something, which we see as "perfect black" and "perfect white" at the same time). I also guessed your images correctly (I tested that it was correct with a software which shows the RGB-value of the pixel under the mouse pointer) and this was not by coincidence, you can see the difference, because on the 256-shades-image the edge of the shadow of Jimmy Wales' nose looks smoother on his beard. And my monitor isn't even perfectly calibrated, I used a calibration software, but not a colorimeter. Also the software recommends calibrating the monitor at least every 6 months, but I did the last calibration about 2 years ago. --MrBurns (talk) 02:51, 10 February 2011 (UTC)
I didn't say it was impossible to tell the difference - only that it's quite hard (which it is - you could only find one tiny place with a smooth gradient where it is noticable). The point (which you seem to be having a hard time understanding) is that you could only see the difference in an area of gradually varying brightness. The eye - and especially the optic nerve doesn't work like a TV camera. It doesn't send an array of pixels at some fixed "frame rate" and it doesn't have some particular number of "grey levels". You simply cannot relate the bandwidth of one particular nerve fibre to the rate of an object flashing on the screen because there may just be some nerves whose job it is to say "something just flashed over there" - or "something just flashed three times over there". It can send that kind of data much more slowly than the actual flash rate...so doing thought experiments such as the one you're doing simply cannot tell you anything about that higher level communication path. Human vision simply doesn't operate using any of the terms and concepts that you're flinging around here.
The true reason that you can only tell the difference between 64 grey levels and 256 in a smoothly varying surface has nothing to do with the precision of the retina or even the bandwidth of the nerve fibres. It's because the retina detects (at high fidelity) whether something is smoothly varying in brightness (to a precision of maybe one part in 256) or has sudden steps - and probably sends that to the brain as a "steppy or smooth" signal. When it sends "this is the brightness of this area of the image", it does it at less than one part in 64 of precision. So you can easily tell when the background (or that area under the beard is at less than 256 grey levels - but your brain can't tell that there are only 64 grey levels where the area isn't "smooth". The reason the brain does this is that absolute grey levels are pretty irrelevant to an animal surviving out there in a wild environment. To the contrary, it's more useful that you can recognize Jimmy Wales instantly in any kind of lighting - which is more about knowing where rough and smooth areas are than about absolute brightnesses. Variations in brightness are important because they help to distinguish shape and curvature in the 3rd dimension...absolute brightnesses are beyond useless, they tell you almost nothing about the world. That's why there is the phenomenon of Mach bands (and in fact, that article confirms what I'm saying by explaining that gradients are processed in the retina - not in the brain).
I'm sorry but it would take a year of cybernetics classes to teach you all of this...and I just don't have the time to impart it all to you in a forum post - but your reasoning is just naive given the 'architecture' of the human visual system. (I have a degree in cybernetics BTW).
SteveBaker (talk) 16:08, 10 February 2011 (UTC)
Ok, but you still don't have any reference for the 600,000 bits/s. And according to the last paragraph of Retina#Physiology the 600,000 bits/s are only the information capacity of the fovea, not of the whole retina. According to Fovea centralis about half of the nerve fibers of the optical nerve carry information from the fovea, so the rate would rather be 1.2 Mbits/s then 0.6 Mbits/s. Which also seems to be more reasonable when considering the "small dots problem". --MrBurns (talk) 17:05, 10 February 2011 (UTC)

Phonebook

This is so stoopid:

104bits

• 22,500 bits – Amount of information in a typical non-fiction book.
• 27,000 bits – Amount of information in a typical phone book.
• 42,000 bits – Amount of information in a typical reference book.

Reality check: a phone book has on the order of 1000 pages, which would amount to 27 bits or less than 4 bytes per page. Stoopid! My own phone book (2005/2006 edition for the Arnhem-Zevenaar region, the Netherlands) has 704 pages, 4 columns per page, 129 lines per column, and on average 35 characters per line, for a total of about 100 megabits. Ads have a lower information density, so I estimate that my phone book contains 80 ± 20 megabits of information. The other two entries make no sense either—what is a "typical non-fiction book"?

As a fix, I removed these three entries and created a new one for the phone book.
Herbee July 3, 2005 23:05 (UTC)

i agree with you . your a eagle eyed observer if you spoted that. but i was not here to view this wikipedia artical then in fact now in 2012 was the fist time i have seen this wiki artical. so i do think you did a good job. 69.221.168.185 (talk) 09:37, 4 January 2012 (UTC)

Binary prefixes like kibibyte

A vote has been started on whether Wikipedia should use these prefixes all the time, only in highly technical contexts, or never. - Omegatron 14:57, July 12, 2005 (UTC)

A year & a half later & the debate continues. Jimp 01:26, 19 April 2007 (UTC)

Uninformative Entries

This article seems to have a very large number of entries describing the names of the numbers (eg. 8,000,000 = 8 kilobits) and relatively few entries that actually give you an idea of how large that number is. For example, in the entry for 10^15 bits, there are five entries - four of which are names. Are all these names really relevant? Furthermore there is notable redundancy in the names - for example there is the heading "10^18 bits – One exabit" and then the first entry is "1,000,000,000,000,000,000 bits (1018 bits, 125 petaoctets) – One exabit" which is really not very informative - yes there is some information there but most readers will find it irrelevant. Cornflake pirate 03:02, 13 April 2006

I must concur. The entries are useless and dilute actual information. —Preceding unsigned comment added by 173.79.112.21 (talk) 14:57, 28 March 2009 (UTC)

Octet vs Byte

Byte is a far more common term... shouldn't we use that instead of octet? SigmaEpsilonΣΕ 03:31, 15 May 2006 (UTC)

I tend to agree, Bytes would be useful on this page as they are more common and much easier for most people to understand.--Hibernian 19:32, 16 August 2006 (UTC)
Yes. — Omegatron 15:58, 20 May 2007 (UTC)
I disagree, octet is the correct and accurate term, byte can be ambiguous. Sarenne 17:36, 21 May 2007 (UTC)
Yes - octet is used in a number of fields, notably music. Replacing a word that is used solely in the computer field with one that is used in many is more confusing, regardless of any historical ambiguity (byte = 8 bits since I was a kid, and that's a long time ago). Maury 12:24, 23 May 2007 (UTC)
I find the layout is poor, I'm going to try to look into a more efficient way of displaying the information. Tyler 17:50, 25 May 2007 (UTC)
IMHO the layout is fine. However, it might be worthwhile to split it into two tables? One for the "kibibits" and one for the SI standard "kilobits". It's less confusing to only have to deal with one measuring system at a time. (Or not; doesn't matter.) - Theaveng 12:20, 28 September 2007 (UTC)

I think octet is more appropriate here, as it provides an accurate, numerical definition that doesn't rely on anything else. There can be absolutely no confusion about what an octet is, and I think that goes well with other "orders of magnitude" lists on Wikipedia where absolute values are provided. — Northgrove 09:12, 14 September 2007 (UTC)
Wikipedia is supposed to reflect ACTUAL usage by the common people, not try to redefine how people are supposed to talk. Descriptive of the language, not prescriptive. - Theaveng 12:12, 28 September 2007 (UTC) - P.S. My external hard drive is 300 gigaBYTES, not 300 gigaoctets (sounds unusual doesn't it; that's because it is). Be *descriptive* of the language as its actually used.
P.P.S. I apologize if I sounded a little hostile. It's just that I, as an engineer, get a little annoyed when some English major comes along and tells me I should be saying "gigaoctets" because "it's more proper". (And they say I'm not allowed to use the phrase "can't disagree". Chaucer and Shakespeare used double negatives; why can't I?) ----- I've been using "bytes" for the last thirty years, as have all the engineering & programming colleagues around me. We are not going to change to "octets" just because you tell us we should.
My Atari 2600, Commodore 64, Amiga, and PowerMac were not filled with octets. They were filled with 128 bytes, 64 kilobytes, 2 megabytes, and 1 gigabyte respectively. BYTES, not octets. The article should reflect that common usage. - Theaveng 12:44, 28 September 2007 (UTC)
The term "octet" has been often used in protocol specifications and the like when it is necessary to unambiguously define an 8-bit data word. See Octet (computing) and (to pick an IETF RFC at random) RFC1122. Letdorf 14:04, 28 September 2007 (UTC).
Well that does make sense because specifications are a lot like laws: Filled with lots of jargon only understood by lawyers and politicians. These words have very precise definitions, but are also very confusing to the layman. "Byte" is still the more common term that is used virtually everywhere you look, and IMHO "octet" should be treated the same way as "crumb"... a term that is used, but only rarely, and only with select groups (like specification writers or engineers).
Also an article using the terminology that people know best (bytes) is far more useful to those perusing the encyclopedia, whereas an article using unfamiliar terminology (octets) will just leave the average Joe Smith scratching his head in confusion. He won't get any use out of it. IMHO. - Theaveng 15:44, 28 September 2007 (UTC)

Semioctet vs. "nibble, rarely used"

Say what? I've been using the term nibble since the 1970s. As do many of my engineering colleagues. Hardly rare.

And what the heck is a semioctet??? Where the heck did that come from? I've never heard that terminology, not even once, in the last thirty years. (IMHO it seems this article has its terminology backwards, using rare terms as "common", and common terms as "rare".) (Wikipedia's supposed to reflect ACTUAL usage by the common people, not try to redefine how people are supposed to talk.) - Theaveng 12:12, 28 September 2007 (UTC) P.S. I back-up my argument by pointing out the word "semioctet" has no article of its own. It defaults to the common usage "nibble" article.

Yes!! Me too!! As a "layman" trying to comprehend these terms I find semioctet as silly as kibibits. When on earth did that term crop up? If we have jargon like kilobits, kilobytes, kibibits or even kibbutz's, how do you abbreviate them so everyone understands? Or is the general idea to confuse? - Meshdunarodny —Preceding unsigned comment added by 84.92.95.190 (talk) 23:56, 21 January 2009 (UTC)

Fractional byte lengths

The phrase "10 bits is the minimum byte length" is inherently misleading. It is correct to say that 10 bits is the minimum bit length for a certain purpose. Byte has already been defined as 8 bits and introducing the non-standard notion of fractional byte lengths will only confuse the reader (e.g. what is the byte length of 3 bits?). I've edited the text accordingly.

This points to a more general semantic confusion in the article between the actual information conveyed in a string of bits/nibbles/bytes/whatever and the information carrying capacity of such information. I'm not sure how best to clear up this confusion. —Preceding unsigned comment added by Ross Fraser (talkcontribs) 00:48, 15 November 2007 (UTC)

Size of a DVD

The size of a DVD is stated as 4.04 * 10^10 bits. This equals about 4.7 Gibibyte, not Gigabyte as stated. 4.7 Gigabyte (the correct size) equals only 3.76 * 10^10 bits. 79.206.186.31 (talk) 19:02, 1 August 2008 (UTC)

Wikipedia

Anybody know how much data (in Bytes) is contained in the entirety of en.wikipedia.org? —Preceding unsigned comment added by 68.41.142.17 (talk) 03:40, 26 February 2008 (UTC)

Not much. 189.30.69.113 (talk) 03:27, 17 June 2008 (UTC)

Useful orders of magnitude

I would imagine that one of the more useful orders of magnitude that most non-computer proficient laymen (and yes, I do include myself in that group) need is the ol' byte->kilobyte->megabyte->gigabyte, etc, etc. It is how I ended up here in the first place, looking to confirm my knowledge of it. SteveCoppock (talk) 03:01, 1 May 2008 (UTC)

Reference with Obsolete Comparison

Last-Modified: Sun, 12 Mar 2000 00:12:24 GMT

Their reference to the "entire Internet being roughly 100 Terrabytes" seems a tad dated. It's not uncommon for people to have hard drives 1% of that size in their desktop PC. —Preceding unsigned comment added by 86.155.191.111 (talk) 11:47, 5 May 2008 (UTC)

Not relevant

I don't see these orders of magnitude to be very relevant. If you type "man ls", a basic, and very much used Unix command you will not see the term "mebibyte" used, you will be dealing with the 1024*1024 bytes, in fact all of the usage is in terms of base 2. I really just don't see the usefulness of this sort of documentation for practical purposes. Heck, even my spellcheck sees mebibyte as a spelling error, while megabyte has no issue at all.

--Rahennig (talk) 18:44, 4 August 2009 (UTC)

Creating orders-of-magnitude articles is useful to all sorts of people - just because you don't find it useful doesn't mean that someone won't. You always see things in mainstream media like "The XYZ-2000 hard drive can store XXX Mbytes of data which is like storing the Library-of-Congress YYY times!" - well, if you need a handy analogy for some amount of storage capacity - then this is the place to look. I use the other orders-of-magnitude articles for all sorts of similar things (eg "The deck of the Nimitz aircraft carrier is 25,000 square meters which is about 5 football fields"). So this article is certainly not irrelevant.
As for the Mebibyte, it has been a part of the IEC standard for data capacities since 2000 and is accepted by a large number of standards organizations. So - the reason it's here is to educate people such as yourself into the correct terminology. 1 Mebibyte is 1000x1000 kibibytes, 1 Megabyte is 1024x1024 kilobytes. Yes, a lot of people get it wrong...but a lot of people getting something wrong doesn't somehow make it right. The 'man' page for 'ls' dates back to the early days of UNIX in the late 1970's - and that was before the standard for the Mebibyte (and other) binary orders of magnitude were formulated. Spell checkers are hardly reliable sources. The 'ispell' checker (assuming you are a UNIX user) only has an 87,000 word vocabulary - the Oxford English Dictionary contains 300,000 words. I strongly suggest you read our articles on Gibibyte, Mebibyte, Kibibyte, etc.
SteveBaker (talk) 19:59, 21 September 2010 (UTC)
Even the Mebibyte article says that Mebibyte, quote, "has seen little real world usage in the computer industry". Now, Wikipedia is supposed to reflect real world usage, at least that seems to be the standard everywhere else. IMHO, the IEC picked a stupid word to replace the megabyte, and it will never be accepted. They fiatted the word into existance in 2000, today, this article, is the first time I've ever heard of it, and I'm a professional computer guy. The phrases "Binary Megabyte" (meaning 2^20) and "Digital Megabyte" (meaning 10^6) would have been better (it's what I use when it comes up in conversation). Unambiguous, clear both ways, everyone knows what you mean, and you don't sound like you're babbling. Listmeister (talk) 21:11, 5 July 2012 (UTC)
Discussed and consensus arrived at already. See the section "Neutral point of view" below. Btw, I think the term you want is not "digital megabyte" but rather "decimal megabyte". I think all megabytes are digital, no? Jeh (talk) 22:59, 5 July 2012 (UTC)

3 digits

The statement "7 bits – the size of code points in the ASCII character set – minimum length to store a single group of 3 decimal digits" should be only 2 digits, as 2^7=128 thus allowing 00-99.88.212.102.161 (talk) 10:56, 6 August 2009 (UTC)

Trits are not 1.5 bits

Two trits can store 9 (3x3) unique values. Three bits can only store 8 (2x2x2). If a trit were 1.5 bits, two trits could be fit in 3 bits, but this is clearly not the case. If my math is correct, log 3 base 2 (= log(3)/log(2) ~ 1.584 ~ 1.6) would be more correct. Hopefully I haven't stepped on any toes by correcting this immediately. 24.19.56.119 (talk) 04:09, 21 September 2010 (UTC)

You're absolutely right - my bad. Thanks for the fix! I added trit because I wanted to dispel the idea that information can only come in integer numbers of bits - I guess I screwed up a tad there! SteveBaker (talk) 19:37, 21 September 2010 (UTC)

Neutral point of view

It looks like this article overstates the use of -bibyte prefixes. As Wikipedia:MOSNUM says they are not familiar and are generally not to be used. I think this article should put forward a more neutral point of view by using the more commonly used prefixes instead of using prefixes that are not familiar. I'll do these modifications in a few days unless someone can come up with a better idea? Glider87 (talk) 05:47, 23 October 2010 (UTC)

I'm loath to remove them entirely because this is an article that's talking rather specifically about these kinds of issue. MOSNUM does provide an exception under such kinds of situation. I don't think it's a NPOV issue - but it's definitely something we need to address. IMHO, we should either list both (with the IEC unit name in parentheses with a link to a footnote explaining the issue) - or at the very least explain the situation in the lead of the article so that people can understand the sources of confusion. We should also explain the ugly issues about disk drive capacities where 1Mbyte could mean 1,000,000 bytes or 1024 x 1024 - or even 1000 x 1024. SteveBaker (talk) 15:23, 1 November 2010 (UTC)
I think the -bibyte prefixes in this article should be removed and replaced with exact numbers, as demonstrated in MOSNUM. This is instead of putting -bibyte prefixes in parentheses because as it says in MOSNUM "Disambiguation should be shown in bytes or bits". Glider87 (talk) 21:25, 2 November 2010 (UTC)
In most articles, I'd strongly agree with you - but this one is special. We're specifically talking about orders of magnitude - and we're a part of a series of a dozen or more other articles that we need to follow in the style of. We should not use the 'kilobyte==1024' byte standard because that only applies in computer-like applications...and even then, we know that there is confusion between disk capacities and RAM capacities. That's not ordinarily a problem - but to fit with the other articles in the series, we need to use the 'kilobyte=1000' method and make it super-clear that we're doing that by using the IEC nomenclature for the powers-of-two stuff. MOSNUM is only a guideline - and it admits the need for exceptions. This article is a clear example of an exception. I can't agree with the course you propose. SteveBaker (talk) 02:12, 3 November 2010 (UTC)
Given that this article is all about (the size of) various units, I'd have to disagree with removing the IEC terms. How the 1000 vs. 1024 ambiguity should be dealt with for the popular non-IEC terms is another matter. --Cybercobra (talk) 04:16, 3 November 2010 (UTC)
Making things super clear is the problem because the article doesn't even cover the fact that IEC is not widely used. As it stands at the moment it gives the misleading impression that IEC is the preferred method to use. As MOSNUM says the IEC prefixes shouldn't be used for disambiguation when there exist other methods such as using the power notation. This is because disambiguation should clarify more often than not and so not introduce terms that are used much less frequently. All the disambiguation could just use power notation instead. Glider87 (talk) 05:34, 3 November 2010 (UTC)
To repeat: MOSNUM is only a guideline - and it allows for exceptions. We don't have to follow it slaveishly. SteveBaker (talk) 16:28, 3 November 2010 (UTC)
I don't think the exception cover this case. There are other pages used to specifically discuss the virtually unused IEC prefixes. This page isn't one of those pages so it shouldn't being using the IEC prefixes when better methods already exist. Glider87 (talk) 18:20, 3 November 2010 (UTC)
If you put for example "megabyte" in both the binary and decimal prefix columns then the table becomes more confusing, because "megabyte" will then appear on the rows for both 1000000 and 1048576. MOSNUM even states
```Avoid inconsistent combinations such as A 64 MB (67,108,864 bytes)
video card and a 100 GB (100 × 10003 bytes) hard drive.
```
One could say "megabyte (binary)" in the binary column, and/or "megabyte (decimal)" in the decimal column, that would work - but it seems to me that it would add quite a bit of clutter. And picking one or the other (as opposed to using both) would be NNPOV, wouldn't it? Really, I don't see that this is an NPOV issue at all. It's just a question of how best to succinctly and unambiguously refer to a number. Jeh (talk) 19:54, 3 November 2010 (UTC)
The number itself or power notation unambiguously and succintly refers to a number, not the unfamiliar (to our readers) IEC prefixes. Glider87 (talk) 21:20, 3 November 2010 (UTC)
It is true that the number itself or power notation do refere unambiguously to a number. However power notation maybe unfmailiar to many readers. I have yet to see an operating system describe a hard drive as holding "300 x 109 bytes", for example. OSs often do display exact numbers of bytes if you look hard enouth; nevertheless the important thresholds like "4294967296 bytes", let alone "18446744073709551616 bytes", are not numbers that very many people will immediately recognize. Jeh (talk) 00:20, 4 November 2010 (UTC)
Power notation is the method advocated in MOSNUM and it specifically says not to use IEC prefixes. The power notation is a mathematical concept so of course it is more familiar than using unfamiliar IEC prefixes. Glider87 (talk) 05:18, 4 November 2010 (UTC)
I think there may be some confusion about what we're talking about doing here. Right now, we have things like:
``` 220 | mebibit | 106 | megabit | 1,048,576 bits (128 kibibytes) – RAM capacity of popular 8-bit Computers
```
• The first use of 'mebibit' in that second column - I strongly defend. A mebibit is indeed 220 bits - and since it's an example of an order-of-magnitude term (albeit rare), there is no problem in saying that - MOSNUM not withstanding. It may be obscure - but so are many of the things in this article. Saying that "9408 bits (1,176 bytes) – uncompressed single-channel frame length in standard MPEG audio (75 frames per second and per channel), with standard 16-bit sampling at 44,100 Hz" is an exceedingly obscure fact. We condone it because it's interesting...the same exact reasoning applies to the use of IEC notation. Just because something is unfamiliar to most readers, that is NOT a reason not to describe it. The whole reason people use an encyclopedia is to be educated - to learn stuff - and learning that there is this IEC notation is a perfectly reasonable idea for this specific article.
• The whole point of these order-of-magnitude articles is to provide terms and examples for different amounts of 'stuff'. However, the use of 'kibibytes' in the last column of our table is a clear problem vis-a-vis MOSNUM and could/should be removed. To that extent (only) I agree with Glider87.
• However, the other problem we have is with 'megabit' in the fourth column because this term "megabit" means different things to different people (specifically, it means different things to disk drive manufacturers and RAM manufacturers). The idea of replacing mebibit in the second column with megabit is just nuts - because now you'd have two sets of terms in two different columns - not lining up. To resolve that - I think that our lead section should explain that with something like: "Because the giga/mega/kilo nomenclature has multiple meanings - in this article we use the (rare) IEC notation for exact powers of two and the giga/mega/kilo notation for powers of ten - note that some applications of data sizes use giga/mega/kilo as exact powers of two". I don't see a problem with MOSNUM for that situation - it keeps things clean and simple. 19:48, 4 November 2010 (UTC)
SteveBaker (talk) 19:48, 4 November 2010 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

I pretty much agree with SteveBaker. Well put. Jeh (talk) 01:11, 5 November 2010 (UTC)

Minimum error-correcting size

We say:

``` 10 bits – minimum bit length to store a single byte with error-correcting memory
```

...and I agree that when you buy an error-correcting memory, it'll have 10 bits per byte. But that's not actually the MINIMUM necessary to do error correction in larger blocks of data - which is confusing.

You could (for example) store each 8 bit byte with a single parity bit and each (say) 1kbyte block with one extra 9 bit word that stored the 'vertical parity' - that is to say that n'th bit of that extra word would store the parity of all of the n'th bits in the preceeding 1kbyte block. When a single bit error occurs anywhere in the block, one of the per-byte parity bits will be wrong - and one of the per-block parity bits also will be wrong. Using these two single bit values yields the row and column at which the error occurred. RAM chips don't work like that - but they could, in principle. The theoretical minimum error-correcting overhead for large blocks of data using this approach is 2*sqrt(number_of_bits)+1.

I'm not quite sure what to do about that though...should we clarify the present decabit entry - or simply delete it?

SteveBaker (talk) 15:23, 1 November 2010 (UTC)

Memory hierarchy not relevant

Please remove this picture as it is not relevant to data sizes. It needs to find a new home :) —Preceding unsigned comment added by 94.9.2.201 (talk) 19:20, 2 November 2010 (UTC)

Average hard disk size

where is the reference for tebibytes being considered average for a hard disk as of 2012? what country is that? Is that desktops only? Just graduated from a masters degree, new laptop in nov 2012... 200Gb seemed more standard for laptops when i went to buy. And for students across the UK at least, laptops are anecdotally more commonly owned. 86.149.75.62 (talk) 23:16, 28 June 2013 (UTC)

Fictitious orders of magnitude

In Terry Preatchett's and Stephen Baxter's 'The Long War', the word 'Godzillabytes' is used: "Godzillabytes: Nelson had an irrational dislike of 'petabytes' [...]. Anything that sounded like a kitten's gentle nip just didn't have the moxie to do the job asked of it. 'Godzillabytes', on the other hand, shouted to the world that it was dealing with something very, very big ... and possibly dangerous." I'm guessing this is a fictional order of magnitude, esp. considering Pratchett's style. It reminded me of Wikipedia's article on Indefinite and fictitious numbers, but it doesn't quite belong there, and indeed isn't. Perhaps it would fit in List of humorous units of measurement, but it doesn't feel right either. Should an article or section on Fictitious orders of magnitude be considered?Corntrooper (talk) 01:11, 4 November 2013 (UTC)

That's Terry Pratchett, not Preatchett. --Thnidu (talk) 06:15, 11 January 2014 (UTC)

Add an entry for the capacity of a ZFS zpool

The ZFS page says "256 zebibytes (278 bytes): maximum size of any zpool"

It might be nice to have an entry for 281 bits as the maximum size of a ZFS zpool

(199.167.120.229 (talk) 18:18, 18 February 2014 (UTC))

Wikipedia, part 2

Imho it would be better to use the uncompressed text size than the compressed one, because it has more meaning. The compressed size actually doesn't have a lot of meaning, it doesn't even say anything about the information entropy, because no known compression algorithm can compress data to the smallest possible size. But the uncompressed size says how many text is actually stored, so it is the most meaningful number imho. --MrBurns (talk) 04:09, 6 November 2014 (UTC)

Capacity of a punched card

This comment for "80 bits" was removed by , and correctly: There are 80 columns on the typical punched card, and each column carries the equivalent of several bits.

If this is to be corrected and restored, the question of "how many bits per column?" arises.

In the earlier of the two most commonly-used character coding on 80-column punched cards, there 64 possible characters. (See this table.) This suggests six bits per column, for a total of 480 bits.

But this is a subset of the character codes defined under the later EBCDIC. Under EBCDIC there were still 64 commonly-used characters (some of the non-alphanumerics having different punch codes than they did under BCD). The additional characters were not much used (there were no keypunches that could produce them, as far as I know) but they existed (see the famous IBM S/360 "green card"). However they did not define punch codes for all 8-bit byte values possible on System/360.

On the other hand, some IBM computers and card readers supported a "column binary" mode in which every punch combination was possible. There are 12 punch positions per column, so this gives us 960 bits. On 8-bit byte machines this would be stored in 120 successive byte locations.

So, is it 480 bits, 960, or somewhere in between? It seems to me it's at least 480. Or is this too much OR? Personally I think the "480 bytes" claim is just arithmetic, not OR. Jeh (talk) 09:59, 28 January 2015 (UTC)

How about an entry for 480 bits, identified as "typical capacity of a punched card"? Any reader wanting to see why this is merely a "typical capacity" can follow the link and see the complexity of the various sizes and encodings. -- John of Reading (talk) 13:48, 28 January 2015 (UTC)