Talk:Byte

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Computing / Software / CompSci / Early / Hardware (Rated C-class, Top-importance)
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Software (marked as High-importance).
Taskforce icon
This article is supported by WikiProject Computer science (marked as High-importance).
Taskforce icon
This article is supported by Early computers task force (marked as High-importance).
Taskforce icon
This article is supported by Computer hardware task force (marked as High-importance).
 
Things you can help WikiProject Computer science with:



Decibel is not an SI unit[edit]

In the section titled "Unit Symbol" there is an entire paragraph explaining that the symbol 'B' is the SI unit of the bel. This is not true.

Although often used with SI prefixes - e.g. decibel(dB) - the bel itself is not, nor can it ever be an SI unit itself, it is a dimensionless ratio. Because it is dimensionless, it is often necessary to indicate how it was calculated by adding an appropriate suffix (e.g. dBi, dBm) in order to make meaningful comparisons.

http://physics.nist.gov/cuu/Units/outside.html

JNBBoytjie (talk) 10:38, 13 April 2016 (UTC)[]

Thank you. Done. Kbrose (talk) 14:17, 13 April 2016 (UTC)[]

C 'char' type[edit]

AFAICS 'unsigned char' is right. Based on the C Standard, (signed) char need only hold values between -127 and 127 inclusive, in other words only 255 distinct values. If you want a guarantee of 256 distinct values you need unsigned char. Ewx (talk) 08:04, 25 August 2016 (UTC)[]

It is not necessary to specify unsigned char. The C standard already mandates that a value stored in a char is guaranteed to be non-negative. A signed char is an integer type and has to be declared, not an unsigned char. Kbrose (talk) 11:50, 25 August 2016 (UTC)[]
I have two problems with this concept....:
What happened to -128? (0x80)
What version of the C standard? The original compiler I was using in the 80s (Turbo-C 1.0, 1.5 and 2.0) had by default 'char' being signed. But you could configure the compiler to make it unsigned by default. That was obviously before the current C standard... Dhrm77 (talk) 13:51, 25 August 2016 (UTC)[]
Formally speaking there is only one C standard, ISO/IEC 9899:2011; the rest have been withdrawn, as you can see on the ISO website. That doesn't stop people referring to older revisions (or drafts, given the excessive cost of the current version), or implementing them, though. In this case however, the question is irrelevant: all versions of the C standard permit char to be either a signed or an unsigned type. As for 'what happened to -128', the point is to permit a variety of representations of signed types; there's more to the world than x86 and two's complement. Ewx (talk) 08:05, 26 August 2016 (UTC)[]
Which C standard? Is the specification for char the same in C89, C90, C99 and C11? If not, the article should reflect that. Shmuel (Seymour J.) Metz Username:Chatul (talk) 18:45, 25 August 2016 (UTC)[]
No, char is not guaranteed to be unsigned. C99 and n1570 are completely explicit about this (6.2.5#15). Ewx (talk) 07:57, 26 August 2016 (UTC)[]

For the record, this is the text of 6.2.5#3:

An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

Kbrose (talk) 11:57, 26 August 2016 (UTC)[]

You're reading the wrong bit. That just tells you that certain characters have non-negative representation in a char. It does not tell you that the type itself is unsigned. Once again, see 6.2.5#15 for text that is actually relevant here. Ewx (talk) 18:50, 26 August 2016 (UTC)[]
I have to agree with Ewx, I think it just says that if you need to store non-negative values, you can, if you want to store something else, it depends on the implementation, which is a way of saying that they are not defining if a char is signed or unsigned. It even opens the door to implementing a range of -64 to +191 if you wanted to, instead of the classic -128 to +127 or 0 to 255. Dhrm77 (talk) 22:02, 26 August 2016 (UTC)[]
-64 to 191 would be forbidden (SCHAR_MIN must be at most -127) but -127 to 127 is permitted (and realistic for 1s complement machines); so is -2147483648 to 2147483647 or 0 to 4294967295 (and realistic for word-addressed machines).Ewx (talk) 07:25, 27 August 2016 (UTC)[]

Merge Octet (computing) into Byte[edit]

Octet (computing) should be merged into Byte as Octlet (computing) is another name for Byte and Wikipedia does not have two articles for two names of the same thing (instead both are mentioned in the WP:LEAD and the article's title is at the WP:COMMONNAME). -KAP03(Talk • Contributions • Email) 22:47, 26 March 2017 (UTC)[]

  • Oppose. I see your point, but I'm afraid that merging may blur the differences between the terms even more (than how they are confused in the current articles). Byte, Octet, Octlet (and the not mentioned Octad) are related but not multiple names for the same thing in general. They need to be distinguished carefully and we should rather improve/sharpen the articles emphasizing their differences.
While byte is today understood as referring to a group of 8 bits most often, without context it does not define a specific count of bits. Historically, a byte was defined as any group of bits from 1 to 6 (with 5 and 6 bit being the most commonly used forms even at that time). Later it was defined as the group of bits necessary to hold a character, that is 5 to 8 bits. With the advent of micro-computers in the late 1970s / early 1980s, this shifted towards meaning 8 bits by default. Therefore, byte is a platform-specific term.
Octet, however, was specifically defined to avoid the ambiguity of the term byte, and always means 8 continous bits, regardless of context and platform. That's why octet (rather than byte) is the term used in formal definitions of f.e. network protocols, in the telecommunication industry, etc.
Octad is a term similar to octet, however, it has fallen into disuse in recent decades and is not in common use any more. Like octet, it specifically means 8 bits as well, however, it looks at them from the angle of how many bits are necessary to define 129 to 256 states in coding (at least this is what I draw from the usage of similar terms like tetrads and pseudo-tetrads). Looking from that angle it appears as being don't care if those 8 bits holding the state are grouped together physically.
Octlet (per IEEE 1754) means 8 octets or 64 bits, so it is clearly different from octets.
--Matthiaspaul (talk) 12:31, 27 March 2017 (UTC)[]
  • oppose - No, they are not two names for the same thing, but names for different things. As a side note, the VFL instructions of Stretch could specify any byte size from 1 to 8 and 12 was a common byte size for CDC users. Shmuel (Seymour J.) Metz Username:Chatul (talk) 19:22, 28 March 2017 (UTC)[]
  • Oppose - There has been significant discussion of the distinction on the respective article talk pages. Bytes have not always been 8 bits. An octet is defined as 8 bits. Bytes are used in processors. Octets are used in communications. It is probably possible to cover both in a single article but that would not be a trivial merge and I'm no convinced that what we'd end up with would be an improvement over current coverage. If someone wants to create a sandbox version of a merged Byte article, I'd be happy to assess in more detail. ~Kvng (talk) 18:34, 9 April 2017 (UTC)[]
  • Support - I think the articles should be merged. It's true that byte has meant other things in the past, but the modern definition (according to the International System of Quantities) is 8 bits. Dondervogel 2 (talk) 18:50, 9 April 2017 (UTC)[]
  • Oppose - The terms don't always represent the same thing and are used for different purposes.Jko831 (talk) 19:37, 21 September 2017 (UTC)[]

Architectural support for byte sizes other than 8[edit]

Off the tope of my head, these machines come to mind as supporting byte sizes other than 8

  • CDC 3600 and 3800
    1 to 48 bits
  • DEC 36-bit machines
    1-36 bits
  • GE, Honeywell and Bull 36-bit machines
    6 or 9 bits
  • RCA 601
    3, 4, 6, 8 or 24 bits
  • UNIVAV and Unisys 36-bit machines
    6, 9, 12 or 18

Shmuel (Seymour J.) Metz Username:Chatul (talk) 18:41, 29 March 2017 (UTC)[]

Shmuel, do you actually mean bytes in this context? I recall the larger sizes to be called words. If you can, please provide some refs, it would be great if we could track this down to historic sources in order to improve the article.
--Matthiaspaul (talk) 00:37, 30 March 2017 (UTC)[]
Yes, I actually mean byte and both CDC and DEC used the word byte as part of the instruction names, e.g., Deposit Byte. Two easy citations from bitsaver are
  • 3600 Computer System Reference Manual (PDF), CDC, October 1966, 60021300
  • Book1 Programming with the PDP-10 Instruction Set (PDF), PDP-10 System Reference Manual, CDC, August 1969, 60021300
Shmuel (Seymour J.) Metz Username:Chatul (talk) 20:40, 3 April 2017 (UTC)[]
Another more recent example would be the Nintendo 64 with 9-bit bytes. 2003:71:CF10:FD00:A843:F00A:C1FE:7F1F (talk) 20:40, 23 September 2018 (UTC)[]

Status of error checking bits[edit]

The IBM 7030, for which the term byte was coined, did not include error checking bits as part of a byte. Nor did the DEC PDP-6, the CDC 3600, or any of the other computers with the ability to access bytes of various size. The System/360 Principles of Operation contains the text "Within certain units of the system, a bit-correction capability is provided by either appending additional check bits to a group of bytes or by converting the check bits of a group of bytes into an arrangement which provides for error checking and correction (ECC). The group of bytes associated with a single ECC code is called an ECC block. The number of bytes in an ECC block, and the manner in which the conversion or appending is accomplished depend on the type of unit involved and may vary among models." Accordingly, I call for the reinstatement of the text "The byte size designates only the data coding and excludes any parity or other error checking bits." Shmuel (Seymour J.) Metz Username:Chatul (talk) 21:30, 21 June 2017 (UTC)[]

I removed that statement because the lead is supposed to summarize the important points in the article body. The statement I removed, (a) is not covered in the body (b) may not be one of the more important points about the topic. I am not at all opposed to including this information in the article body and once that is stable, we can consider it for inclusion in the lead. ~Kvng (talk) 14:51, 24 June 2017 (UTC)[]

"Octad"[edit]

Is the origin unclear?

The article currently states:

"The exact origin of the term is unclear, but it can be found in British, Dutch, and German sources of the 1960s and 1970s, and throughout the documentation of Philips mainframe computers."

Surely this is just the eighth member of the sequence which starts "monad", "dyad", "triad", ie a group of eight things (looking toward Greek). "Octet" and "octad" appear similar because the Latin and Greek cardinal number 8 both have the same form (octō, ὀκτώ). Compare e.g. "quintet" vs "pentad" for a group of 5.

Of course the correct term would be an ogdoad (from the genitive of the ordinal) but not everyone who wants to use precise, technical language also knows Greek.

If the question is about who first used the term in its computing sense, that may be unanswerable because it probably slipped in from an earlier technical or mathematical sense. –moogsi(blah) 23:46, 23 October 2018 (UTC)[]

… representing a binary number[edit]

oRLY?
Anybody having a programming experience—even amateur—knows that bytes more frequently do not represent numbers (serving as opcodes, parts of bitmaps or compressed data…) than do represent numbers explicitly. Even for such complicated number format as IEEE 754 it wouldn’t be helpful to think of every isolated byte as of a sensible numerical value. Objections against complete removal? If any, then change to “are capable of representing a binary number” maybe? Incnis Mrsi (talk) 10:06, 27 July 2019 (UTC)[]

I'm fine with that. You can change it to “are capable of representing a binary numbers”. Vmelkon (talk) 02:10, 12 February 2021 (UTC)[]
I completely agree for the purposes of this article, the following is just being pedantic; Opcodes are a small part of the actual instruction, most of which is just a bunch of numbers that can be directly read on most architectures and for most instructions. For architectures that either limit immediates to 8 bits or actually are 8 bits, the bytes in program code are very likely to have direct meaning as numbers. Sure, you might have to read them as nibbles and mentally stick an "r" in front of them if the thought of assigning numbers to other numbers that represent offsets in sram adds too much confusion on top of the fact that you're trying to read raw instructions, or if you're on ARM calculate the sign bit from a bitwise operation on 3 bits spread around the T4 encoding of the branch instruction because the people designing the instruction set were high on horse tranquilizers that day, but they're numbers with direct meaning as such. The opcode itself is just a number too, although you kinda have to squint in some cases, but if you tell an x86 to 233 it's going to 233, damnit. :D --A Shortfall Of Gravitas (talk) 03:47, 5 August 2021 (UTC)[]

Unit Multiples[edit]

Is this really correct? "100 gigabytes is specified when the disk contains 100 billion bytes (93 gibibytes) of storage space." The whole section has no attribution, so I can't check it. But I had always gathered that the difference was available storage space on formatted vs un-formatted disk. — Preceding unsigned comment added by Tsuchan (talkcontribs) 12:42, 27 May 2020 (UTC)[]

This should not need attribution anymore, is not an opinion, but straight forward application of prefixes. But I clarified the statement in a more verbose form. Kbrose (talk) 15:08, 27 May 2020 (UTC)[]

Proposed merger[edit]

Moved from Talk:Gigabyte
 – jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:03, 14 October 2020 (UTC)[]

Is it absolutely necessary for kilobyte, kibibyte, megabyte, mibibyte, gigabyte, gibibyte, etc. to all have their own individual wikipedia pages that all say the exact same thing? Can't we put the information on one page, and have all those terms redirect to that one page? — Preceding unsigned comment added by 73.70.13.107 (talk) 10:33, 14 October 2020 (UTC)[]

Wikipedia has no problem with duplication and redundancy not least because it is not paper. As long as each article is properly sourced and notable, there's no reason to replace them all with one "mega sized" article. QuiteUnusual (talk) 11:52, 14 October 2020 (UTC)[]
@QuiteUnusual: The merge suggestion has merit precisely because the articles don't sum up to a massive article. They're 95% (99%?) the same article—just at different levels of development because some get more attention than others.
Clearly the hypothetical merge destination would not be Gigabyte, so I'm moving the discussion here to the talk page of Byte. I see that the IP editor making the original proposal tried to put this plan into motion already and selected Binary prefix for the redirect, but I think that may be confusing the thing with the name. Millilitre redirects to Litre § SI prefixes applied to the litre, not Metric prefix, and that strikes me as appropriate.
Either way, this is a change worth discussing first, and care will need to be taken to preserve the best citations, but yes, I think consolidation is a good way to deal with the existing mess of nearly identical articles. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:03, 14 October 2020 (UTC)[]
I just made an effort to gather some of the best content from the relevant articles and centralized it here. A lot of what was available is dated and esoteric (“In 2013, one expert estimated that the "amount of data generated worldwide" would reach 4 zettabytes by the end of the year”) and of questionable value.
The biggest concern now might be how redirects and disambiguation are handled. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 18:20, 10 December 2020 (UTC)[]
I made a earnest attempt scrape as many good bits from the 16 redirected articles as possible. Some genuinely interesting facts and citations existed in only one or two of the articles, so I’m hopeful that the consolidation will make those nuggets more findable to readers. There was a lot of cruft to sift through—and I had to make judgment calls—so if anyone wants to go through the trimmings for anything I overlooked, that would be welcome. Cheers —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 02:34, 11 December 2020 (UTC)[]
There has been no consensus for these actions, so I have reversed them. The articles have been more or less stable for years, with only occasional curiosity about a single article. There is not mess, and separate articles introduce no problems for WP policies or maintenance. It is good to have places for unit-specific content and that shows in some of the articles. kbrose (talk) 14:59, 11 December 2020 (UTC)[]
@Kbrose: I’m not arguing that the existence of the 16 articles I redirected violates policy, but policy violations are not the only reason for advantageous merging. I’m citing two reasons:
  1. Fragmentation dilutes editor attention. Byte is still rated C-class despite having existed since 2001. The other 16 articles are Start-class dumping grounds for poorly curated trivia. None of these 16 articles is improving at a rate we should be proud of. Take Kilobyte: it’s been something like 22 months since the last substantive improvement (it was yours, and I thank you!); most edits are just fighting entropy. When I tagged a dubious statement there in June, I ended up waiting through three months of silence before removing the statement. This discussion has existed for two months and received no feedback. The stability you cited is not a compelling argument for keeping bad content.
  2. Purposeful redundancy is okay; gratuitous redundancy is not. I’ve noted above that Litre and its many prefixed variants are successfully consolidated in one article. The same is true for Newton, Decibel, and most units. Is there some good reason that this model isn’t appropriate for Byte? I will note that Gram exists along side Kilogram and Metric tonne but not an exhaustive set of articles, so there is precedent for a middle ground. I don’t see any justification for Kilobyte’s existence within the current version of the article, but it’s not impossible for me to imagine. I doubt Zebibyte will ever need a standalone article.
I don't think there was a truly compelling reason to revert my edits en masse, and I think we should re-implement the redirects, but I’m open to considering the 16 affected articles on a case-by-case basis. Making Byte a better article, though, should be the highest priority. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 16:43, 11 December 2020 (UTC)[]
I don't have any strong objections to a merge but clearly some more discussion needed. A single article is fairly standard in other cases, although in this case the prefixed units, especially kB and MB, are the ones in more common use which could muddy the issue of an article title. Perhaps formally tagging the articles for merger would generate a little more interest in this discussion. Some musings on a talk page for a few weeks, with no explicit support and some scepticism, is hardly good support for a fairly dramatic action. A formal merger proposal without clear opposition, on the other hand, could be interpreted as consensus. Lithopsian (talk) 15:12, 11 December 2020 (UTC)[]
Just a pre-vote, but I would definitely support merging anything about say, petabyte. Units not in widespread usage, ones that most people wouldn't recognise, are hardly notable in themselves. Lithopsian (talk) 15:15, 11 December 2020 (UTC)[]
The trend is usually to divide content to more specific topics. Even the small articles in this series have value, because they quickly point the user to a specific definition without having to sort through a lot of information. This has become more important since automated services and devices exist, increasingly, such as the Google assistants and apps, that pull up specific definitions from WP for key words and topics and read them to the user. This is also a good reason to not bunch parenthesized comments right after the key word or article title with lots of pronunciations and in this case unit symbols. Short, clear, specific sentences have more impact. kbrose (talk) 15:36, 11 December 2020 (UTC)[]
There are lots of cases where [[foo]] is a redirect to [[bar#baz]]. — Preceding unsigned comment added by Chatul (talkcontribs)
I’m also a big believer in making the first two sentence count for all they can because they’re what Google’s Knowledge Graph harvests and regurgitates. I wouldn’t go so far, though, as to agree that Wikipedia needs to adopt dictionary-like fragmentation just to accommodate Google. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 16:50, 11 December 2020 (UTC)[]
The Litre, Newton, Decibel, are convincing examples, and, per consistency, I am leaning for a merger. The pages were so similar that creating the 16 pages looked merely as a programming exercise about automated article writing. And fragmentation is trully a editor waste of time: I recently made manually 14 similar edits to see them all reverted at once. QuiteUnusual reply has given no reason why to keep separate articles: stating that there was no reason to merger is not a reason. Neither has kbrose, stating that it was the status quo and that dividing was trendy. So, I am for a merger. I would also add that, giving the user a complete page is like teaching to fish, while the single article for each suffixed unit is like giving one a fish. Teaching is encyclopedic. --Robertiki (talk) 03:37, 12 December 2020 (UTC)[]
In addition to Metre, we have separate articles for Kilometre, Millimetre, Micrometre, Nanometre, Picometre and Femtometre. While I can see a case for keeping Kilometre and (maybe) Millimetre, all others seem frivolous to me because so much of the information is duplicated (making them difficult to maintain), and we could better serve our readers by redirecting those to Metre. The same reasoning applies to multiples of byte. I suggest the following: Let's make a single Byte article that addresses the concerns raised against merging, and then review whether we still need Kilobyte, Megabyte, etc? Dondervogel 2 (talk) 09:46, 12 December 2020 (UTC)[]
@Dondervogel 2: Do you have specific suggestions for changes to Byte you would want to see implemented before a merger? I think all concerns raised thus far are either about procedure or the relative benefits of keeping the diaspora of articles so as to better mesh with Google (and maybe Wikidata?). As I’ve said, I’m interested in making Byte better, so I’m interested in ideas for improvements. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:41, 12 December 2020 (UTC)[]
Moved to #Addressing ambiguous definitions of megabyte and gigabyte
My two cents: we should keep a particular multiple split when it has enough examples, history, etc that justifies a separate article. Example: "Apollo Guidance Computer computer had Kilobytes of RAM memory". If the multiple has only a couple of those examples, then that's a case for merge; if it has 4~5 (or more), plus a history with more than a paragraph, etc, then that's a case for split. In my opinion. Imagine having KiB, MiB, TiB each one with its own example, history, etc all in the same article; it'd be a mess. And I'd guess that there's a lot of encyclopedic history to be told in many (if not all) of the multiples (at least up to Terabyte), considering the rapid evolution and the impact of digital systems in human history.
For maintainability issues, we can try the template {{excerpt}} in case it's not being used already (I didn't check). Feelthhis (talk) 16:17, 12 December 2020 (UTC)[]
There is another way of reaching the intended purpose this discussion. I would support developing one article into a full treatment of all units and their histories and relationships, but at the same time keeping minimal versions of each separately, consisting only of a concise definition (to be used in Google fact finds) and a link to the general description of the whole set. They would not be redirects, but bare definitions with a reference for background. −Woodstone (talk) 16:24, 12 December 2020 (UTC)[]
A lone definition with…what? A warning to editors not to add more? Is there any precedent for this hybrid organization? —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 17:01, 12 December 2020 (UTC)[]

A week has passed since the merger and subsequent reversion, and several days have passed since the last contribution to this conversation. My takeaways are these:

  1. For all eight binary units and the five decimal units larger than Gigabyte, there seems to be clear support for redirecting.
    • The primary argument for retention of all 17 articles is to establish a definition for each word. This justification is at odds with point 1 at WP:NOT#DICT, and there seems to be a general consensus among conversation participants that this is not the right approach.
    • Dondervogel 2’s primary concern was that the expanded section of Byte be up to the task of clearly explaining the multiple systems. Dondervogel 2 and I tag-teamed to improve that section significantly, and I think it’s currently in much stronger shape than the corresponding section in any of the 17 articles has ever been.
  2. For Kilobyte, Megabyte, and Gigabyte, there seems to be a recognition that the articles are problematic but not a consensus on a solution.
    • The strongest argument for not merging everything to Byte, raised by Lithopsian, is that in everyday life, kB and KB are more common than B.
      • This could justify retaining two articles at Byte and at Kilobyte OR retaining a single consolidated article at Kilobyte instead of at Byte. I take the former option more seriously because Byte remains the best article title. Its definition is settled, and the word itself is a constituent part of all the unit names (not just ‘Kilobyte’ and ‘Megabyte’ but ‘Kibibyte’ and ‘Mibibyte’ too). The average reader is going to be able to best understand the relationship between units when the article’s starting point is the base unit.
    • Feelthhis noted that {{excerpt}} might be able to help keep quality up in satellite articles.

Unless it is felt that formal proposal is genuinely needed, I intend to re-implement the 13 supported redirects in the next day or so, to keep improving Byte, and to give further consideration to what form Kilobyte, Megabyte, and Gigabyte might take to best serve Wikipedia’s readers. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 13:50, 17 December 2020 (UTC)[]

While not objecting to jameslucas's proposal, I wonder whether we need two more articles Decimal multiples of byte and Binary multiples of byte, that would each describe a subset of multiples as indicated by their titles. I also wonder whether a similar exercise is needed for Kilobit, Megabit, Kibibit etc. Dondervogel 2 (talk) 14:10, 17 December 2020 (UTC)[]
I don't think creating articles with titles that are unlikely search terms helps Wikipedia users as much as it might satisfy us. It also makes it less obvious where kilobyte, megabyte, etc. should redirect. Lithopsian (talk) 15:26, 17 December 2020 (UTC)[]
Again, not an objection so far as this proposal goes. I think we may be underestimating the purpose and value of redirects here. Whether the content for, say kilobyte, is in byte or a separate article called kB or whatever may cause sleepless nights for us editors but is of little consequence to the average reader. They can search, link, or otherwise open something about "kilobyte" and will get information about kilobyte, albeit in an article titled "byte". Bolding of common synonyms in the lead, or giving them sections or anchors, avoids confusion and everyone is happy. Separate articles are only really needed when there is sufficient distinct text to make that article useful and where it would otherwise overload or unbalance the parent article, or where multiple semi-distinct topics just make a single article unwieldy. I'm not convinced any of these cases apply here. The article is not overly huge and the child articles contain little distinct information, largely repeated across each one, and easily mergeable. Perhaps they could be expanded, but the distinct information in each one at this point would barely be worth a section in a parent article. Perhaps the question to ask is "if they were all in a single article today, would we want to do a WP:SPLIT?" Nothing to stop them being split again in the future if it becomes useful. Lithopsian (talk) 15:26, 17 December 2020 (UTC)[]
JamesLucas, in your first attempt of merging I notice that, from all the content from Exabyte#Usage examples and size comparisons, you brought only the phrase "global monthly Internet traffic in 2004" to your merge. Is this how things will be handled? If that's going to be the tone of this process, then I suggest starting a deletion discussion prior the removal, for respect to the editors who put their time and effort writing all the valuable content that is to be removed. Please take this into consideration before starting the process. Feelthhis (talk) 00:05, 18 December 2020 (UTC)[]
@Feelthhis: It’s undoubtedly helpful to have a few real-world examples to illustrate for the reader the relative sizes of these units, which is why I created Byte § Practical examples and harvested the best examples I could find. Dondervogel 2 has already improved it, and I’ll keep trying to expand it. (And I’d be open to adding more tiers—10 kB, 100 kB, 10 MB, etc.—if you think it’d help readers.)
With that said, I think it’s essential to observe that the vast majority of examples and comparisons present in the 16 articles are not about the units they supposedly illustrate. The average reader is not going to be familiar with “DARPA's ARGUS-IS surveillance system”, so the fact that it could—in 2014—“stream 1 exabyte of high-definition video per day” cannot possibly help most readers understand the size of an exabyte (and those it would help probably don’t need that help). Many of the examples, including the section Exabyte § Library of Congress, are arguably worse because they are dealing with amounts of data an order of magnitude or more different from the example they are supposedly illustrating.
It’s a bit surprising to me that so much of this trivia was allowed to accumulate, and now that it’s thousands of bytes deep, I appreciate that its removal seems dramatic, but I don’t think most editors who spent some time reviewing it fact-by-fact with an eye towards its purpose within the article would deem the content appropriate or its removal controversial. I made a serious effort to gather the best examples, invited others to double-check my work, and in the course of engaging in this discussion have spent additional time with the material on the chopping block. I don’t think it’s defensible. I’m willing to dot the is on this process if it’s judged necessary, but I hope not to. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 01:29, 18 December 2020 (UTC)[]
@JamesLucas: I notice that for Gigabyte you used the example "about half an hour of video". But... just "video"? In a topic about bits and bytes? With all due respect, all examples from Gigabyte are objectively better and were all going to be deleted (in fact, if it wasn't for Kbrose rescuing them, they were all deleted). Do you mind if I invite users Kbrose and QuiteUnusual to this discussion? I hope not. The way I see it now, too many good content will be lost and from now on my position is against the mass merging/deletion.
The non encyclopedic material (for instance outdated trivia) is best handled in a case-by-case manner instead of a mass deletion across multiple articles, in my opinion. It's like cancer, you want to remove the bad (cancerous) stuff and keep all the good (healthy) stuff. This mass merging/deletion will remove the bad stuff and all the good stuff. Feelthhis (talk) 03:53, 18 December 2020 (UTC)[]
The video example for gigabyte was added by me (replacing a self-reference to Wikipedia). The source says "2 GB per hour of video (varies greatly)" in the context of a 4.7 GB DVD. I agree it's weak and can be removed as far as I'm concerned. Dondervogel 2 (talk) 08:13, 18 December 2020 (UTC)[]
No one should be excluded from this conversation. I’ve pinged both Kbrose and QuiteUnusual at least once each in the course of this, and I hope I’ve been clear that further curation of examples is being explicitly requested. The cancer analogy, though, I find inapt. If we agree that these “bodies” are now valued primarily as organ donors, it’s probably better for organ-hunting purposes to move them to the freezer intact than to carve them up and in doing so create obfuscating layers of history states. Unlike organic organs, these word organs are still viable after being declared “dead” for a while. (Granted, I’d probably weigh the pros and cons differently if I thought there were many good organs left to be found.) —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 12:39, 18 December 2020 (UTC)[]

I have redirected the articles for binary units, which contain none of the content being discussed yesterday. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:15, 19 December 2020 (UTC)[]

I have redirected the articles for decimal units greater than Gigabyte after giving them one more comb through for not-yet-harvested informative elements and finding none. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 02:00, 21 December 2020 (UTC)[]

Addressing ambiguous definitions of megabyte and gigabyte[edit]

Moved from #Proposed Merger
I think some of the information from individual articles needs to be copied across before those individual articles can be replaced with re-directs. I imagine there are multiple examples (I've not carried out an audit or check of any kind) but one that springs to mind is the fact that there exist (or at least have existed) 3 different definitions of "megabyte". Another concern is a false impression in the text of symmetry between decimal and binary definitions of kilo, mega, ... yotta. (Only the table hints at the fact that only the first 3 have binary definitions, whereas all of them are decimal). Dondervogel 2 (talk) 15:33, 12 December 2020 (UTC)[]
The 3½-inch floppy’s “1.44 MB” seems like a marketing simplification rather than a third definition; I think the way is currently presented in Megabyte, as a definition of equal relevance, is misleading and not to be emulated. It could be retained as a footnote, but that little bit of history more properly belongs at Floppy disk (and, yes, it’s there). —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 17:01, 12 December 2020 (UTC)[]
@JamesLucas: If the information about the history of "megabyte" is hidden in Floppy disk the reader would need to be told (in Byte) where to find it. But I actually believe the information should not be hidden there. Much better to have a section about the meaning of the word "megabyte" either in Megabyte itself or (if re-directed here) in Byte. The evolving meaning of related terms (kilobyte, gigabyte, terabyte ...) is also not apparent from a re-direct to Byte. Dondervogel 2 (talk) 09:35, 13 December 2020 (UTC)[]
@Dondervogel 2: Maybe I’m missing part of the story? My understanding is that the ‘1.44 MB’ label was a one-off marketing anomaly rather than something that precipitated an evolution of meaning—a funny footnote remembered by few of us and having no relevance to the current definitions of ‘byte’, ‘kilobyte’, or ‘megabyte’. Correct me if I’m wrong! —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:45, 13 December 2020 (UTC)[]
@JamesLucas: I don't think it can be dismissed as a one-off anomaly. See p17 of Hale & Stanney (2014).[1]. I see it more as an ongoing symptom of a deeper malaise, namely the ambiguity of KB, MG, GB ... This ambiguity continues (and will continue) for as long as there are two different interpretations of each prefix, and the ambiguity increases with increasing order of the anomaly because while MB can be interpreted 3 different ways, for GB (either 1000 MB or 1024 MB) there are 4 different interpretations, and so on. Dondervogel 2 (talk) 16:54, 13 December 2020 (UTC)[]
@Dondervogel 2: If the muddling is commonplace, that would being worth mentioning more prominently than I suggested. In the Google Books preview of the Handbook page 17 is a list of references, so I’m not sure I’m seeing what you’re suggesting I see. If it’s another example besides the 3½-inch floppy, that’d be great. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 22:11, 13 December 2020 (UTC)[]
I'm not arguing that particular use is widespread, only that the ambiguity is. The statement in question (paraphrasing a little) reads "[MB] may mean 1000x1000 bytes or it may mean 1024x1024 bytes, or even 1000x1024 bytes", and seems to be on p24 of the version your link points to (search for "1024" and you'll find it). Dondervogel 2 (talk) 22:43, 13 December 2020 (UTC)[]
Here's another example. Rata (2009)[2] defines kilobyte as either 1000 bytes or 1024 bytes, megabyte as either one million bytes or 1024 kilobytes (with 3 possible interpretations) and gigabyte as either one billion bytes or 1024 megabytes (4 interpretations). It's the same problem, getting worse at each step increment in the exponent. Dondervogel 2 (talk) 23:09, 13 December 2020 (UTC)[]

References

  1. ^ Hale, K. S., & Stanney, K. M. (Eds.). (2014). Handbook of virtual environments: Design, implementation, and applications. CRC Press.
  2. ^ Raţă, G. (Ed.). (2009). Language education today: between theory and practice. Cambridge Scholars Publishing.

New material added on 28 Jan 2020[edit]

@KiridaSenpai: I just reverted (for the second time) the addition of a large amount of unreferenced material. Addition of new material is welcome if it is backed up by reliable sources. If you wish to reinstate this material, please read WP:BRD and then gain consensus for the change by discussing it here. Dondervogel 2 (talk) 10:16, 29 January 2021 (UTC)[]

Call for practical examples[edit]

I’m hoping that someone with more technical knowledge than I possess could help source and/or calculate better practical examples for the table. I’ve tried to find published, understandable-to-the-layperson examples, but they are surprisingly hard to come by. Trying to make an example for ‘terabyte’ a few weeks ago, I looked up file sizes of H.264-encoded 1080p video, and a number of independent sources said 30 hours should be very close to 1 TB. Then I went searching for chunk of video 30 hours long (aiming for something that I had heard of despite having never seen), and I found Avatar: The Last Airbender. I see that Canucka has today calculated a substantially different file size for the same video data. I’m very glad that my amateur calculations are being scrutinized, and there are enough factors (encoding options, aspect ratio, compression of animation vs compression of live action, etc.) that I can believe that my best attempts may have been off by 200+%.

FWIW, I really like the introduction of the footnote, which allows the inclusion of a check-our-math explanation without burying the everybody-gets-it conclusion. This also helps us steer clear of WP:OR since the only original work is the basic math, which is fine as long as the inputs are verifiable. The newly revised table entry is too jargon-heavy, so it’d be good if the next example is something where we don’t feel compelled to mention the aspect ratio for instance. Cheers —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 20:36, 24 February 2021 (UTC)[]

Video is particularly bad as an example for this as most of it is variable-rate based on content and encoders differ heavily in quality; there are also multiple ways of encoding the same data. As a quick recent example from my own re-encodings of terrible movies from UHD Bluray for streaming over my NAS and adding in Rifftrax commentary, with 10bpc H.265 one can encode with a "constant rate factor" mode that attempts to maintain a steady quality level based on an arbitrary number. The size of the output video varies massively based on content. The first of the spastic Michael Bay transformers movies is 143 minutes and weighed in at ~15GB re-encoded from ~70GB on the original disk. The last Avengers movie, around 50GB from disk, and 181 minutes long, which consists of a surprising number of fairly still scenes and talking heads (so very little motion), came in at 2.6GB at one step higher quality and I ended up re-encoding again at 4 steps higher quality because I couldn't quite believe the compression would have gone that well and it was closer to 5GB... once I watched marvel characters sitting around talking for most of 3 hours the compression made sense. It's also completely possible to re-encode movies with naive constant bit rate compression so that they're both larger than the originals and lose quality in some scenes (which required a burst of higher bit-rate than the encoder was set to for that particular scene). You won't really find a good "average" for video like this because there isn't any. 1TB for 30 hours sounds like it's roughly based on the encoding settings used for 1080p Bluray (including audio), which was often encoded at gigantic sizes purely to eat up space and make it difficult for people to deal with on home computers (or it was back when bluray was new, anyway), and may contain huge lossless audio files in multiple languages that people are accidentally factoring into their numbers for the actual video size.
You'll not find many published examples because of this... and I'm talking about two movies at around the same resolution (aspect ratio differs), the same type-ish of source (Disney does cripple their UHD Bluray releases by not including DolbyVision which their movies are mastered in, but this has been accounted for in the sizes above by subtracting both the enhancement layer size and DV metadata sizes from the transformers movie original and re-encode.) Maybe a good place to look would be Sony's camera manuals? I don't shoot video but as I recall all the Sony a7 series manuals list approximated video time / SD card size / selected video resolution and bitrate and that's about as good as you'll get (and they still include the warning that the times vary heavily based on content being filmed). --A Shortfall Of Gravitas (talk) 04:43, 5 August 2021 (UTC)[]

"PiB" listed at Redirects for discussion[edit]

Information.svg A discussion is taking place to address the redirect PiB. The discussion will occur at Wikipedia:Redirects for discussion/Log/2021 May 20#PiB until a consensus is reached, and readers of this page are welcome to contribute to the discussion. ~~~~
User:1234qwer1234qwer4 (talk)
10:55, 20 May 2021 (UTC)[]

wrongly deleted[edit]

IMHO the way I wrote it, the byte as information, is definitely much clearer when in the context of binary "digits" or flags (and corresponding hexadecimal digits). As it is written now you have many many words surrounding the concept and not touching it. So my edit is definitely in place. Please user:Dondervogel 2 next time discuss before deleting work that your fellow wikipedian put time and effort into.

Here is the deleted section:

Hexadecimal and binary representation: Byte values can be easily represented with hexadecimal digits. Since 4 set bits correspond to the hexadecimal digit F, every four-bit byte value is easily written as a single hexadecimal digit value, and the hexadecimal value of each digit can easily be translated back into its four bit binary value. Thus an 8 bit byte can be read as two 4 bit bytes, each represented by a single hexadecimal digit. So for example with an 8-bit byte, hexadecimal FF is the maximum value with all bits set (corresponding to decimal 255) and hexadecimal 10 is easily translated as binary 0001_0000 (corresponding to decimal 16).

I also added some short captions so that one can trudge through all the wording:

Byte size: The size of the byte has historically been ...

The 8 bit standard: The modern de facto standard of eight bits...

The unit symbol B: The unit symbol for the byte was designated as...

Thanks in Advance, Moshe aka פשוט pashute ♫ (talk) 11:51, 24 June 2021 (UTC)[]

Work is rarely ever deleted; all of it is still archived in the page history. Per the BOLD, revert, discuss cycle, it is actually good to first revert and then discuss. Anyway, I do think that a paragraph about hexadecimal representation would be good to have, since bytes are often represented that way. However, I think it should not be put in the lead, since the lead section is supposed to be merely a summary of the article, and this is a bit too specific to be a summary. Lastly, captions are normally not applied to individual paragraphs. That makes it look more like some sort of glossary, which would not really be appropriate here. (also pinging involed editor Dondervogel 2)Jochem van Hees (talk) 12:22, 24 June 2021 (UTC)[]
If there is consensus for it, I would not object to the paragraph being reinserted further down the article, but please not in the lead. I do think it would require a little more context to explain the difference between the byte as a unit of storage, and a byte of information. Dondervogel 2 (talk) 12:52, 24 June 2021 (UTC)[]
First, the text following Hexadecimal and binary representation: doesn't discuss binary representation, so the head is misleading. Second, neither the head nor the text mention octal representation, which is both important historically and still, alas, in use for octets. Third, as Dondervogel mentioned, it doesn't belong in the lede. --Shmuel (Seymour J.) Metz Username:Chatul (talk) 13:27, 24 June 2021 (UTC)[]
My main point was that the wording in the lead just doesn't even explain what a byte is. Rather it starts "arguing with itself" on side details like what the standard representation is or what the symbold is. So what I will do is simply move all that too out of the lead and create a short and clear explanatory lead. My explanation through representation, which has EVERYTHING to do with the subject and NOTHING to do with computers, as OPPOSED to the way it is now in the lead, IMHO was better. But, since there are already three who seem to agree that I'm wrong, I will not argue but rather put up an alternative simple explanation that DOES give the definition perhaps with an example. פשוט pashute ♫ (talk) 21:26, 14 July 2021 (UTC)[]
It is not "arguing with itself", it is explaining that there is no standard definition for a byte. Given that one of the functions of the lead is to define the subject, this seems very appropriate to me. ―Jochem van Hees (talk) 11:33, 16 July 2021 (UTC)[]

Initial description: not quite on point?[edit]

The first sentence of the lead is:

"The byte is a unit of digital information that most commonly consists of eight bits."

This treats the byte as a unit of storage or information, and nothing else. However, use of the term normally relates to how the data is constructed, and in particular that it is a sequence of eight bits that are grouped together. (See Byte | Merriam-Webster.) For example, when we refer to a byte of computer memory, we usually specifically mean one of the 8-bit groupings of memory storage that are addressed together using a single byte address, not individual bits that might be scattered in arbitrary locations. I think it makes sense to put the emphasis on this meaning in this article, and to have the definition as a unit (measure of amount) of storage capacity as a derived meaning. —Quondum 16:52, 29 June 2021 (UTC)[]