Byte: Difference between revisions

Prefixes for decimal and binary multiples
Decimal
Binary
	—
	—
	v; t; e;

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 10:58, 4 November 2009

A byte (Template:Pron-en) is a unit of information storage in computers. It is an ordered collection of bits, with each bit denoting a single binary value of 1 or 0. The byte is the basic addressable element in many computer architectures. The size of a byte is typically hardware dependent, but with the 8-bit byte being a modern de facto standard. Some factors behind this particular size may be the IBM System/360, introduced in the 1960s, and the 8-bit microprocessors, introduced in the 1970s. There is no formal definition however, and other sizes have been used in various computers historically. The term octet is widely used as a more precise synonym where ambiguity is undesirable (such as in protocol definitions, for example).

Length

Architectures that did not have eight-bit bytes include the CDC 6000 series scientific mainframes that divided their 60-bit floating-point words into 10 six-bit bytes. These bytes conveniently held character data from punched Hollerith cards, typically the upper-case alphabet and decimal digits. CDC also often referred to 12-bit quantities as bytes, each holding two 6-bit display code characters, due to the 12-bit I/O architecture of the machine. The PDP-10 used assembly instructions LDB and DPB to load and deposit bytes of any width from 1 to 36-bits — these operations survive today in Common Lisp. Bytes of six, seven, or nine bits were used on some computers, for example within the 36-bit word of the PDP-10. The UNIVAC 1100/2200 series computers (now Unisys) addressed in both 6-bit (Fieldata) and nine-bit (ASCII) modes within its 36-bit word.

Factors behind the ubiquity of the eight bit byte include the popularity of the IBM System/360 architecture, introduced in the 1960s, and the 8-bit microprocessors, introduced in the 1970s. The term octet unambiguously specifies an eight-bit byte (such as in protocol definitions, for example).

History

The term byte was coined by Dr. Werner Buchholz in July 1956, during the early design phase for the IBM Stretch computer.^[1]^[2]^[3] Originally it was defined in instructions by a 4-bit byte-size field, allowing from one to sixteen bits (the production design reduced this to a 3-bit byte-size field, allowing from one to eight bits to be represented by a byte); typical I/O equipment of the period used six-bit bytes. A fixed eight-bit byte size was later adopted and promulgated as a standard by the System/360. The term byte comes from bite, as in the smallest amount of data a computer could bite at once. The spelling change not only reduced the chance of a bite being mistaken for a bit, but also was consistent with the penchant of early computer scientists to make up words and change spellings. A byte was also often specifically qualified as an 8-bit byte, reinforcing the notion that it was a tuple of 8 bits, and that other sizes were possible.

A contiguous sequence of binary bits in a serial data stream, such as in modem or satellite communications, which is the smallest meaningful unit of data. These bytes might include start bits, stop bits, or parity bits, and thus could vary from 7 to 12 bits to contain a single 7-bit ASCII code.
A data type in certain programming languages. The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values. Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits^[4]^[5]. The actual number of bits in a particular implementation is documented as CHAR_BIT as implemented in the limits.h file. Java's primitive byte data type is always defined as consisting of 8 bits and being a signed data type, holding values from −128 to 127.

Early microprocessors, such as Intel 8008 (the direct predecessor of the 8080, and then 8086) could perform a small number of operations on four bits, such as the DAA (decimal adjust) instruction, and the half carry flag, that were used to implement decimal arithmetic routines. These four-bit quantities were called nybbles, in homage to the then-common 8-bit bytes.

Historical IETF documents cite varying examples of byte sizes. RFC 608 mentions byte sizes for FTP hosts (the FTP-BYTE-SIZE attribute in host tables for the ARPANET) to be 36 bits for PDP-10 computers and 32 bits for IBM 360 systems.^[6]

Unit symbol or abbreviation

IEEE 1541 and Metric-Interchange-Format specify B as the symbol for byte (e.g., MB means megabyte), while IEC 60027 seems silent on the subject. Unfortunately, B is also used for bel, another unit used in the same field. The use of B to stand for bel is consistent with the metric system convention that capitalized symbols are for units named after a person (in this case Alexander Graham Bell); usage of a capital B to stand for byte is not consistent with this convention. However, there is little danger of confusion because the decibel (dB) is used almost exclusively for bel measurements, while the decibyte (1/10 of a byte) is never used.

The unit symbol KB is commonly used for kilobyte, but is often confused with the use of kb to mean kilobit. IEEE 1541 specifies b as the symbol for bit, however, the IEC 60027 and Metric-Interchange-Format specify bit (e.g., Mbit for megabit) for the symbol, achieving maximum disambiguation from byte.

The lowercase letter o for octet is a commonly used symbol in several non-English-speaking countries, and is also used with metric prefixes (for example, ko and Mo).

Today the harmonized ISO/IEC IEC 80000-13:2008 - Quantities and units -- Part 13: Information science and technology standard cancels and replaces subclauses 3.8 and 3.9 of IEC 60027-2:2005 (those related to Information theory and Prefixes for binary multiples). See Units of Information for detailed discussion on names for derived units.

Unit multiples

See also: Binary prefixes

Linearly growing percentage of the difference between decimal and binary interpretations of the unit prefixes when plotted against the logarithm of storage size.

There has been considerable confusion about the meanings of SI (or metric) prefixes used with the unit byte, especially concerning prefixes such as kilo (k or K) and mega (M) as shown in the chart Prefixes for bit and byte. Since computer memory is designed with binary logic, multiples are expressed in powers of 2, rather than 10. The software and computer industries often use binary estimates of the SI-prefixed quantities, while producers of computer storage devices prefer the SI values. This is the reason for specifying computer hard drive capacities of, say, 100 GB, when it contains 93 GiB of storage space.

While the numerical difference between the decimal and binary interpretations is small for kilo and mega prefixes, the difference grows to over 20% for the yotta prefix, illustrated in the linear-log graph (see right) of difference versus storage size.

References

^ Origins of the Term "BYTE" Bob Bemer, accessed 2007-08-12
^ TIMELINE OF THE IBM STRETCH/HARVEST ERA (1956–1961) computerhistory.org, '1956 July ... Werner Buchholz ... Werner's term "Byte" first popularized'
^ byte catb.org, 'coined by Werner Buchholz in 1956'
^ [26] Built-in / intrinsic / primitive data types, C++ FAQ Lite
^ Integer Types In C and C
^ RFC 608, Host Names On-Line, M.D. Kudlick, SRI-ARC (January 10, 1974)

[1] Origins of the Term "BYTE" Bob Bemer, accessed 2007-08-12

[2] TIMELINE OF THE IBM STRETCH/HARVEST ERA (1956–1961) computerhistory.org, '1956 July ... Werner Buchholz ... Werner's term "Byte" first popularized'

[3] yte catb.org, 'coined by Werner Buchholz in 1956'

[4] [26] Built-in / intrinsic / primitive data types, C++ FAQ Lite

[5] Integer Types In C and C

[6] RFC 608, Host Names On-Line, M.D. Kudlick, SRI-ARC (January 10, 1974)

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 1: / Line 1: @@
 {{about|the information storage unit}}
-A ''byte'' ({{pron-en|ˈbaɪt}}) is a unit of [[Computer storage|information storage]] in computers. It is an ordered collection of [[bit|bits]], with each bit denoting a single [[binary numeral system|binary]] value of 1 or 0. The byte is the basic [[address space|addressable]] element in many [[computer architecture]]s. The size of a byte is typically hardware dependent with the [[8-bit]] byte being a modern ''[[de facto standard]]''. Some factors behind this particular size may be the IBM [[System/360]], introduced in the 1960s, and the 8-bit [[microprocessor]]s, introduced in the 1970s. There is no formal definition however, and other sizes have been used in various computers historically. The term [[Octet (computing)|octet]] is widely used as a more precise synonym where ambiguity is undesirable (such as in [[Protocol (computing)|protocol]] definitions, for example).
+A ''byte'' ({{pron-en|ˈbaɪt}}) is a unit of [[Computer storage|information storage]] in computers. It is an ordered collection of [[bit|bits]], with each bit denoting a single [[binary numeral system|binary]] value of 1 or 0. The byte is the basic [[address space|addressable]] element in many [[computer architecture]]s. The size of a byte is typically hardware dependent, but with the [[8-bit]] byte being a modern ''[[de facto standard]]''. Some factors behind this particular size may be the IBM [[System/360]], introduced in the 1960s, and the 8-bit [[microprocessor]]s, introduced in the 1970s. There is no formal definition however, and other sizes have been used in various computers historically. The term [[Octet (computing)|octet]] is widely used as a more precise synonym where ambiguity is undesirable (such as in [[Protocol (computing)|protocol]] definitions, for example).
 ==Length==

v t e Units of information
Platform-independent units	bit hextet octet
Platform-dependent units	nibble byte syllable word
Metric bit units	kilobit megabit
Metric byte units	kilobyte megabyte gigabyte

v t e Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Top type Type class Empty type Unit type Void
Related topics	Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable