Talk:Endianness/Archive 7
This is an archive of past discussions about Endianness. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 5 | Archive 6 | Archive 7 | Archive 8 | Archive 9 |
XDR
there is no formal standard for transferring floating point values between diverse systems There is External Data Representation as a standard for transferring data, including floating point, between diverse systems. Maybe not quite as popular is it should be. Gah4 (talk) 19:18, 29 September 2016 (UTC)
- Well, XDR just falls back on IEEE 754 for floating point, which indeed doesn't specify endianness. Perhaps you might do some rewording. --jpgordon𝄢𝄆 𝄐𝄇 03:57, 30 September 2016 (UTC)
- IEEE 754 doesn't define endianness, but XDR does. In either fixed or floating point, XDR is big endian, even when transferring between two little endian hosts, and IEEE 754 even if one or both hosts don't use that format. But mostly, I consider XDR a formal standard. I haven't thought of new wording yet, though. Gah4 (talk) 05:13, 30 September 2016 (UTC)
This is the way a hexdump is displayed: because the dumping program is unable to know what kind of data it is dumping,
The VAX/VMS (and maybe other VMS) DUMP program displays ASCII characters with the address increasing to the right on each line, and hex data with the address increasing to the left. That is, it knows numeric vs. character data. Gah4 (talk) 03:11, 31 October 2016 (UTC)
- No, it doesn't. It just dumps all of the requested locations. Bytes that happen to contain printable ASCII characters are displayed that way, and hex data is displayed as makes sense for little-endian because that's how the VAX implements binary integers. But the DUMP program does not "know" which bytes will be interpreted as which sort of data. If it did then it would not display a longword containing 0x64636261 as ABCD in the text part of the display.
- And yes, VMS on the Alpha does the same thing. ia64, I dunno. Jeh (talk) 04:17, 31 October 2016 (UTC)
- Yes, I put the knows in quotes because, as you note, it doesn't actually know. But people would be really surprised if the ASCII data printed backwards. Another explanation is that it assumes that numbers are little endian, and text is big endian. Now, there is still Unix's od -x which prints hexadecimal 16 bit words, so they come out in the wrong order as sequential bytes on little endian machines. Gah4 (talk) 05:18, 31 October 2016 (UTC)
- I have an RX2600. If I renew the hobbyist license, I could test it out. Gah4 (talk) 05:18, 31 October 2016 (UTC)
Recommendation for new section: Endianness in natural language
I'd like to see a section on endianness in natural language, namely a survey of world languages accompanied by notes re: any patterns that appear to apply worldwide or region-wide, but I am not familiar enough with a wide enough variety of the world's languages to draw any conclusions myself. My guess is that this section would benefit (even more than usual) from having a subject-matter expert involved from the start.
Here are my (non-expert and possibly wrong) observations so far:
- Numbers:
- Spoken and transcribed from speech:
- Uniformly big-endian:
- Japanese, at least from what I remember from martial-arts classes
- Korean, [ditto]
- Mixed throughout based on place order alone, i.e., regardless of value within place:
- German, following most-least-[and-]middle value-place order within each set of three value places (corresponding to each group separated by short-scale commas when numerals are used), i.e., hundreds-ones-[and-]tens order both for three-value-place numbers and for designations of value in higher places, as in siebenhundertsechsundneunzigtausenddreihundertvierundzwanzig (796,324)
- (possible) Early Modern and older English: Does the "four and twenty blackbirds baked in a pie" pattern apply when the 24 value units being enumerated are thousands, millions, short-scale billions / long-scale milliards, short-scale trillions / long-scale billions, etc.?
- Mixed throughout when certain places have some values but not when they have others:
- "Modern" Modern English, following most-least-middle order within each set of three value places when middle value is one but most-middle-least order (i.e., "pure" big-endianness) otherwise; contrast "eight hundred fifteen thousand, four hundred seventeen" (815,417) with "six hundred twenty-four thousand, nine hundred fifty-two" (624,952)
- Uniformly little-endian:
- None that I know of; I'd be especially grateful for an example here
- If there are in fact no languages anywhere in the world that employ pure little-endianness in speech or transcription of speech, might that fact be a clue showing how, or more precisely ruling out one possibility for how, the brain processes numbers?
- Written in numeral form:
- Uniformly big-endian:
- Indo-Arabic, at least when used in languages that read from left to right...
- ... But are there any right-to-left writing systems that use Indo-Arabic (as opposed to Eastern Arabic) numerals, and if so, then for each such system, does it maintain the left-to-right aspect, thereby switching the endianness to little-, or does it maintain big-endianness, thereby switching the order to left-to-right?
- Mixed:
- [Are there any of these among "true" numeral systems? The use of numerals would seem to have among its purposes the ease of sorting in one direction or another...]
- (exclusion) I'd argue that Roman numerals don't fall into this category because the placement of smaller numbers out of order denotes not the presence of their value but rather the absence (i.e., "to-be-subtractedness") of that value from the following number placed according to big-endian rules.
- Uniformly little-endian:
- Eastern Arabic: Text is read right to left; numerals are read left to right.
- Dates:
- Spoken:
- Uniformly big-endian (year-month-day):
- [Question: Do East Asian languages that employ year-month-day order in writing also do so in speech?]
- Middle-little-big (month-day-year):
- American English
- Uniformly little-endian (day-month-year):
- Modern western continental European (not sure about modern British English or Slavic languages)
- Written:
- Uniformly big-endian:
- At least some East Asian languages (I'd need an expert to confirm which varieties of which languages)
- Middle-little-big:
- American English (majority practice)
- Uniformly little-endian:
- Modern Western European, including British English (not sure about Slavic languages using Roman alphabet or about languages using Cyrillic alphabet)
- [Question: Have any studies been done about the relationship, if any, between use of endianness in date numbering and the time horizon for the context in which it is used and/or the on which people typically think in the cultures using it in everyday life? Little-endian ordering is most efficient if everything is to occur within the remainder of the current calendar month or occurred within the portion of the current calendar month elapsed to date, but middle- (month-)first endianness is helpful if one is making broad, non-granular plans for a calendar year, and big-endianness is necessary for long-term archival in chronological order and long-term future plans. I wonder whether the date order in common use in an area cognitively primes the people who use it to think on a certain time scale.]
Thanks!
These codes right?
(Contribution of user:108.171.131.169 on 2017-05-16 in section Endianness#Files and byte swap:)
No, these codes are not right.
The notation 0x6789abcd has been introduced by the C language and stands for the 32 bit integer 1737075661 (and this independently of the machine type on which it is running – little or big endian), and NOT for the string "6789abcd". Basically the same is true for notations such as 6789abcdh or 6789abcd16.
If you want to specify a byte sequence on a little endian machine (intended to have the very same meaning as 32 bit integer) you may use 0xCD,0xAB,0x89,0x67, 0xcd,0xab,0x89,0x67 or CDh,ABh,89h,67h. There may be further possible notations e.g. in assembler languages. But 0xcdab8967 (or 0xCDAB8967) means 3450571111 (or 3450571111dec or 345057111110) on every machine. --Nomen4Omen (talk) 09:41, 17 May 2017 (UTC)
Now resolved (by the deletion of the contribution). --Nomen4Omen (talk) 18:12, 12 June 2017 (UTC)
bytes or digits
Endianness in computing is similar, but it applies to the ordering of bytes, rather than of digits. It is usual today for endianness to apply to bytes. It can also apply to other bit groups. How are decimal numbers stored for the Intel 4004? Gah4 (talk) 17:43, 29 September 2016 (UTC)
- BCD Little-endian. Either unpacked (where each digit takes up an entire byte, the top four bits being wasted) or packed (low-order nibble is least significant digit). Intel 4004 architecture manual. --jpgordon𝄢𝄆 𝄐𝄇 18:04, 29 September 2016 (UTC)
- The link seems to be for more modern IA (Intel Architecture), and yes IA allows for either packed or unpacked bytes. But the 4004 uses four bit data, so big/little endian should be based on four bit decimal digits for BCD data. Gah4 (talk) 05:18, 30 September 2016 (UTC)
- I changed it to usually which is probably close enough. Gah4 (talk) 05:21, 30 September 2016 (UTC)
- And, for fixed point, packed decimal big-endian on IBM System/360 and its successors, all the way up to z/Architecture; the high-order digit is in the upper 4 bits of the byte and the low-order digit is in the lower 4 bits of the byte. See page 34 of the first edition of the System/360 Principles of Operation. Guy Harris (talk) 07:23, 30 September 2016 (UTC)
- I'm not sure the example of endianness really fits for most people's perception. We write the most-significant digit first, but we operate on them going the other way. Cognitively, operating is a "more important" activity than writing, and we generally grade things by their importance. (I trained in psych, not csci). 98.118.17.206 (talk) 20:14, 17 October 2016 (UTC)
- With respect to direction there are two kinds of «operations»:
- addition, subtraction, and multiplication go from low to high order
- division and comparison go from high to low order.
- One should not totally ignore the latter 2 «operations». --Nomen4Omen (talk) 21:03, 17 October 2016 (UTC)
- Yes. Architectures that started out without a divide operation had some tendency to be little endian. Ones that included divide from the beginning, such as IBM S/360 and Motorola 680x0. For many years, programs were debugged from printed hexadecimal storage dumps, which are re easier to read in big-endian form. Gah4 (talk) 06:34, 18 October 2016 (UTC)
As to comparison, IBM S/360 considers two types, one of which is called logical. (I suspect others might call it unsigned.) The normal way to do numeric compare is to subtract, don't store the result, and, appropriately, test the sign of the result. Note that this works even if you don't know the lengths in advance. On the other hand, there is CLC (compare logical character) which does what C calls memcmp(), which is left to right in increasing address order. Comparing the first non-equal byte, if any, gives the result. I suspect that OS/360 CP (compare decimal) works right to left, but that would be an implementation decision. (That is, use subtract microcode.) Gah4 (talk) 19:40, 28 February 2018 (UTC)
Writing numbers as words
As a native (UK) English speaker the phrase "one hundred twenty-three" sounds dreadfully wrong to me. I edited the page to fix this, but my edit was reverted by Jeh, who left me a comment to say it was correct as it was. Maybe it's more of an American thing, but I've been around a while and I'd never even heard of such a construction until I read this page. Do people really say numbers like that? Or are they just dropping the "and" when they write it? SystemParadox (talk) 17:54, 28 February 2018 (UTC)
- I don't know about UK, but the rule for writing checks in the US is that you only use and between the dollars and the cents. I suppose writing paper checks will soon be a lost art, but until it is, the most common place where numbers are written out as words is still checks. Spoken, people might add the and. Gah4 (talk) 18:11, 28 February 2018 (UTC)
- I'm far from the only one who has ever heard of such a thing - see for example this online calculator or this one. How many "and"s do you use, by the way? Would you write 123456 as one hundred and twenty-three thousand and four hundred and fifty-six? What purpose does the "and" serve? Once you've written "hundred" the next thing will be the tens' position, if any, no? Hmm, why not "twenty and three" while you're at it?
- WP:MOS and WP:MOSNUM are silent on the issue, no doubt because MOS says that numbers above nine should generally be, and numbers above 100 should almost always be, written with digits rather than words anyway. fwiw, this page says you should leave out the "and"s (item 8a). I've found allusions to the notion that the "and"s are a British-English archaism, but nothing I'd consider authoritative. Jeh (talk) 18:20, 28 February 2018 (UTC)
- I use as many "and"s as I would speak. I would say 123456 as "one hundred and twenty-three thousand four hundred and fifty-six", so I would write it that way too. How would you say 123456? Why should it be written any differently? This chart suggests that even in American English, using the "and" is far more common. SystemParadox (talk) 23:23, 28 February 2018 (UTC)
- See English numerals on this subject. It seems that in American English, the "and" tends to be dropped. Vincent Lefèvre (talk) 00:46, 1 March 2018 (UTC)
16-bit word vs. half-word?
In the late 1960s, I learned that the IBM 360 architecture included an 8-bit byte, represented by two hex digits, a 2-byte "halfword", a 4-byte "word", and and 8-byte "double word". I'm prepared to believe that what I learned half a century ago is long obsolete, but I'd like a citation to justify the term "16-bit word" over, e.g., "16-bit half-word". Accordingly I'm setting a "citation needed" flag. ??? Thanks. DavidMCEddy (talk) 04:41, 20 April 2018 (UTC)
- In the 1970's, there was S/370, successor to S/360, with 32 bit word and 16 bit halfword. There was also VAX, as 32 bit machine with a 16 bit word and 32 bit doubleword. (And sometimes 64 bit quadword, and maybe even 128 bit octoword.) For Intel, 8086 though 80286 naturally have a 16 bit word. With the 32 bit 80386, Intel could have changed to a 32 bit word, but they stuck with the 16 bit word from the 8086. With Itanium, a completely new 64 bit architecture not descended from the 8008, Intel could have gone for a 32 or 64 bit word, but they still kept the 16 bit word from the 8086. Given the dominance of Intel processors, it seems that 16 bits is the most obvious word definition today. Seems that we are going backwards. Gah4 (talk) 05:57, 20 April 2018 (UTC)
- In the context used there, a "word" is whatever and anything the machine can work on as a data unit. That's why it refers to both 16-bit words and 32-bit words. Both are correct in context.
- Re Gah4's comment, DEC brought the "word" and "dword" terminology from the PCP-11 (the GPRs of which were 16 bits, hence 16 bits "naturally" made a word) to the VAX. Since one point of the VAX was back-compatibility with the -11 (it could even run many -11 binaries originally built for RSX-11M with no changes) it would have been nonsensical to have two different "word" definitions in the same machine. "If you're coding in MACRO-11 a .WORD is 16 bits, but in MACRO-32 it's 32..." yeah, no.
- As you can see from the above we can reference a LOT of material that documents 16 bit words. But since "word" actually has a variable definition here in terms of numbers of bits I do not see the point. Jeh (talk) 06:19, 20 April 2018 (UTC)
- +1 for what Jeh says. S/360 started out as 32-bit, so 32 bits was a "word" and 16 bits was a "halfword". The PDP-11 was 16-bit, so 16 bits was a "word" and 32 bits was a "longword". That terminology was continued with the 32-bit VAX and even the 64-bit Alpha. x86, which started out as 16-bit, also had 16-bit "words" and 32-bit "doublewords", and continues that terminology even now, when it's 64-bit.
- So, for better or worse, the size of a "word", these days, has nothing to do with the natural address or data item or register size of an instruction set; it has to do with the history of the instruction set and of its predecessors.
- I.e., anybody who learned either that 1) a "word" is 16 bit or 2) a "word" is 32 bits learned something that only applies for some instruction sets - and at least three of those instruction sets (x86, Alpha as a successor to VAX which was a successor to the PDP-11, and z/Architecture as an S/360 successor) are now 64-bit. In none of them is 16 bits or 32 bits the natural size of an addressable item - they're all 8-bit-byte-oriented instruction sets (these days, almost every instruction set is 8-bit-byte-oriented; are there any exceptions other than the UNIVAC 1100/2200 series, with 36-bit words as the addressable unit, and the Burroughs large systems, with 48-bit words as the addressable unit, albeit with, in both cases as far as I know, special capabilities for character strings and decimal numbers?).
- If somebody wants to remove the term "word" from endianness, referring to "multi-byte data items" or something such as that, I'd have no problem with that, as it'd replace a term that can refer to different numbers of bytes on different architectures with a term that makes no commitment to the number of bytes it represents. Guy Harris (talk) 07:33, 20 April 2018 (UTC)
- I'd agree wholeheartedly if that wasn't such an awkward term. Perhaps instead we could use a simple footnote that explains that "word" in this context simply means a multi-byte datum. Jeh (talk) 09:14, 20 April 2018 (UTC)
- It almost works here, as the discussion works with any word size greater than the addressable unit. But then there is VAX floating point, which stores 16 bit little endian words, in big-endian order. Gah4 (talk) 19:23, 20 April 2018 (UTC)
read backwards
The article mentions both storage order and I/O transmission order. It occurs to me that those aren't always the same. It used to be usual for tape drives to have the ability to read (but not write) backwards. Bytes would come off tape in the reverse order, be stored into memory at consecutively lower addresses, and so end up in the original order. Some sort algorithms are optimized to use such tape drives, and save the time for rewinding the tape. That might be a detail that the article doesn't need, though. Gah4 (talk) 23:21, 5 January 2019 (UTC)
- When reading backward, everything comes backward (e.g. character strings too, and the order the objects are passed). Thus this is not en endianness issue, just a reading backward issue. Vincent Lefèvre (talk) 00:13, 6 January 2019 (UTC)
- They come out in the right order, because it fills the buffer from the end to the beginning. But data on the wire is in the backwards order, and the article mentions data transmission order. There might be some that store the wrong way, but S/360 fills buffers from the end to the beginning. Gah4 (talk) 00:40, 6 January 2019 (UTC)
- There's I/O transmission order as in "order in which bytes are transmitted over an I/O bus between a host and peripherals" and there's network transmission order. About the only mention of I/O I see is this paragraph in the "Bi-endianness" section:
Note, too, that some nominally bi-endian CPUs require motherboard help to fully switch endianness. For instance, the 32-bit desktop-oriented PowerPC processors in little-endian mode act as little-endian from the point of view of the executing programs, but they require the motherboard to perform a 64-bit swap across all 8 byte lanes to ensure that the little-endian view of things will apply to I/O devices. In the absence of this unusual motherboard hardware, device driver software must write to different addresses to undo the incomplete transformation and also must perform a normal byte swap.
- The whole "read backwards" mechanism was, as far as I know, a special-purpose mechanism for mag tapes, allowing all the data on a tape to be read without a rewind if you were at the end of the tape; I don't think any network protocol used it or had any need for it. (Even if, for example, UN*Xes had mag tape ioctls to switch between "forwards mode" and "backwards mode", so that, if there were a mag tape that supported read backwards, you could put it in "backwards mode" and all subsequent reads would be done as "read backwards" until you put it back in "forwards mode", the rmt protocol used to dump to and restore from another machine's tape drive wouldn't end up sending data over the wire in reverse order, as, if the rat server did a read after getting a "backwards mode" ioctl sent to it and putting the tape driver into "backwards mode", the data would be in the correct order in memory, and that's what would be sent over the wire.)
- So I suspect "read backwards" is a detail that doesn't need to be mentioned here. It might be an interesting note for magnetic tape data storage#Sequential access to data; I don't see a page discussing sort utilities (as opposed to sort algorithms) - if there were, that might be another place to mention "read backwards". Guy Harris (talk) 02:09, 6 January 2019 (UTC)
- Helical scan drives fundamentally can't do read backwards. I asked in the talk page for LTO. Since it is IBM related, it might be expected. Gah4 (talk) 00:15, 7 January 2019 (UTC)
- When the article mentions data transmission order, that's implicitly assuming that the data are transmitted in the right direction: from the first byte to the last one (as a sequence of bytes, e.g. like a string of characters). The endianness issue is whether the first byte of a multibyte integer is the most significant one or the least significant one. Vincent Lefèvre (talk) 12:24, 6 January 2019 (UTC)
(Meanwhile, in a digression in the "for the lulz" category, I was curious about modern tape drives.
I have the suspicion that the old Shugart Associates System Interface, to attach disks, tapes, and other peripherals to microcomputer systems, has evolved into something that's now the bus for Enterprise-Grade Storage And Archiving Hardware(TM), even if it's, for example, SCSI-over-Fibre Channel, with SCSI's place as "the low-end microcomputer peripheral attachment" being taken by (S)ATA, except perhaps for flash drives which have their own new standards. Are mainframe disks/tapes ultimately SCSI devices, even if there's some hardware that takes traditional channel commands and implements them by controlling the device over SCSI?
So I did a Web search for "SCSI read backwards", and, indeed, there's a SCSI READ REVERSE command.
However, I also found US Patent 6,148,278, "Emulating the S/390 read backwards tape operation on SCSI attached tape devices", held by IBM:
Disclosed is a method for attaching SCSI tape devices which may only read in a forward direction to an S/390 compatible computer system. This involves translating S/390 I/O operations for channel communication by emulating an S/390 "Read Backwards" channel command with a single "Read Backwards" routine to be used for all SCSI tape drives regardless of read capabilities in the backward direction. The emulation of a S/390 "Read Backwards" channel command includes using a combination of existing tape positioning commands, including: the RF (Read Forward) command, parsing technique to obtain the appropriate bytes to be stored into memory, and calculation for the residual byte count.
for, I guess, the benefit of tape devices that don't support READ REVERSE.
So any idea how much backwards-reading of tape drives is done these days? Does anybody do tape sorts any more?) Guy Harris (talk) 02:49, 6 January 2019 (UTC))
8080 and 6800
The article mentions the endianness of the 8086 and 68000, but these, to at least some extent, are successors to the 8080 and 6800. The 8080 is little-endian in the cases where it matters. I am not sure about the 6800. The 8086 was designed to be assembly source back-compatible with the 8080. (You could map instructions, though the opcodes and lengths were different.) As well as I know, the 6800 to 68000 did not have that feature. Gah4 (talk) 20:33, 1 June 2019 (UTC)
- According to page 3-3 of the M6800 Programming Reference Manual, on an interrupt, the upper 8 bits of the PC are stored in a byte at address m - 1 and the lower 8 bits of the PC are stored in a byte at address m;
- according to page 3-8, the subroutine call instruction saves the return address in a similar fashion;
- according to page A-46, the "load stack pointer" instruction loads the upper 8 bits of the stack pointer from address M and the lower 8 bits of the stack pointer from address M + 1;
- so I guess that counts as big-endian.
- It might have been possible to have a compiler that read 6800 assembly code and emitted equivalent 68000 assembly code (as I think DEC did to compile VAX assembler code to run on Alpha), but the 68000 instruction set was pretty clearly not designed with "being like the 6800" as a primary goal.
- Whether that indicates that the 68000 inherited its byte order from the 6800 is another matter. Guy Harris (talk) 22:26, 1 June 2019 (UTC)
A generalized description of endianness
I'm disappointed that my mathematical definition of endianness (Endianness is the sequential order in a list of arbitrary objects) was rejected in favor of yet another introduction that pretends that endianness only applies to bytes in words. As an electrical engineer, this CS-centric view of the universe drives me nuts. This is only what endianness means to computer programmers switching between Intel and non-Intel architectures, it is not a useful or general definition of endianness for other engineers who require a clearer definition. J.Mayer (talk) 16:42, 27 June 2019 (UTC)
- Can you provide any sources supporting the use of the term endianness in this context? Also, while it may be fine content for the article, it isn't at all clear to me that putting it in the lede would be necessary or appropriate. ␄ –Nucleosynth (t c) 17:25, 27 June 2019 (UTC)
incomplete and misleading
Yes there are various mixed-endian formats. It doesn't seem that we need all the details in the lede paragraph, though. Can we indicate that there are other possibilities without making it too confusing? Fill in the details later? Gah4 (talk) 06:28, 6 June 2019 (UTC)
- +1, the lede has far too much detail, much of which is superfluous or subtly incorrect. It could use a rewrite, IMO. ␄ –Nucleosynth (t c) 19:11, 17 June 2019 (UTC)
- IMO it's gotten even worse since I posted; this level of detail in a lede about stuff like 6800 microprocessors is actively harmful. ␄ ––Nucleosynth (t c) 16:39, 26 June 2019 (UTC)
- I've rewritten the introduction to try to address these issues ␄ –Sivix (talk) 19:17, 26 June 2019 (UTC)
- @Sivix: Thanks, this is a lot better IMO ␄ –Nucleosynth (t c) 19:21, 26 June 2019 (UTC)
- I agree, except for the last paragraph: "In English [...]" may not follow the history. I don't understand the part on bit-shift operations (note that bits may be numbered starting from 0 for the least significant bit, so that bit ordering would be little endian). I disagree on "This can lead to confusion when interacting with little-endian numbers." because literals are high level while endianness is a low-level notion (e.g. most programmers do not have to care about endianness); moreover, there is often a conversion between base 2 and base 10, and who cares about using the same endianness in such a case? Note also that when writing 0x1234 as a 4-byte integer type, 0x12 will not even be placed at the first address on a big-endian machine; so, big endian is not less confusing than little endian when considering literals. Vincent Lefèvre (talk) 19:40, 26 June 2019 (UTC)
- @Sivix: Thanks, this is a lot better IMO ␄ –Nucleosynth (t c) 19:21, 26 June 2019 (UTC)
- I've rewritten the introduction to try to address these issues ␄ –Sivix (talk) 19:17, 26 June 2019 (UTC)
- IMO it's gotten even worse since I posted; this level of detail in a lede about stuff like 6800 microprocessors is actively harmful. ␄ ––Nucleosynth (t c) 16:39, 26 June 2019 (UTC)
- Perhaps they're thinking of confusion when reading memory dumps. In any case, that claim needs a citation; the citation might clear up what, if any, issues truly exist. Guy Harris (talk) 01:43, 27 June 2019 (UTC)
- Yes memory dumps do complicate things. Unix od -x dumps (without other options) 16 bit words, and so the wrong order if expecting to read bytes. VMS DUMP command prints hex data right to left, and ASCII data left to right, with the address in the middle. That solves one problem, though it takes a little while to get used to. Big endian hex dumps are much easier to read. Gah4 (talk) 07:32, 27 June 2019 (UTC)
- OK, but an example would be needed. And does this belong to the introduction? But asking 16-bit words while expecting bytes is just wrong usage. Vincent Lefèvre (talk) 17:41, 27 June 2019 (UTC)
- I agree, but it seems to go back to Unix version 1. Well, od without options prints 16 bit words in octal, and od -x prints them in hex. I suspect that means we should blame Ritchie, but it is a little late for that now. Gah4 (talk) 21:44, 27 June 2019 (UTC)
- What matters nowadays is POSIX, where the standard output option for
od
is -t, for which you can choose the number of bytes; the -x option is just an XSI extension. Vincent Lefèvre (talk) 23:20, 27 June 2019 (UTC)
- What matters nowadays is POSIX, where the standard output option for
- Little-endian is definitely more difficult to read when looking at a hex view of any binary source (memory dump, file format, etc). This is true whether the numbers are 16-bit or 32-bit (or 64-bit). On a big-endian file, the contiguous bytes
56 78 9A BC
represent the number0x56789ABC
. In a little-endian file, the reader has to mentally rearranged them to0xBC9A7856
. –Sivix (talk) 18:00, 27 June 2019 (UTC)
- Little-endian is definitely more difficult to read when looking at a hex view of any binary source (memory dump, file format, etc). This is true whether the numbers are 16-bit or 32-bit (or 64-bit). On a big-endian file, the contiguous bytes
- Anyway on both kinds of machines, memory dumps are hard to read, and debugging tools can present data in a structured way, where endianness no longer matters. Vincent Lefèvre (talk) 19:03, 27 June 2019 (UTC)
- w.r.t. to bit-shift operations, most programming languages use "left" or "<<" to indicate shifting towards the most significant bit and "right" or ">>" to indicate shifting towards the least significant bit. That language assumes that bits are laid out in big-endian order. In reality, bits are not individually addressable and so don't have a real endianness, but if one thinks about them as `conceptually` big-endian ordered then it makes big-endian byte ordering easier to understand and little-endian byte ordering more difficult. Under this mental model, big-endian byte ordering arranges all the bits contiguously from most significant to least significant, while little-endian byte ordering results in "switch-backs" on byte boundaries (e.g. 31-24,23->16,15->8,7->0 vs. 7->0,15->8,23->16,31->24). –Sivix (talk) 17:52, 27 June 2019 (UTC)
- Here, "left" just corresponds to the most significant bit. But for people who usually read right to left, or from the bit number 0 (usually the LSB) to the highest numbered bit, the endianness would be reversed. Vincent Lefèvre (talk) 19:03, 27 June 2019 (UTC)
- The paragraph is specifically talking about numeric representations in English. However, I'm not sure that any such person exists (i.e. one who reads numerals from least significant digit to most significant digit). Can you point to a popular language (or programming language) that uses little-endian numerals? —Sivix (talk) 19:50, 27 June 2019 (UTC)
- What you're saying is just meaningless. A programming language is not English (though it can contain English keywords). In C, what could be the closest to bit endianness is bit-fields: if the first bit-field in a structure is put on the MSB side, that could be regarded as big-endian, and if it is put on the LSB side, that could be regarded as little-endian; the ordering is implementation-defined. One could also talk about bit endianness in languages that have bit arrays, with bit addressing, but this can either be big-endian or little-endian. Outside of such notions, how people regard binary numbers is their matter: it can be MSB first or LSB first... Who cares... Vincent Lefèvre (talk) 23:20, 27 June 2019 (UTC)
- The point I'm trying to get across is that English (and every other major natural language that I'm aware of) places the most significant digit on the left side of a number and the least significant digit on the right side. This is also true for every programming language with which I am aware when it comes to numeric literals. This matters for the purposes of things such as bit fields and bit masks (and bit-shifting operators). For example, say I wanted to mask the three lowest significant bits of an integer. I would create a mask where the literal was
int mask = 0b00000111;
, or more realisticallyint mask = 0x07;
. Both of those representations encourage thinking about binary numbers as if the bits were ordered in a big-endian fashion. If one thinks this way, then working with big-endian byte-ordering is conceptually much simpler than with working with little-endian byte order for the reasons I've outlined above. Note that this potential confusion only matters when you're dealing with a stream of raw bytes. For example: when looking at a hex editor, or when reading bytes from a file, or when reading a stream of bytes from the network. If you're just moving integers around in memory with your programming language of choice then you (usually) don't need to care about endianness as that is all handled under the hood. However, when interacting with streams of bytes, endianness matters quite a bit, and little-endian byte streams are usually more difficult for humans to read and work with for the reasons I've described. —Sivix (talk) 00:45, 28 June 2019 (UTC)
- The point I'm trying to get across is that English (and every other major natural language that I'm aware of) places the most significant digit on the left side of a number and the least significant digit on the right side. This is also true for every programming language with which I am aware when it comes to numeric literals. This matters for the purposes of things such as bit fields and bit masks (and bit-shifting operators). For example, say I wanted to mask the three lowest significant bits of an integer. I would create a mask where the literal was
- As well as I know it, that was the reasoning behind the choice for S/360. While for the most part, HLL programmers don't need to know it, at comes up pretty fast in assembler programming. For example, assemblers (at least the OS/360 ones) print the generated hex code to the left of the instruction. Numeric values in instructions are much easier to read if they are in the appropriate order. There are fixes for this, but they are somewhat ugly. A few years ago, Patterson gave a talk about Risc V, and passed out green cards.[1] About the first thing I noticed, is that in the instruction formats, the opcode is on the right. Instructions are 32 bit little-endian words! (or 16 bit little-endian words.) Putting the opcode on the right just seems strange, though maybe not so strange for people used to reverse-polish notation. I suspect that the assemblers print the generated code right to left, to match this format, make numbers readable, and otherwise confuse us. Gah4 (talk) 01:03, 28 June 2019 (UTC)
- @Sivix: No, this is not comparable with big endian. If
int mask = 0b00000111;
were regarded as big endian, then for a 32-bit int, the first (left) 0 would be put on the MSB side of the register (or memory location), and so on, so that you would end up with 0x07000000, not 0x00000007. Vincent Lefèvre (talk) 09:35, 28 June 2019 (UTC)
- @Sivix: No, this is not comparable with big endian. If
- Which reminds me of a question I have long wondered, but probably this is the wrong article to discuss it, which is why do all (or at least most) programming languages use English keywords. (I suppose that leaves out APL.) For languages like Hebrew that naturally read right to left, how do they do numbers? Gah4 (talk) 21:44, 27 June 2019 (UTC)
- Numbers are written in the same way. If you consider that they are written left-to-right (opposite to the language writing direction), that would be similar to big-endian, and if you consider that they are written right-to-left (same as the language writing direction), that would be similar to little-endian. But note that independently of the language, with a right-to-left parsing, you know the weight of each digit, while this is not possible with a left-to-right parsing, where you need to know the number of digits. So, in natural languages, things can be more complex than endianness and should probably not be compared with endianness. Vincent Lefèvre (talk) 23:20, 27 June 2019 (UTC)
- The short answer is that early programming languages were created by English speakers, or by people employed by English-speaking institutions. After a while, the tradition stuck. Some languages experimented with multi-lingual keywords, but they never caught on. In particular, the dominance of C and C++ really cemented the behavior. Many compilers didn't even accept non-ASCII characters in source for a long time (I'm looking at you, Borland C). —Sivix (talk) 00:45, 28 June 2019 (UTC)
References
- ^ "RISC-V Reference Card" (PDF). riscv.org. Retrieved 28 June 2019.
non-binary?
The first sentence seems to suggest that endianness only applies to binary computers, though then suggests it might also apply in non-computer applications. But it also applies in non-binary computer applications, though those are rare these days. BCD arithmetic isn't so rare, though there are even non-BCD decimal computers, such as using a 2-of-5 code. If there were trinary computers, it would apply there, too. (There used to be rumors about Russian trinary computers, I don't know of any being verified.) Gah4 (talk) 19:30, 26 June 2019 (UTC)
- Ternary computer#History has references for the Soviet Setun ternary computer. Guy Harris (talk) 19:41, 26 June 2019 (UTC)
- I think that the introduction should limit to where the endianness issues occurred first: the order of bytes in a multi-byte integer when stored in memory. Then the notion of endianness was extended using the similar idea. Vincent Lefèvre (talk) 19:44, 26 June 2019 (UTC)
- BTW, the endianness issue with BCD is complex because there is the ordering of the two decimal digits in a byte and the ordering of the bytes in memory, thus 2 levels of endianness. Vincent Lefèvre (talk) 19:52, 26 June 2019 (UTC)
- The two-level problem is the origin of VAX middle-endian floating point, from a PDP-11 floating point option. Not to mention the numbering of bits. Gah4 (talk) 20:45, 19 September 2019 (UTC)
verilab
For an interesting reference and discussion on endianness, this paper from Verilab seems pretty good.[1]. Gah4 (talk) 20:10, 20 September 2019 (UTC)
References
- ^ Johnston, Kevin. "Endian: From the Ground Up" (PDF). www.verilab.com. Verilab. Retrieved 20 September 2019.
bits
I removed a claim with {{cn}} related to bit numbering that sounded like WP:OR. Bit numbering is an interesting question. While machines that can directly address individual bits are rare, those that can address bits within words or bytes are not so rare. The 68000[1] BCLR and BTST instructions address bits within a register or byte. One could address arbitrary bits in memory after shifting the bit offset from the byte address. Note also that the 68000 is big-endian in words, but BCLR and BTST use little-endian bit offsets. Machines (and as noted in the article, networking protocols) often number bits in addition to bytes or words. IBM S/360 and successors, which use big-endian order within words, use big-endian bit ordering in documentation. This documentation feature is especially interesting with the z/Architecture extension to 64 bits, as the 32 bit instructions now operate on bits 32 to 63 of registers. I believe that the IBM 7030/Stretch has bit addressing modes, but didn't look it up to be sure. Gah4 (talk) 21:05, 19 September 2019 (UTC)
- For some reason, the reference above doesn't work. No error message, it just doesn't appear! Gah4 (talk) 21:06, 19 September 2019 (UTC)
- Can reference names not be numbers? I changed the name and now it works. Gah4 (talk) 21:09, 19 September 2019 (UTC)
- Application of the term "endianness" to the numbering bits has little meaning. There are a few cases where bits are transferred serially on a data line in some specified protocols, as noted in the article; these are pretty low-level, so not a very common "endianness" issue.
- The equivalent addressing issue for bytes in words doesn't really apply to bits, since (in all conventional computer hardware), bits are not individually addressable, there's no way to confuse which direction to go to increment to the next bit. Bits can be identified by number, but AFAIK this only affects some assembly/machine languages (like the one referenced here), not high level languages. In low-level computer hardware documentation, everybody but some legacy parts of IBM have the least-significant bit called "bit 0", but that is termed neither "little endian" nor "big endian"; I've only heard it called "the old IBM way" or "the normal way". And anyway, being only textual/verbal, this is also not the same level of issue as the byte-order-within-word endianness.
- Perhaps this could be made more clear in the article, but as I'm not sure how clear I've made it here, I'm unsure how to do it. Also, it's hard to get sources for a lack of application of something. --A D Monroe III(talk) 22:05, 19 September 2019 (UTC)
- It seems that IA32[2] bit instructions (BT, BTC, BTR) and bit scan (BSF, BSR) use bit addressing. Both 680x0 and IA32 seem to use little-endian bit addressing, even though 680x0 is byte big-endian. It seems that bit addressing isn't all that unusual. I believe both ignore high bits in the bit offset, such that one can take a bit address, shift right to get the byte address, then apply the instruction using the bit address in the bit offset operand. The BSF and BSR instructions return the bit position of the first or last (LSB or MSB, respectively) set, again with little endian addressing. So, it seems that bit level addressing isn't all that unusual. Gah4 (talk) 06:51, 20 September 2019 (UTC)
- No, this is not bit addressing; this is just a convention for the ISA, hidden to programming languages. There is no endianness issue there. Compare with the usual endianness issues: you store a 32-bit word in a file on some platform, and when you re-read it from the file on a different platform, you get a different value. There is no such issue with bits: in a byte, the bits are always stored in the same order on all platforms. Vincent Lefèvre (talk) 09:20, 20 September 2019 (UTC)
- Agree, and better stated than my attempt at the same point. Should we add some form of this version of how endianness doesn't apply to the article in the Endianness#Bit endianness section? --A D Monroe III(talk) 15:03, 20 September 2019 (UTC)
- If you have, for example, a bit image then you need to specify the how the bits in the image map into bytes. In that case, it does matter how it goes into a file, and back out again. For example, Postscript[3] on page 290, specifies that the image operator considers bytes as a bit stream with the high-order bit of each byte first. Why do they specify it? Because it could be the other way. It might be that this order is so common for bit image files, that we don't think about that it could be the other way. As with digits, we write binary values with the MSB on the left, so big-endian seems an obvious way to write bit images. Then again, big-endian seems an obvious way to write bytes, too. Not so obvious, the default user coordinate system for Postscript has the y-axis going up, so data comes in in decreasing Y order, but increasing X order. See page 184 for coordinate system discussion. Are there no bit little-endian image file formats? Gah4 (talk) 18:32, 20 September 2019 (UTC)
- If only high-level languages count, it seems that some C implementations have the _bittestandcomplement intrinsic function, which allows for addressing bits. I don't know if it is in any C standard yet. Gah4 (talk) 18:32, 20 September 2019 (UTC)
- BMP_file_format#Pixel_format also mentions that the leftmost pixel is in the most-significant bit of the first byte. As with the Postscript coordinates, rows go from the bottom to the top, but by making the height negative, you can store them top to bottom. Gah4 (talk) 18:47, 20 September 2019 (UTC)
- I don't understand the point of this. Are you speculating that big or little endian for bits could exist? How would this affect the article? --A D Monroe III(talk) 20:37, 20 September 2019 (UTC)
- There is some form of bit endianness, but in the above examples, they do not correspond to bit addressing (this notion was mentioned earlier). Moreover, for file formats, there are no consequences on portability if the endianness is fixed. The only place I can see where bit endianness affects portability, i.e. may be different on different platforms, is bit-fields in C (actually, that's bit-field endianness, but with bit-fields of width 1, this corresponds to bit endianness): in an integer type, they could be ordered from the most significant bit or from the least significant bit, depending on the implementation. Vincent Lefèvre (talk) 21:47, 20 September 2019 (UTC)
- Hmm. I've never heard C's bit-field-order issue described as "endianness", but I agree it does have some similarities. Should this be explicitly added to the article? Can we find a source that links this issue with "endianness"? --A D Monroe III(talk) 15:39, 21 September 2019 (UTC)
To see that bit endianness is real, consider the IBM 2701 Data Adapter Unit for S/360,[4] figure 16, which is a bit-reversed ASCII table. Software using the 2701 has to bit-reverse data, presumably with a TR (translate) instruction, before sending or after receiving. Even though it is 1965, the table isn't labelled as ASCII. Some IBM terminals, such as the 2741, send MSB first, so the hardware works that way. Gah4 (talk) 22:49, 4 October 2019 (UTC)
- It's certainly real when data is serialized, in the sense of "turning it into a serial bit stream". That's not an issue for in-memory data on non-bit-addressible machines. Guy Harris (talk) 23:35, 4 October 2019 (UTC)
- Well, the article is not titled Endianness as visible in memory, but at least 680x0 and IA32 have instructions to modify and test a bit within a byte. I believe they ignore high order bits, so you can take a bit address, copy it to another register and shift right 3. Then use that address for a byte, the unshifted address for the bit within the byte. That is about as easy as addressing bytes in Alpha. (Well, not quite. Alpha load/store instructions ignore the low bits, so you don't have to shift.) So it takes a few instructions, but the bit addressing is built into the bit set/clear/test instructions. Not as convenient as a single instruction to address a bit, though. For other machines, you need a lookup table in memory, and that determines the endianness. Bit set/clear is pretty convenient for generating bitmap graphics files, or memory mapped bitmap displays. An endianness choice was made in designing those instructions. And note that in the 2701 case, it is visible in memory after the data is stored by the channel. (I knew about the 2701 from a story about someone who learned this the hard way, while debugging a program using it.) In theory, it could happen in any I/O device (tape, disk, etc.) but people are pretty good at getting that one right. Those that get it wrong probably go out of business, except that IBM didn't have so much competition for the 2701. Gah4 (talk) 00:06, 5 October 2019 (UTC)
- The 68k family's bit and bit-field instructions can have the bit number/bit offset in a register, so the 68k ISA has a notion of bit order. (At least as of the 68020; the 68000 manual I checked doesn't indicate where the bit number comes from in the bit instructions, and the bit-field instructions first showed up in the 68020. The same applies to x86 (at least as of IA-32) and VAX.
- For machines where you need a lookup table, the ISA obviously has no notion of bit order, so that's defined by the software. Guy Harris (talk) 03:29, 6 October 2019 (UTC)
- Again, is this searching for things that could be considered bit-endianness? This is an encyclopedia; we reflect current published information, not discover or invent new information. If there's a reliable source that specifically calls something "bit-endianness", then we should probably include it in this article. Without such a source, we cannot include it, per WP:SYNTH and WP:OR. --A D Monroe III(talk) 23:53, 4 October 2019 (UTC)
- Endianness existed a long time before it got named, but otherwise. The reference in the next section has a good discussion of it, including naming it. Gah4 (talk) 00:09, 5 October 2019 (UTC)
- Concerning the order of the bit-fields in C, the ISO C99 standard says in 6.7.2.1p10: "The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined." (this still seems to be the same text in C17/C18, but I don't have the final version). Vincent Lefèvre (talk) 00:10, 5 October 2019 (UTC)
- I agree that source notes that the order of bit-fields in C is implementation-dependent. Although not stated in the source, I'd guess this could be an issue for the same code running on two different machines that are sharing such data. But that doesn't mean it's called "bit endianness". WP can't label it as such without a source that specifically states this, per WP:SYNTH. --A D Monroe III(talk) 23:13, 5 October 2019 (UTC)
- It is not bit endianness (but in the particular case where all the bit-fields have size 1, you get something similar to bit endianness; however, note that the memory is not involved: this is the ordering in the integer, not in its memory representation, i.e. in memory, this will be a combination between the bit-field ordering and the usual endianness). Note also that there is no notion of endianness in C, but that doesn't prevent one to have a WP article on endianness with examples written in C. Then there are many sources that draw a parallel between bit-field ordering and endianness. https://www.google.com/search?q=bit+fields+endianness and https://www.google.com/search?q=%22bit+fields%22+endianness are a start... Vincent Lefèvre (talk) 23:44, 5 October 2019 (UTC)
- Yes the C standard means that you might find different bit field allocation, even on machines with the same byte endianness. Reading it above, I suspect that it would also apply to different bit endianness, even with the same byte endianness, though it would be surprising to see that. I believe bit fields are commonly used in, for example, file system directory entries. Sysgen must then use a compiler with the appropriate bit-field allocation. And, or course, it all fails with different byte endianness. Gah4 (talk) 01:10, 6 October 2019 (UTC)
OK, so if we look at the BITS_BIG_ENDIAN #define
in GCC, which is described in the GCC documentation thus:
- Macro: BITS_BIG_ENDIAN
- Define this macro to have the value 1 if the most significant bit in a byte has the lowest number; otherwise define it to have the value zero. This means that bit-field instructions count from the most significant bit. If the machine has no bit-field instructions, then this must still be defined, but it doesn’t matter which value it is defined to. This macro need not be a constant.
- This macro does not affect the way structure fields are packed into bytes or words; that is controlled by BYTES_BIG_ENDIAN.
and see what values it, and BYTES_BIG_ENDIAN
, have for various platform in GCC 3.4.6, we see, for example, that UNICOS/mk on Alpha has a little-endian bit order and a big-endian byte order (yes, big-endian Alpha!), and IA-64 and MIPS have a little-endian bit order regardless of whether they're running with big-endian or little-endian byte order, so there are in fact, processors where the compiler's bit order doesn't match the machine's byte order, as well as several where they do match.
I'm not sure how commonly bit fields are used in file systems. A quick look at the headers in the XNU-3247.1.106 implementation of HFS+ found only 3 fields in in-memory data structures using bitfields; they're not used in on-disk data structures, they just do shifting and masking. (And that's a file system implementation that has to work on both big-endian PowerPC processors and little-endian x86 processors, as well as ARM processors running little-endian. It's big-endian on all platforms, which upset Linux Torvalds.) Looking at FreeBSD's file system implementations - UFS and ZFS, as well as ISO 9660 and FAT - in a not-too-old checkout from Subversion reveals the same. Linux's headers in the fs
directory have more bitfields, but I didn't take the time to see whether they're used for on-disk data structures or just in-memory data structures. Guy Harris (talk) 02:56, 6 October 2019 (UTC)
- Note that
BITS_BIG_ENDIAN
is just internal (that's in gccint, not in the main GCC manual), for internal conventions. Concerning the bit-fields ordering, there is a difference with byte endianness: in byte endianness, there is a notion of most significant and least significant, but not in bit-fields ordering. Vincent Lefèvre (talk) 08:33, 14 October 2019 (UTC)
- I presume it is related to the ordering of the bit fields in an int. If all bit fields have length one, then it is the bit ordering in an int. Then there is padding, unless it fills exactly. I always assumed that bit fields were first used in early Unix systems, such as for directories. As long as there was only one compiler, there was only one ordering. Gah4 (talk) 11:40, 14 October 2019 (UTC)
BITS_BIG_ENDIAN
defines how the generated code treats bit fields within a structure, so it's not "just internal" in the sense that it controls how GCC stores bit-fields (unless GCC is compiled with GCC), it affects code generated by GCC. I.e., it indicates what bit-ordering GCC uses for the platform in question.
- And if we treat Version 7 Unix as an "early UNIX" (Version 6 Unix came out with a C compiler that didn't support bit fields), as an "early UNIX system", a directory entry doesn't contain any bit fields - all it contains is a file name and an inode number. Furthermore, an inode contains no bitfields, either; the bits in the di_mode field of the inode are defined with octal #defines under "modes". Guy Harris (talk) 16:12, 14 October 2019 (UTC)
BITS_BIG_ENDIAN
is not available to the user, thus is entirely internal. Of course, the bit-field order can affect the user code, but this is the case with any C compiler. Vincent Lefèvre (talk) 21:53, 14 October 2019 (UTC)
- None of this depends on
BITS_BIG_ENDIAN
orBYTES_BIG_ENDIAN
being internal; their settings are visible if you look at the generated code, so they are relevant to the issue of whether you could have different bit-endianness on two different platforms with the same byte-endianness, as per Gah4's comment "Reading it above, I suspect that it would also apply to different bit endianness, even with the same byte endianness, though it would be surprising to see that."
- None of this depends on
- I.e., GCC is, in fact, surprising, as my examination of the settings of
BITS_BIG_ENDIAN
andBYTES_BIG_ENDIAN
on various platforms reveals that, in fact, not all platforms with a given byte order have the same bit order. Guy Harris (talk) 23:13, 14 October 2019 (UTC)
- I.e., GCC is, in fact, surprising, as my examination of the settings of
References
- ^ "The 68000's Instruction Set" (PDF). www.tigernt.com. Retrieved 19 September 2019.
- ^ "Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z" (PDF). www.intel.com. Intel. Retrieved 20 September 2019.
- ^ "PostScript®LANGUAGE REFERENCE, third edition" (PDF). www.adobe.com. Adobe. Retrieved 20 September 2019.
- ^ "IBM 2701 Data Adapter Unit Principles of Operation" (PDF). bitsavers.org. IBM. p. 27. Retrieved 4 October 2019.