Talk:Endianness

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing / Networking / Hardware (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Taskforce icon
This article is supported by Networking task force (marked as Mid-importance).
Taskforce icon
This article is supported by Computer hardware task force (marked as High-importance).
 


Wordmarks[edit]

The section on 'wordmarks' is original research, and isn't really related to endianness. The source doesn't mention the endianness at all, and the wordmark doesn't affect the endianness anyway. As far as I can tell, the IBM 1400 allowed certain instructions to address the end of a variable length word rather than the start, but this doesn't affect how the word was stored in memory. This has little to do with endianness, and is more likely to confuse than enlighten. Pburka (talk) 14:15, 22 March 2015 (UTC)

Endianness is not just about how multi-address data is stored (which is big-endian in those machines), but also how it is addressed and processed in relation to the contents. Starting at the high address and working toward the low is not in and of itself indicative of either endian, but when the high address contains the LSB, that is characteristic of little-endian.
Yeah, "some instructions" - try all of the arithmetic except divide, and the ordinary memory-to-memory move (copy) instruction. Arithmetic is the case where endianness matters most. True, the source does not use the word "endianness", but since the 1401 came out long before that word was applied to computing, that is not surprising.
It is not OR to state that addressing a multi-location operand by its least significant location is characteristic of little-endian; that is exactly per Cohen - not only the paper, but the very quotation in our reference; did you miss it?
It is the question of which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word? The followers of the former approach are called the Little-Endians, and the followers of the latter are called the Big-Endians.
When the 1401 does arithmetic other than divide, or executes a Move instruction, the "bit" from the "little end" (least significant end) of the "word" (multi-character operand) "travels" (is used in arithmetic or is moved to the destination) first.
i.e. yes, this has everything to do with endianness. The wordmark is related in that it tells where the "big end" is.
However, this section does cover the same ground as the much-longer-standing paragraph on the 1401 in the "mixed endian" section, so I'm puzzled as to why someone thought it was needed. Jeh (talk) 15:10, 22 March 2015 (UTC)
Dear Jeh, in the 1401 section, I guess it is terribly important to mention the completely different approach with the wordmarks (which allows to address a data item at both ends - like a sausage). --Nomen4Omen (talk) 16:36, 22 March 2015 (UTC)
Dear Pburka, the section on 'wordmarks' is not at all OR. You find it (if you really wish) in the referenced "principles of operation". --Nomen4Omen (talk) 16:36, 22 March 2015 (UTC)
But the ability to address things from either end (according to instruction) is not really what makes the 1400 series mixed-endian. That is due to the "mismatch" between the order of storage in memory (relative to increasing memory address) and the order of processing. By the way, most of that aspect of the 1400 series could have been done without wordmarks, by including a length operand with each memory address operand, so characterizing this as a property of "wordmark machines" is not accurate. Jeh (talk) 19:35, 22 March 2015 (UTC)

For me, it is not at all important to classify the 1401 as mixed-endian. (It is a big-endian machine, by the core criterion.) The wordmark architecture (which was given up after the 1401, as fas as I know, probably because it cannot be held thru or is absolutely out of interest for arbitrarily long multiplications and divisions) allows for the sausage kind of access. The sausage kind of access abolishes a main argument for little-endian, namely the immediate access to the one's position of a data field. So, IMHO the discussion has relevance for the article.

By the way, "Endian" is a misnomer: it should be called Big- and Little-Startian or maybe -Beginnian, because the sausage starts at the edge in attention (= its "handle" = its address) and at its end (there where the field ends) big-endian data is small, and little-endian data is big at its right end. (In Germany there is a saying: Everything has a begin and an end, but a sausage has two ends.) --Nomen4Omen (talk) 20:50, 22 March 2015 (UTC)

As I detailed above, the "core criterion" per that particular passage in Cohen (one of the defining sources for this material) is the order in which the data is processed. So it's little-endian for some operations. The fact that you do that by addressing the little end in your instruction confirms that. However he goes on to illustrate several different machines, characterizing them according to the order in which data is stored. Yes, by that criterion, the 1401 is big-endian. So it's mixed. I really see no room for any credible argument there.
One could also have wordmarks with only little-address-end access. Conversely, a machine that implements variable-length operands via length operands (as the VAX uses for its instructions with variable-length operands, like MOVC3, MOVC5, all of the packed decimal arithmetic, etc.) would not need wordmarks for anything. It could even implement 1401-style big-endian storage and addressing, little-endian usage for arithmetic. (Take the start address, add the length, subtract one, the result is the highest addressed location for the operand; start there and work back to lower addresses.) So the wordmark is not essential for either-end access, and does not imply mixed-endian. It is just how this series of machines did it.
I think characterizing the 1401 as "mixed-endian" is very important. The wordmark architecture is described in detail in the 1401 article.
"endian" may be a misnomer, but it's defined in our references, so we're kind of stuck with it. Inventing another term and saying "we should use this" is ok for the talk page, but not for the article. Jeh (talk) 21:25, 22 March 2015 (UTC)
You say: "but when the high address contains the LSB, that is characteristic of little-endian" ? Isn't that characteristic of big-endian ?? --Nomen4Omen (talk) 21:47, 22 March 2015 (UTC)
That's out of context; it's missing the words that came immediately before. I'll make it easier: What makes it little-endian is that the operand addresses the LSB, and the operation starts with the LSB and works toward the MSB. The latter is completely supported by Cohen's description that I quoted above. Jeh (talk) 22:58, 22 March 2015 (UTC)
That "the operation starts with the LSB and works toward the MSB" is not a characteristic of the machine, it's characteristic for the operation. And there are operations which work this way, namely add, sub, mult; but div works from MSB to LSB. --Nomen4Omen (talk) 23:23, 22 March 2015 (UTC)
M (move) also works from LSB to MSB. That's a pretty fundamental operation.
Anyway, IBM disagrees with you. From Page A-7 of the System Operations Reference Manual for the 1401 and 1460:
"A data field in core storage is addressed by specifying the low-order (units) position of the field in the A- or B-address of the instruction. The data field is read from right to left until a word mark in the high-order position is sensed."
This is the general rule; it precedes all of the descriptions of individual instructions Yes, a few exceptions exist; they are described with the respective instructions. One of those is the record move instruction. Divide is another. Both of those, by the way, were optional on the 1401.
Of course, there is also:
"An instruction in core storage is addressed by giving the high-order (operation code) position of the instruction."
So. Data is addressed by the low-order position in all but a few cases, and some things, including instructions, are addressed by the high-order position.
If that isn't mixed-endian, I don't know what is. Personally, I don't agree that that's what makes it mixed-endian—to me that's the order of operation vs. the order of storage—but it seems to me that you've made a very powerful case for "mixed-endian" based on a completely different criterion. So I don't understand why you're fighting so hard against that term. Perhaps we should call it "confusing-endian"... Jeh (talk) 05:13, 23 March 2015 (UTC)
@Jeh Very late insert on March 27:
(1) I couldn’t get hold of the SORM (System Operations Reference Manual) for the 1401. Maybe you can give me a pointer.
(2) But I have „A22-0526 IBM 1410 Principles of Operation“.
(3) Like you I thought that M is „a pretty fundamental operation“.
(4) But as I found out in PoO p. 25 that although being really fundamental, it is a fairly complex operation. Its mnemonic isn’t M, it is D for Data Moving and M is a mnemonic within the „d-character“ of that D instruction. This d-character specifies e.g. the direction of move i.e. left to right or right to left. But also other information e.g. how word marks should be handled. All 64 combinations of the 6 bits of the d-character have specific meaning which is shown in Figure 21. In Figure 23 you can read M=Move=Move data serial by character. On the same level as M is SCN=scan which does not move data, but only affects the A- and B-address registers. If this is true one cannot take M as a prime example for right to left or big- or little-endian.
(5) You refer to Cohen as one of the defining sources for this material. Now I took the pain reading his article and I found out: He doesn’t get the point. See below the new section Danny Cohen.
--Nomen4Omen (talk) 10:47, 27 March 2015 (UTC)


I would say, that there is neither LSB nor MSB with a Move instruction, because there is no "significance". Bytes, bits or digits have significance especially in the context of arithmetic where we have "positional" numeral systems which are adopted by the computer systems. --Nomen4Omen (talk) 05:55, 23 March 2015 (UTC)

That misses the point. You claimed that address-by-LSB-end/high-address-end was not a property of the machine, but rather of the arithmetic instructions. Well, the Move instruction is quite clearly a non-arithmetic instruction that does the same thing. Again: In the SORM the operand address method is not described for each individual instruction. It simply says ~"operands are addressed by LSB end" before any instructions are described.
Besides that: I refer you to the 14xx's Compare (C) instruction. You see, bit significance in memory doesn't only apply to numbers. When we compare two character strings, perhaps intending to determine which comes first in a sorted list, we treat the first letter of the string (the one on the left as it would normally be printed) as the most significant—just like when we compare numbers, the digit on the left is the most significant. Yet the 1401's Compare op starts from the "little end". This is inefficient, and the fact that it is inefficient is further indication (not that any is really needed) that this operand address and access order is a property of the machine. (I believe that all of the instructions that do it the other way are optional on the 1401, leading me to suspect that the "operand access big end first" hardware was itself a major add-on, not part of the base machine.) Jeh (talk) 20:54, 23 March 2015 (UTC)
Sorry, Jeh I verbally understand very few what you are saying.
1) What is SORM ?
2) What is "address-by-LSB-end/high-address-end" and where did I claim sth about that ?
A) What I know is: LSB means Least Significant Bit, meaning: there is a ranking (called "significance") of the bits, bytes, digits which has meaning (to the operation) beyond (but maybe correlated or anticorrelated with) its address. And with Move there is no such ranking beyond the address. With arithmetics (or with Compare) there is indeed.
B) And the wordmark at first glance has nothing to do with LSB, MSB, big- or little-endian. It just is a feature which separates or is able to separate fields in memory. So in the 1401, too. Additionally for arithmetic instructions (and Compare) the 1401 defines the high order digits at the low address wordmark and the low order at the high wordmark. And depending on the operation it can chose the more efficient alternative of access.
C) By the way, although the 8086 does not have a long compare (it has to be established by subroutine or sequence of instructions) the ranking of the bytes in such a long compare is (as in the telephone book and almost everywhere) a big-endian style ranking. But because this is software, where one can do terrible things, this does not make the intel machines big- or mixed-endian. --Nomen4Omen (talk) 23:22, 23 March 2015 (UTC)
The very fact that this debate is occurring is evidence that the section is original research. It appears that the endianness of the 1401 is ambiguous, the only source provided is primary, the source requires expert interpretation, and experts (Nomen4Omen and Jeh) disagree on how to interpret the source. Unless we can find a reliable source which specifically talks about endianness and the 1401, we shouldn't include this information in the article. Pburka (talk) 00:02, 24 March 2015 (UTC)
I doubt whether it is fair to stop the debate only because it's lengthy. I'am not an American speaker and sometimes have difficulty to understand some acronyms.
As to the 1401 with its wordmarks, I find it really worth to be mentioned in the context of this article as a remarkable approach (although it was not successful). Jeh and I debate how to sort. He seems to strongly support mixed-endian. For me this sorting is not really important, I would be satisfied without an endian-classification or with exotic-endian. Maybe some short words and a reference to the 1401 article is sufficient. --Nomen4Omen (talk) 07:35, 24 March 2015 (UTC)
For the SORM, here's the System Operation Reference Manual, IBM 1401 Data Processing System, IBM 1460 Data Processing System. The quote in question is on page A-8 in that version. Guy Harris (talk) 04:16, 28 April 2015 (UTC)
I find it remarkable that Nomen4Omen describes a key design point of a machine as successful as the 1401 was as "not successful". Allowing for the context of its era the 1401 was one of the most successful computers of all time.
As for the endianness issue, I see the point that if we consider only how things are stored, the 1401 is a big-endian machine. I don't care about describing it as "mixed endian" any more. Jeh (talk) 07:54, 28 April 2015 (UTC)

Remove Example: Interpretation of a Hexdump[edit]

I'm for removing this whole section. Reasons:

  1. Confusing. I work in this field, and can't easily follow the table or text. Readers that don't already know this subject will be utterly confused. It uses terms and notation (like hexadecimal notation) without introducing them.
  2. Inaccurate. It mixes byte-endianness with bit-endianess, when they are separate subjects.
  3. Not relevant. I don't think a how-to for interpreting hexdumps belongs in WP.

--A D Monroe III (talk) 16:03, 8 June 2015 (UTC)

I agreed. Kbrose (talk) 16:36, 8 June 2015 (UTC)

Telephone number[edit]

The article contains the following sentence: "The telephone network has always sent the most significant part first, the area code; doing so allows routing to begin while a telephone number is still being keyed or dialed." However a telephone number is not an integer, and in particular, not a 32-bit or 64-bit integer, just a sequence of digits, which can be seen as a character string. So, this is not related to endianness and IMHO, this example should be removed. – Vincent Lefèvre (talk) 20:48, 8 June 2015 (UTC)

Done. Kbrose (talk) 00:34, 9 June 2015 (UTC)
Indeed, in Danny Cohen's "Endianness" defining paper the fixed length integers are in his main focus. But even in his first part MEMORY ORDER he considers character strings and the English language (which btw is always big-endian). Nevertheless or therefore, the endianness of a computing machine is defined according to the endianness of the fixed length integers. In the second part TRANSMISSION ORDER he fears the great misunderstanding (and the war) and also uses the terms big- and little-endian. In this part a restriction to fixed length integers is even less appropriate and he observes big discrepancies even in one and only one communication standard. So, the observation "The telephone network has always sent the most significant part first, the area code; doing so allows routing to begin while a telephone number is still being keyed or dialed." is very well related to endianness and should receive some room in the article. --Nomen4Omen (talk) 17:09, 9 June 2015 (UTC)
The English language is not big-endian. It is just a sequence of characters. Ditto for telephone numbers. The concept of endianness occurs on integers because they usually don't fit in a single byte, so that they need to be represented as a sequence of bytes, and there is no canonical way to do that. – Vincent Lefèvre (talk) 21:38, 9 June 2015 (UTC)
The argument for telephone numbers isn't entirely without merit. The area code would seem to be the most significant part of the number. However, Wikipedia is not the place to publish original research. Unless someone can find a reliable source describing telephone numbers as being big endian, it doesn't belong in the article. Pburka (talk) 23:33, 9 June 2015 (UTC)
A telephone number is not really a number (despite the term "number" used here). When one writes a number, if a leading 0 is added, the value is not changed. In a telephone number, if a leading 0 is added, everything is changed. I wouldn't say "most significant part", but "leading part". But there's nothing wrong in putting the leading part in the least significant byte of a register, for instance (the "little-endian" way of doing...); with this convention, one could say that the telephone number is represented in little-endian. So, the source needs to be objective, not just say that a phone number is big-endian without justification (or because people tend to think big-endian, thus do wrong assumptions, such as left = most significant). – Vincent Lefèvre (talk) 12:39, 10 June 2015 (UTC)
We don't require sources to justify their logic. If reliable sources said that telephone numbers have endianness, we could include that, even if they provide no explanation. But it's moot, since, as far as I know, there are no such sources. It's fun to debate these things, but my point stands that Wikipedia articles are based on reliable sources, no more, no less. Pburka (talk) 23:13, 10 June 2015 (UTC)
Telephone numbers are simply character strings representing signaling events and for the first few decades after their invention they were never stored or even transmitted until digit registers were invented, some years after machine switching systems simply discarded the signals after a relay was set. I think it is a stretch to call something big-endian simply because it is a defined sequence of signals, like language. For practical purposes time cannot be reversed by some system of engineering, and it is a waste of time to contemplate endianness for these systems. Kbrose (talk) 14:08, 10 June 2015 (UTC)

IBM 1400 series[edit]

I am not able to agree with the statement "The IBM 1400 series has characteristics of ... little- ...-endian." The relevant differences between the 14xx (type "A") and later big-endian machines (type "B"), e.g. system /360, are:

  1. decimal (A) vs. (B) binary
  2. add-instruction addresses rightmost (A) vs. (B) leftmost byte
  3. machine has wordmarks (A) vs. (B) add-instruction "knows" the length of the operand

Besides these differences the algorithm for addition is the same, namely of type big-endian as we learnt it in school: starting with the one's position (the least significant digit) and working addresswise downward to the left to the most significant digit (while propagating the carries). If we program such an addition in COBOL or FORTRAN, we have one name, say "AUGEND", for the field to be added to (the so-called augend). On both types of machines, A and B, the symbol AUGEND stands for the whole field and can be identified with its byte address in memory. Let us assume that the length of the field AUGEND is 4 bytes, so that its least significant byte has the address AUGEND+3. On both systems, A and B, we write AUGEND = AUGEND + 157 for adding 157 to AUGEND.

On an A machine, the compiler generates an add-instruction referring to AUGEND+3. It can do this because the wordmark at address AUGEND+0 will stop the add-operation.
On a B machine, the compiler generates an add-instruction referring to AUGEND+0. At execution time the hardware will have to back up to AUGEND+3, start the addition there and work down to AUGEND+0.

This backing up from the address designating the field to the byte where the addition starts by the COBOL- or FORTRAN-compilers on machine A can hardly be classified as "little-endian". --Nomen4Omen (talk) 14:19, 14 June 2015 (UTC)

The section on the 1400 is unreferenced, and no reliable sources have been provided which directly support the claims in that paragraph. The only source which was found in the earlier discussion was the 1400 manual, which didn't explicitly describe endianness. If we're left to interpret the manual ourselves to determine whether it has characteristics of big- or little-endianness, then we're engaged in original research. The entire paragraph should be deleted. Pburka (talk) 05:22, 15 June 2015 (UTC)