Talk:IEEE floating point

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Computing (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
WikiProject Computer science (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.

Article might benefit from outside views of the new standard[edit]

The present article is carefully-written and informative, but it appears to be the view 'as seen by members of the committee.' (It reads like an insider view, though there is nothing improper about that). Has the trade press commented on these activities? Are any companies planning to implement new chips based on it? Has there been any published criticism of the new standard? This kind of thing would be useful to add, if properly sourced. EdJohnston (talk) 16:38, 11 September 2008 (UTC)

Absolutely -- I think the 754r article has been a 'holding pattern' for the new standard revision as it evolved. So that becomes a historical record of that revision. Soon (as soon as the new standard is actually available -- it was published on 29.08.2008) it will be time to put together a new 'IEEE 754' article that reflects the new standard. mfc (talk) 19:43, 12 September 2008 (UTC)
An additional comment on the above -- the revision process was open: anyone could join the mailing list or attend the meetings (no membership of fees required), so there could not really be such a thing as an 'insider view'. More than 90 people participated in one or more meetings (some had 40+ attendeees), and over 100 voted in the ballots. And yes, there are three separate hardware implementations, to date. mfc (talk) 16:55, 28 September 2008 (UTC)
Now the standard is finally available from the IEEE, it's probably time to start a new description of the standard for Wikipedia -- perhaps a working document first? mfc (talk) 16:55, 28 September 2008 (UTC)
By an 'insider view', I only mean 'the view by the people who were creating the standard.' If a standard has impact in the real world, there should be people who were not part of the committee who have things to say about it. I know that the original IEEE FP standard had a lot of impact, since it greatly reduced the number of incompatible floating-point implementations that engineers had to program for.
I agree. The current IEEE 754-2008 looks like just a rename/copy of the old IEEE 754r article, which was only about the revision process and a summary of changes. That is probably worth having (as a separate article?), but it's really not what should be in IEEE 754-2008 mfc (talk) 08:57, 29 September 2008 (UTC)
User:mfc noted that there are three separate hardware implementations. Shouldn't that be mentioned in the article? Are these implementations documented anywhere? EdJohnston (talk) 17:49, 28 September 2008 (UTC)
Also agree, and again not really much to do with the 'revision' article. For the record, the three hardware implementations are in POWER6 (full decimal floating-point unit), IBM System z9 (assists+millicode), and IBM System z10 (full decimal floating-point unit). For more details follow links from [1]. mfc (talk) 08:57, 29 September 2008 (UTC)

Still not an 'outsider view', but to kick off a new write-up of the standard I have put together a replacement text. This is mostly new text, but with some from the old IEEE 754-1985 article. The existing text here I will move to IEEE 754 revision before replacing with the new text. All comments and (correct) edits welcome! There is plenty that can be improved. mfc (talk) 16:11, 3 October 2008 (UTC)

IEEE 854 redirects to here, but no mention of it is made within the article. Should a note be added to indicate that it has now merged with this standard? --Wm243 (talk) 13:52, 25 March 2009 (UTC)

Good point – have so added. mfc (talk) 16:13, 27 March 2009 (UTC)

The IEEE 754 revision article mentions a half precision format, but no mention is made of it here. Was this dropped from the specification? --Wm243 (talk) 12:43, 17 April 2009 (UTC)

It ended up being just one of the many interchange formats for binary, and is not a basic format. It is mentioned under 'Interchange formats'. mfc (talk) 14:59, 17 April 2009 (UTC)

Exception handling[edit]

"Exception handling", What about "Denormal" exception? —Preceding unsigned comment added by (talk) 15:06, 15 December 2008 (UTC)

There is no 'denormal exception' in IEEE 754. The five possible exceptions are:
Invalid operation
Division by zero
Underflow occurs when a non-zero result is both subnormal and inexact. mfc (talk) 16:55, 15 December 2008 (UTC)

Still about "Exception handling", a note concerning the "Division by zero". The 1985 definition was: "If the divisor is zero and the dividend is a finite nonzero number, then the division by zero exception shall be signaled." As new operations were added in the IEEE 754 revision, it has been chosen to change this definition to include other operations with similar behavior (e.g. log(0)). The new definition is: "The divideByZero exception shall be signaled if and only if an exact infinite result is defined for an operation on finite operands." Vincent Lefèvre (talk) 20:28, 12 February 2012 (UTC)

Why is it worth sticking that in? It doesn't describe the conditions for all the other exceptions never mind in such detail. There's tons more worthwhile stuff like the IEEE 754 1985 article has, it might be worth moving much of the common stuff there about operations to a common article. Dmcq (talk) 21:02, 12 February 2012 (UTC)
The page was giving explanations for the invalid, overflow and underflow exceptions. For consistency, it should also give an explanation for division by zero, in particular because the name of this exception can be misleading. Saying that the exceptions were the same as in the old standard was also misleading. I agree that there should be a common article. The changes could be mentioned on IEEE_754_revision. Vincent Lefèvre (talk) 23:36, 12 February 2012 (UTC)
On the business of log(0) it does occur to me that it is worth noting that the exceptions have been extended to functions in general as seems appropriate. I see the standard now says that functions should set inexact correctly so it really is going for doing it all exactly right. Dmcq (talk) 20:16, 22 February 2012 (UTC)


Why are half precision, binary16 and 16-bit floating point numbers mentioned a few times in the article, but are they not shown in the table under "Basic formats"? Same question for decimal32. If there's a good reason for that, it might be useful to explain it in the article. (talk) 08:26, 2 August 2009 (UTC)

Because theys are not classified as "basic formats". You find them under "interchange formats". Basic formats have arithmetic operations specified for them, non-basic formats do not have that. That is, the standard recommends converting binary16 and decimal32 to a more presice format for doing arithmetic computations. That is in order to get smaller rounding errors. keka (talk) 10:18, 2 August 2009 (UTC)

In the current version, the table now includes the "interchange formats" binary16 and decimal32. However, the table is still in the "basic formats" section, so it's a little confusing as a reference -- why does the text say there are five formats when the table attached to it lists seven. You can figure it out by staring at it long enough, so I guess it's not wrong per se, but it might be cleaner to move that table from the "Basic Formats" sub-section higher up to the parent "Formats" section, or to a new sub-section. Bleachpuppy (talk) 22:35, 7 September 2010 (UTC)

VB.Net Conversion Code: IEEE754 to Hex to IEEE754[edit]

IEEE 754 to Hexadecimal Conversion...

   Private Function ConvertHexToIEEE754(ByVal hexValue As String) As Single
           Dim iInputIndex As Integer = 0
           Dim iOutputIndex As Integer = 0
           Dim bArray(3) As Byte
           For iInputIndex = 0 To hexValue.Length - 1 Step 2
               bArray(iOutputIndex) = Byte.Parse(hexValue.Chars(iInputIndex) & hexValue.Chars(iInputIndex + 1), Globalization.NumberStyles.HexNumber)
               iOutputIndex += 1
           Return BitConverter.ToSingle(bArray, 0)
       Catch ex As Exception
           Throw New FormatException("The supplied hex value is either empty or in an incorrect format. Use the following format: 00000000", ex)
       End Try
   End Function

Hexadecimal to IEEE 754 Conversion...

   Private Function ConvertIEEE754ToHex(ByVal SngValue As Single) As String
       Dim tmpBytes() As Byte
       Dim tmpHex As String = ""
       tmpBytes = BitConverter.GetBytes(SngValue)
       For b As Integer = tmpBytes.GetUpperBound(0) To 0 Step -1
           If Hex(tmpBytes(b)).Length = 1 Then tmpHex += "0" '0..F -> 00..0F
           tmpHex += Hex(tmpBytes(b))
       Return tmpHex
   End Function  —Preceding unsigned comment added by (talk) 06:48, 11 November 2009 (UTC) 

For more information please visit URL:

This does not strike me as approprioate material for the article and this page is for discussions for improving the aticle. Dmcq (talk) 13:35, 11 November 2009 (UTC)

Restore link to ungated draft of the standard?[edit]

In a recent edit, an editor took out the link to a draft of the FP standard hosted at, which is dated 2007. Since viewing of the current standard at the IEEE site is only available to subscribers, I suggest that the link to the ungated draft be restored to the article. EdJohnston (talk) 14:17, 26 July 2010 (UTC)

The draft has a copyright notice on its face. "Permission is hereby granted for IEEE Standards Committee participants to reproduce this document for purposes of international standardization consideration." Not clear that public dissemination fits that description. Glrx (talk) 16:21, 26 July 2010 (UTC)
Yes I agree, from the statements they put on the document it unfortunately looks like even putting a link to a copy would conflict with WP:COPYLINK. Dmcq (talk) 18:45, 26 July 2010 (UTC)
In the edit mentioned above, the motivation for removing the link to the obsolete version was out of concern that people might use or rely on outdated information in the draft version that may have changed in the final version. I didn’t know that access to the standards was restricted (the links worked when I tried them; I hadn’t considered my institution has a campus-wide subscription to IEEE Xplore). Although the link was admittedly not removed for copyright reasons, I think Dmcq read the policy right: “Knowingly and intentionally directing others to a site that violates copyright has been considered a form of contributory infringement in the United States (Intellectual Reserve v. Utah Lighthouse Ministry).” That said, those whose institutions don’t subscribe to IEEE Xplore (such as students in developing countries) can always Google for an illegitimate copy of the standard on the net; it’s not Wikipedia’s obligation to provide that service. Btw, for those in the U. S., buying the standard isn’t really that crazy an idea. I once sent away for a paper copy of the 754-1985 standard for an undergrad processor design project; it was the best $56 I ever spent. —Technion (talk) 10:15, 27 July 2010 (UTC)

The link to draft standard has been reintroduced, and I have been reverting it for the same reasons given above. There is no indication that IEEE has given permission for unlimited public release of the draft. I don't like that the standard isn't freely available, and I don't like removing the link to the draft. Unfortunately, I don't think the link is permitted. Please don't reintroduce the link without getting consensus on this page. Glrx (talk) 19:22, 13 December 2010 (UTC)

Many other draft standard documents are freely available, even when the final document is not free. Can someone figure out officially if this one should be available by link? (Maybe ask IEEE?) Gah4 (talk) 13:24, 24 September 2011 (UTC)

Misleading table?[edit]

The digits column of the table in the basic formats section might be a little misleading. I wasn't sure if it meant binary or decimal until I looked at the Wikipedia page on doubles. I think the column should be relabeled "binary digits". —Preceding unsigned comment added by (talk) 04:58, 10 August 2010 (UTC)

I think a second column should also be added that lists approximate numbers of decimal digits of precision. —Preceding unsigned comment added by (talk) 05:13, 10 August 2010 (UTC)

It wouldn't be then number of binary digits for the decimal formats. That's why the previous column gives the base. Didn't you wonder too about the +1 which is explained in the line just above the table? The standard doesn't give approximate decimal digits. The figures that could possibly be given are the various ones in IEEE 754-2008#Character representation. I'll stick a reference to that section under the table. Dmcq (talk) 09:14, 10 August 2010 (UTC)
I Just had a look at that section and it might not be best for the purpose. I think it would be allowable for me to stick the value of log10 2prescision in the table. This would give about 7.2 for binary32 whereas only 6 digits of a decimal number might be recovered going to it and back again and yet 9 digits are needed in decimal to ensure the binary value is got back again. Dmcq (talk) 10:17, 10 August 2010 (UTC)

Distinguishing zeroes[edit]

It might be worth mentioning that +0 and -0 can be distinguished, in a language that does not offer direct access to the bit pattern but allows infinity, by comparing their reciprocals. If X is a number that is not NaN, one can always determine the sign of X by testing (X + 1/X). JavaScript is such a language, and in it parseInt of a signed zero String returns a signed zero Number. (talk) 22:27, 26 October 2010 (UTC)

And if X is a NaN? Don't encourage contorted programming practice / workarounds. If the sign needs to be determined, then use a function such as copysign. If the implementation doesn't have the facility, then it should be added to the implementation. Glrx (talk) 18:04, 27 October 2010 (UTC)

Citations etc.[edit]

Ran across this article mentioned in the Wikipedia:Articles for deletion/IEEE machine. Four articles on this subject is probably too many, but not user if we want one, two, or three. There is much material here, but very few inline citations, so seems mainly written from personal experience and thus hard to verify. IEEE floating point which might be the common name, redirects here, but it is the IEEE 754-1985 that has the diagrams and background that a general reader might be more interested in, so maybe we should redirect it back there, or else consider a merge. IEEE 754 revision seems mostly personal narrative, with only two inline links instead of citations. It looks like it was split off of this one in October 2008 or so. The articles are otherwise well-written, so not sure if a merge is worth it. It just now requires reading all three to make sense. From the first sentence in the lead, a reader would expect this to be an article on the standard in general, not one revision of it. There could be more in the future too. W Nowicki (talk) 18:15, 21 September 2011 (UTC)

I would also add that the only two inline citations are in the lead, and mention an alphabet soup that would be fairly meaningless to most readers. My guess is that the same format was approved by another standards body, but if so, that should be stated explicitly in the body, with a summary in the lead only. ISO/IEC/IEEE 60559:2011 ... JTC1/SC 25 ... ISO/IEEE PSDO is not English. Maybe somone should propose a Wikipedia written entirely in acronyms and stanards numbers. :-) W Nowicki (talk) 18:23, 21 September 2011 (UTC)

My personal choice would be a single article titled with the common name, IEEE floating point, with a redirect for the standards number as being more accessible to most readers, but that could be just me. (Do we have a guideline that expresses a preference either way for common names versus standards numbers?) I also struggled some with the IEEE 754 revision article, wondering just what additional value it offered. Msnicki (talk) 18:27, 21 September 2011 (UTC)
In some sense only one citation is needed -- to the actual Standard; all assertions in the Wikipedia article can be checked against that.
On the articles IEEE 754-2008, IEEE 754 revision, and IEEE 754-1985 -- it might make sense to merge the first and third, but a lot of new material would need to be added to get the 2008 additions covered as well as the 'old' basics from 1985. This would make it very big and long -- perhaps a different structure altogether is needed: the (currrent) '2008' article as top-level, pointing to "IEEE 754 binary formats" and "IEEE 754 decimal formats" which detail the bits and bytes.
I think the revision article is definitely best kept separate as it refers to the history and process rather than the content. Important stuff, because it shows that due diligence was done over the 7 years by dozens of people, but it is probably not what most readers will be looking for. mfc (talk) 13:08, 22 September 2011 (UTC)
I agree that just merging the 2008 and 1985 articles would make a mess of things. The 2008 standard has a lot of additions and changes in it and the 1985 one is what most machines currently implement. You'd get an article with lots of ifs in it where a lot wasn't applicable to what's mainly out there. This article already refers to separate articles for more down to earth aspects like binary or decimal formats but it could do with a lot more linking to describe other things. Dmcq (talk) 13:54, 22 September 2011 (UTC)

Well if the 1985 standard is what is most common, then perhaps IEEE floating point should at least point there and not here? Given the much sadder state of some other standards articles, it probably makes sense to just work on some of the sourcing for now and avoid the major work of merges, until when and if there is a clear way to improve things. I never formally proposed a merge because it was not clear which way to go. What I am trying to avoid is a deletion fight when someone comes along and cites the rule that any unsourced material may be removed at any time. I do find the narrative of the group history interesting to keep, as long as it does not overlap this article too much. Maybe just rewrite the lead of this one a bit to clarify what we are saying here. Wikipedia articles should talk about the standard (e.g. cite estimates for how widely they are used) but need not be complete enough to allow someone to implement from the article - that is that the standards documents themselves are for, or other books and articles on the subject. W Nowicki (talk) 18:00, 24 September 2011 (UTC)


The spurious space from the change looks rather strange. This change should probably be reverted. If a user has a problem with his browser (the problem described by the user doesn't occur in Firefox), the browser should be fixed, not Wikipedia. Vincent Lefèvre (talk) 01:28, 4 October 2011 (UTC)

256 bit AVX[edit]

Is it true that AVX supports 256 bit floats? And if it does i guess they would be in the style of ieee 754, or are they about to become part of the standard? -- (talk) 18:08, 4 December 2011 (UTC)

No, what it support is doing a number of floating point operations in parallel, the V stands for vector which in effect means a number at a time. Dmcq (talk) 18:21, 4 December 2011 (UTC)
I think the IBM z10 series implements various 128 bit floating point formats in hardware and some older IBM and VAX systems also supported 128 bit foat though I believe they used a bit of software or microcode support via doubles. Dmcq (talk) 18:33, 4 December 2011 (UTC)
Quoting the FMA instruction set Article:
The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add (FMA) operations
Both contain fused multiply–add (FMA) instructions for floating point scalar and SIMD operations.

Sounds to me as there where 128 bit registers that could be just either for SIMD operations or for FMA for floats that would be 128 bit long. Or am i missunderstanding things? -- (talk) 18:33, 5 December 2011 (UTC)

And Bulldozer_(microarchitecture) :
Two symmetrical 128-bit FMAC (fused multiply–add capability) floating-point pipelines per module that can be unified into one large 256-bit-wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software

Or does it simply mean that four 64 bit floats are processed at the same time? -- (talk) 18:38, 5 December 2011 (UTC)

The operations deal with a number of floating point operations at a time, for instance one can do eight single precision or four double precision operations at a time with the 256 bit registers. The bit about the two 128 bits being used as a 256 bit is that for two cores on a chip their floating point power can be shared rather than spending a pile of silicon on separate parallel floating point units for the 256 bit vectors. Dmcq (talk) 20:57, 5 December 2011 (UTC)

Details please[edit]

Question: In IEEE 754, which bits/bytes are used as the significand and the exponent? What code is used for infinity, and NaN? This detail should be included in the article, in case anyone's curious. (talk) 21:06, 28 January 2012 (UTC)
Click on the blue link for the particular format you're interested in to get details about it. Dmcq (talk) 23:09, 28 January 2012 (UTC)
I agree that the details and examples in IEEE_754-1985 are very helpful. Should be included here or in a merged page. — Preceding unsigned comment added by (talk) 13:23, 20 April 2012 (UTC)
You can click on the links. You don't need everything in a singl article on the web when you have hyperlinks. The standard was much smaller in 1985, the decimal format needs a great deal of explaining and sticking all the stuff into this article as well as having separate articles is just unnecessary Dmcq (talk) 14:14, 20 April 2012 (UTC)


This article needs examples! — Preceding unsigned comment added by (talk) 12:56, 19 February 2012 (UTC)

Of what? The various formats are described in detail in other articles. Dmcq (talk) 13:04, 19 February 2012 (UTC)

Suggest merge[edit]

The two revision steps of the standard should be combined at IEEE 754 to avoid duplicate and inconsistent description of the common material. This would also make for a better presentation of the diffrences and significance of the differences between the two revisions. Individual technical standards are rarely notable and technical steps within those standards are even less so. --Wtshymanski (talk) 14:16, 6 March 2012 (UTC)

Currently, IEEE 754 is a redirection to IEEE 754-2008. Is this essentially a request for a name change? Benjaminoakes (talk) 14:10, 4 April 2012 (UTC)
A little more than that is needed, I think. I would imagine a combined article would first describe the motives and development of 754, and give the common elements of the two standards. Then the specific limitations of the 1985 release that were addressed in the 2008 version should be shown, and the differences between them explained, with reference to any practical difference this caused. --Wtshymanski (talk) 21:19, 4 April 2012 (UTC)
It seems to be a consistent practice in Wikipedia to have a main article on subjects with a revision history (e.g. COBOL) with sections on subsequent versions. I would concur with "Floating Point Standards" as a title and include information from ISO/IEC 10967. The progression of improvements to standards reflects on the advancement of the technology. I would like to see "How did we get to here?" -- Softtest123 (talk) 10:56, 27 April 2012 (UTC)

The redirect has variously been pointed at the -1985 and -2008 versions. I agree they should be merged. However, a more recognizable and precise title would make sense. The article is not so much about that standard document as about its contents, so I'd say that "IEEE standard floating point" would make more sense than any of the current redirects such as "IEEE floating point standard". Or just "IEEE floating point" would be good, as someone suggested already. Dicklyon (talk) 16:19, 7 April 2012 (UTC)

I agree with the idea that the title should suggest the article content and therefore concur with "IEEE floating point" but would also agree with "IEEE standard floating point". I still concur with the merge of this article with IEEE 754-1985. Softtest123 (talk) 21:41, 2 June 2012 (UTC)
I'll start an RM discussion at the end of this page. Dicklyon (talk) 21:43, 2 June 2012 (UTC)

Link to an Excel spreadsheet showing how to calculate to/from IEEE 754[edit]

This link was reverted with the explanation "Not a place to place tools. This site is for encyclopaedic information." I agree that it is not a place for tools, although there is already a link for an on-line calculator. Reference The spreadsheet was not written to be a calculation tool. There are many more efficient calculators avaible for this purpose. It was written to show how the calculation is done. I use this spreadsheet to show others the details of the actual calcuation. The first tab (hex to float) shows how two 16 bit integers(words) are broken down into 4 bytes and then 32 bits, and then how those bits are used to determine the sign, exponent and mantissa that are used in an equation to calculate the floating point number. The second tab (float to hex) does the opposite. It shows how a floating point number is used to determine the sign, exponent and mantissa, and how these are converted into 32 bits, then 4 bytes and finally two words. I believe it is valuable for educational purposes and should be included Batman2000 (talk) 14:49, 18 March 2012 (UTC)

I agree with the deletion. See WP:ELNO #8: WP doesn't want to link to material that requires external applications such as Excel. That other links exist is not an argument to include another link. Although I agree the current article does not do a good job of explaining or showing the bit encoding (IEEE 754-1985 makes an attempt), I don't see the Excel sheets being so helpful that I'd override #8. For our purposes, a few specific examples with text and a static diagram can be better than a program that can handle arbitrary values. Although the spreadsheet may help you to explain the encoding to others, the layout is busy, takes some effort to interpret, and begs some familiarity with Excel. The spreadsheet took a lot of commendable effort, but I don't think it is an appropriate external link for this article. Glrx (talk) 16:29, 18 March 2012 (UTC)
If you click on any of the links for a particular format eg binary32 it takes you to a page with everything you'd want to know about it. Dmcq (talk) 23:01, 18 March 2012 (UTC)

Expression evaluation[edit]

Concerning the sentences I added to Expression evaluation that were partially reverted, I have two comments: (1) I would agree that the C99 FLT_EVAL_METHOD as standardized only allows one to read the current preferredWidth and there is currently no standardized method to set it, which would be the ideal, and on rereading the relevant section of the current IEEE754 standard, it does indeed indicate that such a setting should be settable at block level (which would be great to see if/when some language implements that). However, given that the C99 FLT_EVAL_METHOD is the only standardized method that I am aware of (except perhaps Fortran 2003) to at least read the preferredWidth setting, I think it would be useful to the reader to reference it in some way-- perhaps "For C99, the FLT_EVAL_METHOD allows the reading of the current preferredWidth setting, although setting of this is not currently standardized."

(2) The second sentence that was reverted covers a separate important issue-- that the compiler bugs that have plagued the usage of double extended format (with compilers randomly not performing conversions to the destination format) is explicitly forbidden by the IEEE 754 standard (and the C99 standard). Many programmers are still not clear on this and so I think it would be very useful to add this sentence back--

For named variables, the language standard is required to respect, and convert to, the specified format and "implementations shall never use an assigned-to variable’s wider [preferredWidth] precursor in place of the assigned-to variable’s stored value when evaluating subsequent expressions". This removes a major source of inconsistency between language implementations. Brianbjparker (talk) 21:31, 6 April 2012 (UTC)

I'm not keen on sticking in random bits of bad implementations into a discussion of he standard. That can go in the C99 article I guess. I'll stick that bit back in about the assigned variable. Dmcq (talk) 22:10, 6 April 2012 (UTC)
Fair enough. Note that I have modified the text to clarify that preferredWidth is defining the format for temporary subexpression result variables within expressions. I think this is important to clarify as the current text could have been read as the calculation was at a higher precision but still rounded to a smaller internal temporary variable. Brianbjparker (talk) 14:25, 7 April 2012 (UTC)
That's interesting. My reading of that indicates the standard mandates double rounding if one sets the preferred width to extended and one adds two doubles and assigns to a double if following their recommendation. That doesn't sound at all right to me. I must check up on that, it sounds a bit wrong to me. Dmcq (talk) 22:45, 6 April 2012 (UTC)
Maybe I'm misreading it, but § 10.2 is not about preferredWidth (which is § 10.3) even though preferredWidth can play a part. It seems to be a save-the-programmer-from-himself provision when the intermediate values are higher precision than a final destination. An intermediate result (say a product) is computed and rounded to an extended format in an FP register. To store the result in the explicit final double destination dfX, it must be rounded a second time to a double. The statement in 10.2 prohibits the language from using the extended double value in the FP register for anything else. If the language did use the wider register value instead of the rounded dfX, then it might subsequently compare the double dfX with the extended FP register and decide they are different (due to rounding).
Right-- the fact that the previous IEEE 754 standard didn't specify that such explicit assignments and rounding must be honored by the compiler was the main cause of varying behaviour between compilers and on changing optimization levels within a given compiler. Many compilers as an "optimisation" would avoid doing the final rounding to the destination, or would do additional roundings during an expression if registers were spilled such that the results were random. Brianbjparker (talk) 14:25, 7 April 2012 (UTC)
In some situations, it can be a good idea. In other situations, it throws away some precision.
Glrx (talk) 00:39, 7 April 2012 (UTC)
Note that preferredWidth is different from the C99 FLT_EVAL_METHOD: In C99, FLT_EVAL_METHOD is chosen by the C implementation (not by the user), while in IEEE 754-2008, preferredWidth is chosen by the user ("preferredWidth attributes are explicitly enabled by the user [...]"). Vincent Lefèvre (talk) 00:44, 7 April 2012 (UTC)
Ok, yes. FLT_EVAL_METHOD is read-only and setting it is currently undefined in C (several compilers allow it to be set at a compilation unit level by compiler options). However, FLT_EVAL_METHOD 0 corresponds to preferredWidthNone (evaluate to type) and FLT_EVAL_METHOD 1 and 2 are the other two possible preferredWidthFormats (presumably if setting of preferredWidth per block is ever implemented in C compilers, which I would like to see, then FLT_EVAL_METHOD would give the default setting). Brianbjparker (talk) 14:25, 7 April 2012 (UTC)
It seems my reading was correct, setting preferred width to extended may cause the add of two doubles producing a double to do double rounding. I guess one always has these corner cases however one sets standards. Dmcq (talk) 14:22, 20 April 2012 (UTC)
Your statement seems confused. Setting preferredWidth to extended means adding two doubles results in an extended value (destination width is widest of operands and preferredWidth; extended is wider than double, so destination is extended; §10.3). If preferredWidth is set to none, then adding two doubles results in a double (destination width is that of the widest operand, which is a double). Setting pW to double would cause single + single to be rounded to double. Setting pW to single would still cause double + double to be rounded to double. Glrx (talk) 17:34, 21 April 2012 (UTC)
What I'm talking about is double a,b,c; ... a=b+c; If preferred width is set to extended then that will very possibly involve double rounding. Dmcq (talk) 20:41, 21 April 2012 (UTC)
Yes. Setting pW to extended can mean the result is rounded twice (aka double rounding). Adding b and c could be first rounded to extended precision, and then the explicit assignment to a would be a second rounding to double precision. forgot to sign: Glrx (talk) 21:09, 21 April 2012 (UTC)‎

Requested move[edit]

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the move request was: moved. Looks like there's also consensus to merge, but that doesn't require an admin. Jenks24 (talk) 05:07, 10 June 2012 (UTC)

IEEE 754-2008IEEE floating point – Per earlier discussions on the talk page, the page should be given a meaningful name and the other standard date article should be merge in. Dicklyon (talk) 21:45, 2 June 2012 (UTC)

  • Support. I don't think there's anything much more than the History section that can be copied over. Most of the rest is duplicates of stuf fin other articles that are referenced here. There is room for extra here though based on detailing better the change between the revisions. Dmcq (talk) 22:25, 2 June 2012 (UTC)
  • Support. General name/article is better. Glrx (talk) 03:44, 3 June 2012 (UTC)
  • Support. Per WP:COMMONNAME, "Wikipedia does not necessarily use the subject's "official" name as an article title; it prefers to use the name that is most frequently used to refer to the subject in English-language reliable sources." The common name is "IEEE floating point", not the spec number. Msnicki (talk) 15:48, 3 June 2012 (UTC)
    How about the other question about the merge? Dmcq (talk) 15:54, 3 June 2012 (UTC)
  • Support move and merge. Having all these separate articles is confusing. It makes more sense to merge them into one with a better name. If covering all the different revisions would make this article too large it would be better to split it into a main article and History of the IEEE floating point standard than to create a separate article for each revision. —Ruud 20:52, 3 June 2012 (UTC)
  • Support move and merge. There is additional information in Floating point that maybe should be in this article instead. That article should be aligned with this new one. Softtest123 (talk) 00:45, 4 June 2012 (UTC)
  • Support move and merge, agree with Softtest123's suggestion about moving appropriate bits from Floating point here and with Dmcq's suggestion about improving the coverage of the changes between revisions. 1exec1 (talk) 17:11, 4 June 2012 (UTC)
The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

Other merges[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
This merge discussion encompasses 3-4 separate wikipedia articles, all of which contain a lot of very technical and detailed information. It appears, based on the discussion below, that this is NOT a trivial merge discussion between two pages and is, in fact, a discussion on a major reorganization and rewrite of the information on the topic. Due to the highly technical nature of this topic, it's unlikely that you're going to find a "merge editor" willing to do the work for you (WP:MERGE really doesn't work that way). So due to the fact that this is not a merge discussion but more a discussion on reorganization, I am removing the merge tags and closing the merger discussion as no consensus. WTF? (talk) 17:12, 2 June 2013 (UTC)

Merges from IEEE 754-1985 and IEEE 754 revision have been proposed. I support the former because there is a lot over overlap between the this article and IEEE 754-1985. I don't support dumping all revision information in IEEE 754 revision into this article. I think this topic can be handled in WP:SUMMARY style. --Kvng (talk) 17:49, 22 August 2012 (UTC)

I think IEEE 754 revision needs to be rewritten in a more encyclopedic tone (see for example IEEE 754 revision#Clause 6: Infinity, NaNs, and sign bit, this is what you would expect to see in the "Revision Overview" section of the actual specification, not in an encyclopedic article). In a more concise form it would probably belong here as well. —Ruud 20:10, 22 August 2012 (UTC)
  • Support merging all three. Better to have one well-written article than to split our efforts among three. --Guy Macon (talk) 07:58, 23 August 2012 (UTC)
Perhaps the fist thing to do is to edit IEEE 754 revision. I agree with Ruud that it needs work. Once that's finished, a merge might be more palatable to me. --Kvng (talk) 18:50, 24 August 2012 (UTC)
  • Support: I see no reason why these cannot all be merged into a single article. A section on history could detail development and differences, etc. I also am adding IEEE 854-1987 to the mix proposing it also for merging as it is a short article and was superseded by IEEE 754-2008 as well. (talk) 11:50, 9 December 2012 (UTC)

The above discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.

If it's wrong can someone fix it?[edit]

Why does the layout example have the following statement at the end?

"The fractional parts are wrong in both these examples. The first one should be 1.171875 rather than 1.34375."

DGerman (talk) 02:08, 13 September 2013 (UTC)

The comment about the values being wrong was added on 9 Sep 2013, and the section was moved into this article on 25 Aug 2013 (apparently it had been at Audio bit depth).
I checked that the comment is correct at least as far as the first example goes (fraction is 1.171875 not 1.34375; the latter would occur if the fraction started with a single zero rather than two). Given that the example is new and incorrect, and that Single-precision floating-point format has a properly formatted and correct example (and is linked in "binary32"), I have removed the section as unnecessary. Johnuniq (talk) 04:00, 13 September 2013 (UTC)
The first one must be a copy error. The second one is correct as far as I can tell. I did both with a computer program and I just double-checked it. I assume the IP was too lazy to count all the way to that 20th 1 to actually check both problems. Radiodef (talk) 05:08, 23 September 2013 (UTC)
double m = (1 / 2) + (1 / 8) + (1 / 16) + (1 / Math.pow(2, 8)) + (1 / Math.pow(2, 10)) + (1 / Math.pow(2, 20));
double r = (1 + m) * Math.pow(2, 146 - 127);
System.out.println("result: " + r);
is the Java code. Radiodef (talk)

Rename to “IEEE 754”[edit]

Do rename kindly. There are many IEEE standards, and the only uniform and consistent way to cover them all on Wikipedia is to name each article after the technical standard name. There is no reason whatsoever to think up our own names when there are official names to them, so we will use those as we naturally should. Anyone interested in the verbose standard name won't die from a little effort of obtaining the standard's text and reading it. I would rename myself right away, but I don't have an account, nor do I plan to have it in the first place. — (talk) 18:09, 2 June 2014 (UTC)

Please read the "Suggested move" section above. Do you have any reason to believe that the consensus has changed in the two years since we had that discussion? --Guy Macon (talk) 18:54, 2 June 2014 (UTC)

Graph of precision[edit]

On 12 September, User:Ghennessey added a graph showing the precision of floating-point numbers (defined here as ulp(x)) as a function of value (x). I think such a graph is a very good idea, as it helps visualize the actual precision one can expect from float32 v.s. float64. The graph has however a few shortcomings, which I addressed by replacing it with a new graph. Then Ghennessey reverted my edit, with the argument that his original version “is of greater practical use” because it “can be used to select an appropriate format given the expected value of a number and the required precision”. I do not agree with this argument, and here I am explaining why. For reference, here are both graphs:

The main difference between the graphs is that my version shows the relative precision (defined as ulp(x)/x), while Ghennessey’s shows the absolute precision (i.e. ulp(x)). Both are relevant metrics of precision, but I argue that relative precision is more useful.

First, floating point numbers are designed to provide an almost constant relative precision across the whole range. This is in contrast to fixed-point, which provides a known absolute precision. Clearly, floating-point has proved to be more useful, as fixed-point is now seldom used outside embedded systems lacking an FPU. Programming with fixed-point is actually difficult, because one has to be aware of the required absolute precision and range at every intermediate step of every computation. The almost-constant relative precision of floating-point makes programming easier. This, in itself, is a proof that relative precision is more useful than absolute precision. Also, in the table before the graph, precisions are compared to numbers of decimal digits, which is a very intuitive way of thinking of precision, and is actually all about relative precision.

Then, there are a few other points which are a lot clearer on my graph:

  1. The precision curves on Ghennessey’s graph look like linear functions of the value, whereas they are actually step functions. My version clearly shows the discontinuities: the relative precisions are shown to be sawtooth functions. It then appears that the relative precisions of floating-point formats are not strictly constant, but instead oscillate within narrow ranges.
  2. I added the precision curves for selected numbers of decimal digits. This makes clear why it is difficult to compare a binary floating point precision to a number of decimal digits: the decimal precision is also a sawtooth function, with a wider range of variation, and is thus less consistent that binary floating point.
  3. There is an apparent contradiction in the article, where the precision of float32 is quoted to be about 7.22 decimal digits, and yet the section Character representation states that 9 decimal digits are required to properly store a float32 in decimal. My graph lifts the contradiction, showing that the precision of float32 lies mostly between 7 and 8 decimal digits, but occasionally gets better than 8 decimals (or rather, 8 decimals gets worse than float32).

In conclusion, my version provides a more comprehensive view of the precision one can expect from floating point numbers, including its discontinuities, not only in binary but also in decimal. Thus I will be reverting Ghennessey’s revert unless a discussion here shows some consensus against it.

— Edgar.bonet (talk) 09:20, 22 September 2014 (UTC)

Your graph has more useful information in it. However, as things stand, both graphs are probably difficult for non-experts to interpret properly. Yours, having more information, may be more difficult. I think it may be better to present the information available in the graphs as text. We need to improve the Formats section and perhaps add a separate section or sub-section to explain precision. Once that is done, readers may be in a better position to appreciate a graph like this. ~KvnG 13:45, 25 September 2014 (UTC)
I'll second Kvng — both graphs are technical, dense, and repetitive. To first order, both graphs say there's a first order relative error and a second order variation of that error due to the magnitude of the number. The first point is trivially stated (and the obvious trend in the first graph), and the second point does not seem that important. How many calculations are going to depend on that flutter? Can people scale calculations all the mantissas have several leading one bits? (Goes against log distribution of leading digits.) If scaling could be done, what impact would it have on computation? It's only squeezing an extra bit of precision. Glrx (talk) 18:13, 30 September 2014 (UTC)