Talk:Floating-point arithmetic/Archive 5

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 3

Archive 4

Archive 5

Zuse's Z3 floating-point format

There are contradictory documents about the size and the significand (mantissa) size of the floating-point format of Zuse's Z3. According to Pr Horst Zuse, this is 22 bits, with a 15-bit significand (implicit bit + 14 represented bits). There has been a recent anonymous change of the article, based on unpublished Raúl Rojas's work, but I wonder whether this is reliable. Raúl Rojas was already wrong in the Bulletin of the Computer Conservation Society Number 37, 2006 about single precision (he said 22 bits for the mantissa). Vincent Lefèvre (talk) 14:44, 21 September 2013 (UTC)

Error in diagram

The image "Float mantissa exponent.png" erroneously shows that 10e-4 is the exponent, while the exponent actually is only -4 and the base is 10. — Preceding unsigned comment added by 109.85.65.228 (talk) 12:14, 22 January 2014 (UTC)

Failure at Dhahran - Loss of significance or clock drift

This article states in section http://en.wikipedia.org/wiki/Floating_point#Incidents that the Failure at Dhahran was caused by Loss of significance. However, the article "MIM-104 Patriot" makes it sound like it was rather simply clock drift. This should be cleared up. — Preceding unsigned comment added by 82.198.218.209 (talk) 14:01, 3 December 2014 (UTC)

I agree. It isn't a loss of significance as defined by Loss of significance. It is an accumulation of rounding errors (not compensating each other) due to the fact that 1/10 was represented in binary (with a low precision for its usage). In a loss of significance, the relative error increases while the absolute error remains (almost) the same. Here, it is the opposite: the relative error remains (almost) the same, but the absolute error (which is what matters here) increases. Vincent Lefèvre (talk) 00:49, 4 December 2014 (UTC)

John McLaughlin's Album

Should there be a link to John McLaughlin's album at the top in case someone was trying to go there but went here?2602:306:C591:4D0:AD55:E334:4141:98FA (talk) 05:49, 7 January 2015 (UTC)

Done. Good catch! --Guy Macon (talk) 07:05, 7 January 2015 (UTC)

needs simpler overview

put it this way, I'm an IT guy and I can't understand this article, there need to be a much simpler summery for non tech people, using simple English. Right now every other word is another tech term I don't fully understand. -- thanks, Wikipedia Lover & Supporter

It seems that Mfwitten removed that simple overview. Perhaps, to enforce the WP:ROWN. He called this "streamlining". I have recovered mine affair, additionally reducing the 'bits part'. Yet, I am sure, IT department will be happy now. --Javalenok (talk) 18:56, 17 February 2015 (UTC)

Non-trivial Floating-Point Focused computation

The C program intpow.c at www.civilized.com/files/intpow.c may be a suitable link for this topic. If the principal author agrees, please feel free to add it. (Don't assume this is just exponentiation by repeated doubling - it deals with optimal output in the presence of overflow or denormal intermediate results.) — Preceding unsigned comment added by Garyknott (talk • contribs) 23:31, 27 August 2015 (UTC)

Lead

What does "formulaic representation" in the lead sentence mean?

In general, I think we could simplify the lead. I may give it a try over the weekend.... --Macrakis (talk) 18:52, 23 February 2016 (UTC)

Minor technical correctness error

Any integer with absolute value less than 2²⁴ can be exactly represented in the single precision format, and any integer with absolute value less than 2⁵³

These ought to say "less than or equal" instead of "less than", because the powers of two themselves can be exactly represented in single-precision and double-precision IEEE-754 numbers respectively. They are the last such consecutive integers. -- Myria (talk) 00:12, 16 June 2016 (UTC)

Epsilon vs. Oopsilon

Deep in section Minimizing the effect of accuracy problems there is a sentence

Consequently, such tests are sometimes replaced with "fuzzy" comparisons (if (abs(x-y) < epsilon) ..., where epsilon is sufficiently small and tailored to the application, such as 1.0E−13).

wherein 'epsilon' is linked to Machine epsilon. Unfortunately this is not the same 'epsilon'. Epsilon as a general term for a minimum acceptable error is not the same as Machine epsilon which is a limitation of some hardware floating point implementation.

As used in the sentence it would be perfectly appropriate to set that constant 'epsilon' to 0.00001. Whereas Machine epsilon is derivable based on the hardware to be something like 2.22e-16. The latter is a fixed value. The former is something chosen as a "good enough" guard limit for a particular programming problem.

I'm going to unlink that use of epsilon. I hope that won't be considered an error of sufficiently large magnitude. ;-) Shenme (talk) 08:00, 25 June 2016 (UTC)

spelling inconsistency floating point or floating-point

The title and first section say "floating point". But elsewhere in the article "floating-point" is used. The article should be consistent in spelling. In IEEE 754 they use "floating-point" with hyphen. I think that should be the correct spelling.JHBonarius (talk) 14:18, 18 January 2017 (UTC)

This is not an inconsistency (at least, not always), but usual English rules: when followed by a noun, one adds an hyphen to avoid ambiguity, e.g. "floating-point arithmetic". Vincent Lefèvre (talk) 14:26, 18 January 2017 (UTC)

hidden bit

The article Hidden bit redirects to this article, but there is no definition of this term here (there are two usages, but they are unclear in context unless you already know what the term is referring to). Either there should be a definition here, or the redirection should be removed and a stub created. JulesH (talk) 05:43, 1 June 2017 (UTC)

It is defined in the Internal representation section. Vincent Lefèvre (talk) 17:56, 1 June 2017 (UTC)

Seeking consensus on the deletion of the "Causes of Floating Point Error" section.

There is a discussion with Vincent Lefèvre seeking consensus on the deletion of the "Causes of Floating Point Error" from this article on whether this change should be reverted.

Softtest123 (talk) 20:16, 19 April 2018 (UTC)

It started with "The primary sources of floating point errors are alignment and normalization." Both are completely wrong. First, alignment (of the significands) is just for addition and subtraction, and it is just an implementation method of a behavior that has (most of the time) already been specified: correct rounding. Thus alignment has nothing to do with floating-point errors. Ditto for normalization. Moreover, in the context of IEEE 754-2008, a result can be normalized or not (for the decimal formats and non-interchange binary formats), but this is a Level 4 consideration, i.e. it does not affect the rounded value, thus does not affect the rounding error. In the past (before IEEE 754), important errors could come from the lack of normalization before doing an addition or subtraction, but this is the opposite of what you said: the errors were due to the lack of normalization in the implementation of the operation, not due to normalization. Anyway, that's the past. Then this section went on about alignment and normalization...

The primary source of floating-point errors is actually the fact that most real numbers cannot be represented exactly and must be rounded. But this point has already been covered in the article. Then, the errors also depend on the algorithms: those used to implement the basic operations (but in practice, this is fixed by the correct rounding requirement such as for the arithmetic operations +, −, ×, /, √), and those that use these operations. Note also that there is already a section Accuracy problems about these issues.

Vincent Lefèvre (talk) 22:14, 19 April 2018 (UTC)

Perhaps it would be better stated that the root cause of floating point error is alignment and normalization. Note that either alignment or normalization must delete possibly significant digits, then the value must be rounded or truncated, both of which introduce error.

Of course the reason there is floating point error is because real numbers, in general, cannot be represented without error. This does not address the cause. What actual operations inside the processor (or software algorithm) causes a floating point representation of a real number to be incorrect.

Since you have not addressed my original arguments as posted on your talk page, I am reposing them here:

In your reason for this massive deletion, you explained "wrong in various ways." Specifically, how is it wrong? This is not a valid criteria for deletion. See WP:DEL-REASON.

When you find errors in Wikipedia, the alternative is to correct the errors with citations. This edit was a good faith edit WP:GF.

Even if it is " badly presented", that is not a reason for deletion. Again, see WP:DEL-REASON.

And finally, "applied only to addition and subtraction (thus cannot be general)." Addition and subtraction are the major causes of floating point error. If you can make cases for adding other functions, such as multiplication, division, etc., then find a resource that backs your positions and add to the article.

I will give you some time to respond, but without substantive justification for your position, I am going to revert your deletion based on the Wikipedia policies cited. The first alternative is to reach a consensus. I am willing to discuss your point of view.

(talk) 20:08, 19 April 2018 (UTC)

Because you have not responded specifically to these Wikipedia policies (WP:DEL-REASON and WP:GF), I am reverting the section. Please feel free to edit it to correct any errors you might see. I would refer you to the experts on floating point such as Professor Kahan and David Goldberg

Softtest123 (talk) 23:03, 24 April 2018 (UTC)

You might not know, but Vincent is one of those experts on floating point. ;-)

Nevertheless, it is always better to correct or rephrase sub-standard contents instead of deleting it.

--Matthiaspaul (talk) 11:43, 16 August 2019 (UTC)

@Softtest123 and Matthiaspaul: I think that this is more complex than you may think. The obvious cause of floating-point errors is that real numbers are not, in general, represented exactly in floating-point arithmetic. But if one wants to extend that, e.g. by mentioning solutions as what was expected with this section, this will necessarily go too far for this article. IMHO, a separate article would be needed, just like the recent Floating point error mitigation, which should be improved and probably be renamed to "Numerical error mitigation". Vincent Lefèvre (talk) 14:46, 16 August 2019 (UTC)

I agree that "...real numbers are not, in general, represented exactly in floating-point arithmetic" so then the question is, "How does that manifest itself in the algorithms, and consequently the hardware design?" What is it in the features of these implementations that manifests the errors?" As I have pointed out, rounding error occurs when the results of an arithmetic operation produces more bits than can be represented in the mantissa of a floating point value. There are methods of minimizing the probability of the accumulation of rounding error, however, there is also cancellation error. Cancellation error occurs during normalization of subtraction when the operands are similar, and cancellation amplifies any accumulated rounding error exponentially [Higham,1996, "Accuracy and Stability...", p. 11]. This is the material that I presented that was deleted.

Softtest123 (talk) 18:14, 16 August 2019 (UTC)

Interestingly, it just so happens that this week I have been doing some engineering using my trusty SwissMicros DM42 calculator[1] which uses IEEE 754 quadruple precision decimal floating-point (~34 decimal digits, exponents from -6143 to +6144) and at the same time am writing code for a low end microcontroller used in a toy using bfloat16 (better for this application than IEEE 754 binary16 which I also use on some projects). You really have to watch for error accumulation at half precision. --Guy Macon (talk) 19:28, 16 August 2019 (UTC)

The effect on the algorithms is various. Some algorithms (such as Malcolm's algorithm) are actually based on the rounding errors in order to work correctly. There is no short answer. Correct rounding is nowadays required in implementations of the FP basic operations; as long as this requirement is followed, the implementer has the choice of the hardware design. Cancellation is just the effect of subtracting two numbers that are close to each other; in this case, the subtraction operation itself is exact (assuming the same precision for all variables), and the normalization does not introduce any error. Vincent Lefèvre (talk) 20:13, 16 August 2019 (UTC)

Fastfloat16?

[ https://www.analog.com/media/en/technical-documentation/application-notes/EE.185.Rev.4.08.07.pdf ]

Is this a separate floating point format or another name for an existing format? --Guy Macon (talk) 11:32, 20 September 2020 (UTC)

Same question for [ http://people.ece.cornell.edu/land/courses/ece4760/Math/Floating_point/ ] Somebody just added both to our Minifloat article. --Guy Macon (talk) 11:37, 20 September 2020 (UTC)

As the title of the first document says: Fast Floating-Point Arithmetic Emulation on Blackfin® Processors. So, these are formats convenient for a software implementation of floating point ("software implementation" rather than "emulation", as they don't try to emulate anything since they have their own arithmetic, without correct rounding). The shorter of the two formats has a 16-bit exponent and a 16-bit significand (including the sign). Thus that's a 32-bit format. Definitely not minifloat. And the goal (according to the provided algorithms) is not emulate minifloat formats either (contrary to what I have done with Sipe, where I use a large format for a software emulation of minifloat formats). In the second document, this is a 24-bit format with a 16-bit significand, so I would not say that this is a minifloat either. — Vincent Lefèvre (talk) 16:23, 20 September 2020 (UTC)

Thanks! That was my conclusion as well but I wanted someone else to look at it on case I was missing something. As an embedded systems engineer working in the toy industry I occasionally use things line minfloat and brainfloat, but I am certainly not an expert. I fixed the minifloat article. --Guy Macon (talk) 17:50, 20 September 2020 (UTC)