Talk:Floating-point arithmetic

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Start

This article has been rated as Start-class on Wikipedia's content assessment scale.

???

This article has not yet received a rating on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Archives

Time to Archive previous discussion?

Yes, I agree. We're at 107 kilobytes, shouldn't be over 32. William Ackerman 01:54, 7 December 2006 (UTC)[reply]

Done, in two batches. We are still a little heavy, at 56 KB. William Ackerman 16:03, 8 December 2006 (UTC)[reply]

More discussion of introductory material

[The following is a digest of a discussion between Amoss and William Ackerman].

From Amoss:

In reworking the intro you seem to have reverted the text drastically to a much early version focusing on floating point as a numeral representation. There is some discussion on the talk page about why this was changed that you have not added to. In particular: Floating-point is a system of arithmetic that operates on a particular representation (also called floating-point). You've also reverted the intro to the claim that floating-point numbers are representations of real-numbers. This makes no sense and there is much controversy on the subject on the talk page. All floating point numbers are instances of a set that is a sub-set of the rationals. There are no irrational elements, and so it makes no more sense to call them Real than it does to call them Complex. As a result there is now a clash between the description at the top, and that later on. Amoss 16:54, 5 December 2006 (UTC)[reply]

From William Ackerman:

I made the "above the TOC" (table of contents) section list 4 ways of representing numbers: (1) integers (point implicitly at the right), (2) ordinary written mathematical notation (with a point), (3) written scientific notation (with "x10^-3), and (4) the real thing. I believed that understanding (4) required a context of (1), (2) and (3). It was in response to a comment by JakeVortex that I put those 4 below the TOC, listing only (4), without its context, above the TOC. I really believe that this organization, discussing only (4) above the TOC and listing the context shortly afterward, is the right thing, and that at least the first few paragraphs, both above and below the TOC, are starting to be really correct. I really believe that this description, as a sort of numeral representation, is the right way to describe it, and that my sentence "could be thought of as a computer realization of scientific notation" is a good one.

The bit about rationals: I consider the fact that all representable FP numbers are rational to be a coincidence, and not fundamental, which is why I've been downplaying that. (I don't remember at the moment whether I completely reverted your text along those lines, or just seemed to.) Aside from the fact that all FP numbers represent reals that happen to be rational, there is nothing special about the rationals here. People use FP to solve differential equations, invert matrices, etc. etc. These are generally thought of as operations over the field of reals. If FP arithmetic were to be suddenly magically endowed with the ability to represent all rationals exactly, all the usual accuracy problems would remain. Well, most of them. You could invert matrices exactly, but you couldn't solve diff. eq.s, or compute pi or exp or log or sin ....

The fact that FP doesn't represent all reals, and the accuracy problems that arise therefrom, are sort of a hot-button item for me. I've seen too much "floating point mysticism" (or maybe "superstition", but really it's ignorance), and I want to be sure this article dispels same. Therefore, it's important to me that the article say that FP numbers are intended to model the reals, and that they do this only approximately because they exactly represent only a subset of the reals and any result is rounded to the nearest representable number. It's perfectly appropriate to mention somewhere that those representable numbers are rational, but that's not what is fundamental, and it really shouldn't be in the first few paragraphs.

William Ackerman 00:47, 6 December 2006 (UTC)[reply]

[End of material from user pages.]

Hi William, thanks for your detailed reply to my message on your talkpage. Firstly, I didn't mean to accuse you of "angry reverts", if my message came across that way then please accept my apologies. Now, onto the business of the page itself:

The changes in the past month are very drastic, and a huge improvement. I think that we should archive all the other sections on this page and continue with what the page needs now. There seems to be three or four active contributors at the moment, I have no idea how many of the old contributors on here are still watching the talkpage.

1. Numeral-representation or system of arithmetic? There does seem to be a lot of old discussion about this. I would say that the term floating-point does refer to two separate things; a representation and a system of arithmetic. All of the old versions of the intro were quite unwieldy as it is alot to get across in so few words. The current version is quite straight-forward and works well as an intro. Perhaps the nasty details of both could be pushed into one of the body sections?

2. Reals or rationals? I can see your point on this. Strong evidence in favour of the "subset of the reals", shortened to "reals", is that it does make the exposition clearer and less unwieldy, it's still technically correct, and it's the terminology that Kahan uses. But I think that the exact description that they are a particular subset of the rationals needs to go in somewhere. In particular it helps explain the examples like why 0.1 can't be represented exactly. Also, the arithmetic operations make more sense as operating on this set of rationals.

3. Overall structure? The new structure with the Overview split into lots of the old subsections really works well. I would say that the front of the article works well, and it is only the "tail" that still needs serious work.

Amoss 15:02, 6 December 2006 (UTC)[reply]

Even more discussion of introductory material

I think the basic material is here, and the problem is organizing it. The subject matter of floating point is proving to be extremely difficult to get organized and ordered properly. Rather than just going ahead with my vision of how things should be (possibly reverting other people's material) I think it would be good to discuss an outline here on the talk page.

There are many things we want to say, and they are all trying to find the right section, and trying to compete for a spot near the top. It think they include:

The very first paragraph (above the table of contents). Does this properly capture what floating point really is?

I think that it is getting there. I though the phrasing of "some kind of" designation for the radix point was a little weak. So I've made it more direct and inserted a sentence that mentioned arithmetic. It uses a similar weak / glossy mention of accuracy but hopefully it reinforces that the representation is exact, and the operations are approximate. Amoss 17:04, 1 January 2007 (UTC)[reply]

Do we want to say something about the "finite window" of significand digits? (I happen to believe that, on a theoretical basis, the finiteness of the window is an important aspect of what FP really is, but I don't know whether it's a point that we can/should make explicitly.)

Yes it is an important point. Without a finite restriction on the window the approximation becomes exact. I think that it should be explicitly explained although I'm not quite sure where abouts it would fit. Amoss 17:04, 1 January 2007 (UTC)[reply]

Near the top, there is a list of 5 ways of interpreting things (integer, common notation, fixed point, scientific, FP.) Is this the right thing? Does fixed point belong here? (It's more a computer thing than a human thing.)

On the one hand, it is easiest to explain floating-point as a comparison to other formats. On the other hand maybe there should be a separate page for formats that explains the differences between them in more detail? Fixed-point is nice because it is essentially floating-point with a constant exponent. I this is used by people, I remember being taught to evaluate expression using a fixed number of decimal-places (as opposed to significant digits). At the moment there is some redundancy between this section and the following nonclementure, I'll see if it is easy to merge the relevent sentences together. Amoss 17:04, 1 January 2007 (UTC)[reply]

We have a later separate paragraph about alternative computer representations, e.g. arbitrary precision, bignum, symbolic. Is this the right thing to do? Should fixed point computer representation be listed only here?

Again, this is potentially material for a formats page, but taking it out makes it harder to explain floating-point. Amoss 17:04, 1 January 2007 (UTC)[reply]

Do we want to point out early on that FP gives only a subset of the reals, and, in fact, a subset of the rationals? I think this needs to be a recurring theme in the article because it's so important. (We need to go into more detail later, saying that it is [0, 2^p-1] × 2^any.) Where should it be introduced first? Where should we introduce the non-representability of 1/3, 1/10, and π? (I'm inclined to move forward on this, putting this material down around the "value" and "the conversion and rounding" sections.)

The value and rounding sections sound like the right place place for this. Non-representable values should go in the value section I think. Amoss 17:04, 1 January 2007 (UTC)[reply]

Do we need a "misconceptions" section? I tend to think so, because I have seen a lot of FP superstition/mysticism, and I'd like this article to straighten that out. For example, FP numbers are not approximations, they are exact. It's the FP operations that aren't exact. We probably ought to mention, in this section or elsewhere, the real-world things that can interfere with IEEE's vision of making FP deterministic (and thereby perpetuating the superstition.) This includes compilers using 80-bit arithmetic at their whim, and the compiler switches (e.g. "/Op") to prevent this.

I think this section is important, and it should go directly before the numerical analysis part. So firstly explain where the floating-point representation is exact that people generally don't realise. Then lead into where arithmetic can be unstable and what not to do. Amoss 17:04, 1 January 2007 (UTC)[reply]

There used to be a mention of the extreme economic importance of FP in science, technology, industry, etc. There was also mention of "FLOPS". It was taken out. Should it be put back? Where? Perhaps in the topmost section, above the TOC?

Someone has put it back in, I think it adds to the intro and should stay. I've added a statement to it that explains why having a large automatically changing range is the important feature that makes floating-point desirable. The applications could be expanded, they are the "classic" list in some sense, but now games and multimedia performance is just as important, and more widespread. Amoss 17:04, 1 January 2007 (UTC)[reply]

William Ackerman 02:20, 11 December 2006 (UTC)[reply]

Real vs rational

The top section of the article describes a floating point number as representing a real number, whereas this is not really true: floating point cannot fully represent an irrational number, therefore only rational numbers are representable in floating point systems. I'm commenting here rather than fixing it, because I can't believe this hasn't come up before, yet the text is still there at the top, so I'm guessing somebody must have a reason for this. JulesH 11:43, 13 February 2007 (UTC)[reply]

Is it not saying that it represents a specific real number (rather than all possible real numbers)? Perhaps could be clarified. mfc 13:48, 13 February 2007 (UTC)[reply]

The problem that JulesH is addressing is that the specific real that is represented happens to be a rational number. As to why we talk about it representing a real rather than specifying a rational, it's a convention that is used in most literature. There is no better way to represent a real anyway. So when we use the constant pi we use the best approximation of pi that is possible with the precision we are using. Taemyr 20:43, 9 June 2007 (UTC)[reply]

An incorrect convention just the same. Accuracy is preferable to reflex conservatism, of course we probably should note that such a convention exists. The argument that pi is only approximate also applies to the decimal expansion of 1/3, it too has no complete representation in floating point.

Also, in the first sentence:

In computing, floating-point is a numerical-representation system in which a string of digits (or bits) represents a rational real number.

...rational real number is as redundant as natural rational number or natural real number. The likely intent was that rational applied to the actual FPN, while real was the ideal; the painting as compared to the model; the map and the territory; waffling. --AC 07:55, 21 June 2007 (UTC)[reply]

Although it doesn't directly affect the real vs. rational discussion, I'd like to point out that if the IEEE standard is our exemplar model of a floating-point system, and this standard includes representations for infinities and NaN, then a floating-point system isn't just restricted to real/rational quantities. ~ Booya ^Bazooka 16:26, 21 June 2007 (UTC)[reply]

Agreed that the IEEE made an excellent standard. However, while NaN and infinities are honorary members of a particularly good implementation and useful aids to machine calculation, (though little known and seldom used, thus needlessly reinvented), I'd wonder if these symbols, opcodes, and underlying silicon shouldn't be considered more as algorithmic conventions rather than (in)finite digital strings. Useful exceptions when used correctly, but not general -- the IEEE infinities can't make FPUs produce all the digits of pi. Those infinities and NaN are floating point in the same fuzzy sense that David Rice Atchison was President of the United States. --AC 08:08, 22 June 2007 (UTC)[reply]

...wording seems unnecessarily complicated - a "rational real number" is just a rational number, isn't it?

No thats not the same! A floating point number is a representation of a real value. Because its finite, it must be from a rational subset of R. So its rational as value and real, when used in a longer calculation. Sometimes a phrase is constructed very carefully and not only in prophecies of modern british authors. This is one of them. But rational real number should be better explained perhaps. --Brf 07:46, 9 July 2007 (UTC)[reply]

From the rational number article: "The rationals are a dense subset of the real numbers" - If rationals are a subset of reals, than specifying "real" is redundant. ~ Booya ^Bazooka 11:05, 10 July 2007 (UTC)[reply]

Until I read the article on rational numbers I was neutral on which wording was best. This article pointed out that the reals include irrational numbers like pi and e. Floating-point numbers don't include the irrational numbers and so the rational number link is of higher value (ie, more relevant information), in my opinion, than any alternative link. Derek farn 16:50, 10 July 2007 (UTC)[reply]

(I moved the two discussion parts together) --Brf 06:44, 12 July 2007 (UTC)[reply]

Hi Guys, this has all been argued through before. It's possible that the talk page has been chopped since then (I even vaguely remember requesting it). The basic argument (as always) is that floating point representations are entirely contained within the set of Rationals and therefore shouldn't be referred to as Reals. In fact they are entirely within the subset of the Rationals given by products of integers and powers of two, but that never seems to crop up as a reason to describe them that way. Every floating point number is a real number. It is not a representation of a real number, but actually a real number. Yes, it is also in a subset of the Rationals, but the semantics of floating point operations (rather than just the representations) assume that these numbers are Real. William Ackermann posted a very good link to an article by Kahan on the subject. If William is reading then perhaps he could post the link again and settle this debate before it starts all over again? Amoss 13:51, 11 July 2007 (UTC)[reply]

The problem in this discussion is, that floating point numbers (fpn) have several differents aspects. As a number set they are a finite subset of the rational numbers (rat) and not even a dense one. They were constructed for approximate calculations in the real range (rl) and so they follow mostly the arithmetic rules of rl. Exceptions are discussed in numerical analysis. Their use is to calculate real results. Concerning their applications flps are real. And a reader of the article is sent into a wrong direction when reading that flps are rational. How would you call an arithmetic using fractions? Are fractions simply integers, because only integers are involved? --Brf 06:44, 12 July 2007 (UTC)[reply]

Request for info

The following was edited into the main article. I have reverted same, and will try to formulate a reply for this person.

Actually floating point are processed in binary format. there formats. 1-> IEEE .

I want to know what is IEEE. and how it is normalised in mantissa and exponent form

Please verify this and reply same

debakanta.rout@feelings.com

William Ackerman 00:37, 15 April 2007 (UTC)[reply]

That hyphen

Both floating point and floating-point are used interchangeably here. Which is correct? The article should be standardised on one of them. 172.188.190.67 14:55, 21 August 2007 (UTC)[reply]

Both are correct (but may not be used correctly):

floating-point is the adjective, as in 'floating-point number'.

floating point is an adjective and noun, as in 'the number has a floating point'

mfc 13:17, 29 August 2007 (UTC)[reply]

Clarification

The article mentions using the natural logarithm of a number stored in fixed-point as an alternative to the IEEE floating point format, does anyone have a reference to how that may work? Specifically, how would addition and subtraction be implemented without first exponentiating the operands and taking the logarithm of the result? —Preceding unsigned comment added by Somenick (talk • contribs) 09:02, 26 September 2007 (UTC)[reply]

Addition

Some of the wording under the Addition heading is completely mangled and unreadable, to the point that I can't clean it up because I don't even know what it's supposed to mean. Examples: "The following example is decimal means base is simply 10."; "This is nothing else as converting to engineering notation. In detail:". Largesock 19:44, 27 September 2007 (UTC)[reply]