Talk:Fixed-point arithmetic

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated Start-class)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.

Usage for speed[edit]

My understanding was that games and other graphical applications might use fixed point for speed even when the architecture *does* have an FPU. Perhaps someone who knows about this could add a sentence or two. —The preceding unsigned comment was added by 11:47, 6 July 2005 (talkcontribs)

Standard notation[edit]

Is the standard notation for fixed point "1.15" or "Q1.15" ? —The preceding unsigned comment was added by DavidCary (talkcontribs) 06:53, 24 October 2005 (UTC1)

In Texas Instruments documentation I have only seen Qn notation where n is the fractional bit count. —The preceding unsigned comment was added by Petedarnell (talkcontribs) 23:28, 24 August 2006 (UTC1)

TI describes their use of "Qm.f" notation in their libraries here:, Appendix A.2 "Fractional Q Formats", pg. A-3.

Note, that TI makes the sign bit implied. I.e. the word length (in bits) is m+f+1. This differs from the description in the main page which seem to say that the word length is m+f.

Would the someone kindly verify if this difference is truly a difference in common usage? Bvacaliuc 18:35, 14 October 2006 (UTC)

Hello all

If we are talking about a general fixed-point notation, please be aware that fixed-point notation is often used in the context of hardware descriptions (e.g., VHDL, Verilog): I would strongly prefer a concise notation with a clear definition of the sign bit. According to this thread:

  • TI: implicit additional sign bit, i.e.,
 signed Q1.14 has 16 bits
 unsigned Q1.14 has 15 bits
  • Matlab: implicit sign bit, i.e.,
 q = quantizer('fixed', [16 14], 'round', 'saturate') corresponds to signed Q1.14
 q = quantizer('ufixed', [15 14], 'round', 'saturate') corresponds to unsigned Q1.14

Both notations, Q1.14 and [16 14], are not entirely clear. Personally, I heavily used Matlab with the above notation for fixed-point characterization of VHDL blocks. I do not prefer one of both, but the Matlab notation has the slight advantage that you instantly know the total word width, without requiring any additional information.

best regards, Peter, 2009/07/21 —Preceding unsigned comment added by (talk) 13:02, 21 July 2009 (UTC)

Hi! Q15 is the same as Q1.15. If the number of integer bits is 1 then it can be omitted because it's the most commonplace form of fixed-point arithmetic. That's when signal gets values from -1.000 to a little less than 1.000 . That's massively useful in signal processing, where the magnitudes of signals are traditionally represented this way. That again is due to that it's the native format in which the integer alu of a 16-bit processor (such as the 8086) calculates when using signed integers. Early computers had no floating point logic thus the signal processing had to be done using integer computers. For instance in the 8086 you would do a unsigned Q16 multiplication using the MUL command. It takes the first operand from AX (in 16-bit mode), the second operand from general purpose register or memory and stores the result in DX AX (DX has the high 16 bits and AX has the low 16 bits (16 bit number multiplied by 16 bit number is 32 bit number)). So I would calculate:

 MOV AX, (first Q15 number)
 MOV BX, (second Q15 number)
 ;-- the result is now in DX, I can copy it from there somewhere.

Since by selecting the 16 leftmost bits (DX) as results, I get the shift-by-16 free and my result is thus in Q16 form. Try it! Great fun! I'd imagine this is how the fixed system was "discovered".

What really bugs me is that there doesn't seem to be a standard for differentiating between signed and unsigned fractionals. The only relevance that can be trusted is that the Q notation tells the number of fractional bits. So Q15 means that there are 15 bits after the decimal point (well actually it should be called the binary point). It doesn't globally guarantee anything else about the internal representation. For instance the internal representation could be two's complement (common) or sign+magnitude (uncommon, except in floating point units). But that's not so bad since you seldom need to know the internal representation: It does not matter in which format the data is inside the computer if you don't need to transfer from one computer to another in raw form. It guarantees that you have 15 fractional bits and that's it. Deep down in the depthness of the machine you might even have more (for instance if the hardware is 32-bit for instance) but that should not matter.

If you're really into it, here's an article about fixed point and TI dsp's:

Somebody put something of the above to the article. I've grown tired of edit wars.

P.S. By the way, the line "1:8:23 // signed 8-bit fixed point with 24 bit fractional, the IEEE 754 format (citation needed)" is total crap since IEEE_754 is a floating point, not fixed point standard.

-Panze 06:13, 9 May 2007 (UTC)

a few wiki pages that mention fixed-point arithmetic[edit]

Fixed-point arithmetic

Computers and real numbers

Portable Fixed Point

Inkscape renderer issues

other fixed-point pages[edit]

Getting a Speed Boost with Fixed-Point Math by Daniel W. Rickey

The "double blend trick" by Michael Herf

Developing Smartphone Games by Andy Sjostrom of businessanyplace, 2003-01: For games on the ARM processor, use fixed point math, not floating point math.

"fixed-point" vs. "fixed point" and "floating-point" vs. "floating point"[edit]

Should not these to be consistent? English is not my native language so somebody please help out. I find it strange that wikipedia sticks to "fixed-point arithmetic" and "floating point". Velle 12:26, 25 March 2006 (UTC)

Look at the talk page for Floating point in the section called "hyphen or not?". —The preceding unsigned comment was added by (talkcontribs) 05:53, 28 May 2006 (UTC1)

Modern games[edit]

Tom Forsyth is a respected game developer. Here he explains why even modern games may prefer fixed point over floating point, for reasons of precision rather than speed: A matter of precision. —The preceding unsigned comment was added by (talkcontribs) 12:02, 3 October 2006 (UTC1)

His message: use more bits; and if you want speed use integer ratios. With more bits a floating point number has no hardware support or is slower. It is a very good article. As I said below if the mantissa is the same length as the integer and you restrict yourself as you have to when using integer ratios; they both have the same accuracy. Charles Esson 21:56, 7 April 2007 (UTC)

Name Binary scaling[edit]

It should have this name because that is what it is refferred to in old embedded assembler programs. —The preceding unsigned comment was added by (talk) 14:05, 19 February 2007 (UTC).

Then have Binary scaling redirect to fixed-point arithmetic, and note that fixed-point arithmetic is also known as binary scaling in old embedded assembler programs. These are clearly the same technique, and it is confusing to have two different pages with different descriptions. 06:29, 28 March 2007 (UTC)

No, that won't work because there is lots of old code that still uses B notation. If you re-wrote the fixed point section to accurately include how B notation is used, and how angles use it (including working out transendentals using B0) then it could be merged. But you would need someone who knew what they were doing or you could get it wrong (and there is enough 'wrong' stuff on wiki anyway, don't want any more of that ! (talk) 23:34, 5 January 2009 (UTC)

The general case is integer based arithmetic[edit]

(a/base * b/base)/base = a*b/base

((a/base)*base/(b/base)) = (a/b)/base

a/base + b/base = (a+b)/base


This can speed things up using q numbers; that is make the base a power of 2. The formulas then become

(a/base * b/base)/shift_rightQ = a*b/base

((a/base)shift_leftQ/(b/base)) = (a/b)/base

a/base + b/base = (a+b)/base

I think Q numbers and binary scaling are the same thing being described with different words ( but I am not sure because the purist might say Q numbers have to be signed magnitude). You need to be a little careful with fixed point arithmetic; it is only the same if the radix is binary. Fixed point using a decimal radix would be a different animal.

The comment on floating point not being as accurate is wrong; if you place the same restrictions on your floating point calculations as you have to place on your calculations when doing fixed point maths and the mantissa of the floating point is of the same length as the integer the results are the same. If the floating point number has a longer mantissa than the length of the integer being used for the integer based maths the floating point result will be better. Further many integer calculations get done without proper rounding ( it still matters) the floating point unit generally does it properly.

>> Charles, for the same # of bits, say 32, fixed point is more accurate since all 32 bits of precision is used. 32 bit Floating point will only use 24 bits of precision with 8 bits of exponent. Pete Darnell 22:28, 31 May 2007 (UTC)

The article asked for a comment :-) Charles Esson 11:18, 6 April 2007 (UTC)

And looking at the links in my comment there needs to be a computer article on floating point; the current entry doesn't come close to dealing with the topic (and it is a interesting topic) . Just checked IEEE 754 the fractional part is called the mantissa; so the linked page for mantissa also has some issues.Charles Esson 11:36, 6 April 2007 (UTC)

IEEE 754 (and the current revision, IEEE 754r) both use the term significand for the fractional part, not mantissa. mfc (talk) 12:10, 29 March 2008 (UTC)

It should not be merged[edit]

The main article fixed point arithemetic is a confused presentation of binary based fixed point stuff; the examples in the section Current common uses of fixed-point arithmetic give examples in binary and decimal base fixed point stuff.

I have to do more reading to make sure but I think the general thrust should be:

Fixed point arithmetic is the general set presented in a way that supports binary and decimal radix. Binary scaling and Q numbers have a binary base. Binary numbers probable should go to Q number. I think Q numbers need to describe the format Qn.n at the moment it only mentions Qn which is a subset of the former. Charles Esson 21:37, 9 April 2007 (UTC)

I'm confused. What should not be merged? -- (talk) 03:37, 28 March 2008 (UTC)

Please don't merge the Q format article into the Fixed Point article. The Q format article is substantial and specific to the Q format, not a stubby article full of general fixed-point information. It deserves to stand on its own. I found it very useful as-is. RolfeDH 12:09, 4 Sept 2008 (UTC)

There should not be a merge. Not of any of the articles. That they are/are not the same part of math. is irrelevant. Wiki is not a 'Learned Journal' rather it is a multi-linked learning resource. All that is needed is a link to articles on the 'other parts' of the subject. I found it all via a Qn.Qn enquiry. Now I know it's called Fixed-point. (talk) 17:26, 29 January 2010 (UTC)

I don't think the articles should be merged, for several reasons. First, the resulting article would simply be too big. IMHO one of the benefits of the wiki/hyperlink approach is that you can write a series of fairly short, succinct articles on related subjects and then cross-link them. Articles that are pages long with 25 sub-sections seem to defeat the whole purpose of a general reference encyclopedia.

Secondly, I think the current articles already confuse several issues. "Fixed Point" is a generic term for any finite numeric representation system in which the position of the radix (base) point is not explicitly included as part of the representation itself. One way of categorizing fixed point representation schemes is based on the particular radix/base that is used. For historical reasons, over the last 50 years the most important practical application of fixed point numeric representation has been in the field of computer science where for technical reasons (existing digital logic circuit design constraints) the radix/base used has overwhelmingly been 2; however, it is important to begin any discussion of fixed point by pointing out that it is a concept that can be applied to any radix. It is very possible that in the near future, digital logic gate design constraints may change, resulting in hardware that is capable of operating efficiently in other radices (three, four or more). For historical reasons, the bulk of the discussions should focus on binary fixed point and its applications, but the intro should not confuse the reader that this is the only meaning of the term "fixed point."

"Q Format" is a specific binary (radix=2) fixed point representation scheme that was developed and promoted by a particular company (Texas Instruments) in a particular context (embedded systems). It has specific historical connotations that should probably be clearly explained (independent of the subject of fixed point representation in general). I'm not familiar with the term "B format", but to the extent that it was a precursor to the widespread adoption of TI's Q format it should either be discussed as part of that article or given its own article.

"Binary Fixed Point Arithmetic" is a sub-set of the more general topic of fixed point numeric representation. It deals with the arithmetic operators (+, -, *, /) and their specific properties when applied to numbers represented using the fixed point representation scheme. This article should address issues such as overflow that occur as the result of applying arithmetic operators to fixed point representations. I would argue that the topic of "scaling" (which is a technique that is often used in conjunction with the application of arithmetic operators to fixed point representations schemes as part of implementing an algorithm) should be treated as a separate subject (see below).

"Scaling" (sometimes referred to as "Slope-Bias Scaling" is a very general technique that can be used to map a given set of real values (with a particular range and precision) onto a particular fixed point numeric representation scheme using the following formula: real-world value = (slope x integer) + bias, where the slope can be expressed as, slope = (slope adjustment contant) x radix^exponent. It is the key step in determining a) whether a particular real world requirement (range and precision) can be satisfied by a particular finite fixed point representation scheme (i.e. do you have enough bits to cover the required range at the required precision?) and b) being able to convert between and adjust fixed point representation schemes "on the fly" in order to avoid overflow while preserving precision.

Again, the term "Binary Scaling" refers to the specific case where the radix/base of the underlying fixed point numeric representation scheme = 2. From a historical perspective this is the most significant application and should thus be the focus of the article (without losing sight of the fact that it is a special case of the more general concept of scaling).

Due to the significant conceptual differences between "scaling" and "arithmetic" I would argue that they should be treated as separate (but related) subjects. The purpose of scaling operations (primarily shift and add) are to establish and maintain the range and precision of the underlying fixed point numeric representation; whereas, the purpose of the more general arithmetic (and logic) operations (+, -, *, /, AND, OR, XOR, etc.) are to actually perform the desired computation. In the course of performing a series of arithmetic operations on a fixed point numeric representation (as part of an algorithm) a programmer may be required to recognize the need for, and understand how to apply, additional scaling operations in order to avoid a problem like overflow. Lumping the two topics together makes it harder for the reader to understand this important distinction.

Enderz Game (talk) —Preceding undated comment added 15:56, 3 June 2011 (UTC).


I suggest merging both binary scaling and Q (number format) into the fixed-point arithmetic article. As far as I can tell, they are all identical. So they should all be discussed in one article, with redirects from the other name -- like puma and mountain lion. -- (talk) 03:37, 28 March 2008 (UTC)

As Charles Esson mentions above, binary scaling and Q (number format) are not synonyms for fixed-point arithmetic (although possibly they are synonyms for each other and should be merged). They are a special case (radix two) of fixed-point. The fixed-point article should probably have radix-dependent aspects moved to separate articles, too (with binary going into one of those other two.
So I'd suggest a way forward might be:
  1. Merge binary scaling and Q (number format) into binary fixed-point
  2. Merge binary aspects of fixed-point arithmetic into binary fixed-point
  3. Create a decimal fixed-point (perhaps)
  4. clean up fixed-point arithmetic so it is radix-independent.
mfc (talk) 12:16, 29 March 2008 (UTC)
You're right -- binary scaling is not an exact synonym for fixed-point. Binary scaling is one of several kinds of fixed-point.
I agree that someday, there might be enough information to warrant 3 articles: "binary fixed-point", "decimal fixed-point", and radix-independent "fixed-point arithmetic".
But I think a better way forward is to use the "big buckets first" technique:
  1. Merge all three articles into one big "fixed-point arithmetic" article. That name does cover all these more specific idea, right? I think it is fine for information about one particular kind of thing to not have its own article, when that information is included in the more general article -- the way that "cornbread muffins" don't have their own article, but are instead mentioned in the muffin article.
  2. It's much easier to move text from the "binary" section to the "radix-independent" section and back again (and spot repetitive redundancies) when they are sections in one big article than across several articles. So leave everything in one big article for a few months while we clean it up and re-organize the sections.
  3. If and when the article gets "too big" (WP:SIZE), then split out more specific articles as you suggested -- or perhaps (as suggested by the "big buckets first" essay) by that time the article will have some other clear-cut section divisions that would be even better.
-- (talk) 20:28, 2 April 2008 (UTC)
Support the merge. Besides, I have seen no reference that "Q number format" is an estabished name for binary fixed-point. --Jorge Stolfi (talk) 18:25, 24 June 2009 (UTC)

I support the merge. Without the overlapping content the other articles will become short subsections, decreasing a reader's total time. Ray Van De Walker 23:04, 21 May 2011 (UTC)

Scaling factor: 100 or 1/100?[edit]

I rewrote the sections "Representation" and "Operations" assuming the the "scale factor" if the number that must be multiplied by the underlying integer to give the intended value. In this interpretation, a binary number with b-bits, of which f are fraction bits, has a scale factor of 1/2f. When one stores a dollar amount as an integer number of cents, the scaling factor would be 1/100.

However it seems that the rest of the article assumes the opposite interpretation, namely the intended value times the scaling factor is the underlying integer. In the above examples, the binary scaling factor is 2f and the dollar scaling factor is 100.

I must now fix the inconsistency, but for that I need to know which interpretation of "scaling factor" is the commonly used one. (If both are used in different contexts, we must warn the reader about that). Note that a "factor" is a operand of a multiplication, not of a division. All the best, --Jorge Stolfi (talk) 23:37, 9 February 2010 (UTC)

For the sake of clarity, call 2f a "denominator" because that's what it is. --Damian Yerrick (talk | stalk) 21:25, 19 September 2010 (UTC)

Tautology in summary[edit]

The summary states: "Fixed-point numbers are useful for representing fractional values, usually in base 2 or base 10, when the executing processor has no floating point unit (FPU) or if fixed-point provides improved performance or accuracy for the application at hand. Most low-cost embedded microprocessors and microcontrollers do not have an FPU." The third clause is a tautology. It's saying "Fixed-point numbers are useful if fixed-point provides improved performance or accuracy for the application at hand". Asmor (talk) 16:50, 1 May 2012 (UTC)