Talk:Wilkinson's polynomial

From Wikipedia, the free encyclopedia
Jump to: navigation, search

dispute one[edit]

I don't think that it is true that A small change in one coefficient can lead to drastic changes in the roots. It is the other way around: a small change of roots lead to a drastic change in coefficients. The coefficients of the Wilkinson polynomial are huge, even if the roots are small numbers. So the problem is to compute the polynomial values f(x) with sufficient accuracy. The problem is not to compute the roots from the correct values. Bo Jacoby 10:08, 20 October 2005 (UTC)

I think the change that Wilkinson made was to the degree 20 polynomial. If the coefficient of x^19 is changed from -210 to -210.000000119 (a small change by most standards), the roots are scattered over the complex plane. See the example at the end of http://calcrpnpy.sourceforge.net/ratfunManual.html which demonstrates this effect. The problem I have with the article is I can't figure out what the plot is supposed to be showing. YarLevub 17:41, 29 April 2006 (UTC)
The plot shows f(x) against x. Unfortunately, it's not very informative and perhaps even misleading. As the example that YarLevub points to, shows, the polynomial is not flat around x = 19, contrary to what the plot suggests. -- Jitse Niesen (talk) 03:13, 30 April 2006 (UTC)
Examining the value of the degree 20 polynomial at several points (note you need high precision arithmetic), reveals that it oscillates very rapidly over the range 0..21. The plot obscures the oscillation because the first extremum is 2 orders of magnitude smaller than the value at zero and the others are smaller yet. The plot looks smooth only because you need to zoom in to see the oscillation. This oscillation is likely the reason for the sensitivity to changes of some of the coefficients.
(0,2.433e+18) (1,0) (1.248,-1.183e+16) (2,0) (2.299,8.035e+14) (3,0)
(3.335,-1.053e+14) (4,0) (4.364,2.117e+13) (5,0) (5.39,-5.932e+12)
(6,0) (6.414,2.195e+12) (7,0) (7.437,-1.039e+12) (8,0) (8.458,6.165e+11)
(9,0) (9.479,-4.528e+11) (10,0) (10.5,4.088e+11) (11,0)
(11.52,-4.528e+11) (12,0) (12.54,6.165e+11) (13,0) (13.56,-1.039e+12)
(14,0) (14.59,2.195e+12) (15,0) (15.61,-5.932e+12) (16,0)
(16.64,2.117e+13) (17,0) (17.67,-1.053e+14) (18,0) (18.7,8.035e+14)
(19,0) (19.75,-1.183e+16) (20,0) (21,2.433e+18)
Note that the points in between the roots are the extrema obtained by finding the roots of the derivative polynomial. -- YarLevub 18:33, 30 April 2006 (UTC)

I was obviously wrong in denying the statement. A small change in one coefficient can lead to drastic changes in the roots. Sorry. The roots of x2−ε , the square roots of ε , varies hysterically when ε varies close to zero. Bo Jacoby 12:14, 2 May 2006 (UTC)

dispute two[edit]

There are multiple problems with this article:

  • The polynomial appears to be nothing other than the falling factorial, which is ancient, predating Newton, so its unclear why it gets a modern mathematicians name.
  • The article talks about "varying the coefficient" but there are no coefficients to be varied. Unclear at best, wrong at worst.
  • The second section states that the polynomial may be written in a different basis, but then gives a formula identical to the first.

I have no clue to what this is all about. linas 02:31, 23 April 2006 (UTC)

Hi Linas. The history is that mr Wilkinson, working on numerical rootfinding in the early days of the computer, experienced bad accuracy due to round-off errors in the evaluation of the polynomial in the vicinity of the roots. I don't think that telling this story is important today. There is nothing special about this polynomial. Evaluating any polynomial having a cluster of roots requires high precision arithmetic. Bo Jacoby 13:09, 28 April 2006 (UTC)
The point is that it's not obvious that this polynomial has a cluster of roots. They seem quite nicely separated. I don't agree with the emphasis that Bo is placing on error in evaluating the polynomial, but I cannot pinpoint what I don't like about it. This article is on my to do list, but I can't promise anything. I haven't yet found a nice explanation in the literature — perhaps I should dig out Wilkinson's original comments. -- Jitse Niesen (talk) 03:13, 30 April 2006 (UTC)
Hi Jitse. The set of roots {1,2,3,...,19,20} look from far away like a cluster. If you had no problem evaluating the polynomial between 9.5 and 10.5, then you can compute the root 10.000000 by twenty iterations of the bisection method, (each iteration giving one more bit of precision, so twenty iterations give six significant digits). The only snag is when round-off errors in the evaluation of the polynomial gives the computed polynomial value f(x) the wrong sign, or zero. Then the bisection method choses the wrong subinterval, or reports x as a root, and then the precision is lost for good. The absolute value of the polynomial f(x) for 9.5<x<10.5 is much smaller than the individual terms of the polynomial, so many significant digits are required in the evaluation of f(x) in order to provide the correct sign. Bo Jacoby 11:53, 2 May 2006 (UTC)

In reply to Linas:

  • The polynomial is NOT the falling factorial for at least two reasons:
    • the falling factorial is a sequence of natural numbers? (the article Pochhammer symbol doesn't say what 'x' is...) while wilkinson's polynomial is a function with the complex plane (or real line) as a range (I concur the notation 'x' in both articles may be confusing);
    • the falling factorial while if is wilkinson's polynomial of degree k-1, .
  • about varying the coefficients: of course, since wilkinson's polynomial is a polynomial, it has coefficients that depends on the representation (i.e. the basis) of the polynomial; in the definition, the basis used is the lagrange basis for which all coefficients but one equal zero; if you "defactorize" the formula, you obtain a polynomial expressed in the power basis for which coefficients are huge...
  • indeed the formula is the same which is confusing... maybe we could readd the formula i first put here (see this link), but it is also quite confusing... any ideas?

Since most of both disputes is now solved, I think we can remove now the tag, cannot we? Julien Tuerlinckx 18:22, 24 July 2006 (UTC)

I don't see any part of the dispute as having been solved. There have been no substantive edits to te article since te "accuracy" tag was slapped on the article (Micheal Hardy made a cosmetic change). The article was and remains utterly unclear. To reply to Justin:
  • Regrading "the coefficients": What coefficients? Where? What is being varied? How is it being varied?
This is a terrible article. It really needs to be re-written to provide both the historical motivation, and to actually explain the thing that Wilkinson discovered. linas 00:35, 25 July 2006 (UTC)

Linas: sorry to tell you this but you really should read my remark instead of getting angry. Some further remarks (I don't feel like a "defender" of this article but i cannot accept wrong stuff is written here):

  • the falling factorial might be a function but nothing in the fist part of the article say that the symbol is used to denote a set of functions which makes the article unclear to people unfamiliar with the concept...
  • wilkinson's polynomial is computable in x=0 (it is the factorial of k or the opposite if k is odd) while the falling factorial equals zero following the first definition and is undefined following the second one (factorial of a negative number does not exist to my knowledge). You said it was equal to (x-1)_n which i do not contest but you just scaled the function to prove this so it is not obvious to most readers that it is "the same function".
  • once again for the coefficients, just move from Lagrange basis to power basis (you know, the one with basis vectors 1, x, x^2,...) and you will find that a variation in a coefficient leads to drastic changes in the roots. For instance, let us compute the coefficients of (x-1)(x-2)(x-3):

Of course, since the degree of the polynomial is quite low, a slight change in coefficients (namely [1,-6,11,-6]) will not change the roots too much. However, if you compute the roots of polynomial with coefficients [1.01,-6,11,-6] on this website, you will find that the root in x=3 has moved to 2.853 which is quite impressive... Now just think of a polynomial of degree 20. I hope this is more obvious to you for the coefficients and the way to make them vary.

  • To me the point with Wilkinson is that he tried a numerical method to find roots of a polynomial with "wilkinson's polynomial" of which he obviously knew the roots. But the method failed because the problem of finding roots of wilkinson's polynomial is ill-conditioned which means a small error in the computation of coefficients (remember you must "defactorize" the polynomial in Lagrange form) leads to enormous errors in the roots... That's why he thought this was traumatic: compute numerically roots of a quite trivial polynomial (roots are equally spaced) is a nightmare.
  • To me (once again) the reason why in the paper i suggested they call the polynomial SECOND Wilkinson polynomial is that this polynomial have roots equally spaced if the function is represented in a graph with x-axis ln(x), so in a way it is polynomial with roots having a "special" distribution while most polynomials have a "random" distribution. And this seems to be also a problem with numerical finding of roots. Hence the name they chose here (I cannot confirm this naming is "official").

I hope this will help you to understand the article. Julien Tuerlinckx 16:47, 25 July 2006 (UTC)

Lagrange form[edit]

For those interested in the lagrange form part of this article, they may have a look at this paper. Maybe we can cite it in a "references" section. Julien Tuerlinckx 12:01, 23 July 2006 (UTC)

Based on the usage of the term "wilkinson polynomial" in that article, it appears that the definition in this article is utterly incorrect! I'm going to stick a "delete" tag on this article. It is utterly stupid to have an article on WP as utterly incorrect as this one is. linas 00:54, 25 July 2006 (UTC)
Linas, please explain what is wrong. There are many things I don't like about the article, but I don't see anything which is incorrect. -- Jitse Niesen (talk) 02:15, 25 July 2006 (UTC)
Look at the pdf. Towards the end, it gives two examples of a Wilkinson polynomial. Neither example matches what is given here. I can't imagine what general definition would encompass both examples. It seemed easier to just delete this mess than to start solving other people's puzzles. If someone can fix this article to sound even remotely plausible, I'll vote "keep", but at the moment, starting with a clean slate seems simpler. linas 03:51, 25 July 2006 (UTC)
The first example is a trivial rescaling of the polynomial mentioned here. I don't know why the author uses a rescaling, but it is a trivial change, I'm fairly sure that the original polynomial used by Wilkinson uses the scaling mentioned in the article, and it is not uncommon in maths to use the same name for slightly different equations. -- Jitse Niesen (talk) 04:20, 25 July 2006 (UTC)
OK, what then about the second example? I don't see how that is a "trivial rescaling". In either case, I cannot tell, from this article, why something as old as as the falling factorial, appearing for at least four centuries, suddenly deserves to be named after a 20th century mathematician. Surely there's something called a "Wilkinson polynomial", but surely there's more to the story. Maybe its any "ill-conditioned polynomial" ? ill-conditioned something? ill-conditioned matrix? Bad choice of basis vectors? Is there some determinant that must be greater than one for the corresponding polynomial to be a "Wilkinson polynomial"? something about orthogonal polynomials? What is it? The nom for deletion is mostly because this particular article appears to be nonsense. linas 05:13, 25 July 2006 (UTC)

Wilkinson's polynomial is an important test case for numerical algorithms related to root finding, eigenvalues, etc. Thus it plays a similar role as the Rosenbrock function does for multidimensional optimization algorithms. I don't understand why we shouldn't be allowed to have a stand-alone entry for this well-known function: just because it's trivial as far as mathematical analysis is concerned is not sufficient reason to dismiss it. After all, the same could be said about Rosenbrock's function. --MarkSweep (call me collect) 06:44, 25 July 2006 (UTC)

Regarding the second example in the talk, I'm at a loss why it's called a Wilkinson polynomial. That is not standard terminology in my experience. Wilkinson's polynomial is really just the polynomial mentioned (though I think in the original, he used a smaller number (6? 7?) instead of 20).
Whether it deserves to be named after Wilkinson is not a question for us to decide, we only need to decide that it is. The point of the polynomial is that finding the roots of that polynomial is ill-conditioned, as the article says, even though the roots are (in a sense) well separated.
Mark, I'm not very happy with saying that the graph is horizontal around x = 10, because it oscillates quite a lot (see the comment further up this talk page); it's only the vertical scale which makes it seem horizontal. Furthermore, would it be possible to have the downward spikes in the graph for the logarithm extend all the way down? After all, the log is −∞ when x is an integer. -- Jitse Niesen (talk) 08:11, 25 July 2006 (UTC)
AFAICT when Wilkinson's polynomial gets mentioned in the computer science literature, it's a 20-degree polynomial, which is sufficient to cause problems for standard double-precision floating point arithmetic. Completely agree with you on the second point. Regarding the third point, you're right too: the bit about the graph being flat around y=0 is a remnant in the text referring to the graph that was previously there. It should obviously be changed. Finally, I did try, but failed, to coax gnuplot into extending the downward spikes all the way down, but the reason they don't is partly due to limited accuracy and partly to the very properties that make Wilkinson's polynomial challenging. I'll see if I can get it to work with 19 piecewise plots. --MarkSweep (call me collect) 17:23, 25 July 2006 (UTC)
I utterly fail to see how the problem is "ill-conditioned". You can visually read the roots right off! Its trivially solvable! The roots are utterly stable against small perturbations! There is absolutely nothing "ill conditioned" about the thing!
Again, Wilkinson must have said something else, something more. Maybe he tried to take this polynomial, and re-write it in terms of some other basis, I dunno, Hermite polynomials, and he notices that, when factoring it into Hermite polynomials, there is something ill-conditioned about that. The PDF hints that the ill-conditioning has something to do with the choice of basis; this article says nothing about the choice of basis. linas 13:49, 25 July 2006 (UTC)
Mark, I saw you corrected the plots, thanks. I checked and indeed Wilkinson uses the polynomial of degree 20. By the way, the other polynomial mentioned in the talk, having roots 2^(-k), is also mentioned in Wilkinson's book, so that explains why it was also called "Wilkinson's polynomial". The influence of the choice of basis is a later development. Julien, did you add the Lagrange stuff? In that case, is it in the 1984 paper? It's not in any of the other references listed. -- Jitse Niesen (talk) 11:10, 26 July 2006 (UTC)

Demystify[edit]

OK, I think I get the idea now, and tried to edit the article appropriately. There remains, however, one missing link. I'm thinking of the following intuitive argument: Consider the series

For almost any and every choice of sequence this series will have a radius of convergence of one; there are very few, very special choices of which have a larger radius of convergence. Thus, it is rather unrealistic to think that such a sequence could model something which has roots located at |x|>1.

Now, compare this to the sequence

which, for "almost all" choices of is an entire function, and might be expected to be a better model of something with a larger radius of convergence. In this basis ( instead of -- a subtle but important difference), the coefficient of in Wilkinson's polynomial is no longer 210, but is , which is a huge number, comparable in size of any of the other coefficients. In this basis, a perturbation of is no longer seen to be small: it is changing the coefficient by . In absolute terms, this is *not* a "small change" -- it is actually huge! It should be remarkable that a polynomial with such immensely large coefficients has such tightly-spaced, tightly packed set of roots, instead of having them scattered all over the place. Naive order-of-magnitude arguments might lead one to expect the zeroes to be scattered all over a disk of radius of (Stirling approx). That the falling factorial has huge coefficients yet packs all its roots into a miniscule space is perhaps what is remarkable?? The problem was based on the false assumption that the relative scale in the monomial coefficients is mirrored in the relative change of the roots, when in fact there was a magnification or blow-up on the order of between the two. I don't know, is this the correct handwaving argument for why this is the way it is? linas 04:38, 26 July 2006 (UTC)

Wilkinson's polynomial is pretty well-behaved in the basis {x^n/n!}, if you divide it by 20 factorial: the coefficients range from 1 to approximately 2000, and the nineteenth coefficient is −21/2. Nevertheless, a small change (both in relative and absolute sense) in this coefficient leads to a huge change in the roots.
"the false assumption that the relative scale in the monomial coefficients is mirrored in the relative change of the roots" — this is precisely the point that Wilkinson was trying to make. -- Jitse Niesen (talk) 11:10, 26 July 2006 (UTC)

please review my changes[edit]

I made some additional changes to the article to try to match up the effect to what is being said. I don't have access to texts, so I am at best guessing what it is that Wilkinson tried to say, and I probably guessed wrong. The logic behind the reasoning still sounds wooly and off-kilter. (differences of large numbers are inherent sources of round-off errors; this is not news, not even in 1963. The article on ill-conditioned talks about matrices, but what the matrix is in this problem remains unclear.) linas 05:12, 26 July 2006 (UTC)

Hi Linas. The infinite set of real numbers, providing an unlimited number of significant digits and unlimited range, were on the computer implemented by a finite set of numbers, providing unavoidable roundoff errors and an occasional arithmetic overflow. Still the approach was so amazingly successfull that people got surprised by its limitations. Wilkinson got frustrated when his program failed. He should rather be amazed that it sometimes works. Bo Jacoby 07:22, 26 July 2006 (UTC)
I'm not sure whether he got frustrated or interested, but Wilkinson is one of the founding fathers of the field, and I doubt he would need any lessons from us on the dangers of floating-point arithmetic. -- Jitse Niesen (talk) 11:10, 26 July 2006 (UTC)
Hi Jitse. The words: "the most traumatic experience", indicate that mr Wilkinson got frustrated, if not traumatized, by discovering the dangers of floating-point arithmetic. It was not clear at his time that the precision needed for computation far exceeds the precision of measurement. The butterfly effect had not yet been discovered. Of course I don't express disrespect towards mr Wilkinson. Bo Jacoby 14:04, 26 July 2006 (UTC)

My rewrite[edit]

I wanted to make some small changes, which turned out to be rather big. When saving, I got an edit conflict. Having no time at the moment, I just overwrote KSmrq edits and reverted myself immediately. Anyway, my text needs at least some links and a good look over. I'll have a look when I've time, but feel free to use anything that I wrote (that's why I saved it). -- Jitse Niesen (talk) 14:09, 26 July 2006 (UTC)

The {{inuse}} tag is your freind. linas 00:38, 27 July 2006 (UTC)

Cleanup comments[edit]

I'd like to stress a point from the numerical analysis perspective that may not be obvious from the pure mathematics perspective. In a practical setting, data is presented in some form which may not be under our control. In particular, we are often asked to find the roots of a polynomial presented in power-basis form. Much as we might prefer or recommend a different basis if possible, such as a Bernstein-Bézier basis, we must write software that works with what we are given. A glib response might be to first convert from an inconvenient basis to a preferred one, but that conversion itself is subject to instability!

So in this article it matters a great deal that we always respect the presentation of the data. To speak of "the polynomial" as if it were an abstract mathematical object we can freely represent or perturb as we wish, is to completely miss the point of the numerical analyst's dilemma!

I highly recommend that pure mathematicians who are tempted to edit articles concerning numerical analysis topics read George Forsythe's [1] classical caution, Pitfalls in computation, or why a math book isn't enough (PDF), which appeared in AMM 77, 931–956. As it happens, the Wilkinson polynomial example is mentioned.

One basic example that may come as a shock is that floating-point arithmetic is not associative. Thus simply evaluating a polynomial, never mind finding its roots, can be treacherous. --KSmrqT 14:20, 26 July 2006 (UTC)

The meaning of the Wilkinson polynomial problem[edit]

The logic flow of the problem is:

  1. solving polynomial equations
  2. having difficulties for specific examples such as the Wilkinson polynomial
  3. identifying the responsible circumstance.

We have several suspects.

  1. The Wilkinson polynomial is "ill conditioned".
  2. The monomial basis is unsuitabel.
  3. Insufficient arithmetic precision.

Actually all af these suspects are responsible because

  1. well conditioned polynomial equations are solved
  2. the Wilkinson polynomial in a suitable base is solved
  3. the Wilkinson polynomial will be solved by using sufficient accuracy arithmetic.

so there is no reason to highlight one on the expense of another.

However this does not solve our problem, because

  1. some polynomials happen to be ill-conditioned,
  2. in any base there exist ill-conditioned polynomials
  3. no finite accuracy is sufficient for all problems.

So we face a paradigm shift. We can do a lot of useful calculations, but there are limitations. Reliable long term numerical weather prediction is beyond this limitation. The discovery of the ill-conditioned Wilkinson polynomial was merely an early warning.

Bo Jacoby 07:48, 27 July 2006 (UTC)

We have several suspects.
The Wilkinson polynomial is "ill conditioned.

The Wilkinson polynomial is not ill-conditioned. The problem of finding the roots of any polynomial given its coefficients (in the monomial basis) is ill-conditioned.

The monomial basis is unsuitabel.

Yes, because the problem of finding the roots given the monomial coefficients is ill-conditioned, it would be correct to say the monomial basis is unsuitable for calculating the roots. (It is suitable for other purposes.) NB: this suspect smells to me like the first one (rephrased).

Insufficient arithmetic precision.

See my other comments. A lack of precision may trigger mis-specification of the coefficients, but it is not the central problem here.

in any base there exist ill-conditioned polynomials

Again, polynomials are not ill-conditioned. The problem of finding the roots of a polynomial given the coefficients in the monomial basis is ill-conditioned. Also, there are polynomial bases in which the problem of finding the roots is not ill-conditioned (in other words, is well-conditioned).

no finite accuracy is sufficient for all problems.

A lack of working precision is a hindrance in solving any problem. However, a careful choice/design of an algorithm obviates this problem.

Lunch 07:33, 2 August 2006 (UTC)

how was the example solved?[edit]

The perturbed wilkinson polynomial, where one coefficient was changed one bit, is claimed to be solved. The roots are quoted! How was it solved? If it was solved by standard floating point arithmetic, the result is unreliable and probably very wrong. Bo Jacoby 09:25, 28 July 2006 (UTC)

I don't have at hand information about exactly how Wilkinson found the roots, though I believe the essential idea was to use enough precision to overcome the instability. That's how I verified the roots. (For a single example like this, modern software makes this easy, though still too expensive for routine use.) Another possibility would be an interval arithmetic Newton's method. Since the coefficients are given exactly (including the perturbation), the polynomial can be evaluated using exact rational arithmetic. If you think you're more careful than Wilkinson and all his reviewers, why not compute the roots for yourself in a way you can guarantee is accurate enough to check these values as published by Wilkinson? You may wish to double check your results against my version of the real roots:
  1. 0.999999999999999999999999
  2. 2.000000000000000009762004
  3. 2.999999999999805232975910
  4. 4.000000000261023189141844
  5. 4.999999927551537909560059
  6. 6.000006943952295707203355
  7. 6.999697233936013948676183
  8. 8.007267603450376854893171
  9. 8.917250248517070494295520
  10. 20.84690810148225691492877
The agreement between my values and Wilkinson's is unlikely to be due to coincidence. --KSmrqT 19:05, 28 July 2006 (UTC)

I did not claim to be more careful than Wilkinson or you. I merely asked how it was done. Bo Jacoby 12:09, 29 July 2006 (UTC)

I don't have at hand information about exactly how Wilkinson found the roots, though I believe the essential idea was to use enough precision to overcome the instability.

Actually, using greater precision is not necessary (as Wilkinson noted). Just use to evaluate the polynomial rather than its expanded form. Lunch 22:12, 30 July 2006 (UTC)

After reading more of the comments, I think I'll say it again: instability of the roots with respect to the standard polynomial basis is the major problem here; it's not rounding errors. Another phrasing: if you want to find the roots of a polynomial, it's best not to use the coefficients of 1, x, x2, et cetera to specify your polynomial. If you have no choice (someone else gave you the problem), know that your answer may be unreliable. NB: if someone else gave you the problem in the standard basis, converting to another basis to find the roots won't solve the problem (pushing a bubble under a rug doesn't make it go away; it justs moves the bubble to another spot). Lunch 22:24, 30 July 2006 (UTC)

Another thought regarding my comment on finding the roots of the perturbed polynomial: for those folks at home who're interested in coding this up yourself, I have a few pointers if you haven't already figured things out.
Finding the smaller roots (as the article points out) isn't the problem. Those roots (in relative size) are spaced far apart. Any of several naive choices for codes will probably do. It's the larger magnitude roots that are the problem.
If you want to find the larger roots and avoid using greater precision arithmetic, you're going to have to be a bit careful. Now is the time for careful consideration of rounding and cancellation errors (you're trying to get something for free - solve an ill-conditioned problem using the same underlying method). To use the form as your sole means of finding the roots, you'll need to use a secant method or bisection (or something else that I haven't thought of off-hand). I haven't thought out how to use bisection, but you can probably figure it out on your own. (Especially since the answer is printed above!) (And I forget: can bisection be modified for the 2D problem of finding the complex roots?)
(An intermission: maybe my thoughts here are complete bunk. YMMV. I haven't coded this up myself. For the entrepreneurial reader, it looks like IEEE single precision is enough here. Many, many high-level computer languages will allow you to use floating point; most all of those allow you to use IEEE and use both IEEE single and double precisions. But be careful: many compilers "graciously" promote all single precision calculations to double precision because a lot of hardware nowadays runs both equally fast (they convert all operations internally to double precision to save on having to separately create and optimize single and double precision).)
In the secant method, keep separate track of the iterates and the increments used to calculate the iterates. Use the increments as the divisor in the secant method (don't recalulate them - you'll hit cancellation big time). When calculating the numerator for the secant method, group the difference of the products together and group the difference of the monomials together. Floating point arithmetic is not associative! Cancellation is rampant here; it's a matter of what's the best way to avoid it. I'm thinking that the systematic errors from the product terms and the monomial terms are best cancelled together (but maybe I have that backwards); that is, there are errors, but the output of these terms is "continuous" in some sense. Another approach would be to order the four terms in the numerator by size (absolute value), and add them in order of decreasing size (that may only be best near the root). And, btw, the four terms are about the same order of magnitude near x=21.
One last thing: to find the complex roots you'll need to use Broyden's method. It generalizes the secant method to the multidimensional root-finding problem. That is, we can consider a complex-valued function with complex inputs to be a map from R^2 to R^2. You'll also need an initial guess with a non-zero imaginary part. Because of symmetry (the polynomial coefficients are real), a real initial guess will always yield real iterates.
Grrr. Maybe I will code this up just to check my own sanity. Lunch 01:36, 2 August 2006 (UTC)

OK, so I coded this up. You can find it here. Some notes:

  • IEEE single precision is sufficient to find the largest root of the perturbed polynomial to working precision (20.846909). I don't have enough time right now to try to do this for the complex roots next largest in magnitude. Wilkinson's analysis indicates they're easier to get anyway (their relative separation is greater).
  • Note that the change to the coefficient of x^19 was 1 part in 1.76 billion. The change in the largest root was more than 1 part in 25. This is the defining characteristic of an ill-conditioned problem.
  • Again, slight mis-specification of the coefficients causes large changes in the roots. A lack of precision (or rounding) might be the cause of the initial mis-specification (keep in mind there could be other causes of mis-specification external to the problem as given to you). BUT ONCE THAT MIS-SPECIFICATION IS MADE, NO AMOUNT OF EXTRA PRECISION WILL SAVE YOU FROM CALCULATING THE WRONG ROOTS. Not even exact arithmetic; that's why Wilkinson wasn't interested in analyzing rounding errors. Rounding errors don't matter once you've told me the wrong coefficients (again, no matter the precision and no matter the algorithm).

Lunch 07:01, 2 August 2006 (UTC)


  1. To compute the roots, use the Durand-Kerner method.
  2. Wrong coefficients give wrong results even with high precision arithmetic. Low precision arithmetic give wrong results even with correct coefficients. To get correct results, use high precision arithmetic and correct coefficients. It is as simple as that. Bo Jacoby 12:09, 2 August 2006 (UTC)

What do you find unbelievable about my statement "IEEE single precision is sufficient to find the largest root"? Did you look at the code and understand how it works? Do you realize that IEEE single precision is sufficient to calculate all the roots? Again, precision is not the central problem here. (And your insistence on proselytizing the Durand-Kerner method --- aka (Bo) Jacoby's method --- is frustrating.) Lunch 21:04, 4 August 2006 (UTC)

Lagrange form[edit]

In the subsection on Lagrange form there is confusion between letter f and letter w for the polynomial. Bo Jacoby 09:54, 28 July 2006 (UTC)

No, there is no confusion. The letter w is used for the Wilkinson polynomial defined by the fallling factorial; the letter f is used for the presented function which may be perturbed. If two different letters are not used we cannot possibly define y0, since the definition of f itself depends on y0. --KSmrqT 19:09, 28 July 2006 (UTC)

Thank you. The Lagrange polynomials used in the article seem to assume knowledge of the roots of the polynomial to be solved. So it does not provide a general method for solving polynomial equations. Right? So the point seem to be that if you know the roots of some polynomial, w, and wish to know the roots of some other polynomial, f, which is somehow close to w, then f should be expanded in terms of the Lagrange polynomials of the roots of w. Right? The present formulation added more to my confusion than to my understanding. Bo Jacoby 07:55, 31 July 2006 (UTC)

I found the original discussion of Lagrange basis confusing, rewrote it to be clearer for myself, and then saw it rewritten again in a form I find less clear. It's a bit of a cheat as presented, because it uses the root locations as the interpolation points.
The purpose of the section is to show that it is not the polynomial, but the presentation of the polynomial, that makes root finding ill-conditioned. It is trivial to show that in a Langrange basis where the interpolation points are at the roots, root finding poses little difficulty. --KSmrqT 04:31, 3 August 2006 (UTC)

You indicate that "it is not the polynomial, but the presentation of the polynomial, that makes root finding ill-conditioned". I think that is untrue. Any polynomial basis must contain base polynomials of high degree, and any polynomial of high degree assumes huge values for arguments far away from the roots of this polynomial, and so any tiny perturbation to the coefficient to that base polynomial creates huge effects somewhere where the resulting polynomial might have a root. I agree that the section is "a bit of a cheat". It is not helpful to the reader. I suggest that you remove it (yourself). Bo Jacoby 11:14, 3 August 2006 (UTC)

KSmrq, I tried to clarify what using the Lagrange basis means. Do you think it is distracting to mention this in such detail, or is it just my dismal writing style?
As an alternative to a Lagrange basis with the roots as interpolation points, we might consider a Bernstein basis. It seems that R.T. Farouki and V.T. Rajan, "On the numerical condition of polynomials in Bernstein form", Computer Aided Geometric Design, 4:191-216, 1987, prove that the Bernstein basis always has a lower condition number. However, I get the impression that they also assume some a priori information, namely that the roots are in [0,1].
On the other hand, the text books that I checked do not discuss the effect of the basis, so perhaps we should leave it out alltogether. -- Jitse Niesen (talk) 13:01, 3 August 2006 (UTC)

The Bernstein polynomials is the same thing as the beta distribution functions. They have multiple roots at x=0 and x=1, while the monomials only have one multiple root at x=0. Away from the roots they grow just as wildly as the monomial of the same degree. So the change of basis from monomials to bernstein polynomials does not solve the general problem. Bo Jacoby 14:20, 3 August 2006 (UTC)

The whole idea is to use scaling to make the roots fall in [0,1]; this is what my comment on "a priori information" alluded to. I did not say it solves the problem — indeed it does not since the change of basis is ill-conditioned — I only said that the Bernstein basis yields a lower condition number. -- Jitse Niesen (talk) 03:45, 5 August 2006 (UTC)
I will respond to several issues. First, I would like to keep the Lagrange section. Second, I think the rewrite got better after I last looked at it; though I think I like my version a little better, this one is not so bad as where it began. Third, I'll explain in a minute why I think such a section is essential. Fourth, let's talk about a B-spline basis instead of a Bernstein basis, now.
The usual Bernstein polynomials, (n?k)(1−t)nktk, are a special case of a B-spline basis over a single knot interval, where the knot vector has zeros at one end and ones at the other, with sufficient multiplicity for the degree. For an example like this it is more appropriate to use a knot interval from 1 to 20. Many of the coefficients (control points) w.r.t. this basis are on the order of 1019, but that's to be expected given the swing of function values. They alternate in sign, and a technique like Bézier clipping can find the roots nicely, and the de Boor algorithm provides robust evaluation. This basis is popular because of its many nice properties.
 0
−1.1556284538839040000×1017
 5.8884808817673216000×1017
−1.9095690768572378880×1018
 4.7773099133454589530×1018
−9.8642548398359770048×1018
 1.7376562521073733242×1019
−2.6612000344617893324×1019
 3.5841081584193336135×1019
−4.2744798544074946647×1019
 4.5311859029308549436×1019
−4.2744798544074946647×1019
 3.5841081584193336135×1019
−2.6612000344617893324×1019
 1.7376562521073733242×1019
−9.8642548398359770048×1018
 4.7773099133454589530×1018
−1.9095690768572378880×1018
 5.8884808817673216000×1017
−1.1556284538839040000×1017
 0
The obvious symmetry in the coefficients is a consequence of a symmetry in the basis. More important is the fact that the function segment values are bounded by (the convex hull of) the control points. For root finding, we also take note of the variation diminishing property, which here guarantees us that the function segment cannot cross zero more often than the control polygon. We have no need to explicitly compute the basis polynomials, but for this example they can be produced from the Bernstein polynomials by a parameter change, t = (x−1)/19.
OK, so having gone to some trouble to express w in B-spline form, with all its nice properties, why do I prefer to retain a Lagrange discussion? The answer is that I think the Lagrange exploration provides much more "bang for the buck", in term of the essential message of Wilkinson's polynomial.
And that message is that for a polynomial presented in power-basis form, roots can be extraordinarily sensitive to the coefficient values. One way we reinforce the point is to present the same polynomial in a Lagrange form, where that sensitivity disappears. Maybe one day we'll even convince Bo.
I don't say that just to tease Bo. I worry that Bo may represent a segment of readers who also do not get the point, which suggests that perhaps we have not explained it as well as we would like. I thought my example of intersecting lines (a classic way to present condition issues in linear systems) made the point so vividly there could be no further debate. Obviously I was overly optimistic. But we seem to have degenerated to an argumentation loop with no forward progress.
Incidentally, I have never said that floating point issues have no relevance to root finding. The mere act of writing the polynomial coefficients in floating point can perturb them. Likewise, evaluation of the power-basis polynomial should be done using Horner's rule, as this Wolfram page demonstrates graphically. Floating point also presents us with the necessity of deciding how close to zero is close enough. But all of those potential source of error are present no matter what the basis, and yet the Lagrange basis allows the roots to be found with little ill effect, while the power basis does not. --KSmrqT 05:48, 5 August 2006 (UTC)

Answer to KSmrq[edit]

Note that the Wilkinson problem is (x−1 ! 20) = 0, where (n ! k) is the binomial coefficient. Your coefficients above can all be divided by 1017. They do not have to be big.

The problem of ill condition of Wilkinson's polynomial is solved by using sufficient accuracy, so that is not a problem any more. Are you saying that it can be solved without using sufficient accuracy? What is the point of that attempt? Why not just use sufficient accuracy and stop talking nonsense?

I'm not surprised that computing the intersection of nearly parallel lines needs high accuracy. Some people may be surprised, and it is nice to explain the fact to these people so that they can update their intuition, rather than the fact. You seem to hope that the sensitivity can be got rid of by choosing a suitable basis. Well, it can, but not without cheating. Choose the system of coordinates with origo at the point of intersection and with basis vectors along the lines, then the ill conditioned problem is transformed into a well conditioned problem which is readily solved: the point of intersection between the two coordinate axes is (0,0). But one cannot do it without knowing the solution to the problem, so it is cheating. Using knowledge of the roots (1,2,3,..,20) of the Wilkinson polynomial to find a better basis, is also cheating.

You write that the sensitivity is extraordinary. Well, it is not. It is perfectly normal for polynomials of high degree to assume huge values which require high accuracy in order to compute the roots.

You and I agree that the Horner scheme should be used, but you reverted my edit showing the Horner scheme back into a misleading expression that is not the Horner scheme.

Three conjectures are by Wilkinson's example shown to be false:

  1. The roots rk depend approximately linearly on the coefficients cj. (This is false)
  2. The condition numbers drk/dcj are small. (This is false)
  3. 32 significant bits of accuracy is sufficient for practical purposes. (This is false)

You adress the second point. Some people get surprised and enlightened. That is great. You do not seem to understand the first point, however, as you still use the concept of a condition number, which is useful only for very small dcj. In Wilkinsons's example, −210>c19>−210−2−23, the roots are not at all approximately linear functions of the coefficient. They are not even differentiable, as pairs of roots fusion into double roots which immediately disintegrate into single roots moving apart with infinite speed in directions perpendicular to the original motion. This behavior also occurs with low degree equations such as x 2+t = 0 for −1<t<+1. The whole point of your writing seems to be that you still trust the third conjecture, despite the facts. Bo Jacoby 09:55, 7 August 2006 (UTC)

can you solve an equation without solving the equation?[edit]

I think that Wilkinson's polynomial by itself does not show that the statement "32 significant bits of accuracy is sufficient for practical purposes" is false, because solving Wilkinson's polynomial is not a practical but a theoretical problem. It does show that if you have a practical problem, it is probably not a good idea to reduce the problem to solving a high-degree polynomial in the monomial basis, even though this might theoretically be possible. -- Jitse Niesen (talk) 01:13, 8 August 2006 (UTC)

Conjecture 3 may be considered a definition of "practical purposes", and so it is tautologically true. Older definitions are: "16 significant bits of accuracy is sufficient for practical purposes", and: "paper and pencil is sufficient for practical purposes". Wilkinson expected his problem to be solvable by 30-bit accuracy arithmetic, and he got frustrated by failing, but an unsolved problem is not a practical problem: No road, no traffic; no traffic, no need for a road. The road is blocked with a sign: "No passing for low accuracy vehicles". You may leave your vehicle hoping to find a foot-path to the goal circumventing the roadblock, but there is no such path. You cannot find a solution to an equation without solving the equation; it is the same thing. No method is better than: "reduce the problem to a polynomial equation and solve it using high accuracy arithmetic", because any other solution to the problem would also be a solution to the equation. Bo Jacoby 08:40, 8 August 2006 (UTC)

Misguided edits[edit]

I have backed out edits by Bo Jacoby that reflected a fundamental misunderstanding about the point of Wilkinson's polynomial. Any floating point number is rational, and so we can in principal evaluate the polynomial exactly; yet the numerical instability remains. That is, a tiny perturbation in a coefficient can cause a huge change in a root even with exact arithmetic. Frankly, the graph should have been a hint, because it shows large swings at both ends, yet the lower roots are barely perturbed. --KSmrqT 19:19, 28 July 2006 (UTC)

For exact representation of, say, x20 for x= 123.456 (x having 3 digits before the decimal point and 3 digits after the decimal point), you need 60 digits before the decimal point and 60 digits after the decimal point. The floating point arithmetic available to Wilkinson did not provide that precision. If you do not perturbe the coefficients at all there should be no problem, but there is. Bo Jacoby 12:06, 29 July 2006 (UTC)
Wilkinson could have used exact rational arithmetic (with bignums) or huge precision floating point (with bigfloats), regardless of what floating point capability was provided by hardware of the time. But what if the input data is a result of physical measurements, so contains small perturbations from the truth? The output is unreliable!
Standard practice in NA is to carefully distinguish between difficulties inherent in the problem, like this instability, and difficulties caused by a poor solution method.
Consider a simple geometric example, the intersection of two lines in a plane. If the lines are nearly parallel, tiny perturbations of the lines can move the point of intersection across vast distances. Nothing we do in solving the problem can change that fact; this specific linear system is ill-conditioned. We can make matters worse, and given the conditioning any errors we introduce may be amplified; but no matter what we do the problem of exquisite sensitivity remains. Contrast this with two lines that are nearly perpendicular, where tiny perturbations of the lines move the intersection very little. Now the exact same algorithms can give robust answers, even in floating point.
A similar example in solving initial value problems for differential equations is stiff equations. (A more extreme example is found in chaos theory.)
So again, it is a fundamental misunderstanding to believe this is about floating point. This topic would be covered in NA 101, but that's not a course required in a pure mathematician's training. --KSmrqT 18:12, 29 July 2006 (UTC)
This is all very true. Nevertheless, since Bo is interested in the effect of rounding errors on Wilkinson's results and I'm also curious, let's have a look. Wilkinson performed his calculations on the Automatic Computing Engine (ACE). For floating-point multiplications, the mantissa is stored in one word, which on the ACE has 48 bits. Hence the round-off error is 2^(-48). Since the polynomial goes up to e^37 according to the plot in the article, the error in evaluating the polynomial is approximately 2^(-48) e^(37) < 50. So, on the computer, all x for which |p(x)| < 50 may look like a zero, where p is the perturbed Wilkinson polynomial. How large is this neighbourhood? Well, at any of its zeros, the derivative of p is of the order 10^12 or larger, so the precision with which the zeros can be located is 50 / 10^12 ≈ 10^(-10). Wilkinson gives the zeros with nine digits after the decimal point, and this rough computation shows that he could be confident that these nine digits were correct. -- Jitse Niesen (talk) 05:17, 30 July 2006 (UTC)
Thanks Jitse. Reading the plot, I think that '2e+13' is supposed to mean 2×1013 rather than 2e13, and that log(|w(x)|) refer to the base 10 logarithm rather than to the natural logarithm. So the number e37 should be replaced by 1037 in your above estimate. But neither e37 nor 1037 is the relevant number to use. The round-off error in computing w(x)=Σckxk is ≤ the round-off error in computing Σ|ckxk|. So the numerical noise in the equation Σckxk = 0 is filtered off in the mathematically equivalent equation 1 + (Σckxk)/(Σ|ckxk|) = 1. The number Σ|ckxk| happens in this case to be equal to w(−x) when x>0. Inserting x=20 gives w(−20) = 40!/20! = 298. Working with 48 significant bits you must demand |w(x)| > 250 in order to be certain that w(x) ≠ 0, as 1 + 250/298 = 1 + 2−48 is just one bit > 1. Taking your value 1012=240 for the derivative, you get that the accuracy of the computed root is 250/240 = 210 >> 1. This proofs that the roots of w cannot be computed reliably from the coefficients using 48 bit floating point arithmetic. Bo Jacoby 11:39, 31 July 2006 (UTC)
I agree that '2e+13' is supposed to mean 2×1013. However, I disagree with the rest :) . The first plot shows that the maximum between x=4 and x=5 is 2e+13, hence the base-10 log of the maximum is around 13 and the natural logarithm is around 30. Comparing with the second plot, this shows that log(|w(x)|) refers to the natural logarithm.
Regarding the round-off error in computing w(x), basically I trust Wilkinson when he labels the roots as accurate. After all, he's writing a book about round-off errors. However, I can give arguments supporting this opinion. There are several ways to evaluate w(x) and I don't know which one Wilkinson used. I assumed he used the form Πk (xk), and my error estimate is valid in this case. If he used the form Σckxk, then this would be more accurate as you suggest because the software library on the ACE accumulates floating-point sums in double precision, which means that the mantissa is stored in two 48-bit words (this is almost quad precision in current terminology; I find it truly surprising that the ACE used better precision than current practice). Another possibility, and Lunch's quotation suggests that this is the one that Wilkinson actually used, is to evaluate the polynomial with Horner's rule (aka nested multiplication). I did not analyse this, but I think that would be fairly accurate since the quotient of subsequence coefficients of the polynomial is fairly small. I suspect that this is explained in the text after the quotation provided by Lunch. -- Jitse Niesen (talk) 12:56, 31 July 2006 (UTC)
  1. Wilkinson hardly used Πk (xk), because there is no point in computing the roots from the roots. The challenge is to compute from the coefficients. Lunch quotes the Horner scheme sn=1 ; sp=xsp+1+ap (p=n-1, ..., 1,0); f(x)=s0, or, in our notation, Σckxk = c0+x(...+x(cn−1+xcn)...).
  2. The value of w(4.5) has no implication on the round-off error in computing w(4).
  3. The round-off error in computing Σckxk using single precision 48 bit floating point arithmetic is estimated by 2−48Σ|ckxk| = 2−48+98 = 1015, while double precision gives round-off error 2−96Σ|ckxk| = 2−96+98 = 4. Wilkinson's trauma by using single precision must have made him write the software for better precision than current practice.
  4. I am going to to re-revert from Ksmrq. My understanding of the situation is not worse than his :-). Bo Jacoby 11:06, 1 August 2006 (UTC)
  1. You asked how Wilkinson could have computed the zeros of Πk (xk) − 2−23 x19 accurately. In answer, I explained a possible method, plus an error analysis.
  2. I only gave the value of w(4.5) to show that log |w(x)| in the second plot refers to the natural logarithm, since there was some confusion about this.
  3. I agree with your computation for the round-off error when evaluating the polynomial in the naive way. More interesting is the error analysis for Horner's scheme, which as we agree is probably the method that Wilkinson evaluated the polynomial. By the way, I now think that Horner's scheme is also wildly inaccurate in this case.
  4. In all references (except possibly the 1984 reference which I haven't yet read), Wilkinson's polynomial is used to discuss ill-conditioning. While I do agree that round-off errors should also be discussed in the context of root-finding, it is not clear that it should be mentioned in this article. Furthermore, I'm a bit concerned that this might be original research. On the other hand, it is well possible that the 1984 book talks about this aspect. -- Jitse Niesen (talk) 13:35, 1 August 2006 (UTC)
  1. Thanks
  2. Ok. I thought you used w(4.5) to estimate the round-off error in computing w(4).
  3. Both Horner and the naive method use n additions, and so they have roughly the same round-off errors. The naive method has no advantage against Horner. When computing a small polynomial value for an argument close to a root, the intermediate results are far greater than the final result, and so round-off errors occurs. (When two almost opposite floating-point numbers are added, unreliable bits leak into the register from the right, and the relative error increases. Binary 11-10=1.0 means (11±0.1)-(10±0.1)=1.0±0.01. The correct result is (11±0.1)-(10±0.1)=1.0±1.0 . The error is supposed to be half the least significant bit, but here it is four times that much. Not even the most significant bit is reliable).
  4. Round-off error analysis applies to rootfinding because it applies to evaluation of a polynomial close to a root. Bo Jacoby 11:51, 2 August 2006 (UTC)

Thanks[edit]

Wow! What a difference attention has made. It reminds me of the old movie cliché where the homely librarian removes her glasses and lets down her hair to reveal a hidden beauty. Though I suppose for most of the world a mathematics article will always be homely. :-) --KSmrqT 17:59, 30 July 2006 (UTC)

I'd second that! :) Featured article, anyone? Lunch 15:27, 2 August 2006 (UTC)

Context of "traumatic" quote[edit]

For anyone who's interested:

...
The history of mathematics has, in a sense, contrived to make mathematicians feel 'at home' with polynomials. This is well illustrated by Weierstrass' theorem. Why should one be interested in the fact that continuous functions can be satisfactorily approximated by polynomials? The assumption is implicit in the theorem that this is a desirable thing to do since polynomials are such agreeable functions.
2. NUMERICAL EVALUATION OF POLYNOMIALS
The cosy relationship that mathematicians enjoyed with polynomials suffered a severe setback in the early fifties when electronic computers came into general use. Speaking for myself I regard it as the most traumatic experience in my career as a numerical analyst.
The early electronic computers had no hardware facilities for computation in floating-point arithmetic and operation in this mode was effected by means of subroutines. Having constructed such a set of subroutines for our first electronic computer, PILOT ACE, I looked for a simple problem on which to give them a field trial. Computing the largest real root of a real polynomial equation seemed tailor-made for the exercise. Moreover since polynomials of even quite a modest degree can assume values covering a very wide range over the interval including their zeros, the use of floating-point arithmetic is particularly appropriate.[1] I was well aware that there were practical difficulties associated with the problem but imagined that they were mainly confined to equations with multiple roots or pathologically close roots. I felt that it was not the time to concern myself with these difficulties and I decided, therefore, to make two programs with quite limited objectives.
The first took n prescribed real numbers and constructed the explicit polynomial by multiplying the factors together. This in itself provided a simple test of the basic subroutines.
The second took the resulting polynomial f(x) and, starting with the initial approximation , used the Newton Raphson iterative method
to locate the largest root.
It was my intention to avoid difficulties by choosing well separated zi. An attractive feature of this problem is that the background theory is trivial. With exact computation the approximations tend monotonically to the root z1 and convergence is ultimately quadratic.
After using these programs with complete success on examples of order three, I decided to give them their first genuine test. I took n=20 and the polynomial with roots 20, 19, ..., 2, 1 and submitted it to the Newton Raphson program expecting immediate success. To my disappointment the iteration continued indefinitely and no root emerged. Since the polynomial was clearly devoid of all guile, I assumed that there was an error either in the floating-point subroutines or the two polynomial programs.
Failing to find an error in the subroutines and verifying that the computed polynomial was correct to working accuracy, I returned to the zero finder and examined the successive iterates xr and the computed values f(xr). The xr appeared to be virtually random numbers in the neighbourhood of 20 while the computed values were of arbitrary sign and were in any case much larger than the true values. (Notice that the true values of f(xr) could be computed directly via since we were in the privileged position of knowing the roots a priori.[2])
Now the interesting thing about this behaviour is that it would not have been difficult to predict, but it was so unexpected that it was some time before I could bring myself to attempt the analyis; I was convinced that either one of the programs or the computer was at fault.
3. ELEMENTARY ERROR ANALYSIS
We wish to avoid any detailed rounding error analysis in this article.[3] Fortunately a very simplified analysis is adequate to demonstrate that the observed behaviour is to be expected. Let us concentrate on the evaluation of the polynomial
at z=x. This was acheived via the algorithm
sn=1
sp=xsp+1+ap (p=n-1, ..., 1,0),
f(x)=s0,
usually known as nested multiplication. ...
  1. ^ My (Lunch's) note: floating point is appropriate as compared to fixed point.
  2. ^ Me again: it seems that Wilkinson is implying that he expanded the polynomial to compute (its value and) its derivative in his coding of the Newton-Raphson method, and that therein lay the bugaboo.
  3. ^ Me again: read that sentence twice.

Lunch 23:18, 30 July 2006 (UTC)

Thank you very much for this. The citation: "the computed values were of arbitrary sign and were in any case much larger than the true values" confirms that this is a floating point round-off problem. Bo Jacoby 09:40, 1 August 2006 (UTC)

I think you misinterpret the significance of the quote "We wish to avoid any detailed rounding error analysis in this article." Wilkinson is saying that rounding errors are not the most important issue here. Don't get me (or him) wrong: analysis of rounding errors (and - more to the point - catastrophic cancellation errors here) is important in numerical analysis both practically and theoretically. But using the coefficients of a polynomial to determine its roots is an ill-conditioned problem: two polynomials which have nearly the same coefficients can have wildly different roots. To say it a third way: no matter what precision arithmetic you use, if you mis-specify the coefficients even by tiny amounts, you will mis-calculate the roots by far. Lunch 00:01, 2 August 2006 (UTC)

reversion[edit]

I have removed two major alterations by Bo Jacoby, for two different reasons.

  1. The carefully formatted, easy-to-read display of the power-series form of the polynomial was changed to a wiki format. The only dubious benefit visible in the change was an explicit application of Horner's rule for evaluation. Not only is that a distraction here, the reformatting made the polynomial and its coefficients a huge run-on mess. Sorry, but this was a really bad idea.
  2. A new section was inserted before the stability analysis discussing round-off error. This is unsatisfactory for several reasons. First, the heart of the example — as stated by Wilkinson himself — is not roundoff, and to interpose this in the logical flow before a discussion of the essential "stability analysis" is not appropriate. Discussions on the talk page include comments from three different editors contesting your desire to treat this as about round-off. Since Jitse Niesen happens to be trained in numerical analysis (as he states on his user page), perhaps you should respect his dissent.

There is no question that floating point issues are important in polynomial root-finding, as I think we all agree. Where we disagree is that Wilkinson's example is about round-off. We have his own words saying it is not. If you don't respect us, at least respect the published source. --KSmrqT 16:04, 1 August 2006 (UTC)

I would add that if for some strange reason or another someone decides to put the information about rounding errors back, please note that there was a mistake in interpreting just what's in a 48-bit floating point number. There are not 48 bits in the mantissa. At least one bit is used as a sign bit, and many bits will be used for the exponent. Think scientific notation. I don't know the specifications for the PILOT ACE computer, but I'd hazard a guess that only 40 bits or fewer were used for the mantissa.

Also, don't confuse the term "double precision" with the IEEE double precision floating point format. The term "double precision" means exactly that: double the precision that was used for some prior calculation (to refine it or to check it). Since the users of the PILOT ACE (and other early computers) wrote their own floating point arithmetic routines in software (no FPUs here), they could pick their own formats (and what precision to use). IEEE set a standard "single precision" and "double precision" format to make portable software (predictably portable) possible. (And to set a platform against which the then established hardware could be better optimized, and to suggest to hardware designers and software writers what features would be best included. Among others - Kahan, forgive me what I've left out. ;)

I guess this is a long-winded way of saying: be careful about linking willy-nilly to the Wikipedia article on IEEE double precision arithmetic when this isn't what's intended (or appropriate). Lunch 00:18, 2 August 2006 (UTC)

I don't know about the PILOT ACE, but Wilkinson's 1963 book says that in the floating-point subroutines on the ACE, the mantissa is stored in one 48-bit word, and that another word is used for the exponent. This makes the computations faster. -- Jitse Niesen (talk) 02:59, 2 August 2006 (UTC)
Huh. OK. Forget what I said about the mantissa.  :) Lunch 06:52, 2 August 2006 (UTC)
Actually, I think I still have a couple of questions. Was the sign bit stored among the 48? And was the leading "1" assumed (or stored)? I suppose if it's "yes" to both, then there's still 48 bits of precision. (And a PS: I noticed on rereading what I first wrote, I used "you" to mean collective "you" and not anyone in particular so I changed it to "someone". Sorry if I caused any offense.) Lunch 15:23, 2 August 2006 (UTC)
The answer is "yes" to both: The sign is stored with the mantissa, and the leading "1" is hidden. (I'm doing a bit of reading between the lines here as Wilkinson is not very explicit on the floating-point format being used). -- Jitse Niesen (talk) 03:02, 3 August 2006 (UTC)
Are you reading the interview published in Byte magazine (Feb. '85)? Also, I noticed that R.e.b. added a comment to the article explaining why Wilkinson chose 2^-23 to perturb the coefficient of x^19: this was one ULP for 210.0 with a 30-bit binary mantissa. -- Lunch 05:16, 3 August 2006 (UTC)
My source is the 1963 book mentioned in the references. I don't know anything about the interview. I also don't know where the 30-bit mantissa is coming from. I read in Pilot ACE that that machine has 32-bit words, while the ACE has 48-bit words. -- Jitse Niesen (talk) 05:52, 3 August 2006 (UTC)

In a polynomial, f(xm) = Σj=0..n cj xmj = cnΠj=1..n (xmrj ) , the arguments, xm , and the coefficients, cj , determine the values, f(xm) , which then determine the roots, rj . The trouble is in computing values from coefficients; not in computing roots from values. The trouble is due to rounding error. The polynomial being ill-conditioned means that more significant digits in the representation of numbers are required. As the coefficients are known exactly, the scenario of "two polynomials having nearly the same coefficients" is not what we are dealing with. The citation above about "arbitrary sign and much larger than the true values", has nothing to do with the polynomial being ill-conditioned. Computed with sufficient precision, the polynomial values, f(xm) , do not have arbitrary signs. If I "mis-specify the coefficients even by tiny amounts", then the polynomial values are in error by huge amounts. The following clarification makes the statement quantitative:

If the coefficient of x19 is decreased from −210 by 2−23 to −210.0000001192, then the polynomial value w(20) is decreased from 0 to −2−232019 = −6.25×1017.

Bo Jacoby 09:59, 2 August 2006 (UTC)

Eigenvalues[edit]

An important application is that of finding eigenvalues, rj , of an n×n matrix A. If each matrix element is represented by k bits, then A is represented by kn2 bits. The eigenvalues are the roots of the characteristic polynomial : f(x) = determinant(ExA) = Σ cjxj , where E is the unit matrix. (Note that if A is a diagonal matrix, then f(x) = Πj=1..n (xrj ) ) . So the polynomial is represented by the n coefficients (cj)j=0..n−1 and cn = 1. If each coefficient is represented by k bits, then only kn bits represents the essentials of the matrix, which used to be represented by kn2 bits. Information is thrown away. In order to preserve the information, each coefficient must be represented by kn bits. As n-double precision is not generally available, inferior methods of finding eigenvalues are used in practice, see Eigenvalue#Numerical_computations. This point is very important. Bo Jacoby 09:59, 2 August 2006 (UTC)

In practice, methods like the QR algorithm are used to calculate the eigenvalues of a matrix. Are you saying that calculating the eigenvalues by finding the zeros of its characteristic polynomial is a better method? That would be an interesting opinion for which I'd like to see some evidence. -- Jitse Niesen (talk) 03:02, 3 August 2006 (UTC)
Coincidentally, and perhaps ironically, the very same Wilkinson is the author of one of the most important texts in numerical analysis, The Algebraic Eigenvalue Problem (ISBN 978-0-19-853418-1), which taught us all the benefit of the QR approach, that we use with preliminary reduction to upper Hessenberg (or tridiagonal) form, implicit shifts, and the Gershgorin circle theorem convergence test. It reliably finds eigenvalues and eigenvectors with remarkably few iterations per value. Except for tiny matrices or perhaps special cases, forming the characteristic polynomial and computing its roots is unappealing, for a variety of reasons. --KSmrqT 04:58, 3 August 2006 (UTC)
Of course, how could I have forgotten that. I think the "which taught us all" betrays your age wealth of experience ;) -- Jitse Niesen (talk) 05:52, 3 August 2006 (UTC)

I do not know the "variety of reasons", referred to by KSmrq, why "forming the characteristic polynomial and computing its roots is unappealing". There are reasons to believe that the method was discarded when the art of root-finding was immature. (1): Wilkinson used Newton's method, which is not generally recommendable (finding only one root at a time, and fast only in the final stage of the iteration process). (2): The requirement of sufficient arithmetic precision for evaluating the polynomial was not recognized, and people talked nonsense about "ill-conditioned" problems. These might be the "variety of reasons". Yes, I am saying that calculating the eigenvalues by finding the zeros of its characteristic polynomial is a better method. The exact coefficients of the characteristic polynomial are computed from the matrix elements by a finite number of multiplications and additions, and the Durand-Kerner method of computing the roots of the characteristic polynomial from the coefficients is smooth. Bo Jacoby 07:41, 3 August 2006 (UTC)

A change in wording?[edit]

I have a small nit to pick, and I was wondering if I could get the opinion of others whether it's OK to make a few small changes in wording. My problem is with the way the word "ill-conditioned" is used.

To be super pedantic (bear with me here), a problem can be ill-conditioned but data for the problem is not. Some data will bear the brunt of the ill-conditioning, and approximate solutions to the problem for the data might not be near the true solution. (However, some other data might produce approximate solutions that are just fine.)

For example, in the problem of solving the linear system Au=f for u given A and f, we say that the problem is ill-conditioned if the matrix A has a large condition number (say the ratio of largest to smallest singular values of A is much bigger than one). For a given matrix A with big condition number, applying an approximate solution procedure (such as Gaussian elimination with floating-point arithmetic) to most right-hand-side data f will give a poor approximate u. However, not all data f will give a poor u. For instance, with f=0, Gaussian elimination will give the exact correct answer u=0 (no matter the condition number of A).

Of course, most people get sick and tired of saying over and over again that "the problem of solving Au=f with Gaussian elimination in floating-point arithmetic is ill-conditioned (when the ratio of largest to smallest singular values of A is large)". They'll just say "A is ill-conditioned". This is fine when it's understood what's implied, but I think the whole shebang needs to be said at least once.

I'll make some parallels with the article. The problem of finding the roots of a polynomial given its coefficients is an ill-conditioned problem. Some polynomials vividly demonstrate this ill-conditioning (such as Wilkinson's polynomial). Other polynomials (like x+5 or Wilkinson's second example with the geometrically spaced roots) do not (like f=0 for the linear system above). Of course, people get lazy and say that Wilkinson's polynomial is ill-conditioned. But I think this sort of phrasing was at least partly responsible for some of the misunderstanding in the discussion above.

In the article, the first mention of ill-conditioning is mentioned in the "Background" section. I might suggest rewording the paragraph:

Like all problems in numerical analysis, it is natural to ask whether it's well-conditioned. In this context, well-conditioned means that a small change in the coefficients ci leads to a correspondingly small change in the roots of the polynomial.

to something like:

It is a natural question in numerical analysis to ask whether the problem of finding the roots of p from the coefficients ci is well-conditioned. That is, we hope that a small change in the coefficients will lead to a small change in the roots. Unfortunately, that is not the case here.

I'd also change the first sentence in the next paragraph, "Polynomials with multiple roots are ill-conditioned. For instance, ..." to something like, "Polynomials with multiple roots show the effect of this ill-conditioning. For instance, ...".

I understand that it's much more succinct to say "Wilkinson's polynomial is ill-conditioned" so I don't intend on beating the article to death with my nitpicking. Just a few changes here and there. (And I'm gonna get that comma splice; it's just been razzing me. ;)

Comments or suggestions? Thanks for listening to my rant.  :) Lunch 01:54, 3 August 2006 (UTC)

I agree. -- Jitse Niesen (talk) 03:02, 3 August 2006 (UTC)

Precision of Pilot ACE[edit]

The article

  • J. H. Wilkinson (1984). The perfidious polynomial. Studies in Numerical Analysis, ed. by G. H. Golub, pp. 1–28. (Studies in Mathematics, vol. 24). Washington, D.C.: Mathematical Association of America,

states on page 6 :

  • "In floating-point arithmetic on the PILOT ACE the mantissa had 30 binary digits, ...."

and on page 12 states:

  • "The true a_19 is 210, which requires 8 binary digits for its representation; on a 30-binary digit computer the perturbation 2^-23 therefore represents a unit in the first position beyond the end of the computer representation of a_19."

R.e.b. 16:21, 3 August 2006 (UTC)

Thank you for this clarification. The point is that the two numbers, −210 and −210 − 2−23, have the same 30-bit floating-point representation. The formulation, "2−23 is an error of 1 in the first digit that the computer cannot represent", stinks. (What is it that the computer cannot represent? The digit? the error? 2−23 ? None of these interpretations are true. One must distinguish between digit position and digit value). Why not copy Wilkinson? This also pinpoints that the ill-conditionedness is really about floating-point accuracy. With sufficient accuracy the problem vanishes. This point is severely underemphasized in the article. Bo Jacoby 08:01, 4 August 2006 (UTC)

The formulation, "2−23 is an error of 1 in the first digit that the computer cannot represent", stinks.

No, actually, it is correct. Lunch 21:09, 4 August 2006 (UTC)

The formulation, "2−23 is an error of 1 in the first binary digit position that the computer does not represent", is better. The computer can represent any binary digit, both 0 and 1. Bo Jacoby 06:59, 7 August 2006 (UTC)

Background[edit]

Wilkinson is a better author than any of us, and so I included his own explanation in the background subsection of the article. However, it was immediately removed. What is left is not clear and not true. For example: "It is a natural question in numerical analysis to ask whether the problem of finding the roots of p from the coefficients ci is well-conditioned". No sir, that is not natural question. Wilkinson explains that he expected the software to work, and it did not, and he did not begin to perform the analysis until he had convinced himself that there was no error in the software. He did not ask such a "natural question". The statement is simply not true, nor is it clear to the reader why it should be the case. Bo Jacoby 11:39, 10 August 2006 (UTC)

accuracy versus precision[edit]

The article Accuracy discusses the difference between the two concepts accuracy and precision: "Accuracy is the degree of veracity while precision is the degree of reproducibility". As a floating point computation is perfectly reproducible, it has high precision, but if the result is wrong it has low accuracy. So what is needed is sufficient accuracy, not higher precision, according to that definition. The article Precision (arithmetic) does not agree in this distinction, however. Bo Jacoby 12:43, 10 August 2006 (UTC)

How can we possibly discuss the accuracy of a floating point representation? We may have two hundred bits of precision, and use those bits to compute the wrong result; where's the veracity? The right word is definitely "precision". It is also the conventional word.
You also may not realize that the first letter of an article name does not need to be capitalized in a link, and that use of a pipe ("|") automagically strips any disambiguating parenthetical component (among other things). Thus we need only enter
[[precision (arithmetic)|]]
not
[[Precision (arithmetic)|precision]]
and certainly not
[[Accuracy|accuracy]]
under any circumstances. --KSmrqT 20:15, 10 August 2006 (UTC)

Thanks for the useful hints on writing links. I agree that the article on precision uses the word 'precision' for the number of significant digits. But the article on accuracy puzzled me. If repeated computations give the same result, then the precision is good. If that result is wrong, then the accuracy is bad. The quantitative definitions are upside down, however, identifying deviation and precision, while high deviation means low precision. I suppose that the accuracy of the representation is a measure of the round-off error of the representation, while the accuracy of a calculated result is the distance to the true result. But the bottom line is that the two articles do not agree on terminology. Bo Jacoby 12:42, 11 August 2006 (UTC)

well-conditioned basis[edit]

Hi KSmrq. I inserted, and you deleted, this sentence: "However, approximate knowledge of the roots is required in order to construct a well-conditioned basis, so this is not a path for solving algebraic equations without using high precision arithmetic". As you think it is wrong, you must argue for a low precision path to solving equations which otherwise can only be solved using high precision arithmetic. Bo Jacoby 12:30, 14 August 2006 (UTC)

I think you're turning it around. Statements should not be included unless supported by a reference (see WP:V for the relevant policy). So, first you should explain why the statement should be inserted. -- Jitse Niesen (talk) 14:57, 14 August 2006 (UTC)

The point of the subsection is that there exists a path transforming ill-conditions problems into well-conditioned problems using only standard precision arithmetic, but no reference or proof is given. Really the right thing to do is to remove the whole subsection as pointless. Bo Jacoby 11:09, 15 August 2006 (UTC)

No, transformation has never been the point; it won't work, as has been stated here explicitly. --KSmrqT 12:53, 15 August 2006 (UTC)

What, then, is the point of discussing well-conditioned bases, if you cannot transform your ill-conditioned problem into those bases ? Bo Jacoby 13:45, 15 August 2006 (UTC)

The question being left unanswered for days, I'll remove the confusing subsection. Bo Jacoby 12:07, 18 August 2006 (UTC)

The question has been answered over and over and over again. We're tired of repeating ourselves. The section is important and stays. --KSmrqT 13:18, 18 August 2006 (UTC)

I don't follow. When the point of the subsection does not follow from the subsection, then the subsection has no point. "First you should explain why the statement should be inserted." says Jitse. You write: "If the polynomial is expressed in another basis, then the problem of finding its roots may cease to be ill-conditioned", suggesting that it is possible to express the polynomial in another basis without finding its roots. If that is not the case, then what is the point? An author must have tireless patience with his reader. Bo Jacoby 13:18, 20 August 2006 (UTC)

Monster?[edit]

Sometimes the Wilkinson polynomial is called Wilkinson monster. Actually I know the Wilkinson polynomial as "monster" as not as just a mere polynomial. --84.58.232.187 20:31, 16 July 2007 (UTC)

Arithmetic progression[edit]

This section created to put the phrase "arithmetic progression" on the talk page. Note that zn moves around the unit circle with constant velocity as z moves up the imaginary axis. Granted we only care about z moving along the real axis when searching for roots of the polynomial f(z), but one very easy way of proving the fundamental theorem of arithmetic is to consider a growing circle around which z runs. The fact that the root-producing radii form an arithmetic progression should tell us something about how f(z) is behaving in the complex plane, yes? If you have a degree n polynomial whose roots don't form an arithmetic progression is that any less of a monster? Maybe or maybe not, but why is the fact that the roots form an arithmetic progression being so uniformly ignored? --76.102.51.128 (talk) 14:00, 19 September 2009 (UTC)

What is the point of the Lagrange section?[edit]

Wilkinson's polynomial is about the ill conditioning of finding the roots given the coefficients. I fail to see the relevance of the Lagrange section to this question, since the Lagrange basis employed here presupposes that you know where the roots are. Trivially, the roots are a well-conditioned (identity) function of the roots!

More generally, this article analyzes the problem in more detail and shows that the Lagrange basis leads to ill-conditioned root computations if the Lagrange interpolation nodes are far from the roots:

My inclination is that the Lagrange section as it stands now, which is both unsourced and confusing, should be scrapped.

— Steven G. Johnson (talk) 16:39, 10 June 2011 (UTC)