Jump to content

Sturm's theorem: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m →‎top: - copy-edit; wl univariate
→‎The theorem: +citation
(35 intermediate revisions by 2 users not shown)
Line 1: Line 1:
In [[mathematics]], the '''Sturm sequence''' of a [[univariate]] [[polynomial]] {{mvar|p}} is a sequence of polynomials associated with {{mvar|p}} and its derivative by a variant of [[Euclid's algorithm for polynomials]]. '''Sturm's theorem''' expresses the number of distinct [[real number|real]] [[Root of a function|root]]s of {{mvar|p}} located in an [[interval (mathematics)|interval]] in terms of the number of changes of signs of the values of the Sturm sequence at the bounds of the interval. Applied to the interval of all the real numbers, it gives the total number of real roots of {{mvar|p}}.
In [[mathematics]], the '''Sturm sequence''' of a [[univariate polynomial]] {{mvar|p}} is a sequence of polynomials associated with {{mvar|p}} and its derivative by a variant of [[Euclid's algorithm for polynomials]]. '''Sturm's theorem''' expresses the number of distinct [[real number|real]] [[Root of a function|root]]s of {{mvar|p}} located in an [[interval (mathematics)|interval]] in terms of the number of changes of signs of the values of the Sturm sequence at the bounds of the interval. Applied to the interval of all the real numbers, it gives the total number of real roots of {{mvar|p}}.<ref name=bpr>{{harv|Basu|Pollack|Roy|2006}}</ref>


Whereas the [[fundamental theorem of algebra]] readily yields the overall number of [[complex number|complex]] roots, counted with [[multiplicity (mathematics)|multiplicity]], it does not provide a procedure for calculating them. Sturm's theorem counts the number of distinct real roots and locates them in intervals. By subdividing the intervals containing some roots, it can isolate the roots into arbitrary small intervals, each containing exactly one root. This yields an arbitrary-precision numeric [[root finding algorithm]] for univariate polynomials.
Whereas the [[fundamental theorem of algebra]] readily yields the overall number of [[complex number|complex]] roots, counted with [[multiplicity (mathematics)|multiplicity]], it does not provide a procedure for calculating them. Sturm's theorem counts the number of distinct real roots and locates them in intervals. By subdividing the intervals containing some roots, it can isolate the roots into arbitrary small intervals, each containing exactly one root. This yields an arbitrary-precision numeric [[root-finding algorithm]] for univariate polynomials.

For computing over the [[real number|reals]], Sturm's theorem is less efficient than other methods based on [[Descartes' rule of signs]]. However, it works on every [[real closed field]], and, therefore, remains fundamental for the theoretical study of the [[computational complexity]] of [[decidability of first-order theories of the real numbers|decidability]] and [[quantifier elimination]] in the [[first order theory]] of real numbers.


The Sturm sequence and Sturm's theorem are named after [[Jacques Charles François Sturm]].
The Sturm sequence and Sturm's theorem are named after [[Jacques Charles François Sturm]].


==Sturm chains==
==The theorem==
A '''Sturm chain''' or '''Sturm sequence''' is a finite sequence of polynomials
The '''Sturm chain''' or '''Sturm sequence''' of a [[univariate polynomial]] {{math|''P''(''x'')}} with real coefficients is the sequence of polynomials <math>P_0, P_1, \ldots,</math> such that
:<math>p_0, p_1, \dots, p_m</math>
of decreasing [[degree of a polynomial|degree]] with these following properties:
* {{math|''p''<sub>0</sub> {{=}} ''p''}} is [[square-free polynomial|square free]] (no square factors, i.e., no repeated roots);
* if {{math|''p''(''x'') {{=}} 0}}, then {{math|sign(''p''<sub>1</sub>(''x'')) {{=}} sign(''p′''(''x''))}};
* if {{math|''p<sub>i</sub>''(''x'') {{=}} 0}} for {{math|0 < ''i'' < ''m''}} then {{math|sign(''p''<sub>''i''−1</sub>(''x'')) {{=}} −sign(''p''<sub>''i''+1</sub>(''x''))}};
* {{math|''p<sub>m</sub>''}} does not change its sign.

Sturm's sequence is a modification of [[Budan's theorem#Fourier sequence|Fourier's sequence]].

To obtain a Sturm chain, Sturm himself proposed to choose the intermediary results when applying [[Euclid's algorithm]] to {{mvar|p}} and its [[derivative]]:

:<math>\begin{align}
:<math>\begin{align}
p_0(x) & := p(x),\\
P_0&=P,\\
p_1(x) & := p'(x),\\
P_1&=P',\\
p_2(x) & := -\operatorname{rem}(p_0, p_1) = p_1(x) q_0(x)- p_0(x),\\
P_{i+1}&=-\operatorname{rem}(P_{i-1},P_i),
\end{align}</math>
p_3(x) & := -\operatorname{rem}(p_1,p_2) = p_2(x) q_1(x) - p_1(x),\\
for {{math|''i'' ≥ 1}}, where {{math|''P'{{void}}''}} is the [[derivative]] of {{mvar|P}}, and <math>\operatorname{rem}(P_{i-1},P_i)</math> is the remainder of the [[Euclidean division of polynomials|Euclidean division]] of <math>P_{i-1}</math> by <math>P_i.</math> The length of the Sturm sequence is at most the degree of {{mvar|P}}.
&{}\ \ \vdots\\
0 & = -\operatorname{rem}(p_{m-1}, p_m),
\end{align} </math>


The number of [[sign variation]]s at {{mvar|ξ}} of the Sturm sequence of {{mvar|P}} is the number of sign changes–ignoring zeros—in the sequence of real numbers
where {{math|rem(''p<sub>i</sub>'', ''p<sub>j</sub>'')}} and {{math|''q<sub>i</sub>''}} are the remainder and the quotient of the [[polynomial long division]] of {{math|''p<sub>i</sub>''}} by {{math|''p<sub>j</sub>''}}, and where {{mvar|m}} is the minimal number of polynomial divisions (never greater than {{math|deg(''p'')}}) needed to obtain a zero remainder. That is, successively take the remainders with [[polynomial#Divisibility|polynomial division]] and change their signs. Since {{math|deg(''p''<sub>''i''+1</sub>) < deg(''p<sub>i</sub>'')}} for {{math|0 ≤ ''i'' < ''m''}}, the algorithm terminates. The final polynomial, {{math|''p<sub>m</sub>''}}, is the [[Polynomial greatest common divisor|greatest common divisor]] of {{mvar|p}} and its derivative. If {{mvar|p}} is square free, it shares no roots with its derivative, hence {{math|''p<sub>m</sub>''}} will be a non-zero constant polynomial. The Sturm chain, called the '''canonical Sturm chain''', then is
:<math>P_0(\xi), P_1(\xi),P_2(\xi),\ldots.</math>
This number of sign variations is denoted here {{math|''V''(''ξ'')}}.


Sturm's theorem states that, if {{mvar|P}} is a [[square-free polynomial]], the number of distinct real roots of {{mvar|P}} in the [[half-open interval]] {{math|(''a'', ''b'']}} is {{math|''V''(''a'') − ''V''(''b'')}} (here, {{mvar|a}} and {{mvar|b}} are real numbers such that {{math|''a'' < ''b''}}).<ref name="bpr" />
:<math>p_0,p_1,p_2,\ldots,p_m .</math>


The theorem extends to unbounded intervals by defining the sign at {{math|+∞}} of a polynomial as the sign of its [[leading coefficient]] (that is, the coefficient of the term of highest degree). At {{math|–∞}} the sign of a polynomial is the sign of its leading coefficient for a polynomial of even degree, and the opposite sign for a polynomial of odd degree.
If {{mvar|p}} is not square-free, this canonical sequence does not formally satisfy the definition of a Sturm chain above; nevertheless it still satisfies the conclusion of Sturm's theorem (below).


In the case of a non-square-free polynomial, if neither {{mvar|a}} nor {{mvar|b}} is a multiple root of {{mvar|p}}, then {{math|''V''(''a'') − ''V''(''b'')}} is the number of ''distinct'' real roots of {{mvar|P}}.
==Statement==
Let {{math| ''p''<sub>0</sub>, ..., ''p<sub>m</sub>''}} be the Sturm chain of a square-free polynomial {{mvar| ''p''}}, and let {{math|''σ''(''ξ'')}} denote the number of sign changes (ignoring zeroes) in the sequence


The proof of the theorem is as follows: when the value of {{mvar|x}} increases from {{mvar|a}} to {{mvar|b}}, it may pass through a zero of some <math>P_i</math> ({{math|''i'' > 0}}); when this occurs, the number of sign variations of <math>(P_{i-1}, P_i, P_{i+1})</math> does not changes. When {{mvar|x}} passes through a root of <math>P_0=P,</math> the number of sign variations of <math>(P_0, P_1)</math> decreases from 1 to 0. These are the only values of {{mvar|x}} where some sign may change.
:<math>p_0(\xi), p_1(\xi), p_2(\xi),\ldots, p_m(\xi).</math>

Sturm's theorem then states that for two real numbers {{math|''a'' < ''b''}}, the number of distinct roots of {{mvar|p}} in the half-open interval {{math|(''a'', ''b'']}} is {{math|''σ''(''a'') − ''σ''(''b'')}}.

===The non-square-free case===
Let {{math| ''p''<sub>0</sub>, ..., ''p<sub>m</sub>''}} be the canonical Sturm sequence of a polynomial {{mvar|p}}, not necessarily square-free. Then {{math|''σ''(''a'') − ''σ''(''b'')}} is the number of distinct roots of {{mvar|p}} in {{math|(''a'', ''b'']}} whenever {{math|''a'' < ''b''}} are real numbers such that neither {{mvar|a}} nor {{mvar|b}} is a multiple root of {{mvar|p}}.


==Example==
==Example==
Line 50: Line 35:
\end{align}</math>
\end{align}</math>


Using [[polynomial long division]] to divide {{math|''p''<sub>0</sub>}} by {{math|''p''<sub>1</sub>}} gives the remainder <math>-\tfrac{3}{16}x^2-\tfrac{3}{4}x-\tfrac{15}{16}</math>, and upon multiplying this remainder by {{math|−1}} we obtain <math>p_2(x)=\tfrac{3}{16}x^2+\tfrac{3}{4}x+\tfrac{15}{16}</math>. Next dividing {{math|''p''<sub>1</sub>}} by {{math|''p''<sub>2</sub>}} and multiplying the remainder by {{math|−1}}, we obtain <math>p_3(x)=-32x-64</math>. And dividing {{math|''p''<sub>2</sub>}} by {{math|''p''<sub>3</sub>}} and multiplying the remainder by {{math|−1}}, we obtain <math>p_4(x)=-\tfrac{3}{16}</math>.
The remainder of the [[Euclidean division]] of {{math|''p''<sub>0</sub>}} by {{math|''p''<sub>1</sub>}} is <math>-\tfrac{3}{16}x^2-\tfrac{3}{4}x-\tfrac{15}{16};</math> multiplying it by {{math|−1}} we obtain
:<math>p_2(x)=\tfrac{3}{16}x^2+\tfrac{3}{4}x+\tfrac{15}{16}</math>.
Next dividing {{math|''p''<sub>1</sub>}} by {{math|''p''<sub>2</sub>}} and multiplying the remainder by {{math|−1}}, we obtain
:<math>p_3(x)=-32x-64</math>.
Now dividing {{math|''p''<sub>2</sub>}} by {{math|''p''<sub>3</sub>}} and multiplying the remainder by {{math|−1}}, we obtain
:<math>p_4(x)=-\tfrac{3}{16}</math>.
As this is a constant, this finishes the computation of the Sturm sequence.


To find the number of real roots between of <math>p_0</math> one has to evaluate the sequences of the signs of these polynomials at {{math|−∞}} and {{math|∞}}, which are respectively {{math|(+, −, +, +, −)}} and {{math|(+, +, +, −, −)}}. Thus
Thus the complete chain of Sturm polynomials is:
:<math>V(-\infty)-V(+\infty) = 3-1=2,</math>
which shows that {{mvar|p}} has two real roots.


This can be verified by noting that {{math|''p''(''x'')}} can be factored as {{math|(''x''<sup>2</sup> − 1)(''x''<sup>2</sup> + ''x'' + 1)}}, where the first factor has the roots {{math|−1}} and {{math|1}}, and second factor has no real roots. This last assertion results from the [[quadratic formula]], and also from Sturm's theorem, which gives the sign sequences {{math|(+, –, –)}} at {{math|−∞}} and {{math|(+, +, –)}} at {{math|+∞}}.
:<math>p_0(x) = x^4+x^3-x-1</math>
:<math>p_1(x) = 4x^3+3x^2-1</math>
:<math>p_2(x) = \tfrac{3}{16}x^2+\tfrac{3}{4}x+\tfrac{15}{16}</math>
:<math>p_3(x) = -32x-64</math>
:<math>p_4(x) = -\tfrac{3}{16}</math>


==Generalization==
To find the number of roots between {{math|−∞}} and {{math|∞}}, first evaluate {{math|''p''<sub>0</sub>, ''p''<sub>1</sub>, ''p''<sub>2</sub>, ''p''<sub>3</sub>,}} and {{math|''p''<sub>4</sub>}} at {{math|−∞}} and note the sequence of signs of the results: {{math|+ − + + −}}, which contains three sign changes ({{math|+}} to {{math|−}}, then {{math|−}} to {{math|+}}, then {{math|+}} to {{math|−}}). The same procedure for {{math|∞}} gives the sign sequence {{math|+ + + − −}}, which contains just one sign change. Hence the number of roots of the original polynomial between {{math|−∞}} and {{math|∞}} is {{math|3 − 1 {{=}} 2.}} That this is correct can be seen by noting that {{math|''p''(''x'') {{=}} ''x''<sup>4</sup> + ''x''<sup>3</sup> − ''x'' − 1}} can be factored as {{math|(''x''<sup>2</sup> − 1)(''x''<sup>2</sup> + ''x'' + 1)}}, where it is readily verifiable that {{math|''x''<sup>2</sup> − 1}} has the two roots {{math|−1}} and {{math|1}} while {{math|''x''<sup>2</sup> + ''x'' + 1}} has no real roots. In more complicated examples in which there is no advance knowledge of the roots because factoring is either impossible or impractical, one can experiment with various finite bounds for the range to be considered, thus narrowing down the locations of the roots.
Sturm sequences have been generalized in two directions. To define each polynomial in the sequence, Sturm used the negative of the remainder of the [[Euclidean division]] of the two preceding ones. The theorem remains true if one replaces the negative of the remainder by its product or quotient by a positive constant or the square of a polynomial. It is also useful (see below) to consider sequences where the second polynomial is not the derivative of the first one.


A ''generalized Sturm sequence'' is a finite sequence of polynomials with real coefficients
==Proof==
:<math>P_0, P_1, \dots, P_m</math>
Polynomials are [[continuous function]]s, and any sign change must occur at a root, so consider the behavior of a Sturm chain around the roots of its constituent polynomials.
such that
* the degrees are decreasing after the first one: <math>\deg P_{i} <\deg P_{i-1}</math> for {{math|1=''i'' = 2, ..., m}};
* <math>P_m</math> does not have any real root or does not changes of sign near its real roots.
* if {{math|''P<sub>i</sub>''(''ξ'') {{=}} 0}} for {{math|0 < ''i'' < ''m''}} and {{mvar|ξ}} a real number, then {{math|''P''<sub>''i'' −1 </sub>(''ξ'') ''P''<sub>''i'' + 1</sub>(''ξ'') < 0}}.


The last condition implies that two consecutive polynomials do not have any common real root. In particular
First, note that two adjacent polynomials can share a common root only when it is a multiple root of {{mvar|p}} (in which case it is a root of every {{math|''p<sub>i</sub>''}}). Indeed, if {{math|''p<sub>i</sub>''(''ξ'') {{=}} ''p''<sub>''i''−1</sub> (''ξ'') {{=}} 0}}, then {{math|''p''<sub>''i''+1</sub>(''ξ'') {{=}} 0}}, since {{math|sign(''p''<sub>''i''−1</sub>(''ξ'')) {{=}} −sign(''p''<sub>''i''+1</sub>(''ξ''))}}. The zero then propagates recursively up and down the chain, so that {{mvar|ξ}} is a root of all the polynomials {{math|''p''<sub>0</sub>, ..., ''p<sub>m</sub>''}}.
the original Sturm sequence is a generalized Sturm sequence, if (and only if) the polynomial has no multiple real root (if a polynomial has a multiple root, the two first polynomials of its Sturm sequence have a common toot).


When computing the original Sturm sequence by Euclidean division, it may happen that one encounters a polynomial that has a factor that is never negative, such a <math>x^2</math> or <math>x^2+1</math>. In this case, if one continues the computation with the polynomial replaced by its quotient by the nonnegative factor, one gets a generalized Sturm sequence, which may also be used for computing the number of real roots, since the proof of Sturm's theorem still applies (because of the third condition). This may sometimes simplify the computation, although it is generally difficult to find such nonnegative factors, except for even powers of {{mvar|x}}.
Next, consider roots of polynomials in the interior (i.e., {{math|0 < ''i'' < ''m''}}) of the Sturm chain that are not multiple roots of {{mvar|p}}. If {{math|''p<sub>i</sub>''(''ξ'') {{=}} 0}}, then from the previous paragraph it is true that {{math|''p''<sub>''i''−1</sub>(''ξ'') ≠ 0}} and {{math|''p''<sub>''i''+1</sub>(''ξ'') ≠ 0}}. Furthermore, {{math|sign(''p''<sub>''i''−1</sub>(''ξ'')) {{=}} −sign(''p''<sub>''i''+1</sub>(''ξ''))}}. Since {{math|''p''<sub>''i''−1</sub>}} and {{math|''p''<sub>''i''+1</sub>}} are continuous, {{math|sign(''p''<sub>''i''+1</sub>(''x'')) {{=}} −sign(''p''<sub>''i''−1</sub>(''x''))}} for every {{mvar|x}} in the vicinity of {{mvar|ξ}}. Similarly, the sign of {{math|''p<sub>i</sub>''(''x'')}} is constant before and after {{mvar|ξ}}, and changes as {{mvar|x}} is crossing {{mvar|ξ}}. Thus, whenever {{mvar|x}} is crossing {{mvar|ξ}}, moving from left to right, the part {{math|''p''<sub>''i''−1</sub>, ''p<sub>i</sub>'', ''p''<sub>''i''+1</sub>}} of the Sturm chain loses a sign change at one side, and acquires a new sign change at the other side. Consequently, the total number of sign changes is never affected by the polynomial variations in the interior of the chain, and only roots of the original polynomial, at the top of the chain, can affect the total number of sign changes.


==Use of pseudo-remainder sequences==
Consider a root {{mvar|ξ}}, so {{math|''p''(''ξ'')}}, and assume first that it is a simple root. Then the derivative of {{mvar|p}}, which is {{math|''p''<sub>1</sub>}}, must be non-zero at {{mvar|ξ}}, so {{mvar|p}} must be either increasing or decreasing at {{mvar|ξ}}. If it's increasing, then its sign is changing from negative to positive when moving from left to right while its derivative (again, {{math|''p''<sub>1</sub>}}) is positive, so the total number of sign changes decreases by one. Conversely, if it's decreasing, then its sign changes from positive to negative while its derivative is negative, so again the total number of sign changes decreases by one.
In [[computer algebra]], the polynomials that are considered have integer coefficients or may be transformed to have integer coefficients. The Sturm sequence of a polynomial with integer coefficients generally contains polynomials whose coefficients are not integers (see above example).


To avoid computation with [[rational number]]s, a common method is to replace [[Euclidean division of polynomials|Euclidean division]] by [[pseudo-remainder|pseudo-division]] for computing [[polynomial greatest common divisor]]s. This amounts to replacing the remainder sequence of the [[Euclidean algorithm for polynomials|Euclidean algorithm]] by a [[pseudo-remainder sequence]], a pseudo remainder sequence being a sequence <math>p_0, \ldots, p_k</math> of polynomials such that there are constants <math>a_i</math> and <math>b_i</math> such that <math>b_ip_{i+1}</math> is the remainder of the Euclidean division of <math>p_{i-1}</math> by <math>p_i.</math> For example, the remainder sequence of the Euclidean algorithm is a pseudo-remainder sequence with <math>a_i=b_i=1</math> for every {{mvar|i}}, and the Sturm sequence of a polynomial is a pseudo-remainder sequence with <math>a_i=1</math> and <math>b_i=-1</math> for every {{mvar|i}}.
Finally, let {{mvar|ξ}} be a multiple root of {{mvar|p}}, and let {{math|''p''<sub>0</sub>, ..., ''p<sub>m</sub>''}} be the canonical Sturm chain. Let ''d'' = gcd(''p'',''p′''), {{math|''q''<sub>0</sub>}} = ''p''/''d'' and {{math|''q''<sub>1</sub>}} = ''p′''/''d'', and let {{math|''q''<sub>0</sub>, ..., ''q<sub>m</sub>''}} be the Sturm chain given by dividing the {{math|''p''<sub>i</sub>}} by ''d''. Then {{math|''σ''(''x'')}} is the same for both chains whenever {{mvar|x}} is not a root of {{mvar|d}}. And the number of sign changes (in either chain) around {{mvar|ξ}} decreases by one, since {{mvar|ξ}} is a simple root of {{math|''q''<sub>0</sub>}} and is not a root of {{math|''q''<sub>1</sub>}}.


Various pseudo-remainder sequences have been designed for computing greatest common divisors of polynomials with integer coefficients without introducing denominators (see [[Pseudo-remainder sequence]]). They can all be made generalized Sturm sequences by choosing the sign of the <math>b_i</math> to be the opposite of the sign of the <math>a_i.</math> This allows the use of Sturm's theorem with pseudo-remainder sequences.
In summary, only sign changes at roots of the original polynomial affect the total number of sign changes across the chain, and the total number of sign changes always decreases by one as roots are passed. The theorem follows directly.


==Root isolation==
==History and related methods==
For counting and isolating the real roots, other methods are usually preferred, because they are computationally more efficient; these methods all use [[Descartes' rule of signs]] (just like Fourier<ref name=Fourier>{{cite journal|last=Fourier|first=Jean Baptiste Joseph|title=Sur l'usage du théorème de Descartes dans la recherche des limites des racines|year=1820|journal=Bulletin des Sciences, par la Société Philomatique de Paris|pages=156–165|url=https://archive.org/details/bulletindesscien20soci}}</ref> did back in 1820) and [[Budan's theorem#Vincent's theorem (1834 and 1836)|Vincent's theorem]]. The very first one of those methods was initially called "modified Uspensky's algorithm" by its inventors,<ref name=CA>{{cite book|last=Collins|first=George E.|author2=Alkiviadis G. Akritas |title =Polynomial Real Root Isolation Using Descartes' Rule of Signs|year = 1976|pages=272–275|series = SYMSAC '76, Proceedings of the third ACM symposium on Symbolic and algebraic computation|publisher = ACM|location = Yorktown Heights, NY, USA|url=http://doi.acm.org/10.1145/800205.806346}}</ref> but it was later shown that there is no Uspensky's method;<ref name="akritas">{{cite book|last=Akritas|first=Alkiviadis G.|title=There is no "Uspensky's Method"|url=http://dl.acm.org/citation.cfm?id=32457|year=1986|publisher=In: Proceedings of the fifth ACM Symposium on Symbolic and Algebraic Computation (SYMSAC '86, Waterloo, Ontario, Canada), pp. 88–90}}</ref> afterwards, people started calling it either "Collins–Akritas method"<ref name=Descartes>{{cite book|last=Akritas|first=Alkiviadis G.|title=There is no "Descartes' method"|url=http://inf-server.inf.uth.gr/~akritas/articles/71.pdf|year=2008|publisher=In: M.J.Wester and M. Beaudin (Eds), Computer Algebra in Education, AullonaPress, USA, pp. 19–35}}</ref> or "Descartes' method" only to be shown that there is no Descartes' method<ref name="Descartes"/> either. Finally, François Boulier, of the University of Lille,<ref>{{cite book|last=Boulier|first=François|title=Systèmes polynomiaux : que signifie " résoudre " ?|url=http://www.lifl.fr/~boulier/polycopies/resoudre.pdf|year=2010|publisher=Université Lille 1}}</ref> p.&nbsp;24, gave it the name "[[Vincent's theorem#Vincent–Collins–Akritas (VCA, 1976)|Vincent–Collins–Akritas]]" (VCA for short) to also give credit to Vincent. VCA is a bisection method; there exists also a continued fractions method based on [[Budan's theorem#Vincent's theorem (1834 and 1836)|Vincent's theorem]] namely, the ''[[Vincent's theorem#Vincent–Akritas–Strzeboński (VAS, 2005)|Vincent–Akritas–Strzeboński]]'' (VAS) method.<ref name=VAS>{{cite journal|last=Akritas|first=Alkiviadis G. |author2=A.W. Strzeboński |author3=P. S. Vigklas|title=Improving the performance of the continued fractions method using new bounds of positive roots|journal=Nonlinear Analysis: Modelling and Control| year=2008| volume=13| pages=265–279| url=http://www.lana.lt/journal/30/Akritas.pdf}}</ref>


For a polynomial with real coefficients, ''root isolation'' consists of finding, for each real root, an interval that contains this root, and no other roots.
VAS is based on [[Budan's theorem#Budan's theorem|Budan's theorem]] whereas Sturm's method was inspired by [[Budan's theorem#Fourier's theorem|Fourier's theorem]]. In fact Sturm himself,<ref>{{cite journal|last=Hourya|first=Benis-Sinaceur|title=Deux moments dans l'histoire du Théorème d'algèbre de Ch. F. Sturm|journal= In: Revue d'histoire des sciences|year=1988|volume=41|number=2|pages=99–132|url=http://www.persee.fr/web/revues/home/prescript/article/rhs_0151-4105_1988_num_41_2_4093}}</ref> p.&nbsp;108, acknowledges the great influence [[Budan's theorem#Fourier's theorem|Fourier's theorem]] had on him: « C'est en m'appuyant sur les principes qu'il a posés, et en imitant ses démonstrations, que j'ai trouvé les nouveaux théorèmes que je vais énoncer. » which translates to "It is by relying upon the principles he has laid out and by imitating his proofs that I have found the new theorems which I am about to announce."


This is useful for [[root-finding algorithm|root finding]], allowing the selection of the root to be found and providing a good starting point for fast numerical algorithms such as [[Newton's method]]; it is also useful for certifying the result, as if Newton's method converge outside the interval one may immediately deduce that it converges to the wrong root.
==Applications==
===Number of real roots===
Sturm's theorem can be used to compute the total number of real roots of a polynomial.


Root isolation is also useful for computing with [[algebraic numbers]]. For computing with algebraic numbers, a common method is to represent them as a pair of a polynomial to which the algebraic number is a root, and an isolation interval. For example <math>\sqrt 2</math> may be unambiguously represented by <math>(x^2-2, [0,2]).</math>
This may be done by choosing {{math|−''a'' {{=}} ''b'' {{=}} ''M''}} where {{mvar|M}} is larger than the absolute value of every root. For example, a bound due to [[Augustin Louis Cauchy|Cauchy]] says that all real roots of a polynomial with coefficients {{math|''a<sub>i</sub>''}} are in the interval {{math|[−''M'', ''M'']}}, where


Sturm's theorem provides a way for isolating real roots that is less efficient (for polynomials with integer coefficients) than other methods involving [[Descartes' rule of signs]]. However, it remains useful in some circumstances, mainly for theoretical purposes, for example for algorithms of [[real algebraic geometry]] that involve [[infinitesimal]]s.
:<math>M = 1 + \frac{\max_{i=0}^{n-1} |a_i|}{|a_n|}.</math>


For isolating the real roots, one starts from an interval <math>(a,b]</math> containing all the real roots, or the roots of interest (often, typically in physical problems, only positive roots are interesting), and one computes <math>V(a)</math> and <math>V(b).</math> For defining this starting interval, one may use bounds on the size of the roots (see {{slink|Properties of polynomial roots|Bounds on (complex) polynomial roots}}). Then, one divides this interval in two, by choosing {{mvar|c}} in the middle of <math>(a,b].</math> The computation of <math>V(c)</math> provides the number of real roots in <math>(a,c]</math> and <math>(c,b],</math> and one may repeat the same operation on each subinterval. When one encounters, during this process an interval that does not contain any root, it may be suppressed from the list of intervals to consider. When one encounters an interval containing exactly one root, one may stop dividing it, as it is an isolation interval. The process stops eventually, when only isolating intervals remain.
Although theoretically the above approach is the simplest, in practice bounds on the positive (only) roots of polynomials are used and the positive roots are isolated and evaluated first; the negative roots are treated by first substituting {{mvar|x}} by {{mvar|−x}}, then compute a new (positive root) bound to isolate and evaluate the negative roots. Sturm's method and VCA need to compute only one bound to isolate the positive roots. In contrast, to isolate the positive roots VAS needs to compute various positive bounds, for the various polynomials that appear in the process.<ref>{{cite book|last=Vigklas|first=Panagiotis, S.|title=Upper bounds on the values of the positive roots of polynomials|year=2010|publisher=Ph.D. Thesis, University of Thessaly, Greece|url=http://www.inf.uth.gr/cced/wp-content/uploads/formidable/phd_thesis_vigklas.pdf}}</ref><ref>{{cite journal|last=Akritas|first=Alkiviadis, G.|title=Linear and Quadratic Complexity Bounds on the Values of the Positive Roots of Polynomials|journal=Journal of Universal Computer Science|year=2009|volume=15|number=3|pages=523–537|url=http://www.jucs.org/jucs_15_3/linear_and_quadratic_complexity}}</ref>


This isolating process may be used with any method for computing the number of real roots in an interval. Theoretical [[complexity analysis]] and practical experiences show that methods based on [[Descartes' rule of signs]] are more efficient. It follows that, nowadays, Sturm sequences are rarely used for root isolation.
Another method is computationally simpler. One can use the fact that for large {{mvar|x}}, the sign of


==Application==
:<math>p(x)=a_n x^n+\cdots</math>
Generalized Sturm sequences allow counting the roots of a polynomial where another polynomial is positive (or negative), without computing these root explicitly. If one knows an isolating interval for a root of the first polynomial, this allows also finding the sign of the second polynomial at this particular root of the first polynomial, without computing a better approximation of the root.


Let {{math|''P''(''x'')}} and {{math|''Q''(''x'')}} be two polynomials with real coefficients such that {{mvar|P}} and {{mvar|Q}} have no common root and {{mvar|P}} has no multiple roots. In other words, {{mvar|P}} and {{math|''P'{{space|hair}}Q''}} are [[coprime|coprime polynomials]]. This restriction does not really affect the generality of what follows as [[polynomial greatest common divisor|GCD]] computations allows reducing the general case to this case, and the cost of the computation of a Sturm sequence is the same as that of a GCD.
is {{math|sign(''a<sub>n</sub>'')}}, whereas {{math|sign(''p''(−''x''))}} is {{math|(−1)<sup>''n''</sup>''a<sub>n</sub>''}}. In this way, simply counting the sign changes in the leading coefficients in the Sturm chain readily gives the number of distinct real roots of a polynomial.


Sturm's theorem allows also to determine the multiplicity of a given root, say {{mvar|ξ}}. Indeed, suppose that {{math|''a'' < ''ξ'' < ''b''}}, with {{math|''σ''(''a'') ''σ''(''b'') {{=}} 1}}. Then, {{mvar|ξ}} has multiplicity {{mvar|k}} precisely when {{mvar|ξ}} is a root with multiplicity {{math|''k'' 1}} of {{math|''p<sub>m</sub>''}} (since it is the GCD of {{mvar|p}} and its derivative). Thus the multiplicity of {{mvar|ξ}} may be computed by recursively applying Sturm's theorem to {{math|''p<sub>m</sub>''}}. However, this method is rarely used, as [[square-free factorization]] is computationally more efficient for this purpose.
Let {{math|''W''(''a'')}} denote the number of sign variations at {{mvar|a}} of a generalized Sturm sequence starting from {{mvar|P}} and {{math|''P'{{space|hair}}Q''}}. If {{math|''a'' < ''b''}} are two real numbers, then {{math|''W''(''a'') ''W''(''b'')}} is the number of roots of {{mvar|P}} in the interval <math>(a,b]</math> such that {{math|''Q''(''a'') > 0}} minus the number of roots in the same interval such that {{math|''Q''(''a'') < 0}}. Combined with the total number of roots of {{mvar|P}} in the same interval given by Sturm's theorem, this gives the number of roots of {{mvar|P}} such that {{math|''Q''(''a'') > 0}} and the number of roots of {{mvar|P}} such that {{math|''Q''(''a'') < 0}}.<ref name="bpr" />

===Quotient===
The remainder is needed to compute the chain using Euclid's algorithm. For two polynomials

:<math>p(x)=\sum_{k=0}^i p_k x^k, \qquad \widetilde{p}(x)=\sum_{k=0}^{i-1} \widetilde{p}_k x^k, \quad \widetilde{p}_{i-1} \neq 0,</math>

this is accomplished by

:<math>\operatorname{rem} \left (p, \widetilde{p} \right )= \widetilde{p}_{i-1}\cdot p(x)- p_i \cdot\widetilde{p}(x)\left(x+\frac{p_{i-1}}{p_i}-\frac{\widetilde{p}_{i-2}}{\widetilde{p}_{i-1}}\right),</math>

where the quotient is built solely of the first two leading coefficients.

==Generalized Sturm chains==
Let {{mvar|ξ}} be in the compact interval {{math|[''a'', ''b'']}}. A '''generalized Sturm chain''' over {{math|[''a'', ''b'']}} is a finite sequence of real polynomials {{math|(''X''<sub>0</sub>, ''X''<sub>1</sub>, ..., ''X<sub>r</sub>'')}} such that:
#{{math|''X''(''a'')''X''(''b'') ≠ 0}}
#{{math|sign(''X<sub>r</sub>'')}} is constant on {{math|[''a'', ''b'']}}
#If {{math|''X<sub>i</sub>''(''ξ'') {{=}} 0}} for {{math|1 ≤ ''i'' ≤ ''r'' − 1}}, then {{math|''X''<sub>''i''−1</sub>(''ξ'')''X''<sub>''i''+1</sub>(''ξ'') < 0}}.

One can check that each Sturm chain is indeed a generalized Sturm chain.


==See also==
==See also==
Line 131: Line 106:


{{reflist}}
{{reflist}}

* {{cite book
|first1=Saugata
|last1=Basu
|first2=Richard
|last2=Pollack
|authorlink2=Richard M. Pollack
|first3=Marie-Françoise
|last3=Roy
|authorlink3=Marie-Françoise Roy
|title=Algorithms in real algebraic geometry
|year=2006
|publisher=[[Springer Science+Business Media|Springer]]
|pages=52–57
|section=Section 2.2.2
|isbn=978-3-540-33098-1
|edition=2nd
|url=https://s3.amazonaws.com/academia.edu.documents/30838872/Saugata_Basu_Algorithms_in_Real_Algebraic_Geome.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1542452131&Signature=qTJoc%2FARfSeT01%2FOI%2F7W6zaHZ9w%3D&response-content-disposition=inline%3B%20filename%3DAlgorithms_in_real_algebraic_geometry.pdf
}}
* {{cite journal
* {{cite journal
|first1=Jacques Charles François
|first1=Jacques Charles François
Line 222: Line 216:
* D.G. Hook and P. R. McAree, "Using Sturm Sequences To Bracket Real Roots of Polynomial Equations" in Graphic Gems I (A. Glassner ed.), Academic Press, pp.&nbsp;416&ndash;422, 1990.
* D.G. Hook and P. R. McAree, "Using Sturm Sequences To Bracket Real Roots of Polynomial Equations" in Graphic Gems I (A. Glassner ed.), Academic Press, pp.&nbsp;416&ndash;422, 1990.


==External links==
* [https://webdocs.cs.ualberta.ca/~graphics/books/GraphicsGems/gems/Sturm/ C code] from Graphic Gems by D.G. Hook and P. R. McAree.


{{DEFAULTSORT:Sturm's Theorem}}
{{DEFAULTSORT:Sturm's Theorem}}
Line 230: Line 222:
[[Category:Polynomials]]
[[Category:Polynomials]]
[[Category:Computer algebra]]
[[Category:Computer algebra]]
[[Category:Real algebraic geometry]]

Revision as of 11:15, 17 November 2018

In mathematics, the Sturm sequence of a univariate polynomial p is a sequence of polynomials associated with p and its derivative by a variant of Euclid's algorithm for polynomials. Sturm's theorem expresses the number of distinct real roots of p located in an interval in terms of the number of changes of signs of the values of the Sturm sequence at the bounds of the interval. Applied to the interval of all the real numbers, it gives the total number of real roots of p.[1]

Whereas the fundamental theorem of algebra readily yields the overall number of complex roots, counted with multiplicity, it does not provide a procedure for calculating them. Sturm's theorem counts the number of distinct real roots and locates them in intervals. By subdividing the intervals containing some roots, it can isolate the roots into arbitrary small intervals, each containing exactly one root. This yields an arbitrary-precision numeric root-finding algorithm for univariate polynomials.

For computing over the reals, Sturm's theorem is less efficient than other methods based on Descartes' rule of signs. However, it works on every real closed field, and, therefore, remains fundamental for the theoretical study of the computational complexity of decidability and quantifier elimination in the first order theory of real numbers.

The Sturm sequence and Sturm's theorem are named after Jacques Charles François Sturm.

The theorem

The Sturm chain or Sturm sequence of a univariate polynomial P(x) with real coefficients is the sequence of polynomials such that

for i ≥ 1, where P' is the derivative of P, and is the remainder of the Euclidean division of by The length of the Sturm sequence is at most the degree of P.

The number of sign variations at ξ of the Sturm sequence of P is the number of sign changes–ignoring zeros—in the sequence of real numbers

This number of sign variations is denoted here V(ξ).

Sturm's theorem states that, if P is a square-free polynomial, the number of distinct real roots of P in the half-open interval (a, b] is V(a) − V(b) (here, a and b are real numbers such that a < b).[1]

The theorem extends to unbounded intervals by defining the sign at +∞ of a polynomial as the sign of its leading coefficient (that is, the coefficient of the term of highest degree). At –∞ the sign of a polynomial is the sign of its leading coefficient for a polynomial of even degree, and the opposite sign for a polynomial of odd degree.

In the case of a non-square-free polynomial, if neither a nor b is a multiple root of p, then V(a) − V(b) is the number of distinct real roots of P.

The proof of the theorem is as follows: when the value of x increases from a to b, it may pass through a zero of some (i > 0); when this occurs, the number of sign variations of does not changes. When x passes through a root of the number of sign variations of decreases from 1 to 0. These are the only values of x where some sign may change.

Example

Suppose we wish to find the number of roots in some range for the polynomial . So

The remainder of the Euclidean division of p0 by p1 is multiplying it by −1 we obtain

.

Next dividing p1 by p2 and multiplying the remainder by −1, we obtain

.

Now dividing p2 by p3 and multiplying the remainder by −1, we obtain

.

As this is a constant, this finishes the computation of the Sturm sequence.

To find the number of real roots between of one has to evaluate the sequences of the signs of these polynomials at −∞ and , which are respectively (+, −, +, +, −) and (+, +, +, −, −). Thus

which shows that p has two real roots.

This can be verified by noting that p(x) can be factored as (x2 − 1)(x2 + x + 1), where the first factor has the roots −1 and 1, and second factor has no real roots. This last assertion results from the quadratic formula, and also from Sturm's theorem, which gives the sign sequences (+, –, –) at −∞ and (+, +, –) at +∞.

Generalization

Sturm sequences have been generalized in two directions. To define each polynomial in the sequence, Sturm used the negative of the remainder of the Euclidean division of the two preceding ones. The theorem remains true if one replaces the negative of the remainder by its product or quotient by a positive constant or the square of a polynomial. It is also useful (see below) to consider sequences where the second polynomial is not the derivative of the first one.

A generalized Sturm sequence is a finite sequence of polynomials with real coefficients

such that

  • the degrees are decreasing after the first one: for i = 2, ..., m;
  • does not have any real root or does not changes of sign near its real roots.
  • if Pi(ξ) = 0 for 0 < i < m and ξ a real number, then Pi −1 (ξ) Pi + 1(ξ) < 0.

The last condition implies that two consecutive polynomials do not have any common real root. In particular the original Sturm sequence is a generalized Sturm sequence, if (and only if) the polynomial has no multiple real root (if a polynomial has a multiple root, the two first polynomials of its Sturm sequence have a common toot).

When computing the original Sturm sequence by Euclidean division, it may happen that one encounters a polynomial that has a factor that is never negative, such a or . In this case, if one continues the computation with the polynomial replaced by its quotient by the nonnegative factor, one gets a generalized Sturm sequence, which may also be used for computing the number of real roots, since the proof of Sturm's theorem still applies (because of the third condition). This may sometimes simplify the computation, although it is generally difficult to find such nonnegative factors, except for even powers of x.

Use of pseudo-remainder sequences

In computer algebra, the polynomials that are considered have integer coefficients or may be transformed to have integer coefficients. The Sturm sequence of a polynomial with integer coefficients generally contains polynomials whose coefficients are not integers (see above example).

To avoid computation with rational numbers, a common method is to replace Euclidean division by pseudo-division for computing polynomial greatest common divisors. This amounts to replacing the remainder sequence of the Euclidean algorithm by a pseudo-remainder sequence, a pseudo remainder sequence being a sequence of polynomials such that there are constants and such that is the remainder of the Euclidean division of by For example, the remainder sequence of the Euclidean algorithm is a pseudo-remainder sequence with for every i, and the Sturm sequence of a polynomial is a pseudo-remainder sequence with and for every i.

Various pseudo-remainder sequences have been designed for computing greatest common divisors of polynomials with integer coefficients without introducing denominators (see Pseudo-remainder sequence). They can all be made generalized Sturm sequences by choosing the sign of the to be the opposite of the sign of the This allows the use of Sturm's theorem with pseudo-remainder sequences.

Root isolation

For a polynomial with real coefficients, root isolation consists of finding, for each real root, an interval that contains this root, and no other roots.

This is useful for root finding, allowing the selection of the root to be found and providing a good starting point for fast numerical algorithms such as Newton's method; it is also useful for certifying the result, as if Newton's method converge outside the interval one may immediately deduce that it converges to the wrong root.

Root isolation is also useful for computing with algebraic numbers. For computing with algebraic numbers, a common method is to represent them as a pair of a polynomial to which the algebraic number is a root, and an isolation interval. For example may be unambiguously represented by

Sturm's theorem provides a way for isolating real roots that is less efficient (for polynomials with integer coefficients) than other methods involving Descartes' rule of signs. However, it remains useful in some circumstances, mainly for theoretical purposes, for example for algorithms of real algebraic geometry that involve infinitesimals.

For isolating the real roots, one starts from an interval containing all the real roots, or the roots of interest (often, typically in physical problems, only positive roots are interesting), and one computes and For defining this starting interval, one may use bounds on the size of the roots (see Properties of polynomial roots § Bounds on (complex) polynomial roots). Then, one divides this interval in two, by choosing c in the middle of The computation of provides the number of real roots in and and one may repeat the same operation on each subinterval. When one encounters, during this process an interval that does not contain any root, it may be suppressed from the list of intervals to consider. When one encounters an interval containing exactly one root, one may stop dividing it, as it is an isolation interval. The process stops eventually, when only isolating intervals remain.

This isolating process may be used with any method for computing the number of real roots in an interval. Theoretical complexity analysis and practical experiences show that methods based on Descartes' rule of signs are more efficient. It follows that, nowadays, Sturm sequences are rarely used for root isolation.

Application

Generalized Sturm sequences allow counting the roots of a polynomial where another polynomial is positive (or negative), without computing these root explicitly. If one knows an isolating interval for a root of the first polynomial, this allows also finding the sign of the second polynomial at this particular root of the first polynomial, without computing a better approximation of the root.

Let P(x) and Q(x) be two polynomials with real coefficients such that P and Q have no common root and P has no multiple roots. In other words, P and P'Q are coprime polynomials. This restriction does not really affect the generality of what follows as GCD computations allows reducing the general case to this case, and the cost of the computation of a Sturm sequence is the same as that of a GCD.

Let W(a) denote the number of sign variations at a of a generalized Sturm sequence starting from P and P'Q. If a < b are two real numbers, then W(a) – W(b) is the number of roots of P in the interval such that Q(a) > 0 minus the number of roots in the same interval such that Q(a) < 0. Combined with the total number of roots of P in the same interval given by Sturm's theorem, this gives the number of roots of P such that Q(a) > 0 and the number of roots of P such that Q(a) < 0.[1]

See also

References

  • Basu, Saugata; Pollack, Richard; Roy, Marie-Françoise (2006). "Section 2.2.2". Algorithms in real algebraic geometry (PDF) (2nd ed.). Springer. pp. 52–57. ISBN 978-3-540-33098-1.
  • Sturm, Jacques Charles François (1829). "Mémoire sur la résolution des équations numériques". Bulletin des Sciences de Férussac. 11: 419–425.
  • Sylvester, J. J. (1853). "On a theory of the syzygetic relations of two rational integral functions, comprising an application to the theory of Sturm's functions, and that of the greatest algebraical common measure". Phil. Trans. R. Soc. Lond. 143: 407–548. doi:10.1098/rstl.1853.0018. JSTOR 108572.
  • Thomas, Joseph Miller (1941). "Sturm's theorem for multiple roots". National Mathematics Magazine. 15 (8): 391–394. JSTOR 3028551. MR 0005945.
  • Heindel, Lee E. (1971), "Integer arithmetic algorithms for polynomial real zero determination", Proc. SYMSAC '71: 415, doi:10.1145/800204.806312, MR 0300434
  • Panton, Don B.; Verdini, William A. (1981). "A fortran program for applying Sturm's theorem in counting internal rates of return". J. Financ. Quant. Anal. 16 (3): 381–388. JSTOR 2330245.
  • Akritas, Alkiviadis G. (1982). "Reflections on a pair of theorems by Budan and Fourier". Math. Mag. 55 (5): 292–298. doi:10.2307/2690097. JSTOR 2690097. MR 0678195.
  • Petersen, Paul (1991). "Multivariate Sturm theory". Lecture Notes in Comp. Science. 539: 318–332. doi:10.1007/3-540-54522-0_120. MR 1229329.
  • Yap, Chee (2000). Fundamental Problems in Algorithmic Algebra. Oxford University Press. ISBN 0-19-512516-9.
  • Rahman, Q. I.; Schmeisser, G. (2002). Analytic theory of polynomials. London Mathematical Society Monographs. New Series. Vol. 26. Oxford: Oxford University Press. ISBN 0-19-853493-0. Zbl 1072.30006.
  • Baumol, William. Economic Dynamics, chapter 12, Section 3, "Qualitative information on real roots"
  • D.G. Hook and P. R. McAree, "Using Sturm Sequences To Bracket Real Roots of Polynomial Equations" in Graphic Gems I (A. Glassner ed.), Academic Press, pp. 416–422, 1990.