The Horner scheme for evaluation of polynomials is one of the most commonly used algorithms for this purpose and unlike Estrin's scheme it is optimal in the sense that it minimizes the number of multiplications and addition required to evaluate an arbitrary polynomial. On a modern processor architecture that allows out-of-order execution, instructions that do not depend on each other's results may run in parallel. The Horner scheme contains a series of multiplications and additions that depend on the previous instruction and so cannot execute in parallel. Estrin's scheme is one method that attempts to overcome this serialization while still being reasonably close to optimal.
Description of the algorithm
Given an arbitrary polynomial Pn(x)= C0 + C1x + C2x2 + C3x3 + ... + Cnxn one can isolate sub-expressions of the form (A + Bx) and of the form x2n.
Rewritten using Estrin's scheme we get Pn(x) = (C0 + C1x) + (C2 + C3x) x2 + ((C4 + C5x) + (C6 + C7x) x2))x4 + ...
x2n can be evaluated once and kept until no longer required. As is evident from looking at this expression there are many sub-expression that may be evaluated in parallel.
Take Pn(x) to mean the nth order polynomial of the form: Pn(x) = C0 + C1x + C2x2 + C3x3 + Cnxn
Written with Estrin's scheme we have:
- P3(x) = (C0 +C1x) + (C2 +C3x) x2
- P4(x) = (C0 +C1x) + (C2 +C3x) x2 + C4x4
- P5(x) = (C0 +C1x) + (C2 +C3x) x2 + (C4 +C5x) x4
- P6(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + C6x2)x4
- P7(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + (C6 +C7x) x2)x4
- P8(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + (C6 +C7x) x2)x4 +C8x8
- P9(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + (C6 +C7x) x2)x4 + (C8 +C9x) x8
- Jean-Michel Muller, Elementary Functions: Algorithms And Implementation, 2nd edition, Springer Verlag, page 58.
- G. Estrin, Organization of computer systems - The fixed plus variable structure computer, in Proc. Western Joint Comput. Conf., May 1960, pp. 33-40.