Estrin's scheme

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Njuffa (talk | contribs) at 20:10, 27 April 2013 (Add cross reference to existing entry for Gerald Estrin, who first published this scheme). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In numerical analysis, Estrin's scheme (after Gerald Estrin), also known as Estrin's method, is an algorithm for numerical evaluation of polynomials.

The Horner scheme for evaluation of polynomials is one of the most commonly used algorithms for this purpose and unlike Estrin's scheme it is optimal in the sense that it minimizes the number of multiplications and addition required to evaluate an arbitrary polynomial. On a modern processor architecture that allows out-of-order execution, instructions that do not depend on each other's results may run in parallel. The Horner scheme contains a series of multiplications and additions that depend on the previous instruction and so cannot execute in parallel. Estrin's scheme is one method that attempts to overcome this serialization while still being reasonably close to optimal.

Description of the algorithm

Given an arbitrary polynomial Pn(x)= C0 + C1x + C2x2 + C3x3 + ... + Cnxn one can isolate sub-expressions of the form (A + Bx) and of the form x2n.

Rewritten using Estrin's scheme we get Pn(x) = (C0 + C1x) + (C2 + C3x) x2 + ((C4 + C5x) + (C6 + C7x) x2))x4 + ...

x2n can be evaluated once and kept until no longer required. As is evident from looking at this expression there are many sub-expression that may be evaluated in parallel.

The sub-expressions of form (ABx) can be evaluated using a native multiply–accumulate instruction on some architectures, an advantage that is shared with the Horner scheme.

Examples

Take Pn(x) to mean the nth order polynomial of the form: Pn(x) = C0 + C1x + C2x2 + C3x3 + Cnxn

Written with Estrin's scheme we have:

P3(x) = (C0 +C1x) + (C2 +C3x) x2
P4(x) = (C0 +C1x) + (C2 +C3x) x2 + C4x4
P5(x) = (C0 +C1x) + (C2 +C3x) x2 + (C4 +C5x) x4
P6(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + C6x2)x4
P7(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + (C6 +C7x) x2)x4
P8(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + (C6 +C7x) x2)x4 +C8x8
P9(x) = (C0 +C1x) + (C2 +C3x) x2 + ((C4 +C5x) + (C6 +C7x) x2)x4 + (C8 +C9x) x8
...

References

  • Jean-Michel Muller, Elementary Functions: Algorithms And Implementation, 2nd edition, Springer Verlag, page 58.
  • G. Estrin, Organization of computer systems - The fixed plus variable structure computer, in Proc. Western Joint Comput. Conf., May 1960, pp. 33-40.