Frameworks supporting the polyhedral model

From Wikipedia, the free encyclopedia

Use of the polyhedral model (also called the polytope model) within a compiler requires software to represent the objects of this framework (sets of integer-valued points in regions of various spaces) and perform operations upon them (e.g., testing whether the set is empty).

For more detail about the objects and operations in this model, and an example relating the model to the programs being compiled, see the polyhedral model page.

There are many frameworks supporting the polyhedral model. Some of these frameworks use one or more libraries for performing polyhedral operations. Others, notably Omega, combine everything in a single package. Some commonly used libraries are the Omega Library[1] (and a more recent fork),[2] piplib,[3][4] PolyLib,[5][6] PPL,[7] isl,[8] the Cloog polyhedral code generator,[9][10] and the barvinok library for counting integer solutions.[11] Of these libraries, PolyLib and PPL focus mostly on rational values, while the other libraries focus on integer values. The polyhedral framework of gcc is called Graphite.[12] Polly[13] provides polyhedral optimizations for LLVM, and R-Stream[14] has had a polyhedral mapper since ca. 2006.

Common strengths[edit]

Polyhedral frameworks are designed to support compilers techniques for analysis and transformation of codes with nested loops, producing exact results for loop nests with affine loop bounds and subscripts ("Static Control Parts" of programs). They can be used to represent and reason about executions (iterations) of statements, rather than treating a statement as a single object representing properties of all executions of that statement. Polyhedral frameworks typically also allow the use of symbolic expressions.

Polyhedral frameworks can be used for dependence analysis for arrays, including both traditional alias analysis and more advanced techniques such as the analysis of data flow in arrays or identification of conditional dependencies. They can also be used to represent code transformation, and provide features to generate the transformed code in a high-level language. The transformation and generation systems can typically handle imperfectly nested loops.

An example to contrast polyhedral frameworks with prior work[edit]

To compare the constraint-based polyhedral model to prior approaches such as individual loop transformations and the unimodular approach, consider the question of whether we can parallelize (execute simultaneously) the iterations of following contrived but simple loop:

for i := 0 to N do
      A(i) := (A(i) + A(N-i))/2

Approaches that cannot represent symbolic terms (such as the loop-invariant quantity N in the loop bound and subscript) cannot reason about dependencies in this loop. They will either conservatively refuse to run it in parallel, or in some cases speculatively run it completely in parallel, determine that this was invalid, and re-execute it sequentially.

Approaches that handle symbolic terms but represent dependencies via direction vectors or distance vectors will determine that the i loop carries a dependence (of unknown distance), since for example when N=10 iteration 0 of the loop writes an array element (A(0)) that will be read in iteration 10 (as A(10-10)) and reads an array element (A(10-0)) that will later be overwritten in iteration 10 (as A(10)). If all we know is that the i loop carries a dependence, we once again cannot safely run it in parallel.

In reality, there are only dependencies from the first N/2 iterations into the last N/2, so we can execute this loop as a sequence of two fully parallel loops (from 0...N/2 and from N/2+1...N). The characterization of this dependence, the analysis of parallelism, and the transformation of the code can be done in terms of the instance-wise information provided by any polyhedral framework.

Instance-wise analysis and transformation allows the polyhedral model to unify additional transformations (such as index set splitting, loop peeling, tiling, loop fusion or fission, and transformation of imperfectly nested loops) with those already unified by the unimodular framework (such as loop interchange, skewing, and reversal of perfectly nested loops). It has also stimulated the development of new transformations, such as Pugh and Rosser's iteration-space slicing (an instance-wise version of program slicing; note that the code was never released with the Omega Library).

A more interesting example[edit]

Authors of polyhedral frameworks have explored the simple 1-dimensional finite difference heat equation stencil calculation expressed by the following pseudocode:

for t := 0 to T do
    for i := 1 to N-1 do
        new(i) := (A(i-1) + A(i) + A(i) + A(i+1)) * .25   // explicit forward-difference with R = 0.25
    end
    for i := 1 to N-1 do
        A(i) := new(i)
    end
end

This code confounds many of the transformation systems of the 20th century, due to the need to optimize an imperfect loop nest. Polyhedral frameworks can analyze the flow of information among different executions of statements in the loop nest, and transform this code to simultaneously exploit scalable parallelism and scalable locality.

A re-cap here, of the two approaches on this example, might be nice, but for now see the individual papers of Wonnacott,[15][16] and Sadayappan et al.[17] as well as others who have studied this code using different frameworks, such as Song and Li.[18]

Differences in presentation or vocabulary[edit]

Comparison of work using different frameworks is complicated by both technical differences (discussed later) and differences in vocabulary and presentation. Examples are provided below to aid in translation:

Classification of dependences[edit]

Polyhedral Frameworks support dependence analysis in a variety of ways, helping to capture the impact of symbolic terms, identify conditional dependences, and separating out the effects of memory aliasing. The effects of memory aliasing, in particular, have been described in two ways: many authors distinguish between "true" data dependences (corresponding to actual flow of information) from false dependences arising from memory aliasing or limited precision of dependence analysis.

The Omega Project publications use specific terms to identify specific effects on analysis. They maintain the traditional distinction of flow-, output-, and anti-dependences, based on the types of array access (write to read, write to write, or read to write, respectively). Dependences can independently be classified as memory-based or value-based --- the former corresponds to memory aliasing, and the latter does not include dependences interrupted by intervening writes. A dependence test may produce information that is exact or approximate, depending on the nature of the program being analyzed and the algorithms used in the test. Finally, the results of dependence analysis will be reported in a dependence abstraction that provides a certain degree of precision.

For example, the "dependence relations" produced by the Omega Test, and the "quasts" produced by the algorithms of Feautrier or Maydan and Lam, contain precise information (though in different forms) about the loop iterations involved in a dependence. The results of either test can be converted into the more traditional "dependence vector" form, but since this abstraction provides less precision, much of the information about the dependence will be lost. Both techniques produce exact information for programs with affine control and subscript expressions, and must approximate for many programs outside this domain (i.e., in the presence of non-affine subscripts such as index arrays). The original work of Feautrier focused on describing true dependences, which would be referred to as exact value-based flow dependences by the Omega Project. The Omega Project also described the use of their algorithms for value-based output- and anti-dependences, though Feautrier's quasts could presumably be easily adapted to this as well.

Visualization of transformations and tiling[edit]

There are many ways to produce a visual depiction of the process of transforming and tiling an iteration space. Some authors depict transformations by changing the location of points on the page, essentially aligning the picture with the coordinate axes of the transformed space; in such diagrams, tiles appear as axis-aligned rectangles/rectangular solids containing iterations. Examples of this approach can be found in the publications and transformation-visualization software of Michelle Mills Strout.[19]

Other authors depict different transformations as different wavefronts of execution that move across the points of the original coordinate system at different angles. In such diagrams, tiles appear as parallelograms/parallelepipeds. Examples of this approach can be found in the time-skewing publications of David G. Wonnacott.[20]

Differences in approach or implementation status[edit]

Some of the libraries have seen more extensive development than the Omega Library in the early 2000s, and in many places has much more sophisticated algorithms. In particular, users have reported good results with the Cloog code generator (both in terms of the code generated, and in terms of ability to control trade-offs when generating code), and with the algorithms for counting integer solutions (Alexander Barvinok's[21] work requires a vertex description of the polytope, which is not supported in the Omega Library).

There are several other points on which the frameworks differ, specifically:

Precision and speed[edit]

Integer programming is NP-complete, and Maydan showed that the problem of checking array aliasing in nested loops with affine bounds and subscripts is equivalent to integer programming; other operations, such as array dataflow analysis, are even more complex (the algorithms of the Omega Library handle the full language of Presburger Arithmetic, which is O(2^2^2^n)). Thus, it is clearly unrealistic to expect exact fast results for arbitrary problems of array aliasing or array data flow, even over the affine domain. Fortunately, many problems fall into a subset of this domain where general algorithms can produce an exact answer in polynomial time.[22][23]

Outside of this domain, the Omega Library, piplib and isl emphasize the production of an exact result (except in the cases of certain uses of uninterpreted function symbols in Omega), despite the high complexity. In some cases, such as variable elimination ("projection"), PolyLib and PPL primarily use algorithms for the rational domain, and thus produce an approximation of the result for integer variables. It may be the case that this reduces the common experience with the Omega Library in which a minor change to one coefficient can cause a dramatic shift in the response of the library's algorithms.

Polylib has some operations to produce exact results for Z-polyhedra (integer points bounded by polyhedra), but at the time of this writing, significant bugs have been reported.[24] Note that bugs also exist in the Omega Library, including reliance on hardware-supplied integer types and cases of the full Presburger Arithmetic algorithms that were not implemented in the library. Users who need exact results for integer variables may need to be wary with either library.

Barvinok's techniques for counting integer solutions require a description of the vertices (and bounding rays) of the polyhedron, but produce an exact answer in a way that can be far more efficient than the techniques described by Pugh. Barvinok's algorithm is always polynomial in the input size, for fixed dimension of the polytope and fixed degree of weights, whereas the "splintering" in Pugh's algorithm can grow with the coefficient values[25] (and thus exponentially in terms of input size, despite fixed dimension, unless there is some limit on coefficient sizes).

Vertex enumeration[edit]

Polyhedral libraries such as PolyLib and PPL exploit the double description of polyhedra and therefore naturally support vertex enumeration on (non-parametric) polytopes. The Omega Library internally performs vertex enumeration during the computation of the convex hull. PolyLib and isl provide vertex enumeration on parametric polytopes, which is essential for applying Barvinok's algorithm to parametric polytopes.

Indication of an approximate result[edit]

In some parts of a compiler, an approximate result is acceptable in certain cases. For example, when dependence analysis is used to guide loop transformation, it is generally acceptable to use a superset of the true dependencies—this can prevent an optimization but does not allow illegal code transformations. When the Omega Library produces an approximate answer, the answer is appropriately marked as an upper bound (e.g., via "and UNKNOWN") or a lower bound (e.g., via "or UNKNOWN"). Answers not marked this way are exact descriptions of sets of integer-valued points (except in cases of bugs in the software).

Handling nonlinear terms[edit]

When code contains a mixture of affine and non-affine terms, polyhedral libraries can, in principle, be used to produce approximate results, for example by simply omitting such terms when it is safe to do so. In addition to providing a way to flag such approximate results, the Omega Library allows restricted uses of "Uninterpreted Function Symbols" to stand in for any nonlinear term, providing a system that slightly improves the result of dependence analysis and (probably more significantly) provides a language for communication about these terms (to drive other analysis or communication with the programmer). Pugh and Wonnacott discussed a slightly less restricted domain than that allowed in the library, but this was never implemented (a description exists in Wonnacott's dissertation).

Transitive closure operation[edit]

Some kinds of analysis, such as Pugh and Rosser's iteration space slicing, can be most easily stated in terms of the transitive closure of the dependence information. Both the Omega Library and isl provide a transitive closure operation that is exact for many cases that arise in programs with simple dependence patterns. In other cases, the Omega Library produces a subset of the transitive closure, while isl produces a superset. In the case of the Omega Library, the subset itself may be approximate, resulting in an upper bound (tagged) of a lower bound (not tagged) of the transitive closure. Note that the computation of an exact transitive closure is undecidable.[26]

See also[edit]

  • Loop nest optimization
  • Jean-Francois Collard's Reasoning About Program Transformations,[27] covers some of the shared philosophy of these projects.
  • Cédric Bastoul's thesis[28] gives an introduction to the polyhedral model.
  • The "Omega Test" entry in Springer'sforthcoming Encyclopedia of Parallel Computing[29] describes the applications and algorithms of the Omega Library, indicating the major Omega Project publications where further detail can be found. An earlier draft of this content can be found in tech report form as a Haverford College Computer Science Tech Report.[30]
  • Links to relevant open-source libraries are given in the first paragraph of this article.
  • Reservoir Labs[31] provides "Jolylib", a Java implementation of Polylib etc. that "provides improved performance, stability, and features". Jolylib is available for commercial and academic use.

References[edit]

  1. ^ "Frameworks and Algorithms for the Analysis and Transformation of Scientific Programs". Cs.umd.edu. Retrieved 2012-08-20.
  2. ^ "Tools". Compiler Technology to Optimize Performance. University of Utah. Retrieved 2012-08-20.
  3. ^ Cédric Bastoul. "www.PipLib.org the Parametric Integer Programming home". www.piplib.org. Retrieved 2014-06-04.
  4. ^ Paul Feautrier. Parametric Integer Programming. 1988
  5. ^ "Polylib". Icps.u-strasbg.fr. Retrieved 2012-08-20.
  6. ^ Wilde, Doran K. (1993). "A Library for Doing Polyhedral Operations". tech report. Ftp.irisia.fr.
  7. ^ "PPL". Bugseng. Retrieved 2012-08-20.
  8. ^ "isl – Freecode". Freshmeat.net. Retrieved 2012-08-20.
  9. ^ Cédric Bastoul. "www.CLooG.org the Chunky Loop Generator home". www.cloog.org. Retrieved 2014-06-04.
  10. ^ Cedric Bastoul. Code Generation in the Polyhedral Model Is Easier Than You Think. PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques (2004)
  11. ^ gvy (2007-04-28). "barvinok – Freecode". Freshmeat.net. Retrieved 2012-08-20.
  12. ^ Sebastian Pop, Albert Cohen, Cedric Bastoul, Sylvain Girbal, Pierre Jouvelot, Georges-André Silber et Nicolas Vasilache. Graphite: Loop optimizations based on the polyhedral model for GCC. 4th GCC Developer's Summit. Ottawa, Canada, June 2006.
  13. ^ "Polly - Polyhedral optimizations for LLVM". Polly.llvm.org. Retrieved 2014-06-04.
  14. ^ Benoit Meister, Nicolas Vasilache, David Wohlford, Muthu Baskaran, Allen Leung and Richard Lethin. R-Stream Compiler. In Encyclopedia of Parallel Computing, David Padua Ed., pp 1756-1765, Springer, 2011.
  15. ^ David Wonnacott. Achieving Scalable Locality with Time Skewing. International Journal of Parallel Programming 30.3 (2002)
  16. ^ Wonnacott, D. (2000). "Using time skewing to eliminate idle time due to memory bandwidth and network limitations". Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000. pp. 171–180. doi:10.1109/IPDPS.2000.845979. ISBN 0-7695-0574-0. S2CID 9949169.
  17. ^ Uday Bondhugula, Muthu Manikandan Baskaran, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model. CC 2008 - International Conference on Compiler Construction
  18. ^ Yonghong Song, Zhiyuan Li. New Tiling Techniques to Improve Cache Temporal Locality. Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
  19. ^ "Michelle Mills Strout". Cs.colostate.edu. Retrieved 2012-08-20.
  20. ^ "David G. Wonnacott". Cs.haverford.edu. Retrieved 2012-08-20.
  21. ^ "Alexander Barvinok". Math.lsa.umich.edu. 2012-06-16. Retrieved 2012-08-20.
  22. ^ Pugh, William (1991). "The Omega test: A fast and practical integer programming algorithm for dependence analysis". Proceedings of the 1991 ACM/IEEE conference on Supercomputing - Supercomputing '91. pp. 4–13. doi:10.1145/125826.125848. ISBN 0897914597. S2CID 3174094.
  23. ^ Seater, Robert; Wonnacott, David (2003). "Polynomial Time Array Dataflow Analysis". Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science. Vol. 2624. pp. 411–426. doi:10.1007/3-540-35767-X_27. ISBN 978-3-540-04029-3.
  24. ^ "Frameworks supporting the polyhedral model". lipforge.ens-lyon.fr.[permanent dead link]
  25. ^ Verdoolaege, Sven; Seghir, Rachid; Beyls, Kristof; Loechner, Vincent; Bruynooghe, Maurice. "Counting Integer Points in Parametric Polytopes using Barvinok's Rational Functions]. Section 6.1 discusses Pugh's method and splintering" (PDF). Lirias.kuleuven.be.
  26. ^ Wayne Kelly, William Pugh, Evan Rosser, Tatiana Shpeisman. Transitive Closure of Infinite Graphs and Its Applications. Languages and Compilers for Parallel Computing, 8th International Workshop (LCPC 1995)
  27. ^ Jean-Francois Collard, Reasoning About Program Transformations,, 2003 Springer-Verlag
  28. ^ Bastoul, Cedric. Improving Data Locality in Static Control Programs (PDF). icps.u-strasbg.fr (Thesis).
  29. ^ "Encyclopedia of Parallel Computing". Springer.com. Retrieved 2012-08-20.
  30. ^ Wonnacott, David G. "A Retrospective of the Omega Project" (PDF). Haverford Computer Science Tech Report 2010-01. Haverford College.
  31. ^ "Reservoir Labs, Inc". Reservoir.com. Retrieved 2014-06-04.