E-graph
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let range over the function symbols of the term language, and let range over the terms. Let be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. Then an e-node is an n-ary function symbol together with e-class IDs. An e-class is a set of e-nodes. An e-graph contains a union-find structure over e-class IDs with standard operations , , and .
An e-class ID is canonical if . An e-node (with ) is canonical if each is canonical ( in ).
An e-graph is the combination of:[1]
- the union-find structure ,
- a hashcons (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map that maps e-class IDs to e-classes, such that maps equivalent IDs to the same set of e-nodes:
Invariants
In addition to the above structure, a valid e-graph conforms to several data structure invariants.[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes are congruent when . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
Operations
![]() | This section needs expansion. You can help by adding to it. (June 2021) |
E-graphs expose wrappers around the , , and operations from the union-find that preserve the e-graph invariants. Additionally, the e-matching operation maps "patterns" (terms with variables) to tuples of substitutions (from patterns to e-class IDs) and e-class IDs such that the returned e-classes contain nodes that "match" the pattern after applying the substitution.
Equality saturation
![]() | This section needs expansion. You can help by adding to it. (June 2021) |
Equality saturation is a technique for building optimizing compilers using e-graphs.[3] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
Applications
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3[4] and CVC4.
Equality saturation is used in specialized optimizing compilers,[5] e.g. for deep learning[6] and linear algebra.[7] Equality saturation has also been used for translation validation applied to the LLVM toolchain.[8]
References
- ^ (Willsey 2021)
- ^ (Willsey 2021)
- ^ (Tate 2009)
- ^ de Moura, Leonardo; Bjørner, Nikolaj (2008). Ramakrishnan, C. R.; Rehof, Jakob (eds.). "Z3: An Efficient SMT Solver". Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. 4963. Berlin, Heidelberg: Springer: 337–340. doi:10.1007/978-3-540-78800-3_24. ISBN 978-3-540-78800-3.
- ^ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566. ISSN 0362-1340.
- ^ Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv:2101.01332 [cs.AI].
- ^ Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv:2002.07951 [cs.DB].
- ^ Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). "Equality-Based Translation Validator for LLVM". Computer Aided Verification. Lecture Notes in Computer Science. 6806. Berlin, Heidelberg: Springer: 737–742. doi:10.1007/978-3-642-22110-1_59. ISBN 978-3-642-22110-1.
- de Moura, Leonardo; Bjørner, Nikolaj (2007). Pfenning, Frank (ed.). "Efficient E-Matching for SMT Solvers". Automated Deduction – CADE-21. Lecture Notes in Computer Science. 4603. Berlin, Heidelberg: Springer: 183–198. doi:10.1007/978-3-540-73595-3_13. ISBN 978-3-540-73595-3.
- Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. arXiv:2004.03082. doi:10.1145/3434304. S2CID 226282597.
- Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation: a new approach to optimization". Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery: 264–276. doi:10.1145/1480881.1480915. ISBN 978-1-60558-379-2. S2CID 2138086.