Higher-order abstract syntax

In computer science, higher-order abstract syntax (abbreviated HOAS) is a technique for the representation of abstract syntax trees for languages with variable binders.

Relation to first-order abstract syntax

An abstract syntax tree is abstract because it is a mathematical object that has certain structure by its very nature. For instance, in first-order abstract syntax (FOAS) trees, as commonly used in compilers, the tree structure implies the subexpression relation, meaning that no parentheses are required to disambiguate programs (as they are in the concrete syntax). HOAS exposes additional structure: the relationship between variables and their binding sites. In FOAS representations, a variable is typically represented with an identifier, with the relation between binding site and use being indicated by using the same identifier. With HOAS, there is no name for the variable; each use of the variable refers directly to the binding site.

There are a number of reasons why this technique is useful. First, it makes the binding structure of a program explicit: just as there is no need to explain operator precedence in a FOAS representation, there is no need to have the rules of binding and scope at hand to interpret a HOAS representation. Second, programs that are alpha-equivalent (differing only in the names of bound variables) have identical representations in HOAS, which can make equivalence checking more efficient.

Implementation

One mathematical object that could be used to implement HOAS is a graph where variables are associated with their binding sites via edges. Another popular way to implement HOAS (in, for example, compilers) is with de Bruijn indices.

Use in logical frameworks

In the domain of logical frameworks, the term higher-order abstract syntax is usually used to refer to a specific representation that uses the binders of the meta-language to encode the binding structure of the object language.

For instance, the logical framework LF has a λ-construct, which has arrow (→) type. A first-order encoding of an object language construct let would be (using Twelf syntax):

exp : type.
var : type.
v : var -> exp.
let : exp -> var -> exp -> exp.


Here, exp is the family of object language expressions. The family var is the representation of variables (implemented perhaps as natural numbers, which is not shown); the constant v witnesses the fact that variables are expressions. The constant let is an expression that takes three arguments: an expression (that is being bound), a variable (that it is bound to) and another expression (that the variable is bound within).

The canonical HOAS representation of the same object language would be:

exp : type.
let : exp -> (exp -> exp) -> exp.


In this representation, object level variables do not appear explicitly. The constant let takes an expression (that is being bound) and a meta-level function expexp (the body of the let). This function is the higher-order part: an expression with a free variable is represented as an expression with holes that are filled in by the meta-level function when applied. As a concrete example, we would construct the object level expression

let x = 1 + 2
in x + 3


(assuming the natural constructors for numbers and addition) using the HOAS signature above as

let (plus 1 2) ([y] plus y 3)


where [y] e is Twelf's syntax for the function ${\displaystyle \lambda y.e}$.

This specific representation has advantages beyond the ones above: for one, by reusing the meta-level notion of binding, the encoding enjoys properties such as type-preserving substitution without the need to define/prove them. In this way using HOAS can drastically reduce the amount of boilerplate code having to do with binding in an encoding.

Higher-order abstract syntax is generally only applicable when object language variables can be understood as variables in the mathematical sense (that is, as stand-ins for arbitrary members of some domain). This is often, but not always, the case: for instance, there are no advantages to be gained from a HOAS encoding of dynamic scope as it appears in some dialects of Lisp because dynamically scoped variables do not act like mathematical variables.