# Program synthesis

Program synthesis is a special form of automatic programming that is most often paired with a technique for formal verification. The goal is to construct automatically a program that provably satisfies a given high-level specification. In contrast to other automatic programming techniques, the specifications are usually non-algorithmic statements of an appropriate logical calculus.[1]

## Origin

The idea originated in the 60s[citation needed] with the aim of using techniques from artificial intelligence to build an automatic programmer[citation needed], exploiting deep connections between mathematics and the theory of programming. Lack of early success meant that the mathematical approach soon fell out of favour, along with enthusiasm for AI, in general. Although some researchers[who?] still work on formal approaches, more success has been obtained by combining pure deductive techniques with powerful heuristics, and limiting their application to specific domains.[citation needed]

## Problems and Limitations

Some feel[who?] that the concept of automated program generation often results in poor "factoring" of information.[citation needed] Known redundancy should be[why?] factored out, not introduced, it is said[who?][citation needed]. However, sometimes[vague] specific programming languages are limited such that one has to introduce repetition of a concept or pattern in order to keep using the same language.[citation needed] Here is a simplified illustration of factoring:

Poor Factoring: x = a + a + a + a + a

Good Factoring: x = a * 5 (where the asterisk means "multiply")

Program generation tends to focus on automating the repetition seen in the first example, when a better approach is perhaps[vague] to find a higher-order abstraction, which is multiplication in this case. Other examples include putting parameters into a file or database instead of inside application code.

## The Framework of Manna and Waldinger

The framework of Manna and Waldinger [2] starts from a user-given first-order specification formula. For that formula, a proof is constructed, thereby also synthesizing a functional program from unifying substitutions. Proof rules include non-clausal resolution, logical transformations, splitting of conjunctive assertions and of disjunctive goals, and structural induction. Murray has shown these rules to be complete for first-order logic.[3] In 1986, Manna and Waldinger added generalized E-resolution and paramodulation rules to handle also equality;[4] later, these rules turned out to be incomplete (but nevertheless sound).[5]

The framework has been designed to enhance human readability of intermediate formulas: contrary to classical resolution, it does not require clausal normal form, but allows one to reason with formulas of arbitrary structure and containing any junctors. Programs obtained by this approach are guaranteed to satisfy the specification formula started from; in this sense the are correct by construction. Only a minimalistic, yet turing-complete functional programming language, consisting of conditional, recursion, and arithmetic and other operators is supported.

As a toy example, a functional program to compute the maximum $M$ of two numbers $x$ and $y$ can be derived as follows.

Example synthesis of maximum function
Nr Assertions Goals Program Origin
1 $A=A$ Axiom
2 $A \leq A$ Axiom
3 $A \leq B \lor B \leq A$ Axiom
10 $x \leq M \land y \leq M \land (x = M \lor y = M)$ M Specification
11 $(x \leq M \land y \leq M \land x = M) \lor (x \leq M \land y \leq M \land y = M)$ M Distr(10)
12 $x \leq M \land y \leq M \land x = M$ M Split(11)
13 $x \leq M \land y \leq M \land y = M$ M Split(11)
14 $x \leq x \land y \leq x$ x Resolve(12,1)
15 $y \leq x$ x Resolve(14,2)
16 $\lnot (x \leq y)$ x Resolve(15,3)
17 $x \leq y \land y \leq y$ y Resolve(13,1)
18 $x \leq y$ y Resolve(17,2)
19 $\text{true}$ x<y ? y : x Resolve(18,16)

Starting from the requirement description "The maximum is larger than any given number, and is one of the given numbers", the first-order formula $\forall X \forall Y \exists M: X \leq M \land Y \leq M \land (X=M \lor Y=M)$ is obtained as its formal translation. This formula is to be proved. By reverse Skolemization,[6] the specification in line 10 is obtained, an upper- and lower-case letter denoting a variable and a Skolem constant, respectively. After applying the distributive law in line 11, the proof goal is a disjunction, and hence can be split into two cases, viz. lines 12 and 13. Turning to the first case, resolving line 12 with the axiom in line 1 leads to instanciation of the program variable $M$ in line 14. Intuitively, the last conjunct of line 12 prescribes the value that $M$ must take in this case. Formally, the non-clausal resolution rule shown in line 57 below is applied to lines 12 and 1, with

• $p$ being the common instance $x=x$ of $A=A$ and $x=M$, obtained by syntactically unifying the latter formulas,
• $F[$$p$$]$ being $\text{true} \land$$x=x$, obtained from instantiated line 1 (appropriately padded to make the conext $F[\cdot]$ around $p$ visible), and
• $G[$$p$$]$ being $x \leq x \land y \leq x \land$$x = x$, obtained from instantiated line 12,

yielding $\lnot ($$\text{true} \land$$\text{false}$$) \land ($$x \leq x \land y \leq x \land$$\text{true}$$)$, which simplifies to $x \leq x \land y \leq x$. In a similar way, line 14 yields line 15 and then line 16 by resolution. Also, the second case, $x \leq M \land y \leq M \land y = M$ in line 13, is handled similarly. In a last step, both cases (i.e. lines 16 and 18) are joined, using the resolution rule from line 58; to make that rule applicable, the preparatory step 15→16 was needed. Since both line 16 and 18 comes with a program term, a conditional expression results in the program column. Since the goal formula $\text{true}$ has been derived, the proof is done, and the program column of the "$\text{true}$" line contains the program.

Non-clausal resolution rules (unifying substitutions not shown)
Nr Assertions Goals Program Origin
51 $E[p]$
52 $F[p]$
53 $G[p]$ s
54 $H[p]$ t
55 $E[\text{true}] \lor F[\text{false}]$ Resolve(51,52)
56 $\lnot F[\text{true}] \land G[\text{false}]$ s Resolve(52,53)
57 $\lnot F[\text{false}] \land G[\text{true}]$ s Resolve(53,52)
58 $G[\text{true}] \land H[\text{false}]$ p ? s : t Resolve(53,54)