# Complexity of constraint satisfaction

The complexity of constraint satisfaction is the application of computational complexity theory on constraint satisfaction. It has mainly been studied for discriminating between tractable and intractable classes of constraint satisfaction problems on finite domains.

Solving a constraint satisfaction problem on a finite domain is an NP-complete problem in general. Research has shown a number of polynomial-time subcases, mostly obtained by restricting either the allowed domains or constraints or the way constraints can be placed over the variables. Research has also established relationship of the constraint satisfaction problem with problems in other areas such as finite model theory and databases.

## Overview

Establishing whether a constraint satisfaction problem on a finite domain has solutions is an NP complete problem in general. This is an easy consequence of a number of other NP complete problems being expressible as constraint satisfaction problems. Such other problems include propositional satisfiability and three-colorability.

Tractability can be obtained by considering specific classes of constraint satisfaction problems. As an example, if the domain is binary and all constraints are binary, establishing satisfiability is a polynomial-time problem because this problem is equivalent to 2-SAT, which is tractable. Research has shown a number of tractable subcases. Some of these classes are based on restricting the allowed domains or relations, some on restricting the way constraints are placed over variables, and some on both kinds of restrictions.

One line of research used a correspondence between constraint satisfaction problem and the problem of establishing the existence of a homomorphism between two relational structures. This correspondence has been used to link constraint satisfaction with topics traditionally related to database theory.

A considered research problem is about the existence of dichotomies among sets of restrictions. This is the question of whether a set of restrictions contains only polynomial-time restrictions and NP-complete restrictions. This question is settled for some sets of restrictions, but still open for the set of all restrictions based on a fixed domain and set of relations, as of 2007. This is considered by some authors the most important open question about the complexity of constraint satisfaction.

## Restrictions

Tractable subcases of the general constraint satisfaction problems can be obtained by placing suitable restrictions on the problems. Various kinds of restrictions have been considered.

### Structural and relational restrictions

Tractability can be obtained by restricting the possible domains or constraints. In particular, two kinds of restrictions have been considered:

• relational restrictions bounds the domain and the values satisfying the constraints;
• structural restrictions bounds the way constraints are distributed over the variables.

More precisely, a relational restriction specifies a constraint language, which is a domain and a set of relations over this domain. A constraint satisfaction problem meets this restriction if it has exactly this domain and the relation of each constraint is in the given set of relations. In other words, a relational restriction bounds the domain and the set of satisfying values of each constraints, but not how the constraints are placed over variables. This is instead done by structural restrictions. Structural restriction can be checked by looking only at the scopes of constraints (their variables), ignoring their relations (their set of satisfying values).

A constraint language is tractable if there exists a polynomial algorithm solving all problems based on the language, that is, using the domain and relations specified in the domain. An example of a tractable constraint language is that of binary domains and binary constraints. Formally, this restriction corresponds to allowing only domains of size 2 and only constraints whose relation is a binary relation. While the second fact implies that the scopes of the constraints are binary, this is not a structural restriction because it does not forbid any constraint to be placed on an arbitrary pair of variables. Incidentally, the problem becomes NP complete if either restriction is lifted: binary constraints and ternary domains can express the graph coloring problem, while ternary constraints and binary domains can express 3-SAT; these two problems are both NP-complete.

An example of a tractable class defined in terms of a structural restriction is that of binary acyclic problems. Given a constraint satisfaction problem with only binary constraints, its associated graph has a vertex for every variable and an edge for every constraint; two vertices are joined if they are in a constraint. If the graph of a problem is acyclic, the problem is called acyclic as well. The problem of satisfiability on the class of binary acyclic problem is tractable. This is a structural restriction because it does not place any limit to the domain or to the specific values that satisfy a constraints; rather, it restricts the way constraints are placed over variables.

While relational and structural restrictions are the ones mostly used to derive tractable classes of constraint satisfaction, there are some tractable classes that cannot be defined by relational restrictions only or structural restrictions only. The tractable class defined in terms of row convexity cannot be defined only in terms of the relations or only in terms of the structure, as row convexity depends both on the relations and on the order of variables (and therefore cannot be checked by looking only at each constraint in turn).

### Uniform and non-uniform restrictions

The subcase obtained by restricting to a finite constraint language is called a non-uniform problem. These problems are mostly considered when expressing constraint satisfaction in terms of the homomorphism problem, as explained below. Uniform problems were also defined in the settings of homomorphism problems; a uniform problem can be defined as the union of a (possibly infinite) collection of non-uniform problems. A uniform problem made of an infinite set of non-uniform problems can be intractable even if all these non-uniform problems are tractable.

### Tree-based restrictions

Some considered restrictions are based on the tractability of the constraint satisfaction problem where the constraints are all binary and form a tree over the variables. This is a structural restriction, as it can be checked by looking only at the scopes of the constraints, ignoring domains and relations.

This restriction is based on primal graph of the problem, which is a graph whose vertices are the variables of the problem and the edges represent the presence of a constraint between two variables. Tractability can however also be obtained by placing the condition of being a tree to the primal graph of problems that are reformulations of the original one.

## Equivalence conditions

Constraint satisfaction problems can be reformulated in terms of other problems, leading to equivalent conditions to tractability. The most used reformulation is that in terms of the homomorphism problem.

### Constraint satisfaction and the homomorphism problem

A link between constraint satisfaction and database theory has been provided in the form of a correspondence between the problem of constraint satisfiability to the problem of checking whether there exists a homomorphism between two relational structures. A relational structure is a mathematical representation of a database: it is a set of values and a set of relations over these values. Formally, $A=(V,R^A_1,\ldots,R^A_n)$, where each $R^A_i$ is a relation over $V$, that is, a set of tuples of values of $V$.

A relational structure is different from a constraint satisfaction problem because a constraint is a relation and a tuple of variables. Also different is the way in which they are used: for a constraint satisfaction problem, finding a satisfying assignment is the main problem; for a relation structure, the main problem is finding the answer to a query.

The constraint satisfaction problem is however related to the problem of establishing the existence of a homomorphism between two relational structures. A homomorphism is a function from the values of the first relation to the values of the second that, when applied to all values of a relation of the first structure, turns it into a subset of the corresponding relation of the second structure. Formally, $h$ is a homomorphism from $A=(V,R^A_1,\ldots,R^A_n)$ to $B=(D,R^B_1,\ldots,R^B_n)$ if it is a function from $V$ to $D$ such that, if $(x_1,\ldots,x_k) \in R^A_i$ then $(h(x_1),\ldots,h(x_k)) \in R^B_i$.

A direct correspondence between the constraint satisfaction problem and the homomorphism problem can be established. For a given constraint satisfaction problem, one can build a pair of relational structures, the first encoding the variables and the signatures of constraints, the second encoding the domains and the relations of the constraints. Satisfiability of the constraint satisfaction problem corresponds to finding a value for every variable such that replacing a value in a signature makes it a tuple in the relation of the constraint. This is possible exactly if this evaluation is a homomorphism between the two relational structures.

The inverse correspondence is the opposite one: given two relational structures, one encodes the values of the first in the variables of a constraint satisfaction problem, and the values of the second in the domain of the same problem. For every tuple of every relation of the first structure, there is a constraint having as values the correspondent relation of the second structure. This way, a homomorphism corresponds to mapping every scope of every constraint (every tuple of every relation of the first structure) into a tuple in the relation of the constraint (a tuple in the corresponding relation of the second structure).

A non-uniform constraint satisfaction problem is a restriction where the second structure of the homomorphism problem is fixed. In other words, every relational structure defines a non-uniform problem, that of telling whether a relation structure is homomorphic to it. A similar restriction can be placed on the first structure; for any fixed first structure, the homomorphism problem is tractable. A uniform constraint satisfaction problem is an arbitrary restriction to the sets of structures for the first and second structure of the homomorphism problem.

### Conjunctive query evaluation and containment

Since the homomorphism problem is equivalent to conjunctive query evaluation and conjunctive query containment, these two problems are equivalent to constraint satisfaction as well.

### Join evaluation

Every constraint can be viewed as a table in a database, where the variables are interpreted as attributes names and the relation is the set of records in the table. The solutions of a constraint satisfaction problem are the result of an inner join of the tables representing its constraints; therefore, the problem of existence of solutions can be reformulated as the problem of checking whether the result of an inner join of a number of tables is empty.

## Dichotomy theorems

Some constraint languages (or non-uniform problems) are known to correspond to problems solvable in polynomial time, and some others are known to express NP-complete problems. However, it is possible that some constraint languages are neither. It is known by Ladner's theorem that if P is not equal to NP, then there exist problems in NP that are neither polynomial-time nor NP-hard. As of 2007, it is not known if such problems can be expressed as constraint satisfaction problems with a fixed constraint language. If Ladner languages were not expressible in this way, the set of all constraint languages could be divided exactly into those defining polynomial-time and those defining NP-complete problems; that is, this set would exhibit a dichotomy.

Partial results are known for some sets of constraint languages. The best known such result is Schaefer's dichotomy theorem, which proves the existence of a dichotomy in the set of constraint languages on a binary domain. More precisely, it proves that a relation restriction on a binary domain is tractable if all its relations belong to one of six classes, and is NP-complete otherwise. Bulatov proved a dichotomy theorem for domains of three elements.

Another dichotomy theorem for constraint languages is the Hell-Nesetril theorem, which shows a dichotomy for problems on binary constraints with a single fixed symmetric relation. In terms of the homomorphism problem, every such problem is equivalent to the existence of a homomorphism from a relational structure to a given fixed undirected graph (an undirected graph can be regarded as a relational structure with a single binary symmetric relation). The Hell-Nesetril theorem proves that every such problem is either polynomial-time or NP-complete. More precisely, the problem is polynomial-time if the graph is 2-colorable, that is, it is bipartite, and is NP-complete otherwise.

## Sufficient conditions for tractability

Some complexity results prove that some restrictions are polynomial without giving proving that all other possible restrictions of the same kind are NP-hard.

### Datalog

A sufficient condition for tractability is related to expressibility in Datalog. A Boolean Datalog query gives a truth value to a set of literals over a given alphabet, each literal being an expression of the form $L(a_1,\ldots,a_n)$; as a result, a Boolean Datalog query expresses a set of sets of literals, as it can be considered semantically equivalent to the set of all sets of literals that it evaluates to true.

On the other hand, a non-uniform problem can be seen as a way for expressing a similar set. For a given non-uniform problem, the set of relations that can be used in constraints is fixed; as a result, one can give unique names $R_1,\ldots,R_n$ to them. An instance of this non-uniform problem can be then written as a set of literals of the form $R_i(x_1,\ldots,x_n)$. Among these instances/sets of literals, some are satisfiable and some are not; whether a set of literals is satisfiable depends on the relations, which are specified by the non-uniform problem. In the other way around, a non-uniform problem tells which sets of literals represent satisfiable instances and which ones represent unsatisfiable instances. Once relations are named, a non-uniform problem expresses a set of sets of literals: those associated to satisfiable (or unsatisfiable) instances.

A sufficient condition of tractability is that a non-uniform problem is tractable if the set of its unsatisfiable instances can be expressed by a Boolean Datalog query. In other words, if the set of sets of literals that represent unsatisfiable instances of the non-uniform problem is also the set of sets of literals that satisfy a Boolean Datalog query, then the non-uniform problem is tractable.

### Local consistency

Satisfiability can sometimes be established by enforcing a form of local consistency and then checking the existence of an empty domain or constraint relation. This is in general a correct but incomplete unsatisfiability algorithm: a problem may be unsatisfiable even if no empty domain or constraint relation is produced. For some forms of local consistency, this algorithm may also require exponential time. However, for some problems and for some kinds of local consistency, it is correct and polynomial-time.

The following conditions exploit the primal graph of the problem, which has a vertex for each variable and an edge between two nodes if the corresponding variables are in a constraint. The following are conditions on binary constraint satisfaction problems where enforcing local consistency is tractable and allows establishing satisfiability:

1. enforcing arc consistency, if the primal graph is acyclic;
2. enforcing directional arc consistency for an ordering of the variables that makes the ordered graph of constraint having width 1 (such an ordering exists if and only if the primal graph is a tree, but not all orderings of a tree generate width 1);
3. enforcing strong directional path consistency for an ordering of the variables that makes the primal graph having induced width 2.

A condition that extends the last one holds for non-binary constraint satisfaction problems. Namely, for all problems for which there exists an ordering that makes the primal graph having induced width bounded by a constant i, enforcing strong directional i-consistency is tractable and allows establishing satisfiability.

### Tree-based conditions

Constraint satisfaction problems composed of binary constraints only can be viewed as graphs, where the vertices are variables and the edges represent the presence of a constraint between two variables. This graph is called the Gaifman graph or primal constraint graph (or simply primal graph) of the problem.

If the primal graph of a problem is acyclic, establishing satisfiability of the problem is a tractable problem. This is a structural restriction, as it can be checked by looking only at the scopes of the constraints, disregarding their relations and the domain. An acyclic graph is a forest, but connectedness is usually assumed; as a result, the condition that is usually considered is that primal graphs are trees.

This property of tree-like constraint satisfaction problems is exploited by decomposition methods, which convert problems into equivalent ones that only contain binary constraints arranged as a tree. The variables of these problems correspond to sets of variables of the original problem; the domain of such a variable is obtained by considering some of the constraints of the original problem whose scope is contained in the corresponding original set of variables; constraints of these new problems represent equality of variables that are contained in two sets.

If the graph of one such equivalent problem is a tree, the problem can be solved efficiently. On the other hand, producing one such equivalent problem may be not be efficient because of two factors: the need to determine the combined effects of a group of constraints over a set of variables, and the need to store all tuples of values that satisfy a given group of constraints.

## Necessary condition for tractability

A necessary condition for the tractability of a constraint language based on the universal gadget has been proved. The universal gadget is a particular constraint satisfaction problem that was initially defined for the sake of expressing new relations by projection.

A relation that is not present in a constraint language may be "simulated" by constraints using the relations in the language. In particular, a new relation can be created by placing a set of constraints and using only some of their variables. If all other constraints use only these variables, this set of constraints forces these variable to only assume specific values, practically simulating a new relation.

Every constraint satisfaction problem and subset of its variables defines a relation, which is composed by all tuples of values of the variables that can be extended to the other variables to form a solution. Technically, this relation is obtained by projecting the relation having the solutions as rows over the considered variables.

The universal gadget is based on the observation that every relation that contains $k$-tuples can be defined by projecting a relation that contains all possible columns of $k$ elements from the domain. As an example, the following tables shows such a projection:

a b c d e f g h             b d
---------------             ---
1 1 1 1 0 0 0 0        ->   1 1
1 1 0 0 1 1 0 0             1 0
1 0 1 0 1 0 1 0             0 0


If the table on the left is the set of solutions of a constraint satisfaction problem, its variables $b$ and $d$ are constrained to the values of the table to the right. As a result, the constraint satisfaction problem can be used to set a constraint whose relation is the table on the right, which may not be in the constraint language.

As a result, if a constraint satisfaction problem has the table on the left as its set of solutions, every relation can be expressed by projecting over a suitable set of variables. A way for trying to obtain this table as the set of solution is to place every possible constraint that is not violated by the required solutions.

As an example, if the language contains the binary relation representing the Boolean disjunction (a relation containing all tuples of two elements that contains at least a 1), this relation is placed as a constraint on $a$ and $b$, because their values in the table above are $(1,1)$, $(1,1)$ again, and $(1,0)$. Since all these values satisfy the constraint, the constraint is placed. On the other hand, a constraint with this relation is not placed on $b$ and $d$, since the restriction of the table above to these two variables contains $(0,0)$ as a third row, and this evaluation violates that constraint.

The universal gadget of order $k$ is the constraint satisfaction problem containing all constraints that can be placed in order to obtain the table above. The solutions of the universal gadget include the rows of this table, but can contain other rows. If the solutions are exactly the rows of the table, every relation can be expressed by projecting on a subset of the variables. However, even if the solutions include other rows, some relations can still be expressed. A property of the universal gadget is that it is able to express, by projection, every relation that can be expressed by projection from an arbitrary constraint satisfaction problem based on the same language. More precisely, the universal gadget of order $k$ expresses all relations of $k$ rows that can be expressed in the constraint language.

Given a specific relation, its expressibility in the language can be checked by considering an arbitrary list of variables whose columns in the table above (the "ideal" solutions to the universal gadget) form that relation. The relation can be expressed in the language if and only if the solutions of the universal gadget coincides with the relation when projected over such a list of variables. In other words, one can check expressibility by selecting variables "as if" the solutions of the universal gadget were like in the table, and then check whether the restriction of the "real" solutions is actually the same as the relation. In the example above, the expressibility of the relation in the table on the right can be checked by looking whether the solutions of the universal gadget, when restricted to the variables $b$ and $d$, are exactly the rows of this table.

### Solutions as functions in the universal gadget

A necessary condition for tractability can be expressed in terms of the universal gadget. The solutions of such a gadget can be tabulated as follows:

a b c d e f g h
---------------
1 1 1 1 0 0 0 0
1 1 0 0 1 1 0 0    (solutions that exist by definition)
1 0 1 0 1 0 1 0
---------------
....
1 0 0 1 1 1 0 0    (other solutions are possible)
....


This table is made of two parts. The first part contains the solutions that exist by definition of this problem; the second part (that may be empty) contains all other solutions). Since the columns of the table are by definition associated to the possible $k$-tuples of values of the domain, every solution can be viewed as a function from a $k$-tuple of elements to a single element.

The function corresponding to a solution can be calculated from the first part of the table above and the solution. As an example, for the last solution marked in the table, this function can be determined for arguments $1,0,1$ as follows: first, these three values are the first part of the row "c" in the table; the value of the function is the value of the solution in the same column, that is, 0.

A necessary condition for tractability is the existence of a solution for a universal gadget of some order that is part of some classes of functions. This result however only holds for reduced languages, which are defined below.

### Squashing functions and reduced domains

Squashing functions are functions used to reduce the size of domain of constraint languages. A squashing function is defined in terms of a partition of the domain and a representative element for each set in the partition. The squashing function maps all elements of a set in the partition to the representative element of that set. For such a function being a squashing function it is also necessary that applying the function to all elements of a tuple of a relation in the language produces another tuple in the relation. The partition is assumed to contain at least a set of size greater than one.

Formally, given a partition $D_1,\ldots,D_n$ of the domain $D$ containing at least a set of size greater than one, a squashing function is a function $s:D \rightarrow D$ such that $s(x)=s(y)$ for every $x,y$ in the same partition, and for every tuple $\langle x_1,\ldots,x_n \rangle \in R$, it holds $\langle s(x_1),\ldots,s(x_n) \rangle \in R$.

For constraint problems on a constraint language has a squashing function, the domain can be reduced via the squashing function. Indeed, every element in a set in the partition can be replaced with the result of applying the squashing function to it, as this result is guaranteed to satisfy at least all constraints that were satisfied by the element. As a result, all non-representative elements can be removed from the constraint language.

Constraint languages for which no squashing function exist are called reduced languages; equivalently, these are languages on which all reductions via squashing functions have been applied.

### The necessary condition for tractability

The necessary condition for tractability based on the universal gadget holds for reduced languages. Such a language is tractable if the universal gadget has a solution that, when viewed as a function in the way specified above, is either a constant function, a majority function, an idempotent binary function, an affine function, or a semi-projection.

## References

• Dechter, Rina (2003). Constraint processing. Morgan Kaufmann. ISBN 1-55860-890-7
• Vardi, Moshe Y. (2000). "Constraint Satisfaction and Database Theory: a Tutorial". PODS 2000. pp. 76–85.
• Bulatov, Andrei A. (2006). "A dichotomy theorem for constraint satisfaction problems on a 3-element set". Journal of the ACM 53 (1): 66–120. doi:10.1145/1120582.1120584.