# Semi-Thue system

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a (usually finite) alphabet. Given a binary relation $R$ between fixed strings in the alphabet, called rewrite rules, denoted by $s\rightarrow t$, an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is $usv\rightarrow utv$, where $s$, $t$, $u$, and $v$ are strings.

The notion of a semi-Thue system essentially coincides with the presentation of a monoid. Thus they constitute a natural framework for solving the word problem for monoids and groups.

An SRS can be defined directly as an abstract rewriting system. It can also be seen as a restricted kind of a term rewriting system. As a formalism, string rewriting systems are Turing complete. The semi-Thue name comes from the Norwegian mathematician Axel Thue, who introduced systematic treatment of string rewriting systems in a 1914 paper.[1] Thue introduced this notion hoping to solve the word problem for finitely presented semigroups. It wasn't until 1947 the problem was shown to be undecidable— this result was obtained independently by Emil Post and A. A. Markov Jr.[2][3]

## Definition

A string rewriting system or semi-Thue system is a tuple $(\Sigma, R)$ where

• $\Sigma$ is an alphabet, usually assumed finite.[4] The elements of the set $\Sigma^*$ (* is the Kleene star here) are finite (possibly empty) strings on $\Sigma$, sometimes called words in formal languages; we will simply call them strings here.
• $R$ is a binary relation on strings from $\Sigma$, i.e., $R \subseteq \Sigma^* \times \Sigma^*.$ Each element $(u,v) \in R$ is called a (rewriting) rule and is usually written $u \rightarrow v$.

If the relation $R$ is symmetric, then the system is called a Thue system.

The rewriting rules in $R$ can be naturally extended to other strings in $\Sigma^*$ by allowing substrings to be rewritten according to $R$. More formally, the one-step rewriting relation relation $\rightarrow_R$ induced by $R$ on $\Sigma^*$ for any strings $s$, and $t$ in $\Sigma^*$:

$s \rightarrow_R t$ if and only if there exist $x$, $y$, $u$, $v$ in $\Sigma^*$ such that $s = xuy$, $t = xvy$, and $u \rightarrow v$.

Since $\rightarrow_R$ is a relation on $\Sigma^*$, the pair $(\Sigma^*, \rightarrow_R)$ fits the definition of an abstract rewriting system. Obviously $R$ is a subset of $\rightarrow_R$. Some authors use a different notation for the arrow in $\rightarrow_R$ (e.g. $\Rightarrow_R$) in order to distinguish it from $R$ itself ($\rightarrow$) because they later want to be able to drop the subscript and still avoid confusion between $R$ and the one-step rewrite induced by $R$.

Clearly in a semi-Thue system we can form a (finite or infinite) sequence of strings produced by starting with an initial string $s_0 \in \Sigma^*$ and repeatedly rewriting it by making one substring-replacement at a time:

$s_0 \ \rightarrow_R \ s_1 \ \rightarrow_R \ s_2 \ \rightarrow_R \ \ldots$

A zero-or-more-steps rewriting like this is captured by the reflexive transitive closure of $\rightarrow_R$, denoted by $\stackrel{*}{\rightarrow}_R$ (see abstract rewriting system#Basic notions). This is called the rewriting relation or reduction relation on $\Sigma^*$ induced by $R$.

## Thue congruence

In general, the set $\Sigma^*$ of strings on an alphabet forms a free monoid together with the binary operation of string concatenation (denoted as $\cdot$ and written multiplicatively by dropping the symbol). In a SRS, the reduction relation $\stackrel{*}{\rightarrow}_R$ is compatible with the monoid operation, meaning that $x\stackrel{*}{\rightarrow}_R y$ implies $uxv\stackrel{*}{\rightarrow}_R uyv$ for all strings $x$, $y$, $u$, $v$ in $\Sigma^*$. Since $\stackrel{*}{\rightarrow}_R$ is by definition a preorder, $(\Sigma^*, \cdot, \stackrel{*}{\rightarrow}_R)$ forms a preordered monoid.

Similarly, the reflexive transitive symmetric closure of $\rightarrow_R$, denoted $\stackrel{*}{\leftrightarrow}_R$ (see abstract rewriting system#Basic notions), is a congruence, meaning it is an equivalence relation (by definition) and it is also compatible with string concatenation. The relation $\stackrel{*}{\leftrightarrow}_R$ is called the Thue congruence generated by $R$. In a Thue system, i.e. if $R$ is symmetric, the rewrite relation $\stackrel{*}{\rightarrow}_R$ coincides with the Thue congruence $\stackrel{*}{\leftrightarrow}_R$.

## Factor monoid and monoid presentations

Since $\stackrel{*}{\leftrightarrow}_R$ is a congruence, we can define the factor monoid $\mathcal{M}_R = \Sigma^*/\stackrel{*}{\leftrightarrow}_R$ of the free monoid $\Sigma^*$ by the Thue congruence in the usual manner. If a monoid $\mathcal{M}$ is isomorphic with $\mathcal{M}_R$, then the semi-Thue system $(\Sigma, R)$ is called a monoid presentation of $\mathcal{M}$.

We immediately get some very useful connections with other areas of algebra. For example, the alphabet {a, b} with the rules { ab → ε, ba → ε }, where ε is the empty string, is a presentation of the free group on one generator. If instead the rules are just { ab → ε }, then we obtain a presentation of the bicyclic monoid.

The importance of semi-Thue systems as presentation of monoids is made stronger by the following:

Theorem: Every monoid has a presentation of the form $(\Sigma, R)$, thus it may be always be presented by a semi-Thue system, possibly over an infinite alphabet.[5]

In this context, the set $\Sigma$ is called the set of generators of $\mathcal{M}$, and $R$ is called the set of defining relations $\mathcal{M}$. We can immediately classify monoids based on their presentation. $\mathcal{M}$ is called

• finitely generated if $\Sigma$ is finite.
• finitely presented if both $\Sigma$ and $R$ are finite.

## The word problem for semi-Thue systems

The word problem for semi-Thue systems can be stated as follows: Given a semi-Thue system $T:=(\Sigma, R)$ and two words (strings) $u, v \in \Sigma^*$, can $u$ be transformed into $v$ by applying rules from $R$? This problem is undecidable, i.e. there is no general algorithm for solving this problem. This even holds if we limit the input to finite systems.

Martin Davis offers the lay reader a two-page proof in his article "What is a Computation?" pp. 258–259 with commentary p. 257. Davis casts the proof in this manner: "Invent [a word problem] whose solution would lead to a solution to the halting problem."

## Connections with other notions

A semi-Thue system is also a term-rewriting system—one that has monadic words (functions) ending in the same variable as left- and right-hand side terms,[6] e.g. a term rule $f_2(f_1(x)) \rightarrow g(x)$ is equivalent with the string rule $f_1f_2 \rightarrow g$.

A semi-Thue system is also a special type of Post canonical system, but every Post canonical system can also be reduced to an SRS. Both formalism are Turing complete, and thus equivalent to Noam Chomsky's unrestricted grammars, which are sometimes called semi-Thue grammars.[7] A formal grammar only differs from a semi-Thue system by the separation of the alphabet in terminals and non-terminals, and the fixation of a starting symbol amongst non-terminals. A minority of authors actually define a semi-Thue system as a triple $(\Sigma, A, R)$, where $A\subseteq\Sigma^*$ is called the set of axioms. Under this "generative" definition of semi-Thue system, an unrestricted grammar is just a semi-Thue system with a single axiom in which one partitions the alphabet in terminals and non-terminals, and makes the axiom a nonterminal.[8] The simple artifice of partitioning the alphabet in terminals and non-terminals is a powerful one; it allows the definition of the Chomsky hierarchy based on the what combination of terminals and non-terminals rules contain. This was a crucial development in the theory of formal languages.

## History and importance

Semi-Thue systems were developed as part of a program to add additional constructs to logic, so as to create systems such as propositional logic, that would allow general mathematical theorems to be expressed in a formal language, and then proven and verified in an automatic, mechanical fashion. The hope was that the act of theorem proving could then be reduced to a set of defined manipulations on a set of strings. It was subsequently realized that semi-Thue systems are isomorphic to unrestricted grammars, which in turn are known to be isomorphic to Turing machines. This method of research succeeded and now computers can be used to verify the proofs of mathematic and logical theorems.

At the suggestion of Alonzo Church, Emil Post in a paper published in 1947, first proved "a certain Problem of Thue" to be unsolvable, what Martin Davis states as "...the first unsolvability proof for a problem from classical mathematics -- in this case the word problem for semigroups." (Undecidable p. 292)

Davis [ibid] asserts that the proof was offered independently by A. A. Markov (C. R. (Doklady) Acad. Sci. U.S.S.R. (n.s.) 55(1947), pp. 583–586.

## Notes

1. ^ Book and Otto, p. 36
2. ^ Abramsky et al. p. 416
3. ^ Salomaa et al., p.444
4. ^ In Book and Otto a semi-Thue system is defined over a finite alphabet through most of the book, except chapter 7 when monoid presentation are introduced, when this assumption is quietly dropped.
5. ^ Book and Otto, Theorem 7.1.7, p. 149
6. ^ Nachum Dershowitz and Jean-Pierre Jouannaud. Rewrite Systems (1990) p. 6
7. ^ D.I.A. Cohen, Introduction to Computer Theory, 2nd ed., Wiley-India, 2007, ISBN 81-265-1334-9, p.572
8. ^ Dan A. Simovici, Richard L. Tenney, Theory of formal languages with applications, World Scientific, 1999 ISBN 981-02-3729-4, chapter 4

## References

### Textbooks

• Martin Davis, Ron Sigal, Elaine J. Weyuker, Computability, complexity, and languages: fundamentals of theoretical computer science, 2nd ed., Academic Press, 1994, ISBN 0-12-206382-1, chapter 7
• Elaine Rich, Automata, computability and complexity: theory and applications, Prentice Hall, 2007, ISBN 0-13-228806-0, chapter 23.5.

### Surveys

• Samson Abramsky, Dov M. Gabbay, Thomas S. E. Maibaum (ed.), Handbook of Logic in Computer Science: Semantic modelling, Oxford University Press, 1995, ISBN 0-19-853780-8.
• Grzegorz Rozenberg, Arto Salomaa (ed.), Handbook of Formal Languages: Word, language, grammar, Springer, 1997, ISBN 3-540-60420-0.

### Landmark papers

• Emil Post (1947), Recursive Unsolvability of a Problem of Thue, The Journal of Symbolic Logic, vol. 12 (1947) pp. 1–11. Reprinted in Martin Davis ed. (1965), The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems and Computable Functions, Raven Press, New York. pp. 293ff