Brzozowski derivative

In theoretical computer science, in particular in formal language theory, the Brzozowski derivative u⁻¹S of a set S of strings and a string u is defined as the set of all rest-strings obtainable from a string in S by cutting off its prefix u (if possible), formally: u⁻¹S = { v ∈ Σ^*: uv ∈ S }, cf. picture.^[1] It is named after the computer scientist Janusz Brzozowski who investigated their properties and gave an algorithm to compute the derivative of a generalized regular expression.

Derivative of a regular expression

Given a finite alphabet A of symbols,^[2] a generalized regular expression denotes a possibly infinite set of finite-length strings of symbols from A. It may be built of:

∅ (denoting the empty set of strings),
ε (denoting the singleton set containing just the empty string),
a symbol a from A (denoting the singleton set containing the single-symbol string a),
R∨S (where R and S are, in turn, generalized regular expressions; denoting their set's union),
R∧S (denoting the intersection of R 's and S 's set),
¬R (denoting the complement of R 's set with respect to the set of all strings of symbols from A),
RS (denoting the set of all possible concatenations of strings from R 's and S 's set),
R^* (denoting the set of n-fold repetitions of strings from R 's set, for any n≥0, including the empty string).

In an ordinary regular expression, neither ∧ nor ¬ is allowed. The string set denoted by a generalized regular expression R is called its language, denoted as L(R).

Computation

For any given generalized regular expression R and any string u, the derivative u⁻¹R is again a generalized regular expression.^[3] It may be computed recursively as follows.^[4]

(ua)⁻¹R	= a⁻¹(u⁻¹R)	for a symbol a and a string u
ε⁻¹R	= R

Using the previous two rules, the derivative with respect to an arbitrary string is explained by the derivative with respect to a single-symbol string a. The latter can be computed as follows:^[5]

a⁻¹a	= ε
a⁻¹b	= ∅	for each symbol b≠a
a⁻¹ε	= ∅
a⁻¹∅	= ∅
a⁻¹(R^*)	= a⁻¹RR^*
a⁻¹(RS)	= (a⁻¹R)S ∨ ν(R)a⁻¹S
a⁻¹(R∧S)	= (a⁻¹R) ∧ (a⁻¹S)
a⁻¹(R∨S)	= (a⁻¹R) ∨ (a⁻¹S)
a⁻¹(¬R)	= ¬(a⁻¹R)

Here, ν(R) is an auxiliary function yielding a generalized regular expression that evaluates to the empty string ε if R 's language contains ε, and otherwise evaluates to ∅. This function can be computed by the following rules:^[6]

ν(ε)	= ε
ν(∅)	= ∅
ν(R^*)	= ε
ν(RS)	= ν(R) ∧ ν(S)
ν(R ∧ S)	= ν(R) ∧ ν(S)
ν(R ∨ S)	= ν(R) ∨ ν(S)
ν(¬R)	= ε	if ν(R) = ∅
ν(¬R)	= ∅	if ν(R) = ε

Properties

A string u is a member of the string set denoted by a generalized regular expression R if and only if ε is a member of the string set denoted by the derivative u⁻¹R.^[7]

Considering all the derivatives of a fixed generalized regular expression R results in only finitely many different languages. If their number is denoted by d_R, all these languages can be obtained as derivatives of R with respect to string of length below d_R.^[8] Furthermore, there is a complete deterministic finite automaton with d_R states which recognises the regular language given by R, as laid out by the Myhill–Nerode theorem.

References

^ Janusz A. Brzozowski (1964). "Derivatives of Regular Expressions". JACM. 11: 481–494. doi:10.1145/321239.321249.
^ Brzozowski (1964), p.481, required A to consist of the 2ⁿ combinations of n bits, for some n.
^ Brzozowski (1964), p.483, Theorem 4.1
^ Brzozowski (1964), p.483, Theorem 3.2
^ Brzozowski (1964), p.483, Theorem 3.1
^ Brzozowski (1964), p.482, Definition 3.2
^ Brzozowski (1964), p.483, Theorem 4.2
^ Brzozowski (1964), p.484, Theorem 4.3

[1] Janusz A. Brzozowski (1964). "Derivatives of Regular Expressions". JACM. 11: 481–494. doi:10.1145/321239.321249.

[2] Brzozowski (1964), p.481, required A to consist of the 2ⁿ combinations of n bits, for some n.

[3] Brzozowski (1964), p.483, Theorem 4.1

[4] Brzozowski (1964), p.483, Theorem 3.2

[5] Brzozowski (1964), p.483, Theorem 3.1

[6] Brzozowski (1964), p.482, Definition 3.2

[7] Brzozowski (1964), p.483, Theorem 4.2

[8] Brzozowski (1964), p.484, Theorem 4.3

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]