Talk:Regular language

This is the talk page for discussing improvements to the Regular language article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Philosophy: Logic Start‑class Low‑importance

This article is within the scope of WikiProject Philosophy, a collaborative effort to improve the coverage of content related to philosophy on Wikipedia. If you would like to support the project, please visit the project page, where you can get more details on how you can help, and where you can join the general discussion about philosophy content on Wikipedia.PhilosophyWikipedia:WikiProject PhilosophyTemplate:WikiProject PhilosophyPhilosophy articles

Start

This article has been given a rating which conflicts with the project-independent quality rating in the banner shell. Please resolve this conflict if possible.

Low

This article has been rated as Low-importance on the project's importance scale.

Associated task forces:
/
	Logic

Computer science High‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Mathematics High‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
High	This article has been rated as High-priority on the project's priority scale.

Linguistics

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics articles
???	This article has not yet received a rating on the project's importance scale.

The word regular[edit]

I removed the following:

The word regular is used because these expressions are built up using very distinct limited and regulated syntax rules.

I think one could argue that context-free languages are also built according to limited syntax rules. I'm pretty sure the word "regular" was chosen without much thinking; it's one of those meaningless terms, like "normal", that are used all over the place. Also, the above "these expressions" seems to refer to regular expressions, but regular expressions hadn't been mentioned in the article at that point. --AxelBoldt

Regular expressions[edit]

A regular language is a formal language (i.e. a possibly infinite set of finite sequences of symbols from a finite alphabet) that has one of the following equivalent properties:

(...)
it can be described by a regular expression.

Modern "regular" expressions can describe more languages than just regular ones. Example: /([a-z]*)\\1/ (weird software problem - there should be 1 backslash there) --Taw

Obviously, this refers to regular expressions in the sense used in the theory of computation, not Perl regular expressions etc. -- Schnee 12:16, 13 Jun 2004 (UTC)

Examples and other questions[edit]

I'm reading about regular languages for the first time, so am curious about few things which I could not find in the article.

Why the word 'regular' ? I can see a line on this talk page about it, but it'd be good if a line is added to the main article as well.
Examples of some regular languages
What do you call a language thats not a regular language. An "irregular language" ? If there is anything of the sort, a link to it and maybe an example would help understand this article better.

Jay 07:48, 14 Nov 2003 (UTC)

An example of a regular language might be that consisting of all words formed by three a's followed by any number of b's (with the alphabet consisting of a and b). I don't know where the term "regular" comes from, but I guess it's historic; as for non-regular languages, I never heard of a special moniker for them (outside of "non-regular"). -- Schnee 12:16, 13 Jun 2004 (UTC)

As @Jay: mentions, it is not obvious that a language that is not regular should be called nonregular or non-regular. The reason is that irregular fits better as a correct English word (my students were also confused by this and were using irregular). I did add this clarification in the page. @Jochen Burghardt:, can you please reconsider or comment on your revert? Cheater no1 (talk) 16:31, 28 October 2020 (UTC)[reply]

@Cheater no1: As I said in my edit summary, the prefix "non-" means negation in common language, so I guess it is obvious that a non-regular language is a language that is not regular, and this needn't be explained. We don't explain "non-zero number", "non-empty set", etc. either.

If there is a subtle distinction in English between "irregular" and "non-regular", things might be different, and my English knowledge is insufficient to comment about this. In the standard textbook by Hopcroft.Ullman.1979,^[1] none of these negations appears in the index. I wonder if you have a citation for "non-regular" or/and "irregular".

As for the name "regular", it doesn't appear in Chomsky.1956^[2] ("finite-state grammar" is used instead, p.114), but does in Chomsky.1959a^[3] (however with a different meaning, p.149), so it appears to have been introduced in between. - Jochen Burghardt (talk) 18:47, 28 October 2020 (UTC)[reply]

Thank you for the swift comment, @Jochen Burghardt:. I indeed don't see a definition of "nonregular" in our textbook Juraj Hromkovic -- Theoretical Computer Science. The author instead uses only "not regular" as a negation of "regular". I will update this thread if I find a reliable source (other than university presentations) that defines "nonregular", or commenting on the usage of "nonregular" instead of "irregular". Cheater no1 (talk) 12:36, 29 October 2020 (UTC)[reply]

Continuing research on the origin of the name "regular", I found that it doesn't appear in Chomsky.Miller.1958,^[4] while Chomsky.1959b^[5] says in fn.2 Finite state languages are what are called "regular events" in Kleene (1956). Indeed, Kleene.1956^[6] introduces "regular events" on p.1, and the precursor Kleene.1951^[7] defines them on p.46, noting "We would welcome any suggestions as to a more descriptive term." So, it seems that Kleene introduced the term "regular" in 1951, and Chomsky adopted it in 1959. - Jochen Burghardt, 29 October 2020

References

^ John E. Hopcroft and Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Reading/MA: Addison-Wesley. ISBN 0-201-02988-X.
^ Noam Chomsky (Sep 1956). "Three models for the description of language" (PDF). IRE Transactions on Information Theory (PGIT). 2 (3): 113–124.
^ Noam Chomsky (1959). "On Certain Formal Properties of Grammars" (PDF). Information and Control. 2: 137–167. {{cite journal}}: Cite has empty unknown parameter: |month= (help)
^ Noam Chomsky and George A. Miller (1958). "Finite State Languages" (PDF). Information and Control. 1: 91–112.
^ Noam Chomsky (1959). "A Note on Phrase Structure Grammars" (PDF). Information and Control. 2: 393–395.
^ Stephen Cole Kleene (1956). "Representation of Events in Nerve Nets and Finite Automate" (PDF). Automata Studies, Annals of Math. Studies. 34. Princeton, New Jersey: Princeton Univ. Press.
^ Stephen Cole Kleene (Dec 1951). Representation of Events in Nerve Nets and Finite Automate (PDF) (Research Memorandum). U.S. Air Force / RAND Corporation.

NFA's[edit]

'# it can be accepted by a deterministic finite state machine.' is superfluous. Every NFA has an equivalent DFA. So by requiring a regular language to be accepted by an NFA requires it to be accepted by a (the corrisponding DFA)

That's not superfluous. All the properties listed are equivalent, after all, and to the uninitiated, it's not obvious that DFAs and NFAs are basically the same. -- Schnee 12:16, 13 Jun 2004 (UTC)

Formal definition[edit]

This article describes the properties of regular languages and gives a few examples, but I think a kind of formal definition was missing. I added a well-known recursive definition. I think it fits pretty well in the article. What do you think? --Alexandre 22:27, 25 Jun 2004 (UTC)

The formal definition needs to include {ε} as an atomic language.

Union only produces {ε} when one argument is {ε} and the other is {ε} or ∅
Concatenation only produces {ε} when both arguments are {ε}
Kleene Star only produces {ε} when its argument is {ε}

Tobinyehle (talk) 16:56, 22 February 2016 (UTC)[reply]

The section Regular language#Examples says that ∅^* yields {ε}. That was my reason to revert the edit of 21 Feb. I just looked it up: the definition of Aho.Hopcroft.Ullman.1974 (sect.9.1, p.318) agrees with the example. - Jochen Burghardt (talk) 21:36, 22 February 2016 (UTC)[reply]

Kleene star[edit]

The recursive definition is quite nice. There's something I don't understand here, though. Why is the kleene star required in the recursive definition? It seems to me that you can construct it by induction using concatenation.

Here's a proof. Is it incorrect somewhere?

Suppose L is regular. If $\{\epsilon \}\cup L\cup L^{2}\cup \cdots \cup L^{n}$ is regular, then concatenate $L$ to get that $L\cup L^{2}\cup \cdots \cup L^{n+1}$ is regular. Union with $\{\epsilon \}$ . Therefore L* is regular.

The trick is that ordinary induction cannot "go to infinity". Your proof only establishes that the language

\{\epsilon \}\cup L\cup L^{2}\cup \cdots \cup L^{n}

is regular for all integers n. But none of these languages includes all of L^*, only their union does, and regular languages are not closed under infinite unions. (if they were, every countable language would be regular) Deco 21:56, 5 December 2005 (UTC)[reply]

Properties of non-regular languages[edit]

Can anybody talk about properties of non - regular languages? (unsigned, undated)

it would be of great help if some one could post more info about non regular languages 7 OCT 2007 —Preceding unsigned comment added by 64.234.33.191 (talk) 19:18, 7 October 2007 (UTC)[reply]

What are half, log, and sqrt?[edit]

The article states that regular languages are closed under the "half", "log" and "sqrt" operations. What are these? I could guess that $\mathrm {sqrt} (L)=\{x\in \Sigma ^{*}|xx\in L\}$ , but that is not the inverse of the square of a language. I have no idea for the other two. Eric119 19:31, 14 August 2006 (UTC)[reply]

I have removed the claim, since it seems bogus to me. If these functions are real and anyone knows what they are, please add them back in (with an explanation!). Eric119 05:32, 18 August 2006 (UTC)[reply]

I don't have a definition, but I think I have some examples (just guessing)-- for example, if

L=a^{*}b^{*}

(i.e.

L=\Sigma ^{*}

for

\Sigma =\{a,b\}

) then half(L) would be the set of strings starting only with the letter b. The other half would be the strings starting only with the letter a. The two halves add to the whole language. The square-root would be

sqrt(L)=(ab)^{*}=a^{*}=b^{*}

-- i.e. just take half number of the generators. Finally, log just extracts the generators from out of the language, so if

L=a^{*}

then

log(L)=\{a\}

. Or something like that. I don't know the general definition. linas 04:00, 26 April 2007 (UTC)[reply]

The statement is true. http://www.cs.caltech.edu/~cs20/a/hw/hw2/solution/sol2.pdf (i hope i am not infringing anything by posting this link)

Not at all. And in the homework assignment corresponding to those solutions [1] is the definition of these sets:

Half(L) = { x | ∃ y ( |y| = |x| and xy ∈ L)}

Sqrt(L) = { x | ∃ y ( |y| = |x|² and xy ∈ L)}

Log(L) = { x | ∃ y ( |y| = 2^|x| and xy ∈ L)}

What is not clear to me is that these are significant at all, other than as a homework problem. — Carl (CBM · talk) 18:10, 11 September 2007 (UTC)[reply]

The above operations are discussed in some detail in the recent textbook

Jeffrey Shallit: A second course in formal languages and automata theory, 2009, Chapter 3.

The book also points to further studies of regularity preserving transformations, for example

Dexter Kozen: On Regularity-Preserving Functions. Bulletin of the EATCS 58:131-138 (1996)

An additional recent reference in the research literature studying Half(L) is

Michael Domaratzki: State Complexity of Proportional Removals. Journal of Automata, Languages and Combinatorics 7(4): 455-468 (2002).

I do not know if that answers your question, that is, if you think that coverage in the (mathematically oriented) research literature is enough to witness significance of a topic. You might find some additional motivation for studying these operations in one of the above references. Hermel (talk) 10:59, 16 February 2009 (UTC)[reply]

HALF(L)[edit]

Regular languages most definitely are closed under this operation. The proof is a little long and I am not including it here. It can be found in the text of Problem Solving in Automata, Languages, and Complexity by Ding-Zhu Du and Ker-I Ko. The basic idea however is that if L is regular, then a DFA exists that accepts it. We want to make a new machine that has this original machine and also a copy of it that runs backwards (non-deterministically). We want to run these in parallel on input. If the parallel machines stops with both machines in the same state (note that this state is not necessarily in the set of final states of the original machine) then we have half of L Isometric 05:40, 17 October 2006 (UTC)[reply]

Thanks. I was not aware of what the "half" function is, and didn't succeed in finding out. Though just now I did a Google search that found the answer. I'll add a definition to the article. Eric119 06:05, 18 October 2006 (UTC)[reply]

Incidentally, I believe I proved that the regular languages are closed under the "sqrt" operation I defined above, but it was rather tricky. Eric119 06:12, 18 October 2006 (UTC)[reply]

I removed the statement

* HALF(L), the set of strings that are the first half of a string in L

from the article, as its (clearly?) wrong. A standard textbook example of a language that is not regular is

\{a^{n}b^{n}\vert n\geq 0\}

. I'll try to supply a proof that its not regular shortly. linas 03:47, 26 April 2007 (UTC)[reply]

Isometric is talking about something completely different; he is not talking about "half of a string", he is talking about "half of a language", which has "half" as many strings in it as the original language does. Similarly for the definition sqrt(L), and log(L). I can kinda guess what these are, but this claim would need to be backed up by a longer article defining them so that we don't get mistakes like this. linas 03:55, 26 April 2007 (UTC)[reply]

Linas, I don't get the point of your example. That is a non-regular langauge whose "half" a* is regular. Here I am defining half as it said in the article which more explicitly is half(L) = {x | exists y length(x) = length(y) and xy in L}. Incidentally, the document at http://www.comp.nus.edu.sg/~sanjay/cs3231/half.pdf claims a proof of closure, but I haven't verified it myself. Eric119 20:15, 28 April 2007 (UTC)[reply]

Regular language is closed under HALF. If A is a regular language, then HALF(A) = {x|xy in A and |x|=|y|}. The proof is like this: Given DFA D that recognizes A, you can guess that, given a string w in A, after running through the first half of w, you will end up in some state q in D. So you make another machine M with initial state q, and make every transition in A to be a transition with 0,1, assume alphabet {0,1}. M has the same set of accept states as A. Now you can run D and A in parallel, this can be done with subset construction. You accept x if running x on A ends up in state q, and running {0,1}* for |x| steps on M ends in an accept state. You repeat this for every state q of D and accept x if one of them both have D end up in state q and M start from q and end up in an accept state.

This proof works in general, that is, any constant fraction of a regular language is also regular. E.g. if A is regular, then the first third, or the middle third, or the last third, all three languages are regular.

The following comment was deleted by IP editor 192.5.109.34 (talk). I'm restoring it. Hermel (talk) 19:38, 11 March 2011 (UTC)[reply]

On the other hand, if A is regular, then the language derived by remove the middle third of a string in A is not regular. An example is a*b+c*, assume alphabet {a,b,c}. You can see the language with middle third removed is not regular by observing a*b+c* intersects with a*c*. The intersection is a^nc^n, but this one is not regular.

The editor 192.5.109.34 (talk) replaced the above paragraph with the following. Hermel (talk) 19:38, 11 March 2011 (UTC)[reply]

On the other hand, if A is regular, then the language derived by remove the middle third of a string in A is not regular. An example is (00*1)*0*. You can see the language with middle third removed is not regular since the language contains all the sequences of the form

0^{i}110^{i},

but does not contain the sequences

0^{i}110^{j}

for i≠j.

Reference: Sterns and Hartmanis, Regularity Preserving Modifications of Regular Expressions, Information and Control, v6, 1963, pp 55-69

The following comment by 128.174.24.69 was deleted as well by 192.5.109.34 (talk). I'm restoring this as well. Hermel (talk) 19:38, 11 March 2011 (UTC)[reply]

The a*b+c* intersection of a*c* does not equal a^n c^n, it equals the empty set. This also appears to be completely unrelated to the problem of removing the middle third. Also, the source cited doesn't appear to be available anywhere on the internet; you could just as well cite nothing at all. —Preceding unsigned comment added by 128.174.24.69 (talk) 05:44, 24 February 2011 (UTC)[reply]

It is related if one third and last third are regular their concatenation is regular. This is not true

BTW, the paper is available in many university libraries; But googling it brings up the following link

Stearns, R; Hartmanis, J (1963). "Regularity preserving modifications of regular expressions". Information and Control. 6 (1): 55–69. doi:10.1016/S0019-9958(63)90110-4.

Admittedly, it is behind a WP:PAYWALL, but still... Hermel (talk) 19:58, 11 March 2011 (UTC)[reply]

The language formed by catenating first and last thirds may be regular, but it isn't the language formed by excising the middle third. In general it has many additional elements, e.g. {aaa, bbb} yields extreme thirds {a, b}, but their catenation is {aa, ab, ba, bb}.

The counterexample for excising middle third (dub EMT) seems to have been garbled at some point. It should be not a*b+c*, but a*bc*. Then to find the overlap of the derived language with a*c*, note that the 'b' in the original word must be in the middle third, so all the c's come from the final third, and thus constitute the original third. Then EMT(a*bc*) ∩ a*c* = {a^nc^n}, so EMT(a*bc*) is not a regular language. QED. --RichardW57m (talk) 10:02, 31 March 2022 (UTC)[reply]

More Than One Definition?[edit]

The Article first describes regular language as one that is accepted by a deterministic finite state machine. Then it defines it using the recursive definition. The point is that either one can be used as definition. However, if one is used as definition, the other one must be proved as theorem.--CBKAtTopsails 16:20, 6 June 2007 (UTC)[reply]

This is implied by the word "equivalent", but may deserve clarification. Dcoetzee 18:15, 11 September 2007 (UTC)[reply]

"On the other hand, it is not known to contain AC0."[edit]

Um... I'm not exactly an expert on the subject, but doesn't it certainly not contain AC0? I'm hesitant to take it out, because I don't really know what I'm talking about (not to mention that I don't want to say they're separate without a source, and just cutting it might imply to the casual reader that it does contain it), but it seems to me as though certain languages in AC0 being regular would allow you to use their DFAs to construct DFAs for L-hard languages, violating the space hierarchy theorem. Twin Bird (talk) 04:40, 1 July 2011 (UTC)[reply]

Merger proposal[edit]

I propose that Rational language (currently orphaned) be merged into Regular language. I think that the content in the rational language article can easily be explained in the context of regular languages, and this article is of a reasonable size that the merging of Rational language will not cause any problems as far as article size or undue weight is concerned. Hermel (talk) 22:01, 3 August 2013 (UTC)[reply]

Are you proposing this just because the names are similar? Regular language is also a particular case of context-free language and a lot of others... That doesn't mean all notions more general than Regular should be merged into Regular. If anything should be done with Rational language is to merge it to rational series—its generating notion. JMP EAX (talk) 23:00, 17 August 2014 (UTC)[reply]

Actually, I'm unable to verify the [more general] def given on the Rational language from either of the sources cited! Both simply use "Rational language" as a 100% synonym for Regular language as far as I can tell. (See Talk:Rational language for specifics.) So simply redirecting that page here (and perhaps mentioning the synonym) is the best solution because there's nothing (verifiable) worth merging anywhere in that stub unless I've missed something in the tomes cited as refs there. JMP EAX (talk) 00:08, 18 August 2014 (UTC)[reply]

Done. JMP EAX (talk) 00:22, 19 August 2014 (UTC)[reply]

Mistake in "Equivalent formalisms" section[edit]

"the number of equivalence classes of its syntactic congruence is finite.[note 8][note 9] (This number equals the number of states of the minimal deterministic finite automaton accepting L.)"

I'm quite sure that the sentence in parentheses is not true. The number of equivalence classes of the syntactic congruence is the size of the syntactic monoid. This is also the size of the transition monoid of the minimal deterministic finite automaton accepting L. However, there are usually more elements in the monoid than states in the automaton. — Preceding unsigned comment added by 81.218.76.25 (talk) 08:45, 6 October 2019 (UTC)[reply]

Indeed, for (ab+ba)* the minimum automaton has 4 states but the syntactic monoid has 15 elements. (The congruences are generated by aba ≡ a, bab ≡ b, abba ≡ baab and aaa ≡ bbb ≡ aaaa ≡ bbbb.) I think we should split the equivalent formalism into:

"the number of equivalence classes of its syntactic congruence is finite.[note 8][note 9]

"the number of equivalence classes of its left syntactic equivalence (also known as the right syntactic congruence.) is finite.[note 8][note 9] (This number equals the number of states of the minimal deterministic finite automaton accepting L.)"

The notes will need corresponding review. --RichardW57m (talk) 14:33, 1 April 2022 (UTC)[reply]

Trace Languages[edit]

We also have regular and rational languages in the domain of trace languages. They are defined using different definitions that coincide for the free monoid of strings in an alphabet, namely recognition by DFAs and the purely algebraic definitions respectively, for not all rational trace languages are regular trace languages. The bad boy example is (U+0F73 TIBETAN VOWEL SIGN II}* in the domain of Unicode strings under canonical equivalence, with decomposable characters treated as syntactic sugar.

Should we mention such divergences here, or just have a headnote redirecting the read to Trace monoid for the use of the terms with respect to trace languages? That article doesn't yet have a section on these languages. --RichardW57m (talk) 12:02, 30 March 2022 (UTC)[reply]

IMO this deserves a mention under the Generalizations section, but not elsewhere, and the main content related to this should go at trace monoid. Caleb Stanford (talk) 14:56, 30 March 2022 (UTC)[reply]

The existence of a generalisation needs to be mentioned somewhere before the reader concludes that this article has nothing to say about these terms for traces. --RichardW57m (talk) 16:04, 30 March 2022 (UTC)[reply]

Wiki Education assignment: Linguistics in the Digital Age[edit]

This article was the subject of a Wiki Education Foundation-supported course assignment, between 21 August 2023 and 11 December 2023. Further details are available on the course page. Student editor(s): Sseligman26 (article contribs).

— Assignment last updated by Fedfed2 (talk) 00:53, 9 December 2023 (UTC)[reply]

[1] John E. Hopcroft and Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Reading/MA: Addison-Wesley. ISBN 0-201-02988-X.

[2] Noam Chomsky (Sep 1956). "Three models for the description of language" (PDF). IRE Transactions on Information Theory (PGIT). 2 (3): 113–124.

[3] Noam Chomsky (1959). "On Certain Formal Properties of Grammars" (PDF). Information and Control. 2: 137–167. {{cite journal}}: Cite has empty unknown parameter: |month= (help)

[4] Noam Chomsky and George A. Miller (1958). "Finite State Languages" (PDF). Information and Control. 1: 91–112.

[5] Noam Chomsky (1959). "A Note on Phrase Structure Grammars" (PDF). Information and Control. 2: 393–395.

[6] Stephen Cole Kleene (1956). "Representation of Events in Nerve Nets and Finite Automate" (PDF). Automata Studies, Annals of Math. Studies. 34. Princeton, New Jersey: Princeton Univ. Press.

[7] Stephen Cole Kleene (Dec 1951). Representation of Events in Nerve Nets and Finite Automate (PDF) (Research Memorandum). U.S. Air Force / RAND Corporation.

[1]

[2]

[3]

[4]

[5]

[6]

[7]