# Template talk:Formal languages and grammars

I disagree with Tyler McHenry's version of this table, and much prefer the earlier one by Chris Pressey. An unrestricted grammar as defined in the Chomsky hierarchy is well defined and explicit. Listing it for both the Turing Machine and Decider rows seems to imply that both are equivalent. I'm co-teaching a formal languages course this semster, and all the students found this table confusing when "unrestricted" is listed in multiple rows. Jim Mahoney 19:09, 27 March 2006 (UTC)

I just want to note that this was addressed, by now (I'm stating it clearly for future readers of this talk page). --Blaisorblade (talk) 15:07, 16 June 2008 (UTC)

## Formal languages and grammars vs. Chomsky hierarchy

Since this template is about formal languages and grammars in general, and not strictly the Chomsky hierarchy (as specified in Chomsky (1959, 1963)), would anyone have a problem if we listed other well-documented proper subset formal languages and grammars that have been discovered since then? For example, indexed languages & grammars have been around since Aho (1968) and have been well studied since then, in e.g. Hopcroft & Ullman (1979), not to mention mildly context-sensitive (Joshi et al, 1975), deterministic context-free, and other major formal languages and grammars. –jonsafari 03:37, 21 September 2006 (UTC)

Yes, this table is too heavily tied up in the Chomsky hierarchy — an important classification scheme, to be sure, but not a good way of organizing the information this template needs to convey, seeing as this template needs to include many other kinds of classifications. Please be bold. :-)   —RuakhTALK 03:48, 21 September 2006 (UTC)

## Subsets not proper

As far as I know, indexed grammars and tree adjoining grammars, as well as context-free and deterministic context-free grammars, generate the same language. Therefore they are no proper subsets. Math1985 21:22, 7 August 2007 (UTC)

Where do you get this information from? I get my information from Hopcroft & Ullman (1979:233,390), Partee et al (1990:536-542), and Sipser (1997), not to mention many works by Vijay-Shanker & Weir. Most of these sources are cited in the respective language articles, where they should be. –jonsafari 21:18, 8 August 2007 (UTC)

I agree that the information proposed by Math1985 is incorrect.

Weir and Joshi actually proved that Combinatory Categorial Grammmars generate the same languages as 3 other families already known to (weakly) generate the same class of languages: tree adjoining grammars, Head Grammars and linear indexed grammars. So there is no problem there.

However, in his book "Taking Scope: The Natural Semantics of Quantifiers", Mark Steedman states on page 105: "Full LCFRS and IG are still properly contained within, and much less expressive than, Context-Sensitive (Type 1) grammars. Hoever, they characterize incommensurable overlapping sets of languages and do not stand in a containment relation" This is in contradiction with the table that suggest that no-name languages generated by Linear Context-Free Rewriting Systems (LCFRS) are properly contained in Indexed Languages.

However LCFRS is an important class, and (as I recall) a lot more tractable for practical purposes than Indexed languages, But that is of course a matter of opinion. I would suggest a mark in the table, referring to a note stating there is no containment relation in that case. Bernard Lang (talk) 14:21, 15 February 2014 (UTC)

I'd prefer to have only proper containments in the table, and (to repeat my Nov 2013 posting) to add a link to an own article devoted to the discussion of formalisms related in a more complex way than just set inclusion. Chomsky hierarchy is a possible place for the latter article. - Jochen Burghardt (talk) 12:19, 14 March 2014 (UTC)
Another reason to remove the LCFRS line is that it is (currently) not even clear that they are weakly equivalent to Thread automata from the involved articles. - Jochen Burghardt (talk) 12:32, 14 March 2014 (UTC)
A source for the latter statement: The article Mildly_context-sensitive_language#Formalisms says: "The larger language class is generated by the following formalisms: ... LCFRS ... The larger class is a subset of the class of languages generated by thread automata, but whether this inclusion is proper is not known..[1]" - Jochen Burghardt (talk) 09:31, 25 May 2014 (UTC)
1. ^ Kallmeyer 2010, p. 216.

## Range of Mildly context-sensitive languages

If I understand correctly, the term "mildly context-sensitive languages" refers to a range of languages broader than the TAL/LIL and EPDA (= L2 in Weir's Control Language Hierarchy). I'm referring to Joshi, Vijay-Shanker and Weir's "The Convergence of Mildly Context-Sensitive Grammar Formalisms" and Weir's "A Geometric hierarchy beyond context-free languages".--Ippei (talk) 16:13, 21 April 2008 (UTC)

If you would like to contribute this information in mildly context-sensitive languages, please do, always citing your reliable sources specifically. –jonsafari (talk) 06:25, 24 April 2008 (UTC)
I will definitely add there mention to Control Language Hierarchy when I get some time. Meanwhile, the mildly context-sensitive language article already says TAL alone is not the MCSL. It kind of bothers me the mismatch in this template and wondering if anyone could come up with an alternative. Fortunately (Weir 1992) has extended EPDA for his hierarchy in the same name. It's just that TAG not corresponding well to the MCSLs (or maybe the language column should be TAL).--Ippei (talk) 21:36, 3 May 2008 (UTC)
If TAL's are indeed a proper subset of MCSL's, then both rows should appear in this template. –jonsafari (talk) 01:57, 5 May 2008 (UTC)
That's a good point. The four weakly equivalent grammars (TAG,CCG,LIG,HG) indeed defines the language which Weir's Control Language Hierarchy calls Level-2 (Level-1 is CFL). The properties of Level-k (for some finite k>1) corresponds well with the "rough" definition of MCSL by Joshi. Probably the problem is MCSL not defined in a very proper way, as such does not fit very well into this table. --Ippei (talk) 14:15, 5 May 2008 (UTC)
I've edited the table minimally reflecting the facts but keeping the convenience. Hope it's the right way to deal with it. --Ippei (talk) 22:27, 8 May 2008 (UTC)

According to the article Mildly context-sensitive language#Formalisms, "The notion of mild context-sensitivity does not designate a single class of languages, but applies to any language class meeting the criteria in the definition". However, the template currently suggests that there is a single class called "mildly context-sensitive languages", which is recognized by thread automata and generated by linear context-free rewriting systems. I'd like to replace the "mildly context-sensitive" entry by "(no common name)"; possibly a note could say that this language class, as well as that of tree-adjoining languages below it, belongs to the set of language classes meeting the "mildly context-sensitive" criteria.

By the way: shouldn't the template somewhere explicitly state the horizontal correspondence (e.g. regular grammars generate regular languages, which are accepted by finite automata)? - Jochen Burghardt (talk) 22:38, 2 February 2014 (UTC)

## Catagorisation of further automata

Hello, this is a great template! There are however a significant number of (sometimes oprhaned) articles pertaining rather more obscure automata that are not included. Could the template be extended to include these? Here's a rough list of candidates, where indentation is a possible subset example;

Parity automaton
Büchi automaton / Muller automaton / Streett automaton / Rabin automaton
Kripke structure
Tree walking automaton
Pebble automaton
Quantum finite automata
Learning Automata
Levenshtein automaton
Lattice gas automaton
Continuous spatial automaton
Semiautomaton
Probabilistic automaton
Continuous automaton

(Some of the above could be examples of Cellular Automata, I didn't investigate very far.) A fair few of these I am completely unfamiliar with. I plan to read up on the articles and draft a possible template extension, depending on consensus. --BlueNovember (talk) 12:07, 17 October 2008 (UTC)

## Recursive $\subset$ Recursively Enumerable?

I wonder why the "Type 0" row is at the top. Aren't recursively enumerable languages a subset of recursive languages, thus the second row should be at the top? —Preceding unsigned comment added by 79.211.162.151 (talk) 22:53, 1 December 2009 (UTC)

I think the table is correct having Type 0 row on top and recursive languages next. Recursive languages are a subset of recursively enumerable languages, not the other way round. --Ippei (talk) 05:43, 11 January 2010 (UTC)

## CFG $\cup$ CSG $\neq$ CSG

In other words: There are context-free grammars (in particular those with non-harmless $\varepsilon$-rules) that are no context-sensitive grammars (since the latter permit only harmless $\varepsilon$-rules in order to produce the empty string). Thus, the line saying "languages or grammars are proper subsets of ..." should be corrected. Instead, grammars should be added to the sentence about automata. --Zahnradzacken (talk) 18:05, 27 February 2010 (UTC)

Well then, it's correct now. --Zahnradzacken (talk) 18:53, 19 March 2010 (UTC)

## Star-free grammar?

Rather than offering no name for the grammar of a star-free language, could it not unambiguously be referred to as a Star-free grammar and be linked directly to the Star-free language article? --24.26.130.82 (talk) 21:03, 22 May 2011 (UTC)

## Undescribable languages

Over any given finite alphabet, there are uncountably many possible formal languages but only countably many possible formal grammars and/or Turing machines. This implies that there are formal languages which cannot be described by any formal grammar or recursively enumerated by any Turing machine. Is it worth recording the existence of these languages? This would be the highest row in the table if so. 195.212.29.92 (talk) 13:09, 27 May 2011 (UTC)

I consider undescribable languages worth to be mentioned in some article(s). However, they wouldn't fit into the template, as they are not a superset of any language there, but are disjoint to each of them. Concerning the complementary notion "describable language", some care must be taken not to run into paradoxes like "the smallest language class than cannot be described in fewer than fourteen words".
Referring to the above section #Catagorisation of further automata, I agree that an overview over the automata mentioned there (plus "Van Wijngaarden grammar", which came to my mind) should be given somewhere outside this talk page. Maybe, an own article should be devoted to the discussion of formalisms related in a more complex way than just set inclusion. The template should remain restricted to the main simple-inclusion hierarchy, but link to such a full-overview article. - Jochen Burghardt (talk) 11:04, 6 November 2013 (UTC)

## Problem with table rendering

There is a layout problem in this table with at least two browsers: IE7 and Google. It could be purely a browser bug but it could also be a problem in the Wikimedia system.

Problem: When text in a table cell is too long and must span more than one line, the table rows lose their alignment, which makes the table confusing. The cell that does this is "Linear context-free rewriting systems etc."

IE7 seems to have a permanent limit on cell width while the line break occurs in Google Chrome 16 only if you make the window narrow (not wide enough). In both cases is the table layout broken. 83.226.178.77 (talk) 22:41, 8 January 2012 (UTC)

I can confirm this occurs on Firefox as well whenever the window isn't too wide (narrower than 1000px or so on my setup). Definitely needs to be fixed. /blahedo (t) 06:18, 10 February 2012 (UTC)
I don't get this bug on Chromium. A temporary fix could be to remove the "rewriting systems" part. Andreas vc (talk) 23:44, 16 June 2012 (UTC)

## Removing "recursive grammar"

I am going to revert a recent edit that added "recursive grammar" to the hierarchy. As far as I know, there is no such category in the hierarchy for grammars. This issue is being discussed at Wikipedia:Articles for deletion/Recursive grammar. If there is such a category and you wish to revert my edit, please provide a reference to that effect from a reliable source. Thanks, --Mark viking (talk) 00:44, 31 March 2013 (UTC)

Today, such an entry has been added again in good faith. Apparently, the name "Recursive" in column "Languages" is tempting to draw such a connection. However, a "recursive language" (meaning a decidible language) does not correspond to a "recursive grammar" (meaning just a grammar with recursive rules).
In order to avoid repetion of this confusion, I changed in column "Languages" the entry "Recursive" to "Decidable" which
• redirects to "recursive language",
• fits well with "Decider" in column "Minimal automaton", and
• doesn't invite to insert "recursive grammar" (an article "Decidable grammar" does not exist, and need not exist).
Jochen Burghardt (talk) 18:11, 11 January 2014 (UTC)
This looks good to me, thanks. --Mark viking (talk) 22:33, 19 January 2014 (UTC)
I changed it purely as part of a clean-up exercise on navboxes. Among other things, redirects should, in general (redirects to sections of another article are less clear-cut), be avoided in navboxes, so that the link can be displayed (automatically) in bold when the reader is looking at that article - this helps in navigation, which is what navboxes are designed for. You can pipe "decidable" to "recursive language" if you want, but it should not be left as a redirect. --NSH001 (talk) 20:21, 21 January 2014 (UTC)
Sorry, I wasn't aware of that redirect problem in navboxes. Now I piped "decidable" to "recursive language" as you suggested. - Jochen Burghardt (talk) 22:02, 21 January 2014 (UTC)