Q-systems

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Q-Systems are a method of directed graph transformations according to given grammar rules, developed at the Université de Montréal by Alain Colmerauer in 1967--70 for use in natural language processing. The Université de Montréal's machine translation system, TAUM-73, used the Q-Systems as its language formalism.

The data structure manipulated by a Q-system is a Q-graph, which is a directed acyclic graph with one entry node and one exit node, where each arc bears a labelled ordered tree. An input sentence is usually represented by a linear Q-graph where each arc bears a word (tree reduced to one node labelled by this word). After analysis, the Q-graph is usually a bundle of 1-arc paths, each arc bearing a possible analysis tree. After generation, the goal is usually to produce as many paths as desired outputs, with again one word per arc.

A Q-System consists of a sequence of Q-treatments, each being a set of Q-rules, of the form <matched_path> == <added_path> [<condition>]. The Q-treatments are applied in sequence, unless one of them produces the empty Q-graph, in which case the result is the last Q-graph obtained. The three parts of a rule can contain variables for labels, trees, and forests. All variables after "==" must appear in the <matched_path> part. Variables are local to rules.

A Q-treatment works in two steps, addition and cleaning. It first applies all its rules exhaustively, using instantiation (one-way unification), thereby adding new paths to the current Q-graph (added arcs and their trees can be used to produce new paths). If and when this addition process halts, all arcs used in some successful rule application are erased, as well as all unused arcs that are no more on any path from the entry node to the exit node. Hence, the result, if any (if the addition step terminates), is again a Q-graph. That allows several Q-Systems to be chained, each of them performing a specialized task, together forming a complex system. For example, TAUM 73 consisted of fifteen chained Q-Systems.

An extension of the basic idea of the Q-Systems, namely to replace instantiation by unification (to put it simply, allow "new" variables in the right hand side part of a rule, and replace parametrized labelled trees by logical terms) led to Prolog, designed by Alain Colmerauer and Philippe Roussel in 1972. Refinements in the other direction (reducing non-determinism and introducing typed labels) by John Chandioux led to GramR, used for programming METEO from 1985 onward.

In 2009, Hong Thai Nguyen of GETALP GETALP, LIG (Laboratoire d'Informatique de Grenoble) reimplemented the Q-language in C, using ANTLR to compile the Q-systems and the Q-graphs, and an algorithm proposed by Christian Boitet (as none had been published and sources of the previous Fortran implementation had been lost). That implementation was corrected, completed and extended (to labels using Unicode characters and not only the printable characters of the CDC6600 of the historical version) by David Cattanéo in 2010-11.

History[edit]

The METEO System is a Very High Quality Machine Translation system for weather bulletins that has been in operational use at Environment Canada from 1982 to 2001. It stems from a prototype developed in 1975-76 by the TAUM Group, known as TAUM-METEO. As many authors confuse the prototype with the actual system, a bit of history is in order.

The initial motivation to develop that prototype was that a junior translator came to TAUM to ask for help in doing the extremely boring (and at the same time difficult) job of translating weather bulletins at Environment Canada he had to do at that time.

Indeed, since all official communications emanating from the Canadian government must be available in French and English, because of the official bilingual services act of 1968, and weather bulletins represent a large amount of translation in real time, junior translators had to spend several months of purgatory producing first draft translations, then revised by seniors. That was in fact a quite difficult job, because of the specificities of the English and French sublanguages used, and not very motivating, as the lifetime of a bulletin is only 4 hours.

TAUM proposed to build a prototype MT system, and Environment Canada accepted to fund the project. A prototype was ready after a few months, with a crude integration in the workflow of translation (source and target bulletins travelled over telex lines at the time and MT was performed on a mainframe). The first version of the system (METEO 1) went into operation on a Control Data 7600 supercomputer in March 1977.

John Chandioux then left the TAUM group to manage its operation and improve it, while the TAUM group embarked on a very different project (TAUM-aviation, 1977–81). With Benoit Thouin and one year later alone, he made lots of improvements to the initial prototype, and transformed it into a really operational system. After 3 years, METEO 1 had demonstrated the feasibility of microcomputer-based machine translation to the satisfaction of the Canadian government's Translation Bureau.

METEO 1 was formally adopted in 1981, replacing the junior translators in the workflow. Because of the need for very high quality, the revision step, done by senior translators, was maintained. The quality, measured as the percentage of edit operations (inserting or deleting a word counts as 1, replacing as 2) on the MT results, reached 85% in 1985.

Until that time, the MT part was still implemented as a sequence of Q-systems. The [Q-systems] formalism is a rule-based SLLP (Specialized Language for Linguistic Programming) invented by Alain Colmerauer in 1967 as he was a postdoc coopérant at the TAUM group. (He invented the famous Prolog language in 1972 after returning to France and becoming a university professor in Marseille-Luminy.)

As the engine of the Q-systems is highly non-deterministic, and the manipulated data structures are in some far too simple, without any types such as string or number, J. Chandioux encountered limitations in his efforts to raise translation quality and lower computation time to the point he could run it on microcomputers.

In 1981, he decided to create a new SLLP, or metalanguage for linguistic applications, based on the same basic algorithmic ideas as the Q-systems, but more deterministic, and offering typed labels on tree nodes. Following the advices of Bernard Vauquois and Alain Colmerauer, he created GramR, and developed it for microcomputers.

In 1982, he could start developing in GramR a new system for translating the weather bulletins on a high-end Cromemco microcomputer. METEO 2 went into operation in 1983. The software then ran in 48Kb of central memory with a 5Mb hard disk for paging. METEO 2 is believed to have been the first MT application to run on a microcomputer.

In 1985, the system had nothing left of the initial prototype, and was officially renamed METEO. It translated about 20 M words per year from English into French, and 10 M words from French into English, with a quality of 97%. Typically, it took only 4 minutes for a bulletin in English to be sent from Winnipeg and come back in French after MT and human revision.

In 1996, John Chandioux developed a special version of his system (METEO 96) which was used to translate the weather forecasts (different kinds of bulletins) issued by the US Weather Service during the Atlanta Olympic Games.

The latest known version of the system, METEO 5, dates from 1997 and ran on a standard IBM PC network under Windows NT. It translated 10 pages per second, while occupying so little space that it fitted on a 1.44Mb diskette. It seems that Chandioux lost his contract with Environment Canada in 2001 or 2002 to a competitor. The hot legal debate which followed was once described in Wikipedia, but that article seems to have disappeared. No document could be found on Environment Canada website on which system is currently in use (in 2011).

References[edit]

  • Colmerauer, A: Les systèmes Q ou un formalisme pour analyser et synthétiser des phrases sur ordinateur. Mimeo, Montréal, 1969.
  • Nguyen, H-T: Des systèmes de TA homogènes aux systèmes de TAO hétérogènes. thèse UJF, Grenoble, 2009.

External links[edit]