Jump to content

Alphabet (formal languages): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Tag: Reverted
details of set representation needn't explained here; by analogy, prime factorization doesn't elaborate on multiplication algorithms
Line 7: Line 7:


==Notation==
==Notation==
Alphabets are sets of symbols. As such, they may be described by [[set-builder notation]] as with other mathematical sets. This notation uses curly brackets to denote the set with the members specified explicitly with the brackets as a list, e.g., <math>\{0, 1, 2, 3, 4, 5, 6, 7, 8 ,9\}</math>, or specified implicitly as a variable followed by a vertical bar (or colon) and then conditions that variable satisfies. The set then is comprised of all members from the [[domain of discourse]] that satisfy those conditions. For example, the alphabet consisting of letter <math>v</math>'s sub-scripted by prime numbers could be denoted as <math>\{v_x | x \text{ is a prime number}\}</math>.

If ''L'' is a formal language, i.e. a (possibly infinite) set of finite-length strings, the '''alphabet of ''L''''' is the set of all symbols that may occur in any string in ''L''.
If ''L'' is a formal language, i.e. a (possibly infinite) set of finite-length strings, the '''alphabet of ''L''''' is the set of all symbols that may occur in any string in ''L''.
For example, if ''L'' is the set of all [[Identifier#In computer languages|variable identifiers]] in the programming language [[C (programming language)|C]], ''L''’s alphabet is the set { a, b, c, ..., x, y, z, A, B, C, ..., X, Y, Z, 0, 1, 2, ..., 7, 8, 9, _ }.
For example, if ''L'' is the set of all [[Identifier#In computer languages|variable identifiers]] in the programming language [[C (programming language)|C]], ''L''’s alphabet is the set { a, b, c, ..., x, y, z, A, B, C, ..., X, Y, Z, 0, 1, 2, ..., 7, 8, 9, _ }.

Revision as of 17:48, 27 March 2021

In formal language theory, an alphabet is a non-empty set of symbols/glyphs, typically thought of as representing letters, characters, or digits[1] but among other possibilities the "symbols" could also be a set of phonemes (sound units). Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and depending on its purpose maybe be finite (e.g., the alphabet of letters "a" through "z"), countable (e.g., ), or even uncountable (e.g., ).

Strings, also known as "words", over an alphabet are defined as a sequence of the symbols from the set. For example, the alphabet of lowercase letters "a" through "z" can be used to form English words like "iceberg" while the alphabet of both upper and lower case letters can also be used to form proper names like "Wikipedia". A common alphabet is {0,1}, the binary alphabet, and a "00101111" is an example of a binary string. Infinite sequence of symbols may be considered as well.

It is often necessary for practical purposes to restrict the symbols in an alphabet so that they are unambiguous when interpreted. For instance, if the two-member alphabet is {00,0}, a string written on paper as "000" is ambiguous because it is unclear if it is a sequence of three "0" symbols, a "00" followed by a "0", or a "0" followed by a "00".

Notation

If L is a formal language, i.e. a (possibly infinite) set of finite-length strings, the alphabet of L is the set of all symbols that may occur in any string in L. For example, if L is the set of all variable identifiers in the programming language C, L’s alphabet is the set { a, b, c, ..., x, y, z, A, B, C, ..., X, Y, Z, 0, 1, 2, ..., 7, 8, 9, _ }.

Given an alphabet , the set of all strings of length over the alphabet is indicated by . The set of all finite strings (regardless of their length) is indicated by the Kleene star operator as , and is also called the Kleene closure of . The notation indicates the set of all infinite sequences over the alphabet , and indicates the set of all finite or infinite sequences.

For example, using the binary alphabet {0,1}, the strings ε, 0, 1, 00, 01, 10, 11, 000, etc. are all in the Kleene closure of the alphabet (where ε represents the empty string).

Applications

Alphabets are important in the use of formal languages, automata and semiautomata. In most cases, for defining instances of automata, such as deterministic finite automata (DFAs), it is required to specify an alphabet from which the input strings for the automaton are built. In these applications, an alphabet is usually required to be a finite set, but is not otherwise restricted.

When using automata, regular expressions, or formal grammars as part of string-processing algorithms, the alphabet may be assumed to be the character set of the text to be processed by these algorithms, or a subset of allowable characters from the character set.

See also

References

  1. ^ Ebbinghaus, H.-D.; Flum, J.; Thomas, W. (1994). Mathematical Logic (2nd ed.). New York: Springer. p. 11. ISBN 0-387-94258-0. By an alphabet we mean a nonempty set of symbols.

Literature