Reserved word

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a reserved word may have no meaning.

A closely related and often conflated notion is a keyword which is a word with special meaning in a particular context. This is a semantic definition. By contrast, names in a standard library but not built into the language are not considered reserved words or keywords. The terms "reserved word" and "keyword" are often used interchangeably – one may say that a reserved word is "reserved for use as a keyword" – and formal use varies from language to language; for this article we distinguish as above.

In general reserved words and keywords need not coincide, but in most modern languages keywords are a subset of reserved words, as this makes parsing easier, since keywords cannot be confused with identifiers. In some languages, like C or Python, reserved words and keywords coincide, while in other languages, like Java, all keywords are reserved words, but some reserved words are not keywords – these are "reserved for future use". In yet other languages, such as ALGOL and PL/I there are keywords but no reserved words, with keywords being distinguished from identifiers by other means.

Distinction[edit]

The sets of reserved words and keywords in a language often coincide or are almost equal, and the distinction is subtle, so the terms are often used interchangeably. However, in careful usage they are distinguished.

Making keywords be reserved words makes lexing easier, as a string of characters will unambiguously be either a keyword or an identifier, without depending on context; thus keywords are usually a subset of reserved words. However, reserved words need not be keywords – for example, in Java, goto is a reserved word, but has no meaning and does not appear in any production rules in the grammar. This is usually done for forward compatibility, so a reserved word may become a keyword in a future version without breaking existing programs.

Conversely, keywords need not be reserved words, with their role understood from context, or they may be distinguished in another manner, such as by stropping. For example, the phrase if = 1 is unambiguous in most grammars, since a control statement of an if clause cannot start with an =, and thus is allowed in some languages, such as FORTRAN. Alternatively, in ALGOL 68, keywords must be stropped – marked in some way to distinguished – in the strict language by listing in bold, and thus are not reserved words. Thus in the strict language the following expression is legal, as the bold keyword if does not conflict with the ordinary identifier if:

if if eq 0 then 1 fi

However, in ALGOL 68 there is also a stropping regime in which keywords are reserved words, an example of how these distinct concepts often coincide; this is followed in many modern languages.

Syntax[edit]

A reserved word is one that "looks like" a normal word, but is not allowed to be used as a normal word. Formally this means that it satisfies the usual lexical syntax (syntax of words) of identifiers – for example, being a sequence of letters – but cannot be used where identifiers are used. For example, the word if is commonly a reserved word, while x generally is not, so x = 1 is a valid assignment, but if = 1 is not.

Keywords have varied uses, but primarily fall into a few classes: part of the phrase grammar (specifically a production rule with nonterminal symbols), with various meanings, often being used for control flow, such as the word if in most procedural languages, which indicates a conditional and takes clauses (the nonterminal symbols); names of primitive types in a language that support a type system, such as int; primitive literal values such as true for Boolean true; or sometimes special commands like exit. Other uses of keywords in phrases are for input/output, such as print.

The distinct definitions are clear when a language is analyzed by a combination of a lexer and a parser, and the syntax of the language is generated by a lexical grammar for the words, and a context-free grammar of production rules for the phrases. This is common in analyzing modern languages, and in this case keywords are a subset of reserved words, as they must be distinguished from identifiers at the word level (hence reserved words) to be syntactically analyzed differently at the phrase level (as keywords).

In this case reserved words are defined as part of the lexical grammar, and are each tokenized as a separate type, distinct from identifiers. In conventional notation, the reserved words if and then for example are tokenized as types IF and THEN, respectively, while x and y are both tokenized as type Identifier.

Keywords, by contrast, syntactically appear in the phrase grammar, as terminal symbols. For example, the production rule for a conditional expression may be IF Expression THEN Expression. In this case IF and THEN are terminal symbols, meaning "a token of type IF or THEN, respectively" – and due to the lexical grammar, this means the string if or then in the original source. As an example of a primitive constant value, true may be a keyword representing the boolean value "true", in which case it should appear in the grammar as a possible expansion of the production BinaryExpression, for instance.

Specification[edit]

The list of reserved words and keywords in a language are defined when a language is developed, and both form part of a language's formal specification. Generally one wishes to minimize the number of reserved words, to avoid restricting valid identifier names. Further, introducing new reserved words breaks existing programs that use that word (it is not backwards compatible), so this is avoided. To prevent this and provide forward compatibility, sometimes words are reserved without having a current use (a reserved word that is not a keyword), as this allows the word to be used in future without breaking existing programs. Alternatively, new language features can be implemented as predefineds, which can be overridden, thus not breaking existing programs.

Occasionally, depending on the flexibility of the language specification, vendors implementing a compiler may extend the specification by including non-standard features. Also, as a language matures, standards bodies governing a language may choose to extend the language to include additional features such as object-oriented capabilities in a traditionally procedural language. Sometimes the specification for a programming language will have reserved words that are intended for possible use in future versions. In Java, const and goto are reserved words — they have no meaning in Java but they also cannot be used as identifiers. By "reserving" the terms, they can be implemented in future versions of Java, if desired, without "breaking" older Java source code.

Predefined names[edit]

A related notion to reserved words are predefined functions, methods, subroutines, or variables, particularly library routines from the standard library. These are similar in that they are part of the basic language, and may be used for similar purposes. However, these differ in that the name of a predefined function, method, or subroutine is typically categorized as an identifier instead of a reserved word, and is not treated specially in the syntactic analysis. Further, reserved words may not be redefined by the programmer, but predefineds can often be overridden in some capacity.

Languages vary as to what is provided as a keyword and what is a predefined. Some languages, for instance, provide keywords for input/output operations whereas in others these are library routines. In Python (versions earlier than 3.0) and many BASIC dialects, print is a keyword. In contrast, the C, Lisp, and Python 3.0 equivalents printf, format, and print are functions in the standard library. Similarly, in Python prior to 3.0, None, True, and False were predefined variables, but not reserved words, but in Python 3.0 they were made into reserved words.[1]

Definition[edit]

Some use the terms "keyword" and "reserved word" interchangeably, while others distinguish usage, say by using "keyword" to mean a word that is special only in certain contexts but "reserved word" to mean a special word that cannot be used as a user-defined name. The meaning of keywords — and, indeed, the meaning of the notion of keyword — differs widely from language to language. Concretely, in ALGOL 68, keywords are stropped (in the strict language, written in bold) and are not reserved words – the unstropped word can be used as an ordinary identifier.

The "Java Language Specification" uses the term "keyword".[2] The ISO 9899 standard for the C programming language uses the term "keyword".[3]

In many languages, such as C and similar environments like C++, a keyword is a reserved word which identifies a syntactic form. Words used in control flow constructs, such as if, then, and else are keywords. In these languages, keywords cannot also be used as the names of variables or functions.

In some languages, such as ALGOL and Algol 68, keywords cannot be written verbatim, but must be stropped. This means that keywords must be marked somehow. E.g. by quoting them or by prefixing them by a special character. As a consequence, keywords are not reserved words, and thus the same word can be used for as a normal identifier. However, one stropping regime was to not strope the keywords, and instead have them simply be reserved words.

Some languages, such as PostScript, are extremely liberal in this approach, allowing core keywords to be redefined for specific purposes.

In Common Lisp, the term "keyword" (or "keyword symbol") is used for a special sort of symbol, or identifier. Unlike other symbols, which usually stand for variables or functions, keywords are self-quoting and self-evaluating[4]:98 and are interned in the KEYWORD package.[5] Keywords are usually used to label named arguments to functions, and to represent symbolic values. The symbols which name operators and functions in Lisp are not reserved words. For instance the expression (if if case or) is possible. The leftmost if refers to the if operator; the remaining symbols are interpreted as variable names. Since there is a separate namespace for functions and variables, if could be a variable. In Common Lisp, however, there are two special symbols which are not in the keyword package: the symbols t and nil. When evaluated as expressions, they evaluate to themselves. They cannot be used as the names of functions or variables, so are de facto reserved. (let ((t 42))) is a well-formed expression, but the let operator will not permit the usage.

Typically, when a programmer attempts to use a keyword for a variable or function name, a compilation error will be triggered. In most modern editors, the keywords are automatically set to have a particular text colour to remind or inform the programmers that they are keywords.

In languages with macros or lazy evaluation, control flow constructs such as if can be implemented as macros or functions. In languages without these expressive features, they are generally keywords.

Comparison by language[edit]

Not all languages have the same numbers of reserved words. For example, Java (and other C derivatives) has a rather sparse complement of reserved words—approximately 50 – whereas COBOL has approximately 400. At the other end of the spectrum, pure Prolog and PL/I have none at all.

The number of reserved words in a language has little to do with how “powerful” a language is. COBOL was designed in the 1950s as a business language and was made to be self-documenting using English-like structural elements such as verbs, clauses, sentences, sections and divisions. C, on the other hand, was written to be very terse (syntactically) and to get more text on the screen. For example, compare the equivalent blocks of code from C and COBOL to calculate weekly earnings:

// Calculation in C:
 
if (salaried)
        amount = 40 * payrate;
else
        amount = hours * payrate;
* Calculation in COBOL:
 
IF Salaried THEN
        MULTIPLY Payrate BY 40 GIVING Amount
ELSE
        MULTIPLY Payrate BY Hours GIVING Amount
END-IF.
* Other example of calculation in COBOL:
 
IF Salaried 
        COMPUTE Amount = Payrate * 40
ELSE
        COMPUTE Amount = hours * payrate
END-IF.

Pure Prolog logic is expressed in terms of relations, and execution is triggered by running queries over these relations. Constructs such as loops are implemented using recursive relationships.

All three of these languages can solve the same types of “problems” even though they have differing numbers of reserved words. This “power” relates to their belonging to the set of Turing-complete languages.

Disadvantages[edit]

Definition of reserved words in a language raises problems. The language may be difficult for new users to learn because of a long list of reserved words to memorize which can't be used as identifiers. It may be difficult to extend the language because addition of reserved words for new features might invalidate existing programs or, conversely, "overloading" of existing reserved words with new meanings can be confusing. Porting programs can be problematic because a word not reserved by one system/compiler might be reserved by another.

Reserved words and language independence[edit]

Microsoft’s .NET Common Language Infrastructure (CLI) specification allows code written in 40+ different programming languages to be combined together into a final product. Because of this, identifier/reserved word collisions can occur when code implemented in one language tries to execute code written in another language. For example, a Visual Basic.NET library may contain a class definition such as:

' Class Definition of This in Visual Basic.NET:
 
Public Class this
        ' This class does something...
End Class

If this is compiled and distributed as part of a toolbox, a C# programmer, wishing to define a variable of type “this” would encounter a problem: 'this' is a reserved word in C#. Thus, the following will not compile in C#:

// Using This Class in C#:
 
this x = new this();  // Won't compile!

A similar issue arises when accessing members, overriding virtual methods, and identifying namespaces.

This is resolved by stropping. In order to work around this issue, the specification allows the programmer to (in C#) place the at-sign before the identifier which forces it to be considered an identifier rather than a reserved word by the compiler:

// Using This Class in C#:
 
@this x = new @this();  // Will compile!

For consistency, this usage is also permitted in non-public settings such as local variables, parameter names, and private members.

See also[edit]

References[edit]

  1. ^ "The story of None, True and False (and an explanation of literals, keywords and builtins thrown in)", The History of Python, November 10, 2013, Guido van Rossum
  2. ^ "The Java Language Specification, 3rd Edition, Section 3.9: Keywords". Sun Microsystems. 2000. Retrieved 2009-06-17. "The following character sequences, formed from ASCII letters, are reserved for use as keywords and cannot be used as identifiers[...]" 
  3. ^ "ISO/IEC 9899:TC3, Section 6.4.1: Keywords". International Organization for Standardization JTC1/SC22/WG14. 2007-09-07. "The above tokens (case sensitive) are reserved (in translation phases 7 and 8) for use as keywords, and shall not be used otherwise." 
  4. ^ Peter Norvig: Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp, Morgan Kaufmann, 1991, ISBN 1-55860-191-0, Web
  5. ^ Type KEYWORD from the Common Lisp HyperSpec

External links[edit]