Undefined behavior

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computer programming, undefined behavior refers to computer code whose behavior is specified to be arbitrary. It is a feature of some programming languages—most famously C.[1] In these languages the semantics of certain operations are undefined, so an implementation can assume that such operations never occur in program code, since the implementation will be correct whatever it does in such cases analogously to don't-care terms in digital logic. This assumption can make various program transformations valid or simplify their proof of correctness giving flexibility to the implementation. It is the responsibility of the programmer to write code that never invokes undefined behaviour, but an implementation is allowed to print diagnostics when it happens.

For example, in C the use of any automatic variable before it has been initialized yields undefined behavior, as does division by zero or indexing an array outside of its defined bounds (see buffer overflow). In general, any behavior afterwards is also undefined. In particular, it is never required that the compiler diagnose undefined behavior — therefore, programs invoking undefined behavior may compile and run without apparent failures or fail in seemingly unrelated ways, or behave seemingly inconsistently with the source code.

Under some circumstances there can be specific restrictions on undefined behavior. For example, the instruction set specifications of a CPU might leave the behavior of some forms of an instruction undefined, but if the CPU supports memory protection then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in the operating system's security; so an actual CPU would be permitted to corrupt any or all user registers in response to such an instruction but would not be allowed to, for example, switch into supervisor mode.

In C and C++, implementation-defined behavior is also used, where the language standard does not specify the behavior, but the implementation must choose a behaviour and needs to document and observe the rules it chose. These standards also use unspecified behaviour to mean that from a given set of possibilities it is not specified which behaviour an implementation must choose, it need not document the choice or even be consistent, but it must choose one possibility.

Examples in C and C++[edit]

Attempting to modify a string literal causes undefined behavior:[2]

char *p = "wikipedia"; // ill-formed C++11, deprecated C++98/C++03
p[0] = 'W'; // undefined behavior

One way to prevent this is defining it as an array instead of a pointer.

char p[] = "wikipedia"; // RIGHT
p[0] = 'W';

In C++, one can use a standard string as follows:

std::string s = "wikipedia"; // RIGHT
s[0] = 'W';

Integer division by zero results in undefined behavior:[3]

int x = 1;
return x / 0; // undefined behavior

Certain pointer operations may result in undefined behavior:[4]

int arr[4] = {0, 1, 2, 3};
int *p = arr + 5;  // undefined behavior

Reaching the end of a value-returning function (other than main()) without a return statement may result in undefined behavior:

int f()
{
}  /* undefined behavior */

The original The C Programming Language book cites the following examples of code which “can (and does) produce different results on different machines”[5] (which could be considered just unspecified or implementation-defined behavior in today's terms):

printf("%d %d\n", ++n, power(2, n));    /* WRONG */
a[i] = i++;

The later ANSI C standard chose to leave similar constructions undefined, e.g. “This paragraph renders undefined statement expressions such as i = ++i + 1; while allowing i = i + 1;”.[6]

Risks of undefined behavior[edit]

HTML versions 4 and earlier left error handling undefined. Over time pages started relying on unspecified error-recovery implemented in popular browsers. This caused difficulties for vendors of less-popular browsers who were forced to reverse-engineer and implement bug compatible error recovery. This has led to de facto standard that was much more complicated than it could have been if this behavior was specified from the start.

Compiler easter eggs[edit]

In some languages (including C), even the compiler is not bound to behave in a sensible manner once undefined behavior has been invoked. One instance of undefined behavior acting as an Easter egg is the behavior of early versions of the GCC C compiler when given a program containing the #pragma directive, which has implementation-defined behavior according to the C standard. In practice, many C implementations recognize, for example, #pragma once as a rough equivalent of #include guards — but GCC 1.17, upon finding a #pragma directive, would instead attempt to launch commonly distributed Unix games such as NetHack and Rogue, or start Emacs running a simulation of the Towers of Hanoi.[7]

References[edit]

  1. ^ Lattner, Chris (May 13, 2011). "What Every C Programmer Should Know About Undefined Behavior". LLVM Project Blog. LLVM.org. Retrieved May 24, 2011. 
  2. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §2.13.4 String literals [lex.string] para. 2
  3. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.6 Multiplicative operators [expr.mul] para. 4
  4. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.7 Additive operators [expr.add] para. 5
  5. ^ Kernighan, Brian W.; Ritchie, Dennis M. (February 1978). The C Programming Language (1st ed.). Englewood Cliffs, NJ: Prentice Hall. p. 50. ISBN 0-13-110163-3. 
  6. ^ ANSI X3.159-1989 Programming Language C, footnote 26
  7. ^ "A Pragmatic Decision" quotes the March 1988 issue of UNIX Review magazine, which referred to GCC version 1.17 but got the order wrong. "Everything2: #pragma" gives the correct order. The actual code is in file "cccp.c" in the GCC 1.21 distribution (although commented out): http://www.oldlinux.org/Linux.old/gnu/gcc-1/

External links[edit]