Undefined behavior

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computer programming, undefined behavior refers to computer code whose behavior is unpredictable. It is a feature of some programming languages—most famously C.[1] In these languages, to simplify the specification and allow some flexibility in implementation, the specification leaves the results of certain operations specifically undefined, meaning that the programmer can't predict what will happen.

For example, in C the use of any automatic variable before it has been initialized yields undefined behavior, as would division by zero or indexing an array outside of its defined bounds (see buffer overflow). This specifically frees the compiler to do whatever is easiest or most efficient, should such a program be submitted. In general, any behavior afterwards is also undefined. In particular, it is never required that the compiler diagnose undefined behavior — therefore, programs invoking undefined behavior may appear to compile and even run without errors at first, only to fail on another system, or even on another date. When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all. In particular, anything may include apparently-impossible behavior because the compiler has made assumptions that lead to erroneous code generation that does not match the source code.

Under some circumstances there can be specific restrictions on undefined behavior. For example, the instruction set specifications of a CPU might leave the behavior of some forms of an instruction undefined, but if the CPU supports memory protection then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in the operating system's security; so an actual CPU would be permitted to corrupt any or all user registers in response to such an instruction but would not be allowed to, for example, switch into supervisor mode.

In C and C++, implementation-defined behavior is also defined which requires the implementation to document what it does, thus more restrictive than undefined behavior.

Contents

Examples in C and C++ [edit]

Attempting to modify a string literal causes undefined behavior:[2]

char * p = "wikipedia"; // ill-formed C++11, deprecated C++98/C++03
p[0] = 'W'; // undefined behaviour

One way to prevent this is defining it as an array instead of a pointer.

char p[] = "wikipedia"; /* RIGHT */
p[0] = 'W';

In C++ one can use STL string as follows.

std::string s = "wikipedia"; /* RIGHT */
s[0] = 'W';

Division by zero results in undefined behavior; under IEEE 754, the division of float, double, and long double values by zero results in an infinity or NaN:[3]

return x/0; // undefined behavior

Certain pointer operations may result in undefined behavior:[4]

int arr[4] = {0, 1, 2, 3};
int* p = arr + 5;  // undefined behavior

Reaching the end of a value-returning function (other than main()) without a return statement may result in undefined behavior:

int f()
{
}  /* undefined behavior */

The C Programming Language cites the following examples of code that have undefined behavior in Section 2.12:

The order in which function arguments are evaluated is not specified, so the statement

printf("%d %d\n", ++n, power(2, n));    /* WRONG */

can produce different results with different compilers and hence result in undefined behavior.

The following statement is ambiguous as it is not clear whether the array index is the old value of i or the new.

a[i] = i++;

Different compilers can interpret this in different ways. This will too result in undefined behavior as the result of the code will not be uniform.

Risks of undefined behavior [edit]

HTML versions 4 and earlier left error handling undefined. Over time pages started relying on unspecified error-recovery implemented in popular browsers. This caused difficulties for vendors of less-popular browsers who were forced to reverse-engineer and implement bug compatible error recovery. This has led to de facto standard that was much more complicated than it could have been if this behavior was specified from the start.

Compiler easter eggs [edit]

In some languages (including C), even the compiler is not bound to behave in a sensible manner once undefined behavior has been invoked. One instance of undefined behavior acting as an Easter egg is the behavior of early versions of the GCC C compiler when given a program containing the #pragma directive, which has implementation-defined behavior according to the C standard. In practice, many C implementations recognize, for example, #pragma once as a rough equivalent of #include guards — but GCC 1.17, upon finding a #pragma directive, would instead attempt to launch commonly distributed Unix games such as NetHack and Rogue, or start Emacs running a simulation of the Towers of Hanoi.[5]

References [edit]

  1. ^ Lattner, Chris (May 13, 2011). "What Every C Programmer Should Know About Undefined Behavior". LLVM Project Blog. LLVM.org. Retrieved May 24, 2011. 
  2. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §2.13.4 String literals [lex.string] para. 2
  3. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.6 Multiplicative operators [expr.mul] para. 4
  4. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.7 Additive operators [expr.add] para. 5
  5. ^ "A Pragmatic Decision" quotes the March 1988 issue of UNIX Review magazine, which referred to GCC version 1.17 but got the order wrong. "Everything2: #pragma" gives the correct order. The actual code is in file "cccp.c" in the GCC 1.21 distribution (although commented out): http://www.oldlinux.org/Linux.old/gnu/gcc-1/

External links [edit]