Jump to content

Null character: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
remove cleanup tag
Removed citation needed on 'operations which modify the string's length do not need to update the stored length.' because it is an undeniable fact.
Line 1: Line 1:

The '''null character''' (also '''null terminator''') is a [[character (computing)|character]] with the value zero, present in the [[ASCII]] and [[Unicode]] character sets, and available in nearly all mainstream [[programming language]]s. The original meaning of this character was like [[NOP]] — when sent to a [[computer printer|printer]] or a [[computer terminal|terminal]], it does nothing (some terminals, however, incorrectly display it as [[space (punctuation)|space]]). On [[punched tape]]s, this character is represented with no holes at all, so a new unpunched tape is initially filled with null characters.
The '''null character''' (also '''null terminator''') is a [[character (computing)|character]] with the value zero, present in the [[ASCII]] and [[Unicode]] character sets, and available in nearly all mainstream [[programming language]]s. The original meaning of this character was like [[NOP]] — when sent to a [[computer printer|printer]] or a [[computer terminal|terminal]], it does nothing (some terminals, however, incorrectly display it as [[space (punctuation)|space]]). On [[punched tape]]s, this character is represented with no holes at all, so a new unpunched tape is initially filled with null characters.


Line 7: Line 6:
The character has special significance in [[C (programming language)|C]] and its derivatives, where it serves as a reserved character used to signify the end of [[character string (computer science)|strings]]. The null character is often represented as the [[escape sequence]] ''''<code>\0</code>'''' in [[source code]]. Strings ending in a null character are said to be '''''null-terminated'''''.
The character has special significance in [[C (programming language)|C]] and its derivatives, where it serves as a reserved character used to signify the end of [[character string (computer science)|strings]]. The null character is often represented as the [[escape sequence]] ''''<code>\0</code>'''' in [[source code]]. Strings ending in a null character are said to be '''''null-terminated'''''.


This differs from certain other languages (such as [[Pascal programming language|Pascal]]) which traditionally store a string as an array preceded by a string length. The main advantage of using a null character is that strings can be of any length, and only one character of additional storage is required. Null-terminated strings can also have efficiency benefits, since operations that traverse a string don't need to keep track of how many characters have been seen, and operations which modify the string's length do not need to update the stored length. {{Fact|date=January 2008}} Cache performance can also be better. {{Fact|date=January 2008}}
This differs from certain other languages (such as [[Pascal programming language|Pascal]]) which traditionally store a string as an array preceded by a string length. The main advantage of using a null character is that strings can be of any length, and only one character of additional storage is required. Null-terminated strings can also have efficiency benefits, since operations that traverse a string don't need to keep track of how many characters have been seen, and operations which modify the string's length do not need to update the stored length. Cache performance can also be better. {{Fact|date=January 2008}}


Conversely, the advantage of storing the string's length is that it is always immediately available in constant time; a program using null-terminated strings must count every character in a string to find the string's length, which requires linear or [[Big O notation|O]](''n'') time. Also, storing the length allows strings to contain null characters, which can simplify data processing by eliminating exceptions. In null-terminated strings, the first occurring null character is interpreted as the end of the string.
Conversely, the advantage of storing the string's length is that it is always immediately available in constant time; a program using null-terminated strings must count every character in a string to find the string's length, which requires linear or [[Big O notation|O]](''n'') time. Also, storing the length allows strings to contain null characters, which can simplify data processing by eliminating exceptions. In null-terminated strings, the first occurring null character is interpreted as the end of the string.

Revision as of 15:51, 22 April 2009

The null character (also null terminator) is a character with the value zero, present in the ASCII and Unicode character sets, and available in nearly all mainstream programming languages. The original meaning of this character was like NOP — when sent to a printer or a terminal, it does nothing (some terminals, however, incorrectly display it as space). On punched tapes, this character is represented with no holes at all, so a new unpunched tape is initially filled with null characters.

On many computer and data terminal keyboards, it was possible to type a null character by holding down the Control key and pressing "@" (which usually required also holding Shift and pressing another key such as "2" or "P"). Consequently, in some contexts, the null character is represented visually as "^@". In other contexts, it is represented as a subscript, single-em-width "NUL". In Unicode, there is a character with a corresponding glyph for visual representation of the null character, "symbol for null", U+2400 (␀) — not to be confused with the actual null character, U+0000.

Use as string terminator

The character has special significance in C and its derivatives, where it serves as a reserved character used to signify the end of strings. The null character is often represented as the escape sequence '\0' in source code. Strings ending in a null character are said to be null-terminated.

This differs from certain other languages (such as Pascal) which traditionally store a string as an array preceded by a string length. The main advantage of using a null character is that strings can be of any length, and only one character of additional storage is required. Null-terminated strings can also have efficiency benefits, since operations that traverse a string don't need to keep track of how many characters have been seen, and operations which modify the string's length do not need to update the stored length. Cache performance can also be better. [citation needed]

Conversely, the advantage of storing the string's length is that it is always immediately available in constant time; a program using null-terminated strings must count every character in a string to find the string's length, which requires linear or O(n) time. Also, storing the length allows strings to contain null characters, which can simplify data processing by eliminating exceptions. In null-terminated strings, the first occurring null character is interpreted as the end of the string.

However, the data type used to store the length of a string is also important. If the length is stored as a byte, as in some early implementations of Pascal, strings are limited to a maximum of 255 characters. On the other hand, the use of a larger data type will increase the overhead beyond that of a null-terminated string. In the 1970s, when C was designed, space considerations were much more important than they are at present, which greatly influenced the choice for null-terminated strings.

A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string literal.
- ANSI/ISO 9899:1990 (the ANSI C standard), section 5.2.1
A string is a contiguous sequence of characters terminated by and including the first null character.
- ANSI/ISO 9899:1990 (the ANSI C standard), section 7.1.1
A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character).
- ISO/IEC 14882 (the ISO C++ standard), section 17.3.2.1.3.1

Security exploit: Poison null byte

"Poison null byte" was originally used by Olaf Kirch in a Bugtraq post in October 1998. It was further explored in Phrack Issue 55, article 7.

The "poison null byte" exploit takes advantage of how strings with a known length can contain null bytes and what happens when that string is converted for use with an API that uses null terminated strings. The end result is that by carefully placing a null byte in the string, the attacker is able to force the string to end at that point, even after the application has appended more characters to the string, like for example, a filename extension. Some examples of poison null byte usages includes:

  • Terminating a file name string, such as removing a mandatory file extension.
  • Terminating/commenting a SQL statement when executing code dynamically, such as Oracle EXECUTE IMMEDIATE.

Typically, the "poison null byte" is exploited along with another type of exploit such as directory traversal or SQL injection; poison null byte is often used to simplify or enhance other attacks.

See also