Caret notation

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Caret notation is a notation for control characters in ASCII. The notation assigns ^A to control-code 1, sequentially through the alphabet to ^Z assigned to control-code 26 (0x1A). For the control-codes outside of the range 1–26, the notation extends to the adjacent, non-alphabetic ASCII characters.

Often a control character can be typed on a keyboard by holding down the Ctrl and typing the character shown after the caret. The notation is often used to describe keyboard shortcuts even though the control character is not actually used (as in "type ^X to cut the text").

The meaning or interpretation of, or response to the individual control-codes is not prescribed by the caret notation.

Description[edit]

The notation consists of a caret (^) followed by a single character (usually a capital letter). The digraph stands for the control character whose ASCII code is the same as the character's ASCII code with the uppermost bit, in a 7-bit encoding, reversed. A useful mnemonic, this has the effect of rendering the C0 control character with code N, (where N is from 1 up to 26 = 0x1A) as the Nth capital letter of the alphabet, since capital letters are represented by the ASCII code range 65–90 (0x41–0x5A). Seven ASCII control characters map outside the upper-case alphabet: 0 (NUL) is ^@, 27 (ESC) is ^[, 28 is ^\, 29 is ^], 30 is ^^, 31 is ^_, and 127 (DEL) is ^?.

Examples are "^M^J" for the Windows CR,LF newline pair, and describing the ANSI escape sequence to clear the screen as "^[[3J".

Only the use of characters in the range of 63–95 ("?@ABC...XYZ[\]^_") is specifically allowed in the notation, but use of lower-case alphabetic characters entered at the keyboard is nearly always allowed – they are treated as equivalent to upper-case letters.

Reversing the uppermost of 7 bits is accomplished by a bit-wise exclusive or with 0x40 (64). This is identical to adding 64 modulus 128, or adding 64 and masking with 0x7F. This same operation is done both to convert from a control code to the character to print after the caret, and the reverse to convert a character to a control code. When converting to a control character, except for '?', masking with 0x1F will produce the same result and also turn lower-case into the same control character as upper-case.

There is no corresponding version of the caret notation for control-codes with more than 7 bits such as the C1 control characters from 128–159 (0x80–0x9F). Some programs that produce caret notation show these as backslash and octal ("\200" through "\237"). Also see the bar notation used by Acorn Computers, below.

Use in software[edit]

Many computer systems allow the user to enter a control character by holding down Ctrl and pressing the letter used in the caret notation. This is practical, because many control characters (e.g. EOT) cannot be entered directly from a keyboard. Although there are many ways to represent control characters, this correspondence between notation and typing makes the caret notation suitable for many applications.

Usually the need to hold down ⇧ Shift is avoided, for instance lower-case letters work just like upper-case ones. On a US keyboard layout ctrl+/ produces DEL and ctrl+2 produces ^@. It is also common for ctrl+space to produce ^@.

Caret notation is used to describe control characters in output by many programs, particularly Unix terminal drivers and text file viewers such as more and less commands. Although the use of control-codes is somewhat standard, some uses differ from operating system to operating system, or even from program to program. The actual meaning or interpretation of the individual control-codes is not prescribed by the caret notation, and although the ASCII specification does give names to the control-codes, it does not prescribe how software should respond to them.

Alternate notations [edit]

Acorn operating systems for the Atom, BBC Micro, Archimedes and later RISC OS machines use the vertical bar character | in place of the caret. E.g. |M (pronounced "control em", the same as for the ^M notation) is the carriage return character, ASCII 13. || is the vertical bar character code 124, |? is character 127 as above and |! adds 128 to the code of the character that follows it, so |!|? is character code 128 + 127 = 255.

See also[edit]