Whitespace character

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computer science, whitespace is any character or series of whitespace characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020   space (HTML:  ), also ASCII 32, represents a blank space punctuation character in text, used as a word divider in Western scripts.

With many keyboard layouts, a horizontal whitespace character may be entered through the use of a spacebar. Horizontal whitespace may also be entered on many keyboards through the use of the Tab key, although the length of the space may vary. Vertical whitespace is a bit more varied as to how it is encoded, but the most obvious in typing is the Enter result which creates a 'newline' code sequence in applications programs. Older keyboards might instead say Return, abbreviating the typewriter keyboard meaning 'Carriage-Return' which generated an electromechanical return to the left stop (CR code in ASCII-hex &0D;) and a line feed or move to the next line (LF code in ASCII-hex &0A;); in some applications these were independently used to draw text cell based displays on monitors or for printing on tractor-guided printers—which might also contain reverse motions/positioning code sequences allowing yesterdays text base fancier displays. Many early computer games used such codes to draw a screen.

The term "whitespace" is based on the resulting appearance on ordinary paper. However they are coded inside an applications, whitespace can be processed the same as any other character code and programs can do the proper action as defined for the context in which they occur.

Definition and ambiguity[edit]

The most common whitespace characters may be typed via the space bar or the tab key. Depending on context, a line-break generated by the return or enter key may be considered white space as well.


In Unicode (Unicode Character Database) the following 25 characters are defined as whitespace characters:

Whitespace[a] (Unicode character property WSpace=Y)
Code point Name Script General category Remark
000009U+0009 Common Other, control HT, Horizontal Tab
000010U+000A Common Other, control LF, Line feed
000011U+000B Common Other, control VT, Vertical Tab
000012U+000C Common Other, control FF, Form feed
000013U+000D Common Other, control CR, Carriage return
000032U+0020 space Common Separator, space
000133U+0085 Common Other, control NEL, Next line
000160U+00A0 no-break space Common Separator, space
005760U+1680 ogham space mark Ogham Separator, space
008192U+2000 en quad Common Separator, space
008193U+2001 em quad Common Separator, space
008194U+2002 en space Common Separator, space
008195U+2003 em space Common Separator, space
008196U+2004 three-per-em space Common Separator, space
008197U+2005 four-per-em space Common Separator, space
008198U+2006 six-per-em space Common Separator, space
008199U+2007 figure space Common Separator, space
008200U+2008 punctuation space Common Separator, space
008201U+2009 thin space Common Separator, space
008202U+200A hair space Common Separator, space
008232U+2028 line separator Common Separator, line
008233U+2029 paragraph separator Common Separator, paragraph
008239U+202F narrow no-break space Common Separator, space
008287U+205F medium mathematical space Common Separator, space
012288U+3000 ideographic space Common Separator, space
a. ^ Unicode 6.3 property list

Within the algorithm for bidirectional writing, Unicode uses another definition of "whitespace" (Bidirectional Character Type=WS). These Bidi-WS characters (18 out of the 25 listed in the table here) are "neutral": they follow the writing direction of neighboring characters rather than determining their own. The eight other characters listed here are also "neutral", but have a different bidi-type.


Computer languages[edit]

Runs of whitespace characters (beyond the first) occurring within source code written in computer programming languages are generally ignored; such languages are free-form. However, in some languages, such as Haskell and Python, white space and indentation are used for syntactical purposes. In the language called Whitespace, whitespace characters are the only valid characters for programming, while any other characters are ignored.

Still, for most programming languages, excessive use of white space, especially trailing white space at the end of lines, is considered a nuisance.[by whom?] However correct use of white space can make the code easier to read and help group related logic. In interpreted languages, parsing of unnecessary white space may affect the speed of execution. In markup languages like HTML, unnecessary white space increases the file size, and may so affect the speed of transfer over a network. On the other hand, unnecessary white space can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an infringement of license or copyright that was committed by copying and pasting.

The C language defines whitespace characters to be "... space, horizontal tab, new-line, vertical tab, and form-feed".[1] The HTTP network protocol requires different types of white space to be used in different parts of the protocol, such as: only the space character in the status line, CRLF at the end of a line, and "linear white space" in header values.[2]

Visible symbol[edit]

Sometimes the visible symbol ␣ (Unicode U+2423, decimal 9251, open box) is used to indicate a space. This symbol is used in a textbook on the Modula-2 computer language published ca. 1985 by Springer-Verlag, where it is necessary to explicitly indicate a space code. The symbol is also used in the keypad silkscreening of TI-8x series graphing calculators from Texas Instruments.[3]

File names[edit]

Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes—such file names instead use an underscore (_) as a word separator, as_in_this_phrase.

Another such symbol was U+2422 blank symbol. This was used in the early years of computer programming when writing on coding forms. Keypunch operators immediately recognized the symbol as an "explicit space".[citation needed]

See also[edit]


  1. ^ http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf Section 6.4, paragraph 3
  2. ^ R. Fielding et al., "2.2 Basic Rules", Hypertext Transfer Protocol—HTTP/1.1, RFC 2616 
  3. ^ Above the zero "0" or negative "(‒)" key

External links[edit]