Whitespace character

From Wikipedia, the free encyclopedia
  (Redirected from Whitespace (computer science))
Jump to: navigation, search

In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page. For example, the common whitespace symbol " " (Unicode code point U+0020, decimal 32) represents a blank space, used as a word divider in Western scripts.

The term "whitespace" is based on the assumption that the background color used for rendered text is white.

Contents

[edit] Definition and ambiguity

As is common in technical literature, the two words "white space" have found widespread usage as the single term "whitespace", especially when used as an adjective, as in "whitespace character". Some specifications refer to "white space" while others refer to "whitespace"; there is no difference between the terms, although exactly which characters are being referred to does vary from context to context. For example, the form feed character is "whitespace" in HTML, but is not "white space" in XML.

The most common whitespace characters may be typed via the space bar or the Tab key. Depending on context, a line-break generated by the Return key (Enter key) may be considered whitespace as well.

[edit] Unicode

In Unicode (Unicode Character Database) the following 26 characters are defined as whitespace character:

Whitespace[a] (Unicode character property WSpace=Y)
Code point Name Script General category Remark
&000009U+0009 Common Other, control HT, Horizontal Tab
&000010U+000A Common Other, control LF, Line feed
&000011U+000B Common Other, control VT, Vertical Tab
&000012U+000C Common Other, control FF, Form feed
&000013U+000D Common Other, control CR, Carriage return
&000032U+0020 space Common Separator, space
&000133U+0085 Common Other, control NEL, Next line
&000160U+00A0 no-break space Common Separator, space
&005760U+1680 ogham space mark Ogham Separator, space
&006158U+180E mongolian vowel separator Mongolian Separator, space
&008192U+2000 en quad Common Separator, space
&008193U+2001 em quad Common Separator, space
&008194U+2002 en space Common Separator, space
&008195U+2003 em space Common Separator, space
&008196U+2004 three-per-em space Common Separator, space
&008197U+2005 four-per-em space Common Separator, space
&008198U+2006 six-per-em space Common Separator, space
&008199U+2007 figure space Common Separator, space
&008200U+2008 punctuation space Common Separator, space
&008201U+2009 thin space Common Separator, space
&008202U+200A hair space Common Separator, space
&008232U+2028 line separator Common Separator, line
&008233U+2029 paragraph separator Common Separator, paragraph
&008239U+202F narrow no-break space Common Separator, space
&008287U+205F medium mathematical space Common Separator, space
&012288U+3000 ideographic space Common Separator, space
a. ^ Unicode 6.0, Chapter 4.6

Within the algorithm for Bidirectional writing, Unicode uses another definition of "Whitespace" (Bidirectional Character Type=WS). These Bidi-WS characters (18 out of the 26 listed in the table here) are "Neutral", they do not determine a writing direction, they just follow neighboring characters in this. The eight other characters listed here are also "Neutral", but have a different Bidi-type.

[edit] Usage

[edit] Programming Languages

Runs of whitespace (beyond a first whitespace character) occurring within source code written in computer programming languages are generally ignored; such languages are free-form. But, for example, in Haskell and Python, whitespace and indentation are used for syntactical purposes. And in Whitespace, whitespaces are the only valid characters for programming, while any other characters are ignored.

Still, for most programming languages, abundant use of whitespace, especially trailing whitespace at the end of lines, is considered a nuisance. However correct use of whitespace aids developers. It can make the code easier to read and help group related logic. In interpreted languages, parsing of unnecessary whitespace may affect the speed of execution. In markup languages like HTML, unnecessary whitespace increases the file size, and may so affect the speed of transfer over a network. On the other hand, unnecessary whitespace can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an infringement of license or copyright that was committed by copying and pasting.

The C language defines whitespace to be "... space, horizontal tab, new-line, vertical tab, and form-feed". The HTTP network protocol has very strict requirements about what type of whitespace can occur in the control structures (such as the header fields) and where it must and must not occur.

[edit] Literature

On some occasions, such as a textbook on the Modula-2 computer language published ca. 1985 by Springer-Verlag, it is necessary to explicitly show a symbol to indicate a space code. That book, at least, used the symbol ␣ (Unicode U+2423, decimal 9251, OPEN BOX) to show an explicit space code. (In case it doesn't render well in your web browser, it's much like a ] (a closing square bracket) although not as wide, rotated a quarter-turn clockwise and placed below the writing line. Some fonts render it too narrowly.)

The TI-8x series graphing calculators from Texas Instruments, at least the early models, use the same symbol to represent the space character in the keypad silkscreening, although on the calculators' display, this character appears as a blank space as on typical computer monitors.

[edit] File names

Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes—such file names instead use an underscore (_) as a word separator, as_in_this_phrase.

Another such symbol was U+2422 blank symbol. This was used in the early years of computer programming when writing on coding forms. Keypunch operators immediately recognized the symbol as an "explicit space".

[edit] See also

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages