Soft hyphen

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computing and typesetting, a soft hyphen (U+00AD soft hyphen, HTML: ­ ­), also called a discretionary hyphen or optional hyphen, is a kind of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed. The soft hyphen's semantics and HTML implementation are in many ways similar to the zero-width space.

To show the effect of a soft hyphen, the following words have been separated with soft hyphens:

Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­Antidisestablishmentarianism­

On browsers supporting soft hyphens, resizing the window will re-break the above text only at word boundaries, and insert a hyphen at the end of each line.

Contents

Handling[edit]

Additional semantics associated with the soft hyphen vary. According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.[1] HTML4 describes it as a "hyphenation hint", though it suggests that that interpretation is not universal:[2]

In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics. If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.

ISO 8859-1 specifies that it is always visible. EBCDIC has a SHY character, with "SHY" an abbreviation for "syllable hyphen",[1][3] which is defined by IBM to mean a "hyphen used to divide a word at the end of a line [that] may be removed when a program adjusts lines."[4]

In most parts of ISO-8859 the soft hyphen is at position 0xAD (hexadecimal), and since the first 256 positions in Unicode are taken from ISO-8859-1, it has a Unicode codepoint of U+00AD. HTML 3.2 included the SGML (ISO 8879-1986) character entity for the soft hyphen, "­", which was defined in SGML's "Numeric and Special Graphic" (isonum) character entity set. In troff, the soft hyphen is \%. In TeX and LaTeX, the soft hyphen is represented by the command \-.[5]

Security issues[edit]

Soft hyphens have been used to obscure malicious domains or URLs in e-mail spam.[6][7]

See also[edit]

References[edit]

  1. ^ a b Jukka Korpela (Revision as of January 2011). "Soft hyphen (SHY) – a hard problem?". Tampere University of Technology. Retrieved 2011-04-08. 
  2. ^ "9.3.3 Hyphenation". HTML 4.01 Specification. World Wide Web Consortium. 24 December 1999. Retrieved 2011-04-08. 
  3. ^ "Extended Binary-Coded Decimal Interchange Code - S/390". comsci.us. Retrieved 2011-04-08. 
  4. ^ "Glossary". IBM. Retrieved 2011-04-08. 
  5. ^ "Commonly Confused Characters". Greg Baker, Simon Fraser University. Retrieved 2011-07-12. 
  6. ^ "Spammers Using Soft Hyphen To Hide Malicious URLs". Slashdot. October 7, 2010. Retrieved 2011-04-08. 
  7. ^ "Soft Hyphen – A New URL Obfuscation Technique". Symantec. Retrieved 2011-04-08.