Soft hyphen

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Mykhal (talk | contribs) at 13:08, 27 January 2014 (better example). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computing and typesetting, a soft hyphen (U+00AD SOFT HYPHEN, HTML: ­ ­), also called a discretionary hyphen or optional hyphen, is a kind of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed. The soft hyphen's semantics and HTML implementation are in many ways similar to the zero-width space.

To show the effect of a soft hyphen, the following words have been separated with soft hyphens:

Margaret­Are­You­Grieving­Over­Goldengrove­Unleaving­Leaves­Like­The­Things­Of­Man­You­With­Your­Fresh­Thoughts­Care­For­Can­You­Ah­As­The­Heart­Grows­Older­It­Will­Come­To­Such­Sights­Colder­By­And­By­Nor­Spare­A­Sigh­Though­Worlds­Of­Wanwood­Leafmeal­Lie­And­Yet­You­Will­Weep­And­Know­Why­Now­No­Matter­Child­The­Name­Sorrows­Springs­Are­The­Same­Nor­Mouth­Had­No­Nor­Mind­Expressed­What­Heart­Heard­Of­Ghost­Guessed­It­Is­The­Blight­Man­Was­Born­For­It­Is­Margaret­You­Mourn­For

On browsers supporting soft hyphens, resizing the window will re-break the above text only at word boundaries, and insert a hyphen at the end of each line.

Handling

Additional semantics associated with the soft hyphen vary. According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.[1] HTML4 describes it as a "hyphenation hint", though it suggests that that interpretation is not universal:[2]

In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics. If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.

ISO 8859-1 specifies that it is always visible. EBCDIC has a SHY character, with "SHY" an abbreviation for "syllable hyphen",[1][3] which is defined by IBM to mean a "hyphen used to divide a word at the end of a line [that] may be removed when a program adjusts lines."[4]

In most parts of ISO-8859 the soft hyphen is at position 0xAD (hexadecimal), and since the first 256 positions in Unicode are taken from ISO-8859-1, it has a Unicode codepoint of U+00AD. HTML 3.2 included the SGML (ISO 8879-1986) character entity for the soft hyphen, "­", which was defined in SGML's "Numeric and Special Graphic" (isonum) character entity set. In troff, the soft hyphen is \%. In TeX and LaTeX, the soft hyphen is represented by the command \-.[5]

Security issues

Soft hyphens have been used to obscure malicious domains or URLs in e-mail spam.[6][7]

See also

References

  1. ^ a b Jukka Korpela (Revision as of January 2011). "Soft hyphen (SHY) – a hard problem?". Tampere University of Technology. Retrieved 2011-04-08. {{cite web}}: Check date values in: |date= (help)
  2. ^ "9.3.3 Hyphenation". HTML 4.01 Specification. World Wide Web Consortium. 24 December 1999. Retrieved 2011-04-08.
  3. ^ "Extended Binary-Coded Decimal Interchange Code - S/390". comsci.us. Retrieved 2011-04-08.
  4. ^ "Glossary". IBM. Retrieved 2011-04-08.
  5. ^ "Commonly Confused Characters". Greg Baker, Simon Fraser University. Retrieved 2011-07-12.
  6. ^ "Spammers Using Soft Hyphen To Hide Malicious URLs". Slashdot. October 7, 2010. Retrieved 2011-04-08.
  7. ^ "Soft Hyphen – A New URL Obfuscation Technique". Symantec. Retrieved 2011-04-08.