Soft hyphen
In computing and typesetting, a soft hyphen (U+00AD SOFT HYPHEN, HTML: ­ ­), also called a discretionary hyphen or optional hyphen, is a kind of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed. The soft hyphen's semantics and HTML implementation are in many ways similar to the zero-width space.
To show the effect of a soft hyphen, the following words have been separated with soft hyphens:
MargaretAreYouGrievingOverGoldengroveUnleavingLeavesLikeTheThingsOfManYouWithYourFreshThoughtsCareForCanYouAhAsTheHeartGrowsOlderItWillComeToSuchSightsColderByAndByNorSpareASighThoughWorldsOfWanwoodLeafmealLieAndYetYouWillWeepAndKnowWhyNowNoMatterChildTheNameSorrowsSpringsAreTheSameNorMouthHadNoNorMindExpressedWhatHeartHeardOfGhostGuessedItIsTheBlightManWasBornForItIsMargaretYouMournFor
On browsers supporting soft hyphens, resizing the window will re-break the above text only at word boundaries, and insert a hyphen at the end of each line.
Handling
Additional semantics associated with the soft hyphen vary. According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.[1] HTML4 describes it as a "hyphenation hint", though it suggests that that interpretation is not universal:[2]
In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics. If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.
ISO 8859-1 specifies that it is always visible. EBCDIC has a SHY character, with "SHY" an abbreviation for "syllable hyphen",[1][3] which is defined by IBM to mean a "hyphen used to divide a word at the end of a line [that] may be removed when a program adjusts lines."[4]
In most parts of ISO-8859 the soft hyphen is at position 0xAD (hexadecimal), and since the first 256 positions in Unicode are taken from ISO-8859-1, it has a Unicode codepoint of U+00AD. HTML 3.2 included the SGML (ISO 8879-1986) character entity for the soft hyphen, "­", which was defined in SGML's "Numeric and Special Graphic" (isonum) character entity set. In troff, the soft hyphen is \%
. In TeX and LaTeX, the soft hyphen is represented by the command \-
.[5]
Security issues
Soft hyphens have been used to obscure malicious domains or URLs in e-mail spam.[6][7]
See also
References
- ^ a b Jukka Korpela (Revision as of January 2011). "Soft hyphen (SHY) – a hard problem?". Tampere University of Technology. Retrieved 2011-04-08.
{{cite web}}
: Check date values in:|date=
(help) - ^ "9.3.3 Hyphenation". HTML 4.01 Specification. World Wide Web Consortium. 24 December 1999. Retrieved 2011-04-08.
- ^ "Extended Binary-Coded Decimal Interchange Code - S/390". comsci.us. Retrieved 2011-04-08.
- ^ "Glossary". IBM. Retrieved 2011-04-08.
- ^ "Commonly Confused Characters". Greg Baker, Simon Fraser University. Retrieved 2011-07-12.
- ^ "Spammers Using Soft Hyphen To Hide Malicious URLs". Slashdot. October 7, 2010. Retrieved 2011-04-08.
- ^ "Soft Hyphen – A New URL Obfuscation Technique". Symantec. Retrieved 2011-04-08.