Wikipedia:Turkish characters

This page is deprecated but will be updated periodically.
Please direct edits to the Meta-Wikimedia version of this page

The Turkish alphabet contains six letters not present in standard ISO 8859-1, and that will therefore present problems for Wikipedia use: Dotted uppercase "I", dotless lowercase "i", upper- and lowercase "G" with breve accent, and upper- and lowercase "S" with cedilla.

Turkish computers may use character set ISO 8859-9 ("Latin 5"), which is identical to Latin 1 except that the rarely-used Icelandic characters "eth", "thorn", and "y with acute accent" are replaced with the needed Turkish characters. If you are reading Turkish text in Wikipedia, when you see these Icelandic characters they are probably meant to be the Turkish ones (users with Turkish computers may or may not see them properly). If you are entering Turkish text into Wikipedia, be aware that the Bomis server identifies web pages as ISO 8859-1, and there is no way to override this, so even if these characters appear correct to you, they are not properly encoded from the point of view of a non-Turkish Wiki reader.

You can check what your browser displays for these six positions in the standard ISO set here: Ð Ý Þ ð ý þ should appear as Eth, Y-acute, Thorn, eth, y-acute, thorn. If your system is using the Turkish 8859-9 set and ignoring the server instructions, these will appear as G-breve, I-dot, S-cedilla, g-breve, dotless-i, s-cedilla. Ironically, if these characters do appear to you as Turkish letters, it means they are incorrectly encoded.

Further, HTML 4.0 does not specify named character entities for these characters, so they cannot be entered into Wikipedia text in any portable way except 16-bit Unicode numeric character entity references, which only work in recent browsers, despite being the official standard. Consider anglicizing these characters. If you really must enter Turkish words with these characters into Wikipedia, to maintain data integrity at the expense of readability, use the following codes (the characters themselves appear after each code so you can see what your browser does with them):

HTML code character Description Anglicized
Ğ Ğ Uppercase "G" with breve accent gh1
İ İ Uppercase dotted "I"² i (as in "tree")
Ş Ş Uppercase "S" with cedilla sh
ğ ğ Lowercase "g" with breve accent gh1
ı ı Lowercase dotless "i"³ ou (as in "in")
ş ş Lowercase "s" with cedilla sh

1Silent, lengthens preceding vowel ²Uppercase "i" ³Lowercase "I"

These codes will probably appear correctly on most current browsers, but fail in archaic versions, even though they are according to standards. They are, however, totally unambiguous, and so will appear correctly as users upgrade to more compliant software. IE 5.0 on Windows NT, for example, does display these using the standard supplied Unicode fonts, as do Mozilla and Opera. Their incorrect appearance on older browsers such as Netscape may be acceptable in some situations, such as when an anglicized name is followed by a parenthesized Turkish equivalent as extra information not crucial to the article itself.

