Help:Special characters

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Many characters not on the standard computer keyboard will be useful—even necessary—for projects in a non-Latin-alphabet language. This page contains recommendations for which characters are safe to use and how to enter them.

Editing[edit]

See Help:Entering special characters.

Viewing[edit]

Most current browsers have some level of Unicode support but some do it better than others. The most commonly encountered problem is that Internet Explorer relies on preconfigured font links in the registry rather than actually searching for a font that can display the character in question. This means that Internet Explorer often has to be forced to use particular fonts. On English Wikipedia there are a set of templates to do this. For example {{unicode}} for general Unicode text and {{IPA}} for the International Phonetic Alphabet. The stuff in Windows Glyph List 4 should be safe to use without such special measures.

<font face="Arial Unicode MS">...</font> may work, but only for readers of computers which have installed that font.

Windows 7[edit]

Unicode support is extended through installing the optional update KB2729094 (through Windows Update). This adds Emoji browser support to Windows 7.

Windows 8 includes emoji support by default

Displaying special characters[edit]

To display Unicode or special characters on web page(s), one or more of the Unicode fonts need to be present or installed in your computer, first. For proper working functionality, setup or configuration or settings from the web page viewing browser software also needs to be modified.

The default font for Latin scripts in Internet Explorer (IE) web browser for Windows is Times New Roman. It doesn't include many Unicode blocks. To properly view special characters in IE, you must set your browser font settings to a font that includes many Unicode blocks of characters, such as TITUS Cyberbit Basic and GNU Unifont, which are freely available.

Special symbols should display properly without further configuration with Konqueror, Opera, Safari and most other recent browsers. An optional step can be taken for better (and correct) display of characters with ligature forms, combined characters, after the previously mentioned steps were followed, is to install a rendering engine software.

With Mozilla Firefox, the default setting must be changed. To do that, click on 'Options' in the 'Tools' menu and select the 'Content' icon. On that menu click 'Advanced' under 'Fonts and Colors'. Uncheck the default box "allow pages to choose their own fonts", and choose one of the Unicode options (for example, "Unicode (UTF-32BE)") or "Arabic (ISO-8859-6)" in the 'Default character encoding' box. An alternative is to switch font to 'Arial Unicode MS'. Then the default box can be left checked.

To use one of the available Unicode fonts for displaying special characters inside a table or chart or box, specify the class="Unicode" in the table's TR row tag (or, in each TD tag, but using it in each TR is easier than using it in each TD), in wiki table code, use that after the (TR equivalent) |- (e.g., |- class="Unicode").

For displaying individual special characters, template code {{Unicode|char}} can be used for each character. HTML decimal or hexadecimal numeric entity codes can be used in the place of the char. If a paragraph with lots of special Unicode characters needs to be displayed, then, <p class="Unicode"> ... </p>, or, <span class="Unicode"> ... </span> can also be used.

The class="Unicode" is to be used in web page(s), HTML or wiki tags, where various characters from wide range of various Unicode blocks need to be displayed. If the special characters that need to be displayed on web page(s) are mostly covering fewer Unicode blocks, related to Latin scripts, then class="latinx" can be used. For special characters or symbols related to International Phonetic Alphabet, class="IPA" can be used. For polytonic (Greek) characters or related symbols, class="polytonic" can be used.

Changing Internet Explorer's (IE) default font[edit]

From the IE menu bar, follow this path:  Tools > Internet Options > (General tab >) Fonts > Webpage Font:
to a scrolling list of fonts. As indicated above, the default selection for Windows is Times New Roman. For viewing of many special characters, select a different font, such as Lucida Sans Unicode, and then select OK.

Fonts for specific writing systems[edit]

Ancient scripts[edit]

e.g. Phoenician alphabet, Old Italic alphabet, Linear B, etc.

Windows users

Please download and install one of these freely licensed fonts

Linux users

If using a Debian-based Linux (e.g. Ubuntu, Linux Mint), please download and install deb package ttf-ancient-fonts by entering in terminal:

sudo apt-get install ttf-ancient-fonts

Shavian text[edit]

  • Copyleft font is available from here.

Glagolitic text[edit]

IPA symbols[edit]

Most IPA symbols are not included in the most widely used form of Times New Roman (though they are included in the version provided with Windows Vista), the default font for Latin scripts in Internet Explorer for Windows. To properly view IPA symbols in that browser, you must set it to use a font which includes the IPA extensions characters. Such fonts include Lucida Sans Unicode, which comes with Windows XP; Gentium, Charis SIL, Doulos SIL, DejaVu Sans, or TITUS Cyberbit, which are freely available; or Arial Unicode MS, which comes with Microsoft Office.

On this page, we have forced Internet Explorer to use such a font by default, so it should appear correctly, but this has not yet been done to all the other pages containing IPA. This also applies to other pages using special symbols. Bear this in mind if you see error symbols such as "຦" in articles.

Special symbols should display properly without further configuration with Mozilla Firefox, Konqueror, Opera, Safari and most other recent browsers.

What character encoding is Wikipedia using?[edit]

From MediaWiki 1.5, all projects use Unicode (UTF-8) character encoding.

Until the end of June 2005, when this new version came into use on Wikimedia projects, the English, Dutch, Danish, and Swedish Wikipedias used Windows-1252 (they declared themselves to be ISO-8859-1 but in reality browsers treat the two as synonymous and the MediaWiki software made no attempt to prevent use of characters exclusive to windows-1252). Pre-upgrade wikitext in their databases remains stored in Windows-1252 and is converted on load (some of it may also have been converted by gradual changes in the way history is stored). Edits made since the upgrade will be stored as UTF-8 in the database. This conversion on load process is invisible to users. It is also invisible to reusers as Wikimedia now uses XML dumps rather than database dumps.

Unicode (UTF-8)
  • a variable number of bytes per character
  • special characters, including CJK characters, can be treated like normal ones; not only the webpage, but also the edit box shows the character; in addition it is possible to use the multi-character codes; they are not automatically converted in the edit box.
ISO 8859-1
  • one byte per character
  • special characters that are not available in the limited character set are stored in the form of a multi-character code; there are usually two or three equivalent representations, e.g. for the character € the named character reference &euro; and the decimal character reference &#8364; and the hexadecimal character reference &#x20AC;. The edit box shows the entered code, the webpage the resulting character. Unavailable characters which are copied into the edit box are first displayed as the character, and automatically converted to their decimal codes on Preview or Save.
  • the most common special characters, such as é, are in the character set, so code like &eacute;, although allowed, is not needed.

Note that Special:Export exports using UTF-8 even if the database is encoded in ISO 8859-1, at least that was the case for the English Wikipedia, already when it used version 1.4.

To find out which character set applies in a project, use the browser's "View Source" feature and look for something like this:

<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" />

or

<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

See also[edit]

External links[edit]