Articles on the English Wikipedia may contain words or texts written in different languages and scripts. To be able to correctly view and edit these articles requires that you have the appropriate fonts installed and to have correctly configured your operating system and browser. This guide will help you to do so.
Articles on Wikipedia are encoded using Unicode (specifically UTF-8), an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. Because UTF-8 is backwards compatible with ASCII, and most modern browsers have at least basic Unicode support, most users will experience little difficulty reading and editing most of Wikipedia.
For older browsers, MediaWiki (the Wikipedia software), serves the wikitext in a safe mode upon editing. Characters that cannot be represented in ASCII are temporarily converted to hexadecimalcharacter references, looking like ሴ. Existing hexadecimal character references get an additional leading zero so they are not converted to actual characters when the page is saved, and look like ሴ. Likewise, to create a hexadecimal character reference in safe mode, not the character itself, a leading zero should be added. One can check whether safe mode is used by editing this section. If M looks like M rather than M, safe mode is used.
Most computers with Microsoft Windows, Apple's OS X and many Linux variants will already have fonts with support for Latin, Greek, Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean and the International Phonetic Alphabet installed. Many mobile devices, such as the iPhone and iPad also include such fonts. Several historic and accented characters (used in the transliteration of foreign scripts) may be missing, though.
Supports a wide number of scripts, but is of a slightly lower quality than Arial because it lacks kerning and is not smoothed. Contains a minor bug that causes double-wide diacritics to be placed on the wrong characters.
supports Latin (however not all extended sets), Greek, Cyrillic, Arabic and Hebrew. Support for East Asian and some Indic scripts is available if support for this has been installed for Windows. As Internet Explorer will only use the default font for other scripts, those are usually not supported (unless the default font does).
tries to render any character using all the fonts available on the system so multilingual support is generally good. The default rendering engine can support complex script rendering. Some Linux distributions ship with a Pango-based rendering engine which also does, although this may currently cause some display glitches with justified text.
tries to render any character using all the fonts available on the system so multilingual support is also good. Opera uses the operating system to perform contextual glyph selection, ligature forming, character stacking, combining character support and other character shaping tasks.
Does not directly support several languages of South and Southeast Asian countries, but otherwise renders some tofu signs, due to its problem of font fallback machanism, you may need the Advanced Font Settings extension to optimize. Renders Devanagari (used for Hindi), Bengali, Sinhala, Gurmukhi, and Tibetan scripts in the examples below, but not some of languages of Southeast Asian countries.
Most operating systems provide support for Syriac scripts natively, but only the Maḏnḥāyā (ܡܕܢܚܝܐ) and ʾEsṭrangēlā (ܐܣܛܪܢܓܠܐ) varieties have correct rendering. In order to render the Serṭā (ܣܪܛܐ) variety, additional fonts are needed. These scripts are supported by the following fonts:
The Tifinagh alphabet is used to write the Berber languages. IRCAM (Institut Royal de la Culture Amazighe) has a software suite developed for Windows XP that contains a Tifinagh keyboard and a font available for download here. The script is supported by the following fonts:
The Javanese script is used to write the Javanese language. It is supported by Unicode 5.2 and above. The script is a so-called SIL Graphite-script, and is best supported by Firefox. As of recently however, it can be rendered by the OpenType and TrueType standards, provided the right font is used. The script is supported by the following fonts:
Baybayin (also known as the Tagalog script in Unicode and Alibata) is a form of pre-Spanish Philippine writing system in which modern minority scripts in the Philippines have descended. It is supported by the following fonts:
Mediawiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.
The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ. you may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (C, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.
For example, the interlanguage link [[en:Luxury car]] to en:Luxury car has to be entered in the edit box as [[en:Luxxury car]] on eo:. This has caused problems with interwiki update bots in the past.
The Romanian alphabet contains an S-comma (Ș ș) and T-comma (Ț ț). These characters were added to Unicode 3.0 at the request of the Romanian standardization institute. As font support for these characters has been poor in the past, many computer users use the similar characters S-cedilla (Ş ş) and T-cedilla (Ţ ţ) instead. However, on Wikipedia it is recommended to use the correct characters with comma below.
^Until June 2005, when MediaWiki 1.5 came into use on the Wikimedia projects, articles on the English Wikipedia were encoded using ISO/IEC 8859-1 (although the additional characters from the Windows-1252 character set were used in practice.) All characters from the ISO/IEC 10646 Universal Character Set could be accessed through numerical entities, as specified by the HTML 4.01 specification. Since, nearly all pages have been converted to use Unicode directly. Old discussion on the topic can be read at Wikipedia talk:Unicode.