||It has been suggested that Orthographic depth be merged into this article. (Discuss) Proposed since December 2012.|
|This article needs additional citations for verification. (June 2012)|
A phonemic orthography is an orthography (system for writing a language) in which the graphemes (written symbols) correspond to the phonemes (significant spoken sounds) of the language. Languages rarely have perfectly phonemic orthographies; a high degree of grapheme-phoneme correspondence can be expected in orthographies based on alphabetic writing systems, but these orthographies differ in the degree to which they are in fact fully phonemic. English orthography, for example, though alphabetic, is highly non-phonemic, whereas Italian and Finnish orthographic systems come much closer to being consistent phonemic representations.
In less formal terms, a language with a highly phonemic orthography may be described as having regular spelling. Another terminology is that of deep and shallow orthographies, where the depth of an orthography is the degree to which it diverges from being truly phonemic (this concept can also be applied to non-alphabetic writing systems like syllabaries).
Ideal phonemic orthography
In an ideal phonemic orthography, there would be a complete one-to-one correspondence (bijection) between the graphemes (letters) and the phonemes of the language, and each phoneme would invariably be represented by its corresponding grapheme. This would mean that the spelling of a word would unambiguously and transparently indicate its pronunciation; and conversely that a speaker knowing the pronunciation of a word would be able to infer its spelling without any doubt. This ideal situation is rare, but does exist. An example of an ideally phonemic orthography is the Serbian language. In the Serbian Cyrillic alphabet there are thirty graphemes each uniquely corresponding to one of thirty phonemes. Ideal phonemic orthography was achieved in the 19th century when Serbian linguist Vuk Karadzic reformed the Cyrillic alphabet and presented it to the public with a phrase "Write as you speak, read as it is written" (Piši kao što govoriš, čitaj kako je napisano/Пиши као што говориш, читај како је написано). A perfect phonemic orthography makes reading and writing of Serbian language very easy to learn. 
There are two distinct types of deviation from this phonemic ideal. In the first case, the exact one-to-one correspondence may be lost (for example, some phoneme may be represented by a digraph instead of a single letter), but the "regularity" is retained, in that there is still an algorithm (though a more complex one) for predicting the spelling from the pronunciation and vice versa. In the second case true irregularity is introduced, as certain words come to be spelled according to different rules than others, and prediction is no longer possible without knowledge about the orthography of individual words. Common cases of both of these types of deviation from the ideal are discussed in the following section.
Deviations from phonemic orthography
Some ways in which orthographies may deviate from the ideal of one-to-one grapheme-phoneme correspondence are listed below. The first list contains deviations that tend only to make the relation between spelling and pronunciation more complex, without affecting its predictability (see above paragraph).
- A phoneme may be represented by a sequence of letters – called a multigraph – rather than by a single letter (as in the case of the digraph ch in English and French, and the trigraph sch in German). (This only retains predictability if the multigraph cannot be broken down into smaller units, for example some languages require diacritics to distinguish between "sch" and "s" + "ch"; cf e.g. goatherd in English.) This is often due to the use of an alphabet that was originally used for a different language (the Latin alphabet in these examples) and thus does not have single letters available for all phonemes in the language currently being written (although some orthographies use devices such as diacritics to increase the number of available letters).
- Sometimes, conversely, a single letter may represent a sequence of more than one phoneme (as x can represent the sequence /ks/ in English and other languages).
- Sometimes the rules of correspondence are more complex and depend on adjacent letters, often as a result of historical sound changes (as with the rules for the pronunciation of c and ci in Italian, and the silent e in English).
An orthography mainly affected only by the above types of deviation, with only minor instances of other types of deviation, may still be described as phonemic, or regular, since pronunciation and spelling still correspond in a predictable way. However the deviations listed below are more "serious", as they reduce this predictability (in at least one direction), thus introducing irregularity.
- Sometimes different letters correspond to the same phoneme (as u and ó in Polish are both pronounced as the phoneme /u/). This is often for historical reasons (these Polish letters originally stood for different phonemes, which merged later). This affects the predictability of spelling from pronunciation, though not necessarily vice versa. Another example is found in modern Greek, where the phoneme /i/ can be written in six different ways: ι, η, υ, ει, οι and υι.
- Conversely, a letter or group of letters can correspond to different phonemes in different contexts (as th does in English; originally this stood for a single phoneme, which then split).
- Spelling may otherwise represent a historical pronunciation; orthography does not necessarily keep up with sound changes in the spoken language. For example, the sounds once represented by both the k and the digraph gh of English knight are no longer part of the word's phonemic structure or its pronunciation.
- Spelling may represent the pronunciation of a different dialect from the one being considered. Orthographies tend to reflect a standard variety of the language; however for an international language with wide variations in its dialects, such as English, it would be impossible to represent even the major varieties of the language with a single phonemic orthography.
- Spellings of loanwords often adhere to, or are influenced by, the orthography of the source language (as with the English words ballet and fajita, from French and Spanish respectively). With some loanwords, though, regularity is retained – either by nativizing the pronunciation to match the spelling (as with the Russian word шофёр, from French chauffeur, but pronounced [ʂɐˈfʲor] in accordance with the normal rules of Russian vowel reduction; see also spelling pronunciation), or by nativizing the spelling (for example, football is spelt fútbol in Spanish and futebol in Portuguese).
- Spelling may reflect false etymology (as in the English words hiccough, island, so spelt because of an imagined connection with the words cough and isle), or distant etymology (as in the English word debt, where the b was added under the influence of Latin).
- Spelling may reflect morphophonemic structure rather than the purely phonemic (see next section), although this is often also a reflection of historical pronunciation.
Most orthographies do not reflect the changes in pronunciation known as sandhi, where pronunciation is affected by adjacent sounds in neighboring words (however written Sanskrit and other Indian languages do reflect such changes). A language may also use different sets of symbols or different rules for distinct sets of vocabulary items, such as the Japanese hiragana and katakana syllabaries (and the different treatment in English orthography of words derived from Latin and Greek).
Alphabetic orthographies often have features that are morphophonemic rather than purely phonemic. This means that the spelling reflects to some extent the underlying morphological structure of the words, not only their pronunciation. Hence different forms of a morpheme (minimum meaningful unit of language) are often spelt identically or similarly in spite of differences in their pronunciation. This is often for historical reasons; the morphophonemic spelling reflects a previous pronunciation from before historical sound changes that caused the variation in pronunciation of a given morpheme. Such spellings can assist in the recognition of words when reading.
Some examples of morphophonemic features in orthography are described below.
- The English plural morpheme is written -s regardless of whether it is pronounced as /s/ or /z/; it is cats and dogs, not dogz. This is because the [s] and [z] sounds are forms of the same underlying morphophoneme, automatically pronounced differently depending on its environment. (However when this morpheme takes the form /ɪz/, the addition of the vowel is reflected in the spelling: churches, masses.)
- Similarly the English past tense morpheme is written -ed regardless of whether it is pronounced as /d/, /t/ or /ɪd/.
- Many English words retain spellings that reflect their etymology and morphology rather than their present-day pronunciation. For example, sign and signature include the spelling <sign>, which means the same, but is pronounced differently, in the two words. Other examples are "science /saɪ/ vs. unconscious /ʃ/, prejudice /prɛ/ vs. prequel /priː/, nation /neɪ/ vs. nationalism /næ/, and special /spɛ/ vs. species /spiː/.
- Phonological assimilation is often not reflected in spelling, even in otherwise phonemic orthographies such as Spanish, where obtener "obtain" and optimista "optimist" are written with b and p respectively, even though both are pronounced /p/ by assimilation with the following /t/. On the other hand, Serbo-Croatian (Serbian, Croatian and Bosnian) spelling reflects assimilation, thus one writes Србија/Srbija "Serbia" but српски/srpski "Serbian".
- The final-obstruent devoicing that occurs in many languages (such as German, Polish, Russian and Welsh) is not normally reflected in the spelling. For example, in German, Bad "bath" is spelt with a final d, even though it is pronounced /t/, thus corresponding to other morphologically related forms such as the verb baden, where the d is pronounced /d/. (Compare Rat, raten, where the t is pronounced /t/ in both positions.) Turkish orthography, however, is more strictly phonemic: for example, the imperative of eder "does" is spelled et, as it is pronounced (and the same as the word for "meat"), not *ed as it would be if the German approach were followed.
Korean hangul has changed over the centuries from a highly phonemic to a largely morphophonemic orthography. Japanese kana are almost completely phonemic, but have a few morphophonemic aspects, notably in the use of ぢ di and づ du (rather than じ ji and ず zu, their pronunciation in standard Tokyo dialect), when the character is a voicing of an underlying ち or つ – this is due to the rendaku sound change combined with the yotsugana merger of formally different morae. The Russian orthography is also mostly morphophonemic (does not reflect vowel reduction, consonant assimilation, final-obstruent devoicing; some consonant combinations have silent consonants).
A defective orthography is one that is not capable of representing all the phonemes or phonemic distinctions in a language. An example of such a deficiency in English orthography is the lack of distinction between the voiced and voiceless "th" phonemes, occurring in words like then and thin respectively (both have to be written th). More systematic deficiency is found in orthographies based on abjadic writing systems like the Arabic and Hebrew scripts, which do not normally represent the short vowels (although methods are available for doing so in special situations).
Comparison between languages
Many languages of India written in Brahmic scripts, such as Hindi (apart from schwa and nasal vowels), Tamil and Marathi, but not Bengali and Gujarati, have very high degree of phonemic orthographies.
Orthographies with a high grapheme-to-phoneme and phoneme-to-grapheme correspondence (excluding exceptions due to loan words and assimilation) include those of Maltese, Finnish, Albanian, Georgian, Italian, Turkish (apart from ğ and various palatal and vowel allophones), Serbo-Croatian (Serbian, Croatian and Bosnian), Bulgarian, Macedonian (if the apostrophe is counted, though slight inconsistencies may be found), Eastern Armenian (apart from o, v), Basque (apart from palatalized l, n), Haitian Creole, Castilian Spanish (apart from h, x, b/v, and sometimes k, c, g, j, z), Czech (apart from ě, ů, y, ý), Polish (apart from ó, h, rz), Romanian (apart from distinguishing semivowels from vowels), Ukrainian (mainly phonemic with some other historical/morphological rules, as well as palatalization), Belarusian (phonemic for vowels but morphophonemic for consonants except ў written phonetically), Swahili (missing aspirated consonants, which do not occur in all varieties and anyway are sparsely used), Mongolian (apart from letters representing multiple sounds depending on front or back vowels, the soft and hard sign, silent letters to indicate /ŋ/ from /n/ and voiced versus voiceless consonants) Azerbaijani (apart from k), and Kazakh (apart from и, у, х, щ, ю).
Languages with highly phonemic orthographies often lack or rarely use a word corresponding to the English verb "to spell" because the act of spelling out words is rarely needed (careful pronunciation of a word is generally sufficient to convey its spelling). That is the case in Italian: spelling is queried by asking "Come si scrive?" 'How do you write?', and the question is answered by pronouncing the word syllabically (e.g. [tʃok.ko.ˈlaː.to] could be spelled only cioccolato 'chocolate').
Some phonemic orthographies are slightly defective: Malay, Italian, Lithuanian, and Welsh do not fully distinguish their vowels, Serbian and Croatian do not distinguish tone and vowel length, Somali does not distinguish vowel phonation, and graphemes b and v represent the same phoneme in all varieties of Spanish, while in Spanish of the Americas, /s/ can be represented by graphemes s, c, and z.
French, with its silent letters and its heavy use of nasal vowels and elision, may seem to lack much correspondence between spelling and pronunciation, but its rules on pronunciation, though complex, are consistent and predictable with a fair degree of accuracy. The actual letter-to-phoneme correspondence, however, is often low and a sequence of sounds may have multiple ways of being spelt.
Orthographies such as those of German, Hungarian (mainly phonemic with "ly, j" representing the same sound, but consonant and vowel length are not always accurate and various spellings reflect etymology, not pronunciation), Portuguese, and that of the modern Greek language (written with the Greek alphabet), as well as Korean hangul, are sometimes considered to be of intermediate depth (for example they include many morphophonemic features, as described above).
English orthography is highly non-phonemic. It would in any case be hard to construct an orthography that reflected all of the main dialects of English, because of differences in phonological systems (such as between standard British and American English, and between these and Australian English with its bad–lad split). The irregularity of English spelling is partly because the Great Vowel Shift occurred after the orthography was established, and because English has acquired a large number of loanwords at different times, retaining their original spelling at varying levels. However even English has general, albeit complex, rules that predict pronunciation from spelling, and several of these rules are successful most of the time; rules to predict spelling from the pronunciation have a higher failure rate.
The syllabary systems of Japanese (hiragana and katakana) are examples of almost perfectly shallow orthography – exceptions include the use ぢ of and づ (discussed above) and the use of は, を, and へ to represent the sounds わ, お, and え, as relics of historical kana usage.
Realignment of orthography
With time, pronunciations change and spellings become out of date, as has happened to English and French. In order to maintain a phonemic orthography such a system would need periodic updating, as has been attempted by various language regulators and proposed by other spelling reformers.
Sometimes the pronunciation of a word changes to match its spelling; this is called a spelling pronunciation. This is most common with loanwords, but occasionally occurs in the case of established native words too. In some English personal names and place names, the relationship between the spelling of the name and the pronunciation is so distant that associations among phonemes and graphemes cannot be readily identified. Moreover, in many other words, the pronunciation has subsequently evolved from a fixed spelling, so that it has to be said that the phonemes represent the graphemes rather than vice versa. And in much technical jargon, the primary medium of communication is the written language rather than the spoken language, so the phonemes represent the graphemes, and it is unimportant how the word is pronounced. The sounds which literate people perceive being heard in a word are largely influenced by the actual spelling of the word.
Sometimes, countries have the written language undergo a spelling reform to realign the writing with the contemporary spoken language. These can range from simple spelling changes and word forms to switching the entire writing system itself, as when Turkey switched from the Arabic alphabet to a Turkish alphabet of Latin origin.
Methods for phonetic transcription such as the International Phonetic Alphabet (IPA) aim to describe pronunciation in a standard form. They are often used to solve ambiguities in the spelling of written language. They may also be used to write languages with no previous written form. Systems like IPA can be used for phonemic representation or for showing more detailed phonetic information (see Narrow vs. broad transcription).
Phonemic orthographies are different from phonetic transcription; whereas in a phonemic orthography, allophones will usually be represented by the same grapheme, a purely phonetic script would demand that phonetically distinct allophones be distinguished. To take an example from American English: the /t/ sound in the words "table" and "cat" would, in a phonemic orthography, be written with the same character; however, a strictly phonetic script would make a distinction between the aspirated "t" in "table", the flap in "butter", the unaspirated "t" in "stop" and the glottalized "t" in "cat" (not all these allophones exist in all English dialects). In other words, the sound that most English speakers think of as /t/ is really a group of sounds, all pronounced slightly differently depending on where they occur in a word. A perfect phonemic orthography has one letter per group of sounds (phoneme), with different letters only where the sounds distinguish words (so "bed" is spelled differently from "bet").
A narrow phonetic transcription represents phones, the atomic sounds humans are capable of producing, many of which will often be grouped together as a single phoneme in any given natural language, though the groupings vary across languages. English, for example, does not distinguish between aspirated and unaspirated consonants, but other languages, like Korean, Bengali and Hindi, do. On the other hand, Korean does not distinguish between voiced and voiceless consonants unlike a number of other languages.
The sounds of speech of all languages of the world can be written by a rather small universal phonetic alphabet. A standard for this is the International Phonetic Alphabet.
- Alphabetic principle
- English spelling reform
- Orthographic depth
- Orthographic transcription