Hindustani phonology

This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA. For the distinction between [ ], / / and ⟨ ⟩, see IPA § Brackets and transcription delimiters.

Hindustani is the lingua franca of northern India and Pakistan, and through its two standardized registers, Hindi and Urdu, an official language of India and Pakistan. Phonological differences between the two standards are minimal.

Vowels

Hindustani natively possesses a symmetrical ten-vowel system.^[1] The vowels: [ə], [ɪ], [ʊ] are always short in length, while the vowels: [ɑː, iː, uː, eː, oː, ɛː, ɔː] are always considered long (but see the details below). Among the close vowels, what in Sanskrit are thought to have been primarily distinctions of vowel length (that is /i ~ iː/ and /u ~ uː/), have become in Hindustani distinctions of quality, or length accompanied by quality (that is, /ɪ ~ iː/ and /ʊ ~ uː/).^[2] The historical opposition of length in the close vowels has been neutralized in word-final position, for example Sanskrit loans śakti (शक्ति – شَکتی 'energy') and vastu (वस्तु – وَستُو 'item') are /ʃəkt̪i/ and /ʋəst̪u/, not */ʃəkt̪ɪ/ and */ʋəst̪ʊ/.^[3]

/ə/ is often realized more open than mid [ə], i.e. as near-open [ɐ].^[4]

The vowel represented graphically as ऐ – اَے (romanized as ai) has been variously transcribed as [ɛː] or [æː].^[5] Among sources for this article, Ohala (1999), pictured to the right, uses [ɛː], while Shapiro (2003:258) and Masica (1991:110) use [æː]. Furthermore, an eleventh vowel /æː/ is found in English loanwords, such as /bæːʈ/ ('bat').^[6] Hereafter, the former will be represented as [ɛː] to distinguish it from the latter. The open central vowel is transcribed in IPA by either [aː] or [ɑː]. Despite this, the Hindustani vowel system is quite similar to that of English, in contrast to the consonants.

In addition, [ɛ] occurs as a conditioned allophone of /ə/ (schwa) in proximity to /h/, if and only if the /h/ is surrounded on both sides by two schwas.^[3] For example, in kahanā /kəɦ(ə)naː/ (कहना – کَہنا 'to say'), the /h/ is surrounded on both sides by schwa, hence both the schwas will become fronted to short [ɛ], giving the pronunciation [kɛɦ(ɛ)naː]. Syncopation of phonemic middle schwa can further occur to give [kɛɦ.naː]. The fronting also occurs in word-final /h/, presumably because a lone consonant carries an unpronounced schwa. Hence, kaha! /kəɦ(ə)/ (कह – کَہہ 'say!') becomes [kɛh] in actual pronunciation. However, the fronting of schwa does not occur in words with a schwa only on one side of the /h/ such as kahānī /kəɦaːniː/ (कहानी – کَہانی 'a story') or bāhar /baːɦər/ (बाहर – باہَر 'outside').

As in French and Portuguese, there are nasalized vowels in Hindustani. There is disagreement over the issue of the nature of nasalization (barring English-loaned /æ/ which is never nasalized^[6]). Masica (1991:117) presents four differing viewpoints:

there are no *[ẽ] and *[õ], possibly because of the effect of nasalization on vowel quality;
there is phonemic nasalization of all vowels;
all vowel nasalization is predictable (i.e. allophonic);
Nasalized long vowel phonemes (/ĩː ẽː ɛ̃ː ɑ̃ː ɔ̃ː õː ũː/) occur word-finally and before voiceless stops; instances of nasalized short vowels ([ɪ̃ ə̃ ʊ̃]) and of nasalized long vowels before voiced stops (the latter, presumably because of a deleted nasal consonant) are allophonic.

Masica^[7] supports this last view.

Consonants

Hindustani stops

Bilabial stops

पाल फाल बाल भाल
pāl phāl bāl bhāl
[pal pʰal bal bʱal]
'take care of, knife blade, hair, forehead'

Dental stops

ताल थाल दाल धार
tāl thāl dāl dhār
[t̪al t̪ʰal d̪al d̪ʱaɾ]
'rhythm, plate, lentil, knife'

Palatal stops

चल छल जल झल
cal chal jal jhal
[tʃəl tʃʰəl dʒəl dʒʱəl]
'walk, deceit, water, glimmer'

Retroflex stops

टाल ठाल डाल ढाल
ṭāl ṭhāl ḍāl ḍhāl
[ʈal ʈʰal ɖal ɖʱal]
'postpone, wood shop, branch, shield'

Velar stops

कान खान गान घान
kān khān gān ghān
[kan kʰan ɡan ɡʱan]
'ear, mine, song, bundle'

Problems playing these files? See media help.

Hindustani has a core set of 28 consonants inherited from earlier Indo-Aryan. Supplementing these are 2 consonants that are internal developments in specific word-medial contexts,^[8] and 7 consonants originally found in loan words, whose expression is dependent on factors such as status (class, education, etc.) and cultural register (Modern Standard Hindi vs Urdu).

Most native consonants may occur geminate (doubled in length; exceptions are /bʱ, ɽ, ɽʱ, ɦ/). Geminate consonants are always medial and preceded by one of the interior vowels (that is, /ə/, /ɪ/, or /ʊ/). They all occur monomorphemically except [ʃː], which occurs only in a few Sanskrit loans where a morpheme boundary could be posited in between, e.g. /nɪʃ + ʃiːl/ for niśśīl [nɪʃʃiːl] ('without shame').^[6]

For the English speaker, a notable feature of the Hindustani consonants is that there is a four-way distinction of phonation among plosives, rather than the two-way distinction found in English. The phonations are:

tenuis, as /p/, which is like ⟨p⟩ in English spin
voiced, as /b/, which is like ⟨b⟩ in English bin
aspirated, as /pʰ/, which is like ⟨p⟩ in English pin, and
murmured, as /bʱ/.

The last is commonly called "voiced aspirate", though Shapiro (2003:260) notes that,

"Evidence from experimental phonetics, however, has demonstrated that the two types of sounds involve two distinct types of voicing and release mechanisms. The series of so-called voice aspirates should now properly be considered to involve the voicing mechanism of murmur, in which the air flow passes through an aperture between the arytenoid cartilages, as opposed to passing between the ligamental vocal bands."

The murmured consonants are believed to be a reflex of murmured consonants in Proto-Indo-European, a phonation that is absent in all branches of the Indo-European family except Indo-Aryan and Armenian.

Hindustani consonants
		Labial	Dental/ Alveolar	Retroflex	Palatal	Velar	Uvular	Glottal
Nasal		m	n	(ɳ)	(ɲ)	(ŋ)
Plosive/ Affricate	voiceless	p	t̪	ʈ	tʃ	k	(q)
	voiceless aspirated	pʰ	t̪ʰ	ʈʰ	tʃʰ	kʰ
	voiced	b	d̪	ɖ	dʒ	ɡ
	voiced aspirated	bʱ	d̪ʱ	ɖʱ	dʒʱ	ɡʱ
Fricative	voiceless	f	s	[[Voiceless_retroflex_sibilant\|(ʂ)]]	ʃ	(x)
Fricative	voiced		z		ʒ	(ɣ)		ɦ
Flap	plain		ɾ	(ɽ)
Flap	voiced aspirated			(ɽʱ)
Approximant		ʋ	l		j

Notes

Marginal and non-universal phonemes are in parentheses.
/ɽ/ is lateral [ɺ̢] for some speakers.^[9]^{[[[Wikipedia:Cleanup|can the aspirated (ɽʱ) also be lateral?]]]}
/ɣ/ is post-velar.^[10]

Stops in final position are not released; /ʋ/ varies freely with [v], and can also be pronounced [w]; /ɾ/ can surface as a trill [r] (mostly in word-initial and syllable-final positions), and geminate /ɾː/ is always a trill, e.g. zarā [zəɾaː] (ज़रा – ذرا 'little') versus well-trilled zarrā [zəraː] (ज़र्रा – ذرّہ 'particle'), this happens in loanwords of Arabic and Persian origin.^[4] The palatal and velar nasals [ɲ, ŋ] occur only in consonant clusters, where each nasal is followed by a homorganic stop, as an allophone of a nasal vowel followed by a stop, and in Sanskrit loanwords.^[8]^[4] There are murmured sonorants, [lʱ, ɾʱ, mʱ, nʱ], but these are considered to be consonant clusters with /ɦ/ in the analysis adopted by Ohala (1999).

The palatal affricates and sibilant are variously classified by linguists as alveolo-palatal or palato-alveolar, hence the sound represented by grapheme श can be transcribed as [ʃ] or [ɕ], and the grapheme च can be transcribed as [tʃ], [cɕ], [tɕ] or even plosive [c]. However, in this article, the sounds are transcribed as [ʃ] and [tʃ] respectively. The fricative /h/ in Hindustani is typically voiced (as [ɦ]), especially when surrounded by vowels, but there is no phonemic difference between this voiceless fricative and its voiced counterpart (Hindustani's ancestor Sanskrit has such a phonemic distinction).

Hindustani also has a phonemic difference between the dental plosives and the so-called retroflex plosives. The dental plosives in Hindustani are laminal-denti alveolar as in Spanish, and the tongue-tip must be well in contact with the back of the upper front teeth. The retroflex series is not purely retroflex; it actually has an apico-postalveolar (also described as apico-pre-palatal) articulation, and sometimes in words such as ṭūṭā /ʈuːʈaː/ (टूटा – ٹُوٹا 'broken') it even becomes alveolar.^[11]

In some Indo-Aryan languages, the plosives [ɖ, ɖʱ] and the flaps [ɽ, ɽʱ] are allophones in complementary distribution, with the former occurring in initial, geminate and postnasal positions and the latter occurring in intervocalic and final positions. However, in Standard Hindi they contrast in similar positions, as in nīṛaj (नीड़ज – نیڑج 'bird') vs niḍar (निडर – نڈر 'fearless').^[12]

Allophony of [v] and [w]

[v] and [w] are allophones in Hindustani. These are distinct phonemes in English, but both are allophones of the phoneme /ʋ/ in Hindustani (written ⟨व⟩ in Hindi or ⟨و⟩ in Urdu), especially in loanwords of Arabic and Persian origin. More specifically, they are conditional allophones, i.e. rules apply on whether ⟨व⟩ is pronounced as [v] or [w] depending on context. Native Hindi speakers pronounce ⟨व⟩ as [v] in vrat (व्रत – ورت, 'oath') and [w] in pakwān (पकवान – پکوان 'food dish'), treating them as a single phoneme and without being aware of the allophonic distinctions, though these are apparent to native English speakers. The rule is that the consonant is pronounced as semivowel [w] in onglide position, i.e. between an onset consonant and a following vowel. ^[13]

However, the allophone phenomenon becomes obvious when speakers switch languages. When speakers of other languages that have a distinction between [v] and [w] speak Hindustani, they might pronounce ⟨व و⟩ in vrat (व्रत – ورت) as [w], i.e. as [wɾət̪] instead of the correct [vɾət̪]. This results in an intelligibility problem because [wɾət̪] can easily be confused for aurat (ओरत عورت) [ˈɔːɾət̪], which means woman instead of oath in Hindustani. Similarly, Hindustani speakers might unconsciously apply their native allophony rules to English words, pronouncing war /wɔːɹ/ as [vɔːɹ] or advance /ədˈvɑːns/ as [ədˈwɑːns], which can result in intelligibility problems with native English speakers.^[13]

In some situations, the allophony is non-conditional, i.e. the speaker can choose [v], [w] or an intermediate sound based on personal habit and preference, and still be perfectly intelligible. This includes words such as advait (अद्वैत – ادویت) which can be pronounced equally correctly as [əd̪ˈwɛːt̪] or [əd̪ˈvɛːt̪].^[13]

External borrowing

Loanwords from Sanskrit reintroduced /ɳ/ into formal Modern Standard Hindi. In casual speech it is usually replaced by /n/.^[6] It does not occur initially and has a nasalized flap [ɽ̃] as a common allophone.^[8]

Loanwords from Persian (including some words which Persian itself borrowed from Arabic or Turkish) introduced five consonants, /f, z, q, x, ɣ/. Being Persian in origin, these are seen as a defining feature of Urdu, although these sounds officially exist in Hindi and modified Devanagari characters are available to represent them.^[14]^[15] Among these, /f, z/, also found in English and Portuguese loanwords, are now considered well-established in Hindi; indeed, /f/ appears to be encroaching upon and replacing /pʰ/ even in native (non-Persian, non-English) Hindi words.^[8] This /pʰ/ to /f/ shift also occasionally occurs in Urdu.^[16]

The other three Persian loans, /q, x, ɣ/, are still considered to fall under the domain of Urdu, and are also used by many Hindi speakers; however, some Hindi speakers assimilate these sounds to /k, kʰ, ɡ/ respectively.^[14]^[17] The sibilant /ʃ/ is found in loanwords from all sources (English, Persian, Sanskrit) and is well-established.^[6] The failure to maintain /f, z, ʃ/ by some Hindi speakers (often non-urban speakers who confuse them with /pʰ, dʒ, s/) is considered nonstandard.^[14] Yet these same speakers, having a Sanskritic education, may hyperformally uphold /ɳ/ and [[Voiceless_retroflex_sibilant|[ʂ]]]. In contrast, for native speakers of Urdu, the maintenance of /f, z, ʃ/ is not commensurate with education and sophistication, but is characteristic of all social levels.^[17]

Being the main sources from which Hindustani draws its higher, learned terms– English, Sanskrit, Arabic, and to a lesser extent Persian provide loanwords with a rich array of consonant clusters. The introduction of these clusters into the language contravenes a historical tendency within its native core vocabulary to eliminate clusters through processes such as cluster reduction and epenthesis.^[18] Schmidt (2003:293) lists distinctively Sanskrit/Hindi biconsonantal clusters of initial /kɾ, kʃ, st̪, sʋ, ʃɾ, sn, nj/ and final /t̪ʋ, ʃʋ, nj, lj, ɾʋ, dʒj, ɾj/, and distinctively Perso-Arabic/Urdu biconsonantal clusters of final /ft̪, ɾf, mt̪, mɾ, ms, kl, t̪l, bl, sl, t̪m, lm, ɦm, ɦɾ/.

Suprasegmental features

Hindustani has a stress accent, but it is not as important as in English. To predict stress placement, the concept of syllable weight is needed:

A light syllable (one mora) ends in short vowel /ə, ɪ, ʊ/: V
A heavy syllable (two moras) ends in a long vowel /aː, iː, uː, eː, ɛː, oː, ɔː/ or in a short vowel and a consonant: VV, VC
An extra-heavy syllable (three moras) ends in a long vowel and a consonant, or a short vowel and two consonants: VVC, VCC

Stress is on the heaviest syllable of the word, and in the event of a tie, on the last such syllable. If all syllables are light, the penultimate is stressed. However, the final mora of the word is ignored when making this assignment (Hussein 1997) [or, equivalently, the final syllable is stressed either if it is extra-heavy, and there is no other extra-heavy syllable in the word or if it is heavy, and there is no other heavy or extra-heavy syllable in the word]. For example, with the ignored mora in parentheses:^[19]^{[clarification needed]}

kaː.ˈriː.ɡə.ri(ː)

ˈtʃəp.kə.lɪ(ʃ)

ˈʃoːx.dʒə.baː.ni(ː)

ˈreːz.ɡaː.ri(ː)

sə.ˈmɪ.t(ɪ)

ˈqɪs.mə(t)

ˈbaː.ɦə(r)

roː.ˈzaː.na(ː)

rʊ.ˈkaː.ja(ː)

ˈroːz.ɡaː(r)

aːs.ˈmaːn.dʒaː(h) ~ ˈaːs.mãː.dʒaː(h)

kɪ.ˈdʱə(r)

rʊ.pɪ.ˈa(ː)

dʒə.ˈnaː(b)

əs.ˈbaː(b)

mʊ.səl.ˈmaː(n)

ɪɴ.qɪ.ˈlaː(b)

pər.ʋər.dɪ.ˈɡaː(r)

Content words in Hindustani normally begin on a low pitch, followed by a rise in pitch.^[20]^[21] Strictly speaking, Hindustani, like most other Indian languages, is rather a syllable-timed language. The schwa /ə/ has a strong tendency to vanish into nothing (syncopated) if its syllable is unaccented.

References

^ Masica (1991:110)
^ Masica (1991:111)
^ ^a ^b Shapiro (2003:258)
^ ^a ^b ^c Ohala (1999:102)
^ Masica (1991:114)
^ ^a ^b ^c ^d ^e Ohala (1999:101)
^ Masica (1991:117–118)
^ ^a ^b ^c ^d Shapiro (2003:260)
^ Masica (1991:98)
^ Kachru (2006:20)
^ Tiwari, Bholanath ([1966] 2004) हिन्दी भाषा (Hindī Bhāshā), Kitāb Mahal, Allahabad, ISBN 81-225-0017-X.
^ Masica (1991:97)
^ ^a ^b ^c Janet Pierrehumbert, Rami Nair, Implications of Hindi Prosodic Structure (Current Trends in Phonology: Models and Methods), European Studies Research Institute, University of Salford Press, 1996, ISBN 978-1-901471-02-1, ... showed extremely regular patterns. As is not uncommon in a study of subphonemic detail, the objective data patterned much more cleanly than intuitive judgments ... [w] occurs when /व و/ is in onglide position ... [v] occurs otherwise ...
^ ^a ^b ^c A Primer of Modern Standard Hindi. Motilal Banarsidass. Retrieved 25 August 2009.
^ "Hindi Urdu Machine Transliteration using Finite-state Transducers" (PDF). Association for Computational Linguistics. Retrieved 25 August 2009.
^ Jain, Danesh; Cardona, George (26 July 2007). "The Indo-Aryan Languages". Routledge – via Google Books.
^ ^a ^b Masica (1991:92)
^ Shapiro (2003:261)
^ Hayes (1995:276)
^ http://www.und.nodak.edu/dept/linguistics/theses/2001Dyrud.PDF Dyrud, Lars O. (2001) Hindi-Urdu: Stress Accent or Non-Stress Accent? (University of North Dakota, master's thesis)
^ http://www.speech.sri.com/people/rao/papers/icslp96_wbhyp.pdf Ramana Rao, G.V. and Srichand, J. (1996) Word Boundary Detection Using Pitch Variations. (IIT Madras, Dept. of Computer Science and Engineering)

Bibliography

Masica, Colin (1991), The Indo-Aryan Languages, Cambridge: Cambridge University Press, ISBN 978-0-521-29944-2.
Hayes, Bruce (1995), Metrical stress theory, University of Chicago Press.
Hussein, Sarmad (1997), Phonetic Correlates of Lexical Stress in Urdu, Northwestern University.
Kachru, Yamuna (2006), Hindi, John Benjamins Publishing, ISBN 90-272-3812-X.
Ohala, Manjari (1999), "Hindi", in International Phonetic Association (ed.), Handbook of the International Phonetic Association: a Guide to the Use of the International Phonetic Alphabet, Cambridge University Press, pp. 100–103, ISBN 978-0-521-63751-0
Schmidt, Ruth Laila (2003), "Urdu", in Cardona, George; Jain, Dhanesh (eds.), The Indo-Aryan Languages, Routledge, pp. 286–350, ISBN 978-0-415-77294-5.
Shapiro, Michael C. (2003), "Hindi", in Cardona, George; Jain, Dhanesh (eds.), The Indo-Aryan Languages, Routledge, pp. 250–285, ISBN 978-0-415-77294-5.

[1] Masica (1991:110)

[2] Masica (1991:111)

[Harvcoltxt|Shapiro|2003|p=258-3] Shapiro (2003:258)

[ohala102-4] Ohala (1999:102)

[5] Masica (1991:114)

[Harvcoltxt|Ohala|1999|p=101-6] Ohala (1999:101)

[7] Masica (1991:117–118)

[Harvcoltxt|Shapiro|2003|p=260-8] Shapiro (2003:260)

[9] Masica (1991:98)

[10] Kachru (2006:20)

[11] Tiwari, Bholanath ([1966] 2004) हिन्दी भाषा (Hindī Bhāshā), Kitāb Mahal, Allahabad, ISBN 81-225-0017-X.

[masicapage97-12] Masica (1991:97)

[ref25lojir-13] Janet Pierrehumbert, Rami Nair, Implications of Hindi Prosodic Structure (Current Trends in Phonology: Models and Methods), European Studies Research Institute, University of Salford Press, 1996, ISBN 978-1-901471-02-1, ... showed extremely regular patterns. As is not uncommon in a study of subphonemic detail, the objective data patterned much more cleanly than intuitive judgments ... [w] occurs when /व و/ is in onglide position ... [v] occurs otherwise ...

[A_Primer_for_Modern_Standard_Hindi-14] A Primer of Modern Standard Hindi. Motilal Banarsidass. Retrieved 25 August 2009.

[Association_for_Computational_Linguistics-15] "Hindi Urdu Machine Transliteration using Finite-state Transducers" (PDF). Association for Computational Linguistics. Retrieved 25 August 2009.

[16] Jain, Danesh; Cardona, George (26 July 2007). "The Indo-Aryan Languages". Routledge – via Google Books.

[Harvcoltxt|Masica|1991|p=92-17] Masica (1991:92)

[18] Shapiro (2003:261)

[19] Hayes (1995:276)

[20] ttp://www.und.nodak.edu/dept/linguistics/theses/2001Dyrud.PDF Dyrud, Lars O. (2001) Hindi-Urdu: Stress Accent or Non-Stress Accent? (University of North Dakota, master's thesis)

[21] ttp://www.speech.sri.com/people/rao/papers/icslp96_wbhyp.pdf Ramana Rao, G.V. and Srichand, J. (1996) Word Boundary Detection Using Pitch Variations. (IIT Madras, Dept. of Computer Science and Engineering)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

v t e Phonologies of the world's languages
Phonologies Orthographies Grammars Adjectives Determiners Nouns Prepositions Pronouns Verbs
A–E	Abkhaz Acehnese Adyghe Afrikaans American Sign Language Arabic Modern Standard Egyptian Hejazi Levantine Moroccan Tunisian Avestan Belarusian Bengali Bulgarian Burmese Catalan Chinese Mandarin Cantonese Hokkien Northern Wu Old Historical Chukchi Cornish Czech Danish Dutch Standard Orsmaal-Gussenhoven dialect English Australian General American New Zealand Received Pronunciation Regional North American White South African Standard Canadian Old Middle Esperanto Estonian
F–L	Faroese Finnish French Parisian Quebec Galician German Standard Bernese Greek Standard Modern Ancient Koine Greenlandic Gujarati Hawaiian Hebrew Modern Biblical Tiberian Samaritan Hindustani Hungarian Icelandic Ingrian Inuit Irish Italian Japanese Kiowa Konkani Korean Kurdish Kyrgyz Latgalian Latin Latvian Limburgish Maastrichtian Lithuanian Luxembourgish
M–S	Macedonian Malay Maldivian Māori Marathi Massachusett Medumba Navajo Nepali Norwegian Occitan Ojibwe Old Saxon Oromo Ottawa Pashto Persian Polish Portuguese Proto-Indo-European Ripuarian Colognian Kerkrade dialect Romanian Russian Sardinian Scots Scottish Gaelic Serbo-Croatian Slovak Slovene Somali Sotho Spanish Dialects and varieties Swedish
T–Z	Tagalog Tamil Taos Turkish Ubykh Ukrainian Uyghur Vietnamese Welsh West Frisian Yiddish Zuni