= Phonological history of Hindustani =

The inherited, native lexicon of the Hindustani language exhibits a large number of extensive sound changes from its Middle Indo-Aryan and Old Indo-Aryan. Many sound changes are shared in common with other Indo-Aryan languages such as Marathi, Punjabi, and Bengali.

== Indo-Aryan etymologizing ==

The history of Hindustani language is marked by a large number of borrowings at all stages. Native grammarians have devised a set of etymological classes for modern Indo-Aryan vocabulary:

- Tadbhava (तद्भव, "arising from that") refers to terms that are inherited from vernacular Apabhraṃśa (अपभ्रंश, "corrupted"), from the dramatic Prakrits, and further from Sanskrit. An example is Hindustani jībh (जीभ ) "tongue", inherited through Prakrit jibbhā, from Sanskrit jihvā. Such words are the focus of this article.
- Tatsama (तत्सम, "same as that") refers to words that are borrowed into Hindi or Old Hindi directly from Sanskrit with minor phonological modification (e.g. lack of pronunciation of the final schwa). The Hindi register of Hindustani is associated with a large number of tatsama words through Sanskritisation. An example is Hindustani rūp (रूप ) "form", directly from Sanskrit rūpa.
- Ardhatatsama (अर्धतत्सम, "half-same as that") refers to words that are semi-learned borrowings from Sanskrit. That is, words that underwent some tadbhava sound changes, but were adapted on the basis of a Sanskrit word. An example is Hindustani sūraj (सूरज ) "sun", which is from Prakrit sujja, from Sanskrit sūrya. We would expect Hindustani *sūj from Prakrit, but the -r- was added later on after the Sanskrit word. Such adaptation to Sanskrit occurred continuously and as early as the Middle Indo-Aryan stage. Adapted words were crucial to determining the date and chronology of sound changes.
- Deśaj (देशज, "indigenous") refers to words that may or may not be derived from Prakrit, but cannot be shown to have a clear Sanskrit etymon. This is sometimes complicated by Sanskrit re-borrowing of Prakrit words. Such words sometimes derive from Non-Indo-Aryan languages—primarily Austroasiatic (Munda) languages, as well as Dravidian and Tibeto-Burman languages. An example is Hindustani ōṛhnā (ओढ़ना ) "to cover up, veil", from Prakrit ǒḍḍhaṇa "covering, cloak", from Dravidian, whence Tamil uṭu (உடு) "to wear".

In the context of Hindustani, other etymological classes of relevance are:

- Perso-Arabic loanwords, which came to Old Hindi from Classical Persian. The pronunciation is closer to Classical Persian, rather than modern Iranian Persian. The Urdu register of Hindustani is associated with a large number of Perso-Arabic loanwords. An example is Hindustani zubān "tongue, language", from Classical Persian zubān (whence Persian zobân).
- Borrowings from Northwestern Indo-Aryan. Modern Hindustani, while based primarily on the language of the Khariboli region, comes from a dialectal mixture. Many of the Western Hindi dialects are transitional to Punjabi and the Northwestern Indo-Aryan languages, and have donated words to Hindustani that underwent Northwestern sound changes. We often encounter doublets like Hindustani makkhan (मक्खन ) "butter", borrowed from Northwestern dialects—compare Punjabi makkhaṇ (ਮੱਖਣ )—, and Hindustani mākhan (माखन ), the native tadbhava term which is now archaic/obsolete outside of fossilized phrases.

Like many other languages, many phenomena in the historical evolution of Hindustani are better explained by the wave model than by the tree model. In particular, the oldest changes like the retroflexion of dental stops and loss of ṛ have been subject to a great deal of dialectal variance and borrowing. In the face of doublets like Hindustani baṛhnā (बढ़ना ) "to increase" and badhnā (बधना ) "to increase" where one has undergone retroflexion and the other has not, it is difficult to know exactly under what conditions the sound change operated. One often encounters sound changes described as "spontaneous" or "sporadic" in the literature (such as "spontaneous nasalization"). This means that the sound change's context and/or isogloss (i.e. dialects in which the sound change operated) have been sufficiently obscured by inter-dialect borrowing, semi-learned adaptations to Classical Sanskrit or Prakrits, or analogical leveling.

==Changes from late Middle Indo-Aryan up to Old Hindi==

Changes after this point characterize the New Indo-Aryan (NIA) era from the MIA period. These changes up to Old Hindi (OH) start to distinguish Hindi from nearby languages like Marathi, Gujarati, and Punjabi. Many of these rules are sporadically underway already in Late Prakrit/Apabhramsha.
- Prakrit ṇ /ɳ/, ḷ /ɭ/ are dentalized to n /n̪/, l /l/ everywhere. In Marathi, Gujarati, and Punjabi, we instead find retroflex forms intervocalically and dental forms elsewhere.
- Intervocalic v /ʋ/ is lost around ī̆ /i(ː)/. — Prakrit ṇāvia- (णाविअ-) > Hindustani nāī (नाई ) "barber". Compare Marathi nhāvī (न्हावी).
- Initial v /ʋ/ > b /b/ and medial vv /ʋː/ > bb /bː/ — Prakrit vāla- (वाल​-) > OH bāla (बाल ) "hair", whence Hindustani bāl. Compare Gujarati vāḷ (વાળ).
- ī is shortened before a vowel. — Prakrit bīa- (बीअ-) > OH/Hindustani biyā (बिया ) "seed".
- Several vowel coalescence rules that reduce the frequency of vowels in hiatus. These rules are present to some degree in all NIA languages:
  - Diphthongs ai /a͡ɪ/, au /a͡ʊ/, āy /ɑːj/, and āv /ɑːʋ/ are the outcomes of the two-vowel sequences aï /ɐ.i/, aü /ɐ.u/, āi /ɑː.i/, and āu /ɑː.u/, respectively.
  - When followed by a stressed vowel, short i /i/ and u /u/ become glides. — Prakrit pivāsā- (पिवासा-, /piʋɑːsɑː/) > Apa. piāsa- (पिआस​-, /piˈɑː.sɐ/) > OH pyāsa (प्यास , /ˈpjɑː.sɐ/) "thirst", whence Hindustani pyās.
  - When a short, unstressed vowel is preceded by /i(ː) u(ː) eː oː/, the second vowel is lost and the first vowel is lengthened if short. — Prakrit sīala- (सीअल​-, /siːɐlɐ/) > OH sīla (सील , /ˈsiː.lɐ/) "cold, damp", whence Hindustani sīl.
  - Prakrit /ɐɐ/ (spelled aa अअ, aya अय) generally coalesces to the diphthong ai /a͡ɪ/ (more rarely /a͡ʊ/), but can sometimes contract further to e /eː/. Similarly, ava /ɐʋɐ/ coalesces to the diphthong au /a͡ʊ/, but can sometimes contract further to o /oː/. — Prakrit ṇaaṇa- (णअण​-, /nɐ.ɐ.nɐ/) > OH naina (नैन , /ˈn̪a͡ɪ.n̪ɐ/) "eye", whence Hindustani nain. Turner explains the occasional further contraction of ai > e and au > o (at least for Gujarati) in terms of inherited words versus semi-learned words: in the former the process has had time to go further. A similar explanation of occasions where -y- possessed more reality could be drawn up to word frequency, dialectal borrowing, and semi-learned borrowings.
  - Remaining short/long vowels of like quality coalesce into a single long vowel. — Prakrit duuṇa- (दुउण-, /d̪u.u.n̪ɐ/) > OH dūna (दून , /ˈd̪uː.n̪ɐ/).
  - In remaining cases or in if a morpheme boundary is felt between the vowels in hiatus, vowels may not coalesce. A semivowel may optionally appear to fill the hiatus.
- Sound changes relating to the simplification of consonant clusters:
  - For stressed syllables, the general rule is VCː > VːC and VNC > ṼːC. That is, a consonant cluster is simplified and the preceding vowel undergoes compensatory lengthening or lengthening + nasalization. Per usual, a /ɐ/ lengthens and shifts in quality to ā /ɑː/. Short allophonic ĕ /e/ and ǒ /o/ always elongate to e /eː/ and o /oː/. This change occurred in all regions in some form, excluding the Northwest (e.g. Punjabi). Generally, this sound change had already occurred in the East by the eighth century AD, based on inscriptions found in East Bengal and Chinese-Sanskrit dictionaries of the time. It was probably completed in the Central region by the tenth century.
    - Prakrit satta (सत्त​, /sɐt̪ːɐ/) > OH sāta (सात , /ˈsɑː.t̪ɐ/) "seven", whence Hindustani sāt. Compare Punjabi sattă (ਸੱਤ ).
    - Prakrit daṃta- (दंत​-, /d̪ɐn̪.t̪ɐ/) > OH dā̃ta (दाँत , /ˈd̪ɑ̃ː.t̪ɐ/) "tooth", whence Hindustani dā̃t. Compare Punjabi dand (ਦੰਦ ).
  - Compensatory lengthening from older geminates was sometimes accompanied by spontaneous (and regionally random) nasalization of the vowel. In some cases, this goes back to Prakrit or is otherwise reflected in nearby NIA languages.
  - Unstressed syllables generally underwent VCː > VC and VNC > VNC, i.e. the vowel is left short. — Prakrit kappūra- (कप्पूर​-, /kɐpːuːɾɐ/) > OH kapūra (कपूर , /kɐˈpuː.ɾɐ/) "camphor". Compare Old Marathi kāpura (𑘎𑘰𑘢𑘳𑘨), with lengthening of a > ā.
  - When a stressed VCː or VNC syllable is preceded by another heavy syllable (i.e. of the form Vː(C), VCː, or VNC), it will also sometimes undergo VCː > VC and VNC > VNC with no compensatory lengthening, shifting stress onto the preceding syllable. — Prakrit pālakka- (पालक्क​-, /pɑːlɐkːɐ/) > OH pālaka (पालक , /ˈpɑː.lɐ.kɐ/) "spinach", whence Hindustani pālak. Occasionally, though, compensatory lengthening will occur, as in Prakrit bhattijja- (भत्तिज्ज​-, /bʱɐt̪ːid͡ːʒɐ/) > OH bhatījā (भतीजा , /bʱɐˈt̪iː.d͡ʒɑː/) "nephew", whence Hindustani bhatījā.

== Changes within Old Hindi and up to Hindustani ==
The following sound changes characterize certain dialects of Old Hindi, later Old Hindi, and modern Hindustani. These changes distinguish Hindustani from other Central Indo-Aryan languages, like Braj Bhasha and Awadhi.

- Final nominative -au (-औ ) > -ā (-आ ). Compare Marathi -ā (-आ), Punjabi -ā (-ਆ ), but Gujarati -o (-ઓ) and Braj -au (-औ).
- Attenuation of post-tonic and final short vowels to /ǝ/. A number of words are saved from this lenition by semi-learned lengthening of the final vowel.
- During the Old Hindi stage, final unstressed -ai (-ऐ ) and -au (-औ ) monophthongized to -e (-ए ) and -o (-ओ ), respectively.
- Long vowels (often resulting from compensatory lengthening) are generally shortened (accompanied by a change in quality if necessary) before two or more syllables where at least one of the syllables is heavy. That is, ā > a (ɑː > ɐ), e ī > i (eː iː > i), o ū > u (oː u > u). This rule is fairly productive in Modern Hindustani and partially explains Hindi's distinctive ablaut alterations when certain words are suffixed.
  - OH mīṭhāī (मीठाई ) > later OH miṭhāī (मिठाई ) "sweetness", whence Hindustani miṭhāī. As a general rule in modern Hindustani, the stressed suffix -āī causes the root vowel to reduce, hence Hindustani mīṭhā "sweet" + -āī → miṭhāī "sweetness" with short -i-.
  - OH āpanā (आपना ) > later OH apanā (अपना ) "one's, your", whence Hindustani apnā. Compare Gujarati āpno (આપનો), where the ā was never shortened.
- Old Hindi has a huge influx of tatsama borrowings and ardhatasama (semi-learned) borrowings from Sanskrit. For instance, from Prakrit suddha- (सुद्ध-) we find both OH sūdha (सूध​-) and OH sudha (सुध​-) meaning "pure". The first is the expected reflex and the second term was influenced in vowel length by the Sanskrit etymon śuddha (शुद्ध​) "pure". The tatsama śuddha (शुद्ध) is itself encountered in Old Hindi and Hindustani.
- In verbs, the length of the vowel is frequently manipulated to reflect the transitivity of the verb. This tendency is known since Sanskrit—compare passive tapyate (तप्यते) "is heated" with active tāpayati (तापयति) "heats, causes to heat up". From Prakrit tappa- (तप्प​-) we get the Hindustani pair tapnā (तपना) "to be heated" and tāpnā (तापना) "to heat (something)".
- In some multi-syllabic words, the VCː or VNC sequence was left unsimplified, perhaps due to borrowing from the northwest (whence Punjabi and Sindhi). The vowel lengthening rules did not take place in the northwestern region (words with the VCː > VːC and VNC > ṼːC sound change in Punjabi and Sindhi are themselves borrowings from other Indo-Aryan languages, like Hindustani). These borrowings, likely from a Western Hindi dialect transitional to Punjabi, result in a large number of doublets in Hindustani.
- Indo-Aryan schwa deletion: ɐ → ∅ / VC_CV, _#, though the application of this rule (particularly when there are many schwas in sequence) is dependent on the morphological boundaries of the word. This change is not indicated in the Devanagari script for Hindustani. — OH rāta (रात ) > Hindustani rāt (रात ) "night".
- When short i /i/ or u /u/ are in the VC_CV or _# contexts and the immediately preceding syllable has short a /ɐ/, the a /ɐ/ will assimilate to the i /i/ or u /u/ and the original i /i/ or u /u/ will be deleted. — OH aṅgulī (अंगुली ) > Hindustani uṅglī (उंगली ) "finger". Compare Punjabi aṅgulī (ਅਂਗੁਲੀ ), uṅgulī (ਉਂਗੁਲੀ ), uṅgal (ਉਂਗਲ ).
- Unstressed (short) vowels are also lost in other positions, particularly initial vowels in words of 3 or more syllables or intertonic short vowels. — OH aṛhāī (अढ़ाई ) > Hindustani ḍhāī (ढाई ) "two and a half".
- Lenition of Ṽbh > Vmh and V̆b > Vm. — OH ā̃ba (आँब ) > Hindustani ām (आम ) "mango".
- Loss of nasal aspiration if not pre-vowel. — OH tumha (तुम्ह ) > Hindustani tum (तुम ) "you". Compare Marathi tumhī (तुम्ही) and Hindustani tumhārā (तुम्हारा ) "your", where the medial -mh- is retained as it is pre-vowel.

- Sounds from loanwords: The sounds /f, z, ʒ, q, x, ɣ/ are loaned into Hindi-Urdu from Persian, English, and Portuguese.
  - In Hindi, /f/ and /z/ are most well-established, but can be /pʰ/ or /dʒ/ in rustic speech. /q, x, ɣ/ are variably (by dialect) assimilated into /k, kʰ, g/, respectively, and /ʒ/ is almost never pronounced and substituted by /ʃ/ or /dʒʰ/.
  - /pʰ/ is starting to merge into /f/ in a number of Hindustani dialects.
  - Sanskrit ṛ is borrowed into Gujarati, Hindustani and Bengali as /rɪ/, but is pronounced more like /ru/ in languages like Marathi and Odia.
- Monophthongization of ai to /ɛː ~ æː/ and au to /ɔː/ in many non-Eastern dialects. A separate /æː/ arguably exists in Hindustani by English loanwords.
- Shifts before /ɦ/: Before h + a short vowel or deleted schwa, the pronunciation of short a shifts allophonically to short [ɛ] or [ɔ] (only if the short vowel is u). This change is part of the prestige dialect of Delhi, but may not occur to the full degree for every speaker. Often, this step is taken further by assimilation of short vowel after /ɦ/ to [ɛ] or [ɔ], and then by loss of /ɦ/ and coalescence/lengthening of vowels into long /ɛː/ and /ɔː/. In some cases, different inflections of the same word have differing outcomes.
  - Hindustani bahut (बहुत , /bǝ.ɦʊt̪/) > [bɔ.ɦʊt̪] > [bɔ.ɦɔt̪] > [bɔːt̪] "a lot, many"
  - Hindustani kahnā (कहना , /kǝɦ.näː/) > [kɛɦ.näː] > [kɛː.näː] "to say", but kahegā (कहेगा ) "he will say" is still pronounced regularly as [kǝ.ɦeː.gäː].

==Examples of sound changes==
The following table shows a possible sequence of changes for some basic vocabulary items, leading from Sanskrit to Modern Hindustani. Words may not be attested at every stage.
  - Table of Sound Changes**

| Sanskrit | Early Prakrit | Middle Prakrit | Late Prakrit | (Early) Old Hindi | Hindustani | Meaning |
| यूथिका yūthikā | जूथिका jūthikā | जूहिआ jūhiā | जूहिअ jūhia | जूही jūhī | juhi flower | |
| व्याघ्रः vyāghraḥ | वग्घो vaggho | वग्घु vagghu | बाघ bāgha | बाघ bāgh | tiger | |
| उत्पद्यते utpadyate | उप्पज्जति uppajjati | उप्पज्जइ uppajjaï | उपजै upajai | उपजे upje | (it) grows | |
| कुम्भकारः kumbhakāraḥ | कुम्भकारो kumbhakāro | कुंभआरो kuṃbhaāro | कुंभआरु kuṃbhaāru | कुंभार kumbhāra | कुम्हार kumhār | potter |
| श्यामलकः śyāmalakaḥ | सामलको sāmalako | सामलओ sāmalao | सावलउ sāṽalaü | साँवलौ sā̃valau | साँवला sā̃vlā | dusky |
