Wikipedia:Manual of Style/Persian

From Wikipedia, the free encyclopedia
Jump to: navigation, search


Persian is a member of the Iranian branch of the Indo-European languages. There are three closely related varieties of Persian:

  • Persian proper or Farsi (فارسی) is spoken in Iran.
  • Dari or Afghani Persian (دری) is spoken in Afghanistan and Pakistan.
  • Tajik (Тоҷикӣ / Tojikī / تاجیکی‎) is spoken in Tajikistan and the former USSR.

The Persian language has been written with a number of different scripts, including Old Persian cuneiform, Pahlavi (Middle Persian) and Avestan. After the Islamic conquest of the Persian Sassanian Empire in 651 AD, Arabic replaced Middle Persian as the language of government, culture and especially religion for the next two centuries.

Written Persian reappeared during the 9th and 10th centuries. Since then it has been written in a modified version of the Arabic script with additional letters. The period of the 13th–15th centuries is known as Classical Persian.

In the Tajik Soviet Socialist Republic of the former USSR the Tajik language has been created on the basis of the local dialects. In 1928–1939 it was shortly written with Latin script, and since 1939 with the Tajik version of Cyrillic alphabet.


There exist several romanization schemes for Persian. None of them can be seen as ultimate and universal. Although, three strategies can be concluded:

  • Monographic scientific ("strict") romanization thoroughly represents Persian pronunciation as well as Persian orthography, especially redundant Arabic letters. It follows the principle "one letter (sign) to one letter (sign)" and avoids digraphs but favours diacritical signs. Examples of such schemes: by the German Oriental Society (Deutsche Morgenländische Gesellschaft, DMG) or by Encyclopædia Iranica (EI).
  • Digraphic practical ("semi-strict") romanization generally follows the above principles but uses both diacritical signs and digraphs. However, the use of digraphs may lead to a confusion when combinations such as سه sh or سه zh occur. Examples of such schemes: the ALA-LC romanization or BGN/PCGN romanization.
  • Simplified romanization employs only the letters of the English alphabet. This generally follows digraphic romanization schemes but drops out any diacritical signs.

Romanization table[edit]

This is a compromise version of romanization that combines the existing schemes.

It is expected that the readers of Wikipedia have no linguistic background, so simplified romanization is advised for usage in articles. Original Persian spelling in parenthesis is enough for those who need it. However, the semi-strict romanization may be written alongside (usually after) Persian script to give a clue to the native pronunciation of a name or a word.

The scientific (strict) column is given rather for reference. If you need a more precise transliteration, use the semi-strict one: it is precise enough but uses less diacritical signs and more intuitive.

Unicode Persian
IPA Scientific
U+0627 ا ʔ, ∅[a] ʾ, —[b] ’, —[b]
U+0628 ب b b
U+067E پ p p
U+062A ت t t
U+062B ث s s
U+062C ج j
U+0686 چ č ch
U+062D ح h h
U+062E خ x ḫ/ḵ/x kh
U+062F د d d
U+0630 ذ z z
U+0631 ر r r
U+0632 ز z z
U+0698 ژ ʒ ž zh
U+0633 س s s
U+0634 ش ʃ š sh
U+0635 ص s s
U+0636 ض z ż z
U+0637 ط t t
U+0638 ظ z z
U+0639 ع ʿ
U+063A غ ɣ ġ/ḡ gh
U+0641 ف f f
U+0642 ق ɢ~ɣ q
U+06A9 ک k k
U+06AF گ ɡ g
U+0644 ل l l
U+0645 م m m
U+0646 ن n n
U+0648 و v~w[a][c] v, w[d]
U+0647 ه h[a] h
U+0629 ة ∅, t t[e]
U+06CC ی j[a] y
U+0621 ء ʔ, ∅ ʾ
U+0624 ؤ ʔ, ∅ ʾ
U+0626 ئ ʔ, ∅ ʾ
Unicode Final Medial Initial Isolated IPA Scientific
U+064E ◌َ ◌َ اَ ◌َ æ a
U+064F ◌ُ ◌ُ اُ ◌ُ o o
U+0648 U+064F ◌ﻮَ ◌ﻮَ o[c] o
U+0650 ◌ِ ◌ِ اِ ◌ِ e e
U+064E U+0627 ◌َا ◌َا أ ◌َا ɑː~ɒː ā a
U+0622 ◌ﺂ ◌ﺂ آ ◌آ ɑː~ɒː ā, ʾā ā a
U+064E U+06CC ◌َﯽ ◌َی ɑː~ɒː á ā a
U+06CC U+0670 ◌ﯽٰ ◌یٰ ɑː~ɒː á ā a
U+064F U+0648 ◌ُﻮ ◌ُﻮ اُو ◌ُو uː, oː[d] ū, ō[d] u, ō[d]
U+0650 U+06CC ◌ِﯽ ◌ِﯿ اِﯾ ◌ِی iː, eː[d] ī, ē[d] i, ē[d]
U+064E U+0648 ◌َﻮ ◌َﻮ اَو ◌َو ow~aw[d] ow, aw[d]
U+064E U+06CC ◌َﯽ ◌َﯿ اَﯾ ◌َی ej~aj[d] ey, ay[d]
U+064E U+06CC ◌ﯽ ◌ی –e, –je –e, –ye
U+06C0 ◌ﮥ ◌ﮤ –je –ye


  1. ^ a b c d Used as a vowel as well.
  2. ^ a b Not transliterated at the beginning of words.
  3. ^ a b At the beginning of words the combination ⟨خو⟩ was pronounced /xw/ or /xʷ/ in Classical Persian. In modern varieties the glide /ʷ/ has been lost, though the spelling has not been changed. It may be still heard in Dari as a relict pronunciation. The combination /xʷa/ was changed to /xo/.
  4. ^ a b c d e f g h i j k In Dari.
  5. ^ When used instead of ⟨ت⟩ at the end of words.
  6. ^ Diacritical signs (harakat) are rarely written.

Redundant letters[edit]

Persian has seven redundant letters inherited from Arabic: ⟨ث ص⟩ for ⟨سs, ⟨ذ ظ ض⟩ for ⟨زz, ⟨ط⟩ for ⟨تt, ⟨ح⟩ for ⟨هh. Usually, they are represented in romanizations with one diacritical sign or another. Unlike Arabic, this diacritics does not signify any changes in Persian pronunciation. The motive for this is backward conversion: one could restore the original Persian spelling from a romanization. But if the original spelling for a Persian word is already provided, there is no reason to write these diacritical signs, so you do not have to use them.


When combinations گه gh, که kh, سه sh, زه zh occur, a middle dot ⟨·⟩ or an apostrophe ⟨'⟩ may be employed: g·h, k·h, s·h, z·h.


In Classical Persian there were three short vowels: a, i, u, and five long ones: ā, ē, ī, ō, ū. In modern varieties the distinction is between three unstable (formerly short) vowels: a, e, o, and three stable (formerly long) ones: ā, i, u. Sometimes a macron could be seen over the latter two: ī and ū, but as there is no short i and u in Modern Persian (either Farsi or Dari, but not Tajik), there is no need in such redundant notation. In simplified romanization the macron over the stable a could be also ignored. For ē and ō see the section below.

The ending -eh[edit]

The Middle Persian nominal ending -ag is written with the Arabic letter ⟨ه⟩ and pronounced either with a in Classical Persian and Dari or e in Iranian Farsi. The tradition is to retain this mute letter h in romanization. So شاهنامه is Shahnameh or Shahnamah. Note that Encyclopædia Iranica prefers -a.

Mute h[edit]

The word-final mute ⟨ه⟩ can signify any other final vowel than the above-mentioned ending.

Mute v[edit]

The initial combination ⟨خو⟩ that represented either /xʷ/ or /xw/ in Classical Persian has been simplified into /x/ in Modern Persian. It is advised not to transliterate this mute letter but in some cases it may be represented with ⟨ʷ⟩ (U+02B7 MODIFIER LETTER SMALL W). E.g. Khʷārazm or Khārazm.

Dari and Classical Persian[edit]

Dari, the variety used in Afghanistan, is more conservative in many ways and retains many traits of Classical Persian:

  • Dari preserves two long vowels ē and ō, while in Iranian Persian they are merged with ī and ū respectively. E.g. the Persian words for "lion" and "milk" are written سیر but pronounced differently in Dari and Classical Persian: shēr and shīr, but the same in Iran: shir. If you want to present this distinction, it is better to write the macron.
  • Dari preserves the quality of diphthongs ay and aw, whereas in Iran they are ow and ey.
  • Dari preserves different pronunciation of the letter ⟨قq, whereas in Iran the letter is merged with ⟨غgh in pronunciation.
  • Dari uses the semivowel pronunciation w of the letter ⟨و⟩.

It is up to the writer to decide whether to represent or not these linguistic peculiarities in the articles concerning Afghanistan. An advice here: be consistent and do not mix up two varieties. Articles concerning Classical (pre-modern) periods may follow the romanization of the sources cited.

Old and Middle Persian[edit]

For Old and Middle Persian use transliteration schemes established by scientific community and/or try to follow the sources. Some simplifications may be applied: ZaraϑuštraZarathushtra, GāϑāGatha, etc.

Practical use[edit]

Lead paragraphs[edit]

All Persian-related articles should have a lead paragraph which includes the article title in simplified romanization, along with the original Persian script and the semi-strict romanization in parenthesis, the latter gives a reader a general hint how the name or word is pronounced by native speakers. The Persian script may be enclosed in either {{lang-fa}}, {{lang-prs}} or {{lang}}, while the romanization in either {{unicode}} or {{transl}}.

Consider the following examples:

'''Tehran''' ({{lang-fa|تهران}}, ''{{unicode|Tehrān}}'') is a capital of Iran.
'''Kabul''' ({{lang-prs|کابل}}, ''{{unicode|Kābol}}'') is a capital of Afghanistan.

which gives:

Tehran (Persian: تهران‎‎, Tehrān) is the capital of Iran.
Kabul (Dari: کابل‎, Kābol) is the capital of Afghanistan.

Some cases may require variations on this format.

Consider the following:

Omar Khayyam (born Ghiyās̱-ad-Din Abu-l-Fatḥ ‘Omar ebn Ebrāhim al-Khayyām Nishāpuri, غیاثالدین ابوالفتح عمر ابراهیم خیام نیشابورﻯ) was a Persian poet and polymath.
Ferdowsi, or Firdawsi (full name in Persian: حکیم ابوالقاسم فردوسی توسی‎‎, Ḥakim Abu-l-Qāsem Ferdowsi Tusi) was a Persian poet.

The articles that are missing this information are listed at Articles needing Persian script or text.

In accordance with the official Wikipedia policy at Wikipedia:Naming conventions (use English) if the name has its accepted English form, then use it everywhere: in the name of the article, in the lead paragraph and in the article itself, e.g. use Kabul, not Kabol, Isfahan, not Esfahan, Kunduz, not Qondoz (except for the semi-strict romanization after Persian script).


All common transliterations should redirect to the article. There may often be many redirects, but this is intentional and does not represent a problem.

In text[edit]

Use simplified romanization for Persian names and words whenever possible. If you introduce a Persian name or word the first time, provide the Persian script and the semi-strict transliteration in parenthesis. Example:

An early epic poem of Persian classical literature is the Shahnameh (Persian: شاهنامه‎‎, Shāhnāmeh) by Ferdowsi (Persian: فردوسی‎‎). Ferdowsi wrote the Shahnameh between 977 and 1010 AD. (Not "Ferdowsī wrote the Šāhnāmeh...")

Tajik Cyrilic[edit]

Since Tajik is written in a more or less phonetic alphabet, its romanization causes no much difficulties. In general it follows the Wikipedia guidance for Russian.

However, there cases where one has to pay attention:

  • Tajik has three additional consonants: ⟨ғ, қ, ҳ, ҷ⟩, they correspond to the Perso-Arabic letters ⟨غ ,ق ,ه ,چ⟩‎, so they are transliterated accordingly: gh, q, h, j.
  • Tajik has two historically "long" vowels: ⟨ӣ⟩ and ⟨ӯ⟩. Since Tajik vocalism differs from Farsi and Dari, it is better not to ignore the macron to prevent any confusion: ī and ū.
  • Unlike Russian, Tajik has no palatalized consonants. The letters ⟨ё, ю, я⟩ are always represented by digraphs: yo, yu, ya. The letter ⟨ё⟩ should never be confused with the letter ⟨е⟩.
  • The letter ⟨е⟩: e after consonants, ye in other cases.
  • The obsolete Russian letters ⟨ц, щ, ы, ь⟩ might be seen, they transliterated as in Russian.
Cyrillic IPA Romanization
А а /æ/ a
Б б /b/ b
В в /v/ v
Г г /ɡ/ g
Ғ ғ /ɣ/ gh
Д д /d/ d
Е е /je, e/ ye, e
Ё ё /jɒ/ yo
Ж ж /ʒ/ zh
З з /z/ z
И и /ɪ/ i
Ӣ ӣ /i/ ī
Й й /j/ y
К к /k/ k
Қ қ /q/ q
Л л /l/ l
М м /m/ m
Н н /n/ n
О о /ɒ/ o
П п /p/ p
Р р /r/ r
С с /s/ s
Т т /t/ t
У у /u/ u
Ӯ ӯ /ɵ/ ū
Ф ф /f/ f
Х х /χ/ kh
Ҳ ҳ /h/ h
Ч ч /ʧ/ ch
Ҷ ҷ /ʤ/ j
Ш ш /ʃ/ sh
Ъ ъ /ʔ/
Э э /e/ e
Ю ю ju /ju/ yu
Я я /jæ/ ya