Sotho tonology

  • All examples marked with are included in the audio samples. If a table caption is marked then all Sesotho examples in that table are included in the audio samples.
  • The orthography used in this and related articles is that of South Africa, not Lesotho. For a discussion of the differences between the two see the notes on Sesotho orthography.
  • Hovering the mouse cursor over most italic Sesotho text should reveal an IPA pronunciation key (excluding tones). Note that often when a section discusses formatives, affixes, or vowels it may be necessary to view the IPA to see the proper conjunctive word division and vowel qualities.
  • Some systems without the necessary monospace fonts may render the diagrams used to illustrate the tonal rules incorrectly.

Like almost all other Niger–Congo languages, Sesotho is a tonal language, spoken with two basic tones, high (H) and low (L). The Sesotho grammatical tone system (unlike the lexical tone system used in Mandarin, for example) is rather complex and uses a large number of "sandhi" rules.

However, the Sesotho system is by no means the most complicated, nor even one of the more complicated. For example, there exist African grammatical tone languages with much more than just two tonemes, and the existence of breathy voiced consonants in the Nguni and other languages greatly complicates their tonology. (In Sesotho there is absolutely no interaction whatsoever between the tonemes and phones of the syllables.) There are also very few instances of "floating" tones, and fewer grammatical constructs indicated purely by a change in tone. (The most common instances of this are rule 1 of the plain copulative and the formation of many positive participial sub-mood clauses.) The rules are generally not very dramatic either, and there is generally a very strong tendency to preserve underlying high tones. (For example, in the Nguni languages the underlying high tone of verb stems, subjectival concords, the noun pre-prefix, and/or objectival concords often shifts several syllables to the right, to the antepenultimate or penultimate syllable.)

The tone of a syllable is carried by the vowel, or the nasal, if the nasal is syllabic.[1] The tone carried by syllabic l (and, in Northern Sotho and Setswana, syllabic r) is left over from the elided vowel.

Tone types[edit]

Underlyingly, each syllable of every morpheme may be described as having one of two tone types:[2] high (H [ ¯ ]) and null (ø). On the surface, all remaining null tones default to low (the LTA rule below) and the language is therefore spoken with two contrasting tonemes (H and L).

A classic example of a nasal carrying a tone:

To form a locative from a noun, one of the possible procedures involves simply suffixing a low tone -ng to the noun. To form the locative meaning "on the grass" one suffixes -ng to the word jwang [ _ ¯ ], giving jwanng [ _ ¯ _ ] ([ʒʷɑŋ̩ŋ̩]), with the two last syllabic nasals having contrasting tones.

Names, being nouns, frequently have a tonal pattern distinct from the noun:

The Sesotho word for "mother/missus/ma'am" is mme [ _ ¯ ], but a child would call their own mother mme [ ¯ _ ], using it as a first name. Also, ntate [ _ _ ¯ ] means father/mister/sir, while ntate [ _ ¯ ¯ ] might be used by a small child to say "dad."


In speech, the two surface tonemes may be pronounced as one of several allotones due to the influence of surrounding tones and the length of the syllable. These changes naturally occur due to the way the language is spoken, including the effect of the penultimate lengthening, but ultimately each syllable of every morpheme may be completely described as having only high and low tones.[3]

In this and related articles, the tonemes of a word are delimited with square brackets and the specific (approximate) spoken allotones are between curly braces.

lepata euphemism; tonemes: [ _ ¯ _ ] (L — H — L), allotones: { _ _ } (low — high-falling — low)
Tonemes and allotones
Toneme Allotone Where found Example
H [ ¯ ] extra-high { ¯ } After another H, not penult moririnyana small hair { _ ¯ ¯ _ }
high { ¯ } In the bodies of words ho lekola to investigate { _ ¯ _ }
mid {} Finally in a phrase Mopedi Mopedi person { _ }
high-falling { } Penultimate syllable of phrase before L mmuo dialect { _ _ }
high-mid { } Penult before H mosadi woman { _ }
L [ _ ] extra-low { _ } Finally after another L temo agriculture { _ }
low { _ } Noun class prefixes, in the bodies of words,
and finally after H
ho seba to perform mischief { _ _ }
low-falling { } Penultimate tweba mouse { _ }

Thus in all there are, at least in our analysis, eight allotones[4] { ¯ ¯ – _ _ }.

Most of these allotones only appear on the final word in the phrase in moderately slow or emphasised speech. When not phrase-final, the mid, high-falling, high-mid, low-falling, and extra-low allotones are normally not heard. Bear in mind that the falling tones only occur on lengthened syllables, and if a word has irregular stress then the falling tones will not appear on the penult (for example, the second form of the first demonstrative pronoun has tonemic pattern [ ¯ ¯ ] which is pronounced { ¯ } due to the stressed final syllable).

It is interesting to note that there are no rising tones. For example, [ _ ¯ ] (where the L is penultimate) is pronounced { } though one might have expected *{ / ¯ }. This is a general trend among almost all Bantu languages with (contrastive or stressed) lengthened vowels, though languages with breathy voiced consonants do have audible upward "swoops" on breathy voiced syllable onsets which may be interpreted as rising allotones.

There are several cases of seemingly tonemic instances of some of these allotones. As expected, some ideophones and radical interjectives have strange tones, but it should also be noted that the relative concord has an irregular extra-high tone (except when used to form demonstrative pronouns). The difference in relative pitch between the high tone and its extra-high allotone is less than that between the low and high tones.

Tone usage[edit]

The purpose of the tones can fall into at least one of the following categories:

Characteristic tone[edit]

Each complete Sesotho word has an inherent tone for its syllables, which, although not essential to forming correct speech, will betray a foreign accent:

motho [ _ _ ] human being
ntja [ _ ¯ ] dog
Mosotho [ _ ¯ _ ] singular of Basotho
lerata [ _ _ ¯ ] noise

Various factors mean that the tones of a word may change, but the characteristic tone in a Sesotho word is found when the word is the last in a question sentence not employing the interrogative adverb na?. In this situation, downdrift is greatly attenuated, the penultimate syllable of the sentence is short (although the vowel of the last syllable may completely cut), and the tone of the last word is largely preserved (though a final H tone may fall to L).

O batla ho eba setsebi { _ _ } You want to be a scientist
Na o batla ho eba setsebi? { _ _ } Do you want to be a scientist?
O batla ho eba setsebi? { _ ¯ _ } Do you want to be a scientist?

Distinguishing/semantic tone[edit]

The most important property of tonal languages which distinguishes them from languages that merely use pitch as part of intonation (such as English) is the existence of numerous tonal minimal pairs. Often, a few words may be composed of exactly the same syllables/phonemes, yet have different characteristic tones (the example H verbs have low final tone due to the Finality Restriction):

ho aka [ _ ¯ _ ] to kiss
ho aka [ _ _ _ ] to tell lies
jwang [ _ ¯ ] grass
jwang [ ¯ _ ] how?
ho tena [ _ ¯ _ ] to wear
ho tena [ _ _ _ ] to annoy/disgust

There are, however, several basic homophones pronounced with exactly the same tonal patterns. In these cases only the context may be used to distinguish between the different meanings.

-laola L verb (i) rule; (ii) divine
-rola H verb (i) to forge metal, to hammer; (ii) to undress
mohlwa [ _ ¯ ] (i) termite(s); (ii) a lawn, lawn grass (of the graminaceae family)

There are instances of words being changed either through inflexion or derivation and as a result ending up sounding exactly like other words.

hlolo [ _ _ ] (i) hare, (ii) creation (from the L verb -hlola)

Grammatical tone[edit]

It regularly occurs that two otherwise similar sounding phrases may have two very different meanings mainly due to a difference in tone of one or more words or concords.

Ke ngwana wa hao [ _ _ ¯ ¯ ¯ _ ] I am your child
Ke ngwana wa hao [ ¯ _ ¯ ¯ ¯ _ ] He/she/it is your child
O mobe [ _ _ ¯ ] You are ugly
O mobe [ ¯ _ ¯ ] He/she is ugly
Ke batlana le bona [ _ _ _ _ ¯ _ _ ] I am looking for them (present indicative mood)
Ke batlana le bona [ ¯ ¯ _ _ ¯ _ _ ] As I was looking for them (participial sub-mood i.e. this is not a complete sentence but part of a longer sentence)

Note that when grammatical tone is used the tone of the significant word may influence the relative pitch of the rest of the phrase, although the tones of other words tend to remain intact.


Downdrift, where the absolute pitch (not tones) of the speaker's voice is gradually decreased as the sentence continues (often resulting in initial low tones being pronounced at a higher pitch than final high tones), is a feature during natural speech. Basically, a high tone immediately following a low tone is pronounced at a slightly lower frequency than a previous high tone.

Additionally, a slightly more dramatic lowering of pitch (a downstep) may occur between certain syllables. In Sesotho, the downstep (indicated with a !) naturally occurs between words (being less noticeable if the first word has no low tones) though there is at least one instances (in rule 1 of the plain copulative) where the lack of downstep (as well as other tonal factors) changes the utterance's meaning. In the following example, a grave accent (à) indicates a low tone and an acute accent (á) indicates a high tone.

Kè bàtlà
         !ó ló mò shébà
                              !ó ìlé

I need you to go look where she has gone to

This downdrift is greatly attenuated when the sentence is a question not using the interrogative adverb na?.

Verb tone[edit]

Sesotho verb stems fall into two categories: H stems and L stems. The difference lies in the "underlying tone" of the stem's first syllable (or the stem's "basic tone") being either high or null. When used with an object in the indicative remote future tense (the simple -tla- tense) the verb's stem is monotonous (all syllables high toned or all low toned) with the underlying tone of the first syllable spread to all the following syllables.

Nouns derived from the verb stem are fossilised with the tones of the simple class 15 infinitive as appears in medial positions without a subject or object. The procedure for creating this tonal pattern is intricate and involves several tonal rules.

These factors may also apply in normal verbal conjugations. Adding a verbal suffix (through derivation, not inflexion) creates a new verb stem which falls in the same tone category as the original, and is subject to the same rules.

-paqama (L verb stem) lie (face downwards) ⇒ ho paqama [ _ _ _ _ ] to lie ⇒ ho paqamisa [ _ _ _ _ _ ] to cause to lie ⇒ ho paqamisuwa [ _ _ _ _ _ _ ] to be caused to lie, etc.
-ahlola (H verb stem) judge ⇒ ho ahlola [ _ ¯ ¯ _ ] to judge kahlolo [ ¯ ¯ _ ] judgement, moahlodi [ _ ¯ ¯ _ ] judge, boahlodi [ _ ¯ ¯ _ ] state of being a judge

The tones of the noun prefixes of nouns derived from verbs are independent of the tones of the stem.

Some nouns derived from verbs have idiomatic tonal patterns independent of the original verb stem's tones.

-loka (L verb stem) be sufficient, okay ⇒ -lokela be sufficient for ⇒ tokelo human right (irregular tone [ _ ¯ _ ] instead of the expected [ _ _ _ ])

Several "tonal melodies" may be assigned to certain verbal conjugations based on the desired tense, aspect, and mood (for example, with many verb conjugations the only difference between the indicative mood and the participial sub-mood is one of tone). These are applied before most other rules and may be indicated by a code including the symbols H (high tone), L (low tone), B (verb stem's basic tone), and * (iteratively applying the preceding tone).

For example, applying the (present) "Subjunctive Melody" (HL*H) to the H verb stem -bona (see) and the L verb stem -sheba (look for/at) results in both Ke shebe tau (So I may look at the lion) and Ke bone tau (So I may see the lion) being pronounced with exactly the same tone pattern [ ¯ ¯ ¯ _ ¯ ].

Another way to designate the melodies is to use a standard template of the tense in question and indicate the melody by assigning tones to specific syllables in the resultant word (for example, the final syllable, the subjectival concord, etc.). So for the above example the Subjunctive Melody (actually, present-future subjunctive) may be specified by putting H tones on the first syllable (the subjectival concord's basic tone is ignored), the second syllable, and final syllable of the word and putting an explicit L tone on the fourth syllable (unless if the verb is disyllabic, in which case the fourth syllable is the final syllable and has an H tone) — thus preventing HTD.

Tonal rules[edit]

Sesotho is a grammatical tone language; this means that words may be pronounced with varying tonal patterns depending on their particular function in a sentence. Another interpretation is that the tones of the language interact in their own intricate "tonal grammar."

In order to create certain grammatical constructs, certain tonal rules may be used to modify the underlying tones of the word to create their surface tones. The words are then spoken using the surface tones.

This system is naturally somewhat complex. Indeed, the development of autosegmental phonology was largely motivated by the need for a satisfactory theoretical framework to deal with the tonal grammars of Niger–Congo languages. This article attempts to explain certain aspects of Sesotho tonology in a rule-based autosegmental framework.

The rules presented below are almost exclusively used in constructing the verbal complex as this is the part of speech most radically affected by the tonal grammar.

About autosegmental phonology[edit]

Autosegmental phonology was motivated by the need to represent properties which seem to span several "segments" (in our case, syllables) and seem to be somewhat independent of them. Underlyingly (that is, in the speaker's lexicon), some, but not necessarily all, of the segments of morphemes are associated with one or more properties. The segments are on one "tier" and their properties are on another, and the relationships between the two are indicated by joining them with association lines as follows:


An H verb ("see-able")

Each of the rules changes the associations in some way. For example, High Tone Doubling (HTD) causes the underlying H tone on the first syllable of the verb to also be linked to the syllable immediately to the right:


After HTD

In this article, the application of several rules in succession will be indicated with the following abbreviation:

  ├─┘      HTD

Two rules

The fact that the line emanating from the second syllable is only linked on the HTD line means that this is the first time that syllable is associated with that property.


One popular classification of tonal Bantu languages broadly separates them into two group: shifting languages and spreading languages. The Sotho–Tswana languages are bounded spreading languages as they have primitive rules which directly cause underlying high tones to be associated with (spread to) syllables to the right. The closely related Nguni languages, on the other hand, are unbounded shifting languages as they have primitive rules which directly cause underlying high tones to be moved (shifted to) syllables to the right. The following table presents an informal comparison between the tonal processes found in Sesotho and isiZulu ( = isiZulu, = Sesotho):

Sesotho and isiZulu tonal effects
  Bounded Unbounded

In the table, a process is unbounded if there is no set limit on the number of syllables over which it may occur. Sesotho has basic bounded spread (High Tone Doubling) and isiZulu has basic unbounded shift. Bounded shift in Sesotho occurs as the cumulative effect of bounded right tone spread (High Tone Doubling) and Left Branch Delinking, while various forms of spreading may occur in isiZulu if the word is very short or has two or more underlying highs.

Some tonal rules[edit]

In dealing with verbs, the following rules may be applied at various times:

  • High Tone Doubling (HTD) causes the H tone found on the first syllable of the verb stem, or on an H toned subjectival concord (whether it is used as part of a verb or a copulative), to be spread to (associated with) the syllable immediately to the right. For example ("They see" with no direct object; the bullets • are used here to join the parts of single words which would have been written separately in the current disjunctive orthography):

     │    │
     H    H


     ├─┘  ├─┘
     H    H

  • Iterative Tone Spread (ITS) causes the H tone found on the first syllable of the verb stem to be spread repeatedly to the right until the end of the verb complex. This rule is only applied in certain situations (such as when forming the perfect). For example ("I have bought for..." with two direct objects):




  • Right Branch Delinking (RBD) is an application of the obligatory contour principle which causes an H tone spread from a subjectival concord to a verbal auxiliary infix or objectival concord immediately to the left of the verb stem to be removed (delinked) if the verb stem is an H stem. For example ("They see"):

     ├─┘  ├─┘
     H    H


     │    ├─┘
     H    H

  • Left Branch Delinking (LBD) is an application of the "obligatory" contour principle which causes the H tone on the first syllable of an H verb stem to be delinked if the stem immediately follows an H toned subjectival concord, resulting in tonal pattern (HøH). Interestingly, this rule is idiolectical and is not applied by all Sesotho speakers.[5] For example ("They see..." when used with a direct object):

     │  ─┘
     H  H


     │  ┌─┘
     H  H

  • The Finality Restriction (FR) causes any H tones spread to the final syllable of the verb complex to be removed. This rule is not applied under all circumstances, and is never applied if the verb's stem is monosyllabic (that is, it never delinks the H tone on the verb stem's first syllable). It is also never applied when the verb is immediately followed a direct object (therefore it doesn't undo ITS, or the high tone copied to a disyllabic H verb's last syllable if it is immediately followed by an object).[6] For example ("I love" with no direct object):




  • Low Tone Assignment (LTA) is the very last rule applied and is always applied in all circumstances (not just when dealing with verbs). It simply assigns all unlinked segments (that is, segments with null tone) with an L tone. For example ("She is looking on behalf of" with two direct objects):



    ├──┘  │ │
    H     L L

Some examples[edit]

To construct many verb forms, including many positive indicative tenses without direct objects as well as infinitives, the following rules are applied in order:

  1. Underlying level (lexical tones)
    1. Verb roots (including melodies)
  2. Lexical level (rule assigned tones)
    1. Subject concords
    2. HTD
      1. Verb roots
      2. Subject concords
    3. OCP
      1. RBD
      2. LBD
  3. Postlexical level
    1. FR
    2. LTA

Note that the three main levels are always applied in this order, though the actual rules contained in the levels will change depending on the parts of speech, verb moods, etc. For the word O a bina ("She is singing") the application of the rules is as follows:

│ │  │ │  underlyingly (H stem)
│ │  H 
│ │  │   subjectival concord
─┤  ├─┤  HTD
  H │
│   ├─┤  RBD
│   │   FR
│   │   LTA
H L  H L

Constructing a word

The word appears on the surface with tonal pattern [ ¯ _ ¯ _ ].

Furthermore, the second last syllable of the word is lengthened (or "stressed"), and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels { ¯ _ _ }.

Extending the word by one syllable (O a bintsha "She is conducting"):

│ │  ││   │  underlyingly (H stem)
│ │  H│   │
│ │  ││   │  subjectival concord
  H│   │
─┤  ├┤   │  HTD
│   ├┤     RBD
│   ├┘     LTA
H L  H    L

Constructing a word

The word appears on the surface with tonal pattern [ ¯ _ ¯ ¯ _ ] (the high beneath the third syllable is associated with two syllables).

The second last syllable of the word is lengthened and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels { ¯ _ ¯ _ }.


  1. ^ That is, the tone-bearing unit (TBU) is basically the syllable. In general, to include languages with long vowels, one may say that the TBU of Bantu languages is the mora, and indeed, when dealing with stressed syllables, many descriptions of Sesotho tonology treat the TBU as the mora (that is, a long stressed syllable is analysed as two moras with different tones), but this is really unnecessary.
  2. ^ One could just as easily say that there are three underlying tone types — high (H [ ¯ ]), low (L [ _ ]), and null (ø) — and indeed many authors and researchers do. The truth is revealed by noting that all tonal rules work by only manipulating high tones, thus each syllable may be either attached to a high tone (H), or not attached at all (ø). A three tone model would at least require a rule that works exclusively on the L tones.
  3. ^ Doke & Vilakazi cites nine pitch levels (not counting contours) for isiZulu, while admitting that they may have overlooked some factors which could have superficially increased the actual number. Subsequent work on isiZulu tonology and depressor (breathy voiced) consonants suggests that the language, like Sesotho, may be fully described with only three or two basic tonemes.
  4. ^ The number may increase or decrease depending on how one counts etc, but there are only two contrastive tonemes in the language. The enumeration may be further complicated by considering the effects of downdrift and downstep.
  5. ^ There are numerous examples in rule-based linguistic models (such as autosegmental phonology) when the OCP is broken or only applied under some circumstances. For example, the fact that HTD causes the first two syllables of an H verb stem to be high is yet another "violation" of the OCP. Some Bantu languages also have a "Plateau rule" which changes tone pattern (HøH) to (HHH) — a process which actively creates a sequence that "violates" the OCP.
  6. ^ In a nutshell (under syntactic and/or Optimal Domains theory) the finality restriction prevents a high tone from being spread to the last syllable of the "Prosodic phrase" (though an underlying phrase-final high tone will be left alone). See Sesotho deficient verbs for a fuller explanation.


