= Comorian languages =

Comorian
- Nativename: shikomori',
- States: Comoros and Mayotte
- Region: Throughout Comoros and Mayotte; also in Madagascar and Réunion
- Ethnicity: Comorians
- Speakers: in Comoros
- Date: 2011
- Speakers2: in Mayotte (2007)
- Script: Arabic, Latin
- Familycolor: Niger-Congo
- Fam2: Atlantic–Congo
- Fam3: Volta-Congo
- Fam4: Benue–Congo
- Fam5: Bantoid
- Fam6: Southern Bantoid
- Fam7: Bantu
- Fam8: Northeast Coast Bantu
- Fam9: Sabaki
- Dia1: Maore
- Lc1: zdj
- Ld1: Ngazidja dialect
- Lc2: wni
- Ld2: Ndzwani (Anjouani) dialect
- Lc3: swb
- Ld3: Maore dialect
- Lc4: wlc
- Ld4: Mwali dialect
- Guthrie: G.44
- Glotto: como1260
- Glottorefname: Comorian Bantu

Comorian (Shikomori, or Shimasiwa, the "language of islands") is the name given to a group of four Bantu languages spoken in the Comoro Islands, an archipelago in the southwestern Indian Ocean between Mozambique and Madagascar. It is named as one of the official languages of the Union of the Comoros in the Comorian constitution. Shimaore, one of the languages, is spoken on the disputed island of Mayotte, a French department claimed by Comoros.

Like Swahili, the Comorian languages are Sabaki languages, part of the Bantu language family. Each island has its own language, and the four are conventionally divided into two groups: the eastern group is composed of Shindzuani (spoken on Ndzuani) and Shimaore (Mayotte), while the western group is composed of Shimwali (Mwali) and Shingazija (Ngazidja). Although the languages of different groups are not usually mutually intelligible, only sharing about 80% of their lexicon, there is mutual intelligibility between the languages within each group, suggesting that Shikomori should be considered as two language groups, each including two languages, rather than four distinct languages.

Historically, the language was written in the Arabic-based Ajami script. The French colonial administration introduced the Latin script. In 2009 the current independent government decreed a modified version of the Latin script for official use. Many Comorians now use the Latin script when writing the Comorian language although the Ajami script is still widely used, especially by women. Recently, some scholars have suggested that the language may be on its way to endangerment, citing the unstable code-switching and numerous French words used in daily speech.

It is the language of Umodja wa Masiwa, the national anthem.

== History and classification ==
The first Bantu speakers arrived at the Comoros sometime between the 5th and 10th centuries, before the Shirazi Arabs.

=== Shimwali ===
The Shimwali dialect was possibly one of the earliest Bantu languages to be recorded by a European. On July 3, 1613, Walter Payton claimed to have recorded 14 words on the island of Moheli, stating "They speak a kind of Morisco language." Sir Thomas Roe and Thomas Herbert also claimed to have recorded vocabulary.

Until the 1970s, it was considered a dialect or archaic form of Swahili. This was first proposed in 1871, when Kersten suggested it might be a mixture of Shingazija, Swahili, and Malagasy. In 1919 Johnston, referring to it as 'Komoro Islands Swahili - the dialect of 'Mohila' and 'the 'Mohella' language', suggested that, taken together with the other two dialects in the Comoros, it might be an ancient and corrupt form of Swahili. However, Ottenheimer et al. (1976) found this to not be the case. Instead, they classify Shimwali, as well as the other Comorian languages, as a separate language group from Swahili.

=== Shinzwani ===
Shinzwani was first noted by a South African missionary Reverend William Elliott in 1821 and 1822. During a 13-months' mission stay on the island of Anjouan he compiled a vocabulary and grammar of the language. Elliott included a 900-word vocabulary and provided 98 sample sentences in Shinzwani. He does not appear to have recognized noun- classes (of which there are at least six in Shinzwani) nor does he appear to have considered Shinzwani a Bantu language, only making a superficial connection to Swahili.

The dialect was noted again in 1841 by Casalis, who placed it within Bantu, and by Peters, who collected a short word list. In 1875 Hildebrandt published a Shinzwani vocabulary and suggested in 1876 that Shinzwani was an older form of Swahili.

The idea of the distinctness of Shingazija and Shinzwani from Swahili finally gained prominence during the latter part of the 19th century and the early 20th century. In 1883, an analysis by Gust distinguished Shinzwani from Swahili. He discusses Shinzwani and Swahili as two separate languages which had contributed to the port-language which he referred to as Barracoon.

In 1909 two publications reaffirmed and clarified the distinctiveness of Shinzwani, Shingazija and Swahili. Struck published a word list which appeared to have been recorded by a Frenchman in Anjouan in 1856, identified the words as belonging to Shinzwani and noted some influence from Swahili.

In his Swahili Grammar, Sacleux cautioned that although Swahili was spoken in the Comoros it must not be confused with the native languages of the Comoros, Shinzwani and Shingazija. He said that while Swahili was mostly spoken in cities, the Comorian languages were widely spoken in the countryside.

=== Shingazija ===
Shingazija was not documented until 1869 when Bishop Edward Steere collected a word list and commented that he did not know which language family it belonged to. In 1870 Gevrey characterized both Shingazija and Shinzwani as the 'Souaheli des Comores' (Swahili of the Comoros) which was only a 'patois de celui de Zanzibar'. However, Kersten noted in 1871 that Shingazija was not at all like Swahili but was a separate Bantu language.

Torrend was the first to identify the difference between Shingazija and Shinzwani in 1891. He attempted to account for Shingazija by suggesting that it was a mixture of Shinzwani and Swahili.

==Phonology==
The consonants and vowels in the Comorian languages:
=== Vowels ===
  - Vowels**

| | Front | Central | Back |
| Close | | | |
| Mid | | | |
| Open | | | |

=== Consonants ===
  - Consonants**

| | Bilabial | Labio- dental | Dental/Alveolar | Palatal | Retroflex | Velar | Glottal | |
| Nasal | | | | | | | | |
| Plosive/ Affricate | | | | | | | | |
| | | ~ | | ~ | | | | |
| | ~ | | ~ | | | | | |
| Fricative | | | | | | | | |
| Approximant | | | | | | | | |
| Trill | | | | | | | | |

The consonants mb, nd, b, d are phonemically implosives, but may also be phonetically recognized as ranging from implosives to voiced stops as /[ᵐɓ~ᵐb]/, /[ⁿɗ~ⁿd]/, /[ɓ~b]/, /[ɗ~d]/. A glottal stop /[ʔ]/ can also be heard when in between vowels.

In the Shimaore dialect, if when inserting a prefix the leading consonant becomes intervocalic, /[p]/ becomes /[β]/, /[ɗ]/ becomes /[l]/, /[ʈ]/ becomes /[r]/, /[k]/ becomes /[h]/, and /[ɓ]/ is deleted.

There is a preference for multi-syllable words and a CV syllable structures. Vowels are frequently deleted and inserted to better fit the CV structure. There is also an alternate strategy of h-insertion in scenarios which would otherwise results in VV.

There is a strong preference for penultimate stress. There was previously a tone system in the language, but it has been mostly phased out and no longer plays an active role in the majority of cases.

== Orthography ==

Comorian is most commonly written in Latin alphabet today. Traditionally and historically, Arabic alphabet is used as well but to a lesser extent. Arabic alphabet has been universally known in Comoros, due to the fact that there was a near universal attendance at Quranic schools on the islands, whereas knowledge and literacy in French was lacking. Since independence from France, the situation has changed, with improvements to infrastructure of secular education, in which French is the language of instruction.

=== Latin alphabet ===
| Comorian Latin alphabet | | | | | | | | | | | | | | | | | | | | | | | | | | |
| Upper Case | A | Ɓ | B | C | Ɗ | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | Z |
| Lower Case | a | ɓ | b | c | ɗ | d | e | f | g | h | i | j | k | l | m | n | o | p | r | s | t | u | v | w | y | z |
| IPA | | | | | | | | | | | | | | | | | | | | | | | | | | |

  - List of digraphs in Comorian**

| Digraphs | dh | dj | dr | dz | gh | ny | sh | pv | th | tr | ts |
| IPA | | | | | | | | | | | |

Note: In Shimaore, the digraphs " vh " and " bv " are used for representing the phoneme .

The 20th century marked the start of a process of orthographic reform and standardization across the Muslim world. This process included standardizing, unifying, and clarifying the Arabic script in most places, ditching the Arabic script in favour of Latin or Cyrillic in others in places such as Soviet Turkistan and Soviet Caucasus, to Turkey and Kurdistan, to Indonesia and Malaysia, to the Eastern African coast (Swahili Ajami) and Comoros.

The mantle of standardization and improvement of Arabic-based orthography in Comoros was carried by the literaturist Said Kamar-Eddine (1890-1974) in 1960. Only two decades before, in 1930s and 1940s, Swahili literaturists such as Sheikh el Amin and Sheikh Yahya Ali Omar had developed the Swahili Arabic alphabet as well.

In Swahili, two new diacritics were added to the 3 original diacritics, namely to represent the phoneme /[e]/, and to represent the phoneme /[o]/. Furthermore, the usage of the 3 mater lectionis (or vowel carrier letters) followed the following convention too: Vowels in stressed (second-to-last) syllable of the word are marked with diacritic as well as a carrier letter, namely alif for vowel /[a]/, yāʼ for vowels /[e]/ and /[i]/, and wāw for vowels /[o]/ and /[u]/.

But, in the proposal by Said Kamar-Eddine for Comorian, there was a departure from the Ajami tradition and a divergence from what was done by Swahili literaturists. Kamar-Eddine had an eye on Iraqi and Iranian Kurdistan, and the orthographic reforms implemented there. In Kurdish, the direction of the reforms of the alphabet favoured elimination of all diacriticts and designating specific letters to each and every vowel sound, thus creating a full alphabet. Kurdish orthography wasn't unique in this regard. A similar direction was pursued in various Turkic languages such as Uzbek, Azerbaijani, Uyghur, and Kazakh, as well as languages of the Caucasus such as Western and Eastern Circassian languages and Chechen language. This makes Said Kamar-Eddine orthography for Comorian, a unique case for Sub-saharan African languages that have been written with the Arabic script.

In the initial position, the vowels are written as a single letter. No preceding alif or hamza is required. (This is similar to the convention of Kazakh Arabic alphabet)

  - Vowels in Comorian**

| | Final | Medial | Initial | Isolated |
| a | | | | |
| u | | | | |
| i | | | | |
| o | | | | |
| e | | | | |

In Kurdish, new vowel letters were created by adding accents on existing letters. The phonemes /[o]/ and /[e]/ are written with and respectively. In Comorian, new independent letters were assigned instead. The letter hāʾ in two of its variants are used for both aforementioned phonemes. A standard Arabic hāʾ, in all its 4 positional shapes () is used for the vowel /[o]/. This is a unique innovation exclusive to this orthography. The letter hāʾ in these shapes is not used as vowel in any other Arabic orthography. A letter hāʾ, in a fixed medial zigzag shape (medial form of what's known in Urdu as gol he) () is used for the vowel /[e]/. The usage of this variant of the letter hāʾ as a vowel is not unique to Comorian. In the early 20th century, West and East Circassian Arabic orthography also used this variant of the letter hāʾ to represent the vowel /[ə]/ (written as ы in Cyrillic).

Letters representing consonant phonemes that are not present in Arabic have been formed in either of the two following methods. First method is similar to Persian and Kurdish, where new letters are created by adding or modifying of dots. The second method is to use the Arabic gemination diacritic Shaddah on letters that are most similar to the missing consonant phoneme. This is similar to the tradition of Sorabe (Arabo-Malagasy) orthograhpy, where a geminated r () is meant to represent /[nd]/ or /[ndr]/, and where a geminated f () is meant to represent /[p]/ or /[mp]/.

  - Kamar-Eddine's Comorian Arabic Alphabet**

| Arabic (Latin) [IPA] | ‌( A a ) | (B b / Ɓ ɓ) / | (P p) | (T t) | (Tr tr) | (Th th) |
| Arabic (Latin) [IPA] | (J j / Dj dj) | (H h) | (D d / Ɗ ɗ) / | (Dh dh) | (R r / Dr dr) / | (Z z) |
| Arabic (Latin) [IPA] | (Dz dz) | (S s) | (Ts ts) | (Sh sh) | (C c) | (G g / Gh gh) / |
| Arabic (Latin) [IPA] | (F f) | (Pv pv) | (V v) | (K k) | (L l) | (M m) |
| Arabic (Latin) [IPA] | (N n) | (Ny ny) | (O o) | (E e) | (U u / W w) / | (I i / Y y) / |
| Arabic (Latin) [IPA] | ( - ) | | | | | |

There are two types of vowel sequencees in Comorian, a glide or a vowel hiatus. Latin letters w and y, represented by and , are considered semivowels. When these letters follow another vowel, they are written sequentially.

Other succession of vowels are treated as vowel hiatus. In these instances, a hamza () is written in between.

Prenasalized consonants are written as digraphs, with either m () or n ().

===Sample text===

Comorian Latin Alphabet:
- Ha mwakinisho ukaya ho ukubali ye sheo shaho wo ubinadamu piya pvamwedja ne ze haki za wadjibu zaho usawa, zahao, uwo ndo mshindzi waho uhuria, no mlidzanyiso haki, ne amani yahe duniya kamili.

Comorian Arabic (Kamar-Eddine's) Alphabet:

== Grammar ==

=== Noun class ===
As in other Bantu languages, Shikomor displays a noun class/gender system in which classes share a prefix. Classes 1 through 10 generally have singular/plural pairings.
| Class | Prefix | Class | Prefix |
| 1 | m(u)-, mw | 2 | wa- |
| 3 | m(u)-,mw- | 4 | m(i)- |
| 5 | Ø- | 6 | ma- |
| 7 | shi- | 8 | zi- |
| 9 | Ø- | 10 | Ø- |
| 10a | nyi- | 11 | u- |
Classes 9 & 10 consists mainly of borrowed words, such as dipe (from French du pain 'some bread') and do not take prefixes. Class 7 & 8 and class 9 & 10 take on the same agreements in adjectives and verbs. Class 10a contains a very small amount of words, generally plurals of Class 11. Class 15 consists of verbal infinitives, much like English gerunds.

Class 16 contains only two words, vahana and vahali, both meaning 'place'. It was probably borrowed from Swahili pahali, which was borrowed from Arabic mahal. Class 17 consists of locatives with the prefix ha-, and Class 18 consists of locatives with the prefix mwa-.

=== Numerals ===
Numerals in Comorian follow the noun. If the number is 1 through 5 or 8, it must agree with the class of its noun.
  - Numerals**

| Number | Comorian | Num. | Comorian |
| 1 | oja/muntsi | 6 | sita |
| 2 | ili/mbili | 7 | saba |
| 3 | raru/ndraru | 8 | nane |
| 4 | nne | 9 | shendra |
| 5 | tsano/ntsanu | 10 | kumi/kume |

=== Demonstratives ===
There are three demonstratives: One that refers to a proximate object, a non-proximate object, and an object that was previously mentioned in the conversation.

=== Possessives ===
The possessive element -a agrees with the possessed noun. The general order of a possessive construction is possessed-Ca-possessor.

=== Verbs ===
Comorian languages exhibit a typical Bantu verb structure.
  - Comorian Verb Structure**

| Slot | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Content | Verbal preprefix (pre) | Subject Marker (SM) | Tense-Aspect-Mood | Object Marker (OM) | Root | Extension | Final Vowel | Suffix |
Although there is only one form of the subject marker for personal plural subjects and for subjects belonging to the classes 3-18.
  - Subject Pronouns**

| | Set 1 | Set 2 |
| 1sg | ni- | tsi- |
| 2sg | u- | hu-/u- |
| 3sg | a- | ha-/a- |
| 1pl | ri- | |
| 2pl | m-/mu- | |
| 3pl | wa- | |
In Proto-Sabaki, the 2sg and 2sg subject markers were *ku and *ka, respectively. However, the *k was weakened to h in Shingazija and further to Ø in all other dialects.

Verbs can be negated by adding the prefix ka-. However, occasionally other morphemes of the verb may take on different meanings when the negative prefix is added, such as in the following example, where the suffix -i, usually the past tense, takes on the present habitual meaning when it is in a negative construction.

The present progressive uses the prefix si-/su-, the future tense uses tso-, and the conditional uses a-tso-.There are two past tense constructions in Comorian.The first of these is the simple past tense, which uses the structure SM-Root-Suffix 1.

The second is the compound past, using the structure SM-ka SM-Root-Suffix 1.
