|This article needs additional citations for verification. (March 2013)|
- 1 Consonants
- 2 Vowels
- 3 Phonotactics
- 4 Accent
- 5 Sound change
- 6 Notes
- 7 References
- 8 Further reading
- Consonants inside parentheses are allophones that now occur phonemically in recent Western loans.
- Voiceless stops /p, t, k/ are slightly aspirated: less aspirated than English stops, but more so than Spanish.
- /t, d, n/ are laminal denti-alveolar (that is, the blade of the tongue contacts the back of the upper teeth and the front part of the alveolar ridge) and /s z/ are laminal alveolar. Before /i/, the oral sounds are alveolo-palatal [tɕ (d)ʑ ɕ (d)ʑ] and before /u/ they are alveolar [ts (d)z s (d)z].
- /ɴ/ is a moraic nasal with variable pronunciation depending on what follows.
- /z/ is pronounced [dz] in pausa by many speakers. It is [dʑ] before /i/.
- /r/ is an apical postalveolar flap undefined for laterality. That is, it is specified as neither a central nor a lateral flap, but may vary between the two. It is similar to the Korean r. To an English speaker's ears, its pronunciation varies between a flapped d ([ɾ], as in American English buddy) and a flapped l [ɺ], sounding most like d before /i/ and /j/ listen (help·info), most like l before /o/ listen (help·info), and most like a retracted flap [ɾ̠] before /a/. It is occasionally realized as a trill [r], especially when conveying a vulgar nuance in speech. The phenomenon is called rolled tongue (巻き舌 makijita?) in Japanese, and is usually transcribed by repeating katakana ru, e.g. ガルルルル for a dog's growl.
- The compressed velar is essentially a non-moraic version of the vowel /u/. It is not equivalent to a typical IPA [w], since it is pronounced with lip compression ([ɰᵝ]) rather than rounding.
- /h/ is [ç] before /i/ and /j/ listen (help·info), and [ɸ] before /u/ listen (help·info), coarticulated with the labial compression of that vowel.
|/b/ > bilabial fricative [β]:||/abareru/ > [aβaɾeɺɯᵝ] abareru 暴れる 'to behave violently'|
|/ɡ/ > velar fricative [ɣ]:||/haɡe/ > [haɣe] hage はげ 'baldness'|
However, /ɡ/ is further complicated by its variant realization as a velar nasal [ŋ]. Standard Japanese speakers can be categorized into 3 groups (A, B, C), which will be explained below. If a speaker pronounces a given word consistently with the allophone [ŋ] (i.e. a B-speaker), that speaker will never have [ɣ] as an allophone in that same word. If a speaker varies between [ŋ] and [ɡ] (i.e. an A-speaker) or is generally consistent in using [ɡ] (i.e. a C-speaker), then the velar fricative [ɣ] is always another possible allophone in fast speech.
/ɡ/ may be weakened to nasal [ŋ] when it occurs within words — this includes not only between vowels but also between a vowel and a consonant. There is a fair amount of variation between speakers, however. Some, such as Vance (1987), have suggested that the variation follows social class; others, such as Akamatsu (1997), suggest that the variation follows age and geographic location. The generalized situation is as follows.
At the beginning of words:
- all present-day standard Japanese speakers generally use the stop [ɡ] at the beginning of words: /ɡaijuu/ > [ɡaijɯᵝɯᵝ] gaiyū 外遊 'overseas trip' (but not *[ŋaijɯᵝɯᵝ])
In the middle of simple words (i.e. non-compounds):
- A. a majority of speakers uses either [ŋ] or [ɡ] in free variation: /kaɡu/ > [kaŋɯᵝ] or [kaɡɯᵝ] kagu 家具 'furniture'
- B. a minority of speakers consistently uses [ŋ]: /kaɡu/ > [kaŋɯᵝ] (but not *[kaɡɯᵝ])
- C. most speakers in western Japan and a smaller minority of speakers in Kantō consistently use [ɡ]: /kaɡu/ > [kaɡɯᵝ] (but not *[kaŋɯᵝ])
In the middle of compound words morpheme-initially:
- B-speakers mentioned directly above consistently use [ɡ].
So, for some speakers the following two words are a minimal pair while for others they are homophonous:
- sengo 1,005 (せんご) 'one thousand five' = [seŋɡo] for B-speakers
- sengo 戦後 (せんこ゜) 'postwar' = [seŋŋo] for B-speakers
To summarize using the example of hage はげ 'baldness':
- A-speakers: /haɡe/ > [haŋe] or [haɡe] or [haɣe]
- B-speakers: /haɡe/ > [haŋe]
- C-speakers: /haɡe/ > [haɡe] or [haɣe]
Palatalization and affrication
The palatals /i/ and /j/ palatalize the consonants they follow:
|/m/ > palatalized [mʲ]:||/umi/ > [ɯᵝmʲi] umi 海 'sea'|
|/ɡ/ > palatalized [ɡʲ]:||/ɡjoːza/ > [ɡʲoːza] gyōza ぎょうざ 'fried dumpling'|
|/s/ > alveolopalatal fricative [ɕ]:||/sio/ > [ɕi.o] shio 塩 'salt'|
|/z/ > alveolopalatal [dʑ] or [ʑ]:||/zisiɴ/ > [dʑiɕĩɴ] jishin 地震 'earthquake';
/ɡozjuu/ > [ɡodʑɯᵝɯᵝ] ~ [ɡoʑɯᵝɯᵝ] gojuu 50 'fifty'
|/t/ > alveolopalatal affricate [tɕ]:||/tiziɴ/ > [tɕidʑĩɴ] ~ [tɕiʑĩɴ] chijin 知人 'acquaintance'|
/i/ and /j/ also palatalize /h/ to a palatal fricative ([ç]): /hito/ > [çi̥to] hito 人 ('person')
Of the allophones of /z/, the affricate [dz] is most common, especially at the beginning of utterances and after /ɴ/ (or /n/, depending on the analysis), while fricative [z] may occur between vowels. Both sounds, however, are in free variation.
In the case of the /s/, /z/, and /t/, when followed by /j/, historically, the consonants were palatalized with /j/ merging into a single pronunciation. In modern Japanese, these are arguably separate phonemes, at least for the portion of the population that pronounces them distinctly in English borrowings.
|/sj/ > [ɕ] (Romanized as sh):||/sjaboɴ/ > /ɕaboɴ/ > [ɕabõɴ] shabon シャボン 'soap'|
|/zj/ > [dʑ] or [ʑ] (Romanized as j):||/zjaɡaimo/ > /dʑaɡaimo/ > [dʑaŋaimo] jagaimo じゃがいも 'potato'|
|/tj/ > [tɕ] (Romanized as ch):||/tja/ > /tɕa/ > [tɕa] cha 茶 'tea'|
The vowel /u/ also affects consonants that it follows:
|/h/ > bilabial fricative [ɸ]:||/huta/ > [ɸɯ̥ᵝta] futa ふた 'lid'|
|/t/ > dental affricate [ts]:||/tuɡi/ > [tsɯᵝŋi] tsugi 次 'next'|
Although [ɸ] and [ts] occur before other vowels in loanwords (e.g. [ɸaito], 'fight'; [tsaitoɡaisu̥to], 'Zeitgeist'; [eɾitsiɴ], 'Yeltsin'), *[hɯᵝ] is still not distinguished from [ɸɯᵝ] (e.g. English hoop > [ɸɯᵝpɯᵝ]). Similarly, *[si] and *[zi] do not occur even in loanwords so that English cinema becomes [ɕinema].
The moraic nasal /ɴ/
Some analyses of Japanese treat the moraic nasal as an archiphoneme /N/; however, other, less abstract approaches, take its uvular pronunciation as basic, or treat it as coronal /n/ appearing in the syllable coda. Even when the nasal coda is proposed as /N/, it is in a complementary distribution with the nasal onsets within a syllable. In any case, it undergoes a variety of assimilatory processes. Within words, it is variously:
- uvular [ɴ] at the end of utterances and in isolation.
- bilabial [m] before [p], [b] and [m]; this pronunciation is also sometimes found at the end of utterances and in isolation. Singers are taught to pronounce all final and prevocalic instances of this sound as [m], which reflects its historical derivation.
- dental [n] before coronals /d/, /t/, and /n/; never found utterance-finally.
- velar [ŋ] before [k] and [ɡ].
- some sort of nasalized vowel before vowels, approximants (/j/ and /w/), /r/, and fricatives (/s/, /z/, and /h/). Depending on context and speaker, the vowel's quality may closely match that of the preceding vowel or it may be more constricted in articulation. This pronunciation is also found utterance-finally.
Some speakers produce [n] before /z/, pronouncing them as [ndz], while others produce a nasalized vowel before /z/.
These assimilations occur beyond word boundaries.
While Japanese features consonant gemination, there are some limitations in what can be geminated. Most saliently, voiced geminates are prohibited in native Japanese words. This can be seen with suffixation that would otherwise feature voiced geminates. For example, Japanese has a suffix, |ri| that contains what Kawahara (2006) calls a "floating mora" that triggers gemination in certain cases (e.g. |tap| +|ri| > [tappɯᵝɾi] ('a lot of'). When this would otherwise lead to a geminated voiced obstruent, a moraic nasal appears instead as a sort of "partial gemination" (e.g. |zabu| + |ri| > [zambɯᵝɾi] ('splashing').
However, voiced geminates do appear in loanwords. These loanwords can even come from languages, such as English, that do not feature gemination in the first place. For example, when an English word features a coda consonant followed by a lax vowel, it can be borrowed into Japanese featuring a geminate; gemination may also appear as a result of borrowing via written materials, where a word spelled with doubled letters leads to a geminated pronunciation. Because these loanwords can feature voiced geminates, Japanese now exhibits a voice distinction with geminates where it formerly did not:
- suraggā スラッガー ('slugger') vs. surakkā ('slacker')
- kiddo キッド ('kid') vs. kitto ('kit')
This distinction is not very rigorous. For example, when voiced obstruent geminates appear with another voiced obstruent they can undergo optional devoicing (e.g. doreddo ~ doretto, 'dreadlocks'). Kawahara (2006) attributes this to a less reliable distinction between voiced and voiceless geminates compared to the same distinction in non-geminated consonants, noting that speakers may have difficulty distinguishing them due to the partial devoicing of voiced geminates and their resistance to the weakening process mentioned above, both of which can make them sound like voiceless geminates.
There is some dispute about how gemination fits with Japanese phonotactics. One analysis, particularly popular among Japanese scholars, posits a special "mora phoneme" (モーラ 音素 Mōra onso) /Q/, which corresponds the sokuon 〈っ〉. However, not all scholars agree that the use of this "moraic obstruent" is the best analysis. Even when the non-nasal coda is proposed as /Q/, it is in a complementary distribution with the non-nasal onsets. In those approaches that incorporate the moraic obstruent, it is said to completely assimilate to the following obstruent, resulting in a geminate (that is, double) consonant. The assimilated /Q/ remains unreleased and thus the geminates are phonetically long consonants. /Q/ does not occur before vowels or nasal consonants. This can be seen as an archiphoneme in that it has no underlying place or manner of articulation, and instead manifests as several phonetic realizations depending on context, for example:
|[p̚] before [p]:||/niQpoN/ > [nʲip̚.põɴ] nippon 日本 'Japan'|
|[p̚] before [pʲ]:||/haQpjaku/ > [hap̚.pʲa.kɯᵝ] happyaku 八百 '800'|
|[s] before [s]:||/kaQseN/ > [kas.sẽɴ] kassen 合戦 'battle'|
|[t̚] before [tɕ]:||/saQti/ > [sat̚.tɕi] satchi 察知 'inference'|
Another analysis of Japanese dispenses with /Q/ and other mora phonemes entirely. In such an approach, the words above are phonemicized as shown below:
|[p̚] before [p]:||/nippoɴ/ > [nʲip̚.põɴ] nippon 日本 'Japan'|
|[p̚] before [pʲ]:||/happjaku/ > [hap̚.pʲa.kɯᵝ] happyaku '800'|
|[s] before [s]:||/kasseɴ/ > [kas.sẽɴ] kassen 合戦 'battle'|
|[t̚] before [tɕ]:||/satti/ > [sat̚.tɕi] satchi 察知 'inference'|
In addition to the above to representations, gemination can also be transcribed with a lengthening mark (e.g. [nʲipːõɴ] instead of [nʲip.põɴ]). However, this notation obscures mora boundaries.
/d, z/ neutralization
The contrast between /d/ and /z/ is neutralized before /u/ and /i/: [zɯᵝ, dʑi]. By convention, it is often assumed to be /z/, though some analyze it as /d/, the voiced counterpart to [ts]. The writing system preserves morphological distinctions, though spelling reform has eliminated historical distinctions: つづく[続く] /tuduku/, いちづける[位置付ける] /itizukeru/ from |iti+tukeru|,
Various forms of sandhi exist; the Japanese term for sandhi generally is ren'on (連音?), while the Japanese form is referred to as renjō (連声?). Most commonly, a terminal /n/ on one morpheme results in an /n/ (or /m/) being added to the start of the next morpheme, as in tennō (天皇?, emperor), てん ＋ おう > てんのう (ten + ō = tennō). In some cases, such as this example, the sound change is used in writing as well, and is considered the usual pronunciation, though in other cases, such as abbreviating …-no-uchi (〜の家?, …'s house) to 〜んち (-nchi) this is only done in speech, and considered informal. See 連声 (in Japanese) for further examples.
|/a/||This is a low central vowel, [ä]; it is most like RP English 〈u〉 in cut, but with the mouth slightly more open.|
|/i/||This sounds like the English 〈ee〉 in feet.|
|/u/||This is a somewhat centralized close back compressed vowel, [ɯᵝ] listen (help·info), pronounced with the lips compressed toward each other but neither rounded like [u] nor spread to the sides like [ɯ].|
|/e/||This is [e̞], somewhat like the English 〈e〉 in set.|
|/o/||This is [o̞] listen (help·info), somewhat like the 〈o〉 in English core.|
Vowels have a phonemic length distinction (short vs. long). Compare contrasting pairs of words like ojisan /ozisaɴ/ 'uncle' vs. ojiisan /oziisaɴ/ 'grandfather', or tsuki /tuki/ 'moon' vs. tsūki /tuuki/ 'airflow'.
In most phonological analyses, all vowels are treated as occurring with the time frame of one mora. Phonetically long vowels, then, are treated as a sequence of two identical vowels. For example, ojiisan is /oziisaɴ/, not /oziːsaɴ/.
Within words and phrases, Japanese allows long sequences of phonetic vowels without intervening consonants, pronounced with hiatus, although the pitch accent and slight rhythm breaks help track the timing when the vowels are identical. Sequences of two vowels within a single word are extremely common, occurring at the end of many i-type adjectives, for example, and three vowels within a word also occur, as in aoi 'blue/green'. In phrases, sequences with multiple o sounds are most common, due to the direct object particle を 'wo' (which comes after a word) being realized as o and the honorific prefix お〜 'o', which can occur in sequence, and may follow a word itself terminating in an o sound; these may be dropped in rapid speech. A fairly common construction exhibiting these is 「〜をお送りします」 ... (w)o o-okuri-shimasu 'humbly send ...'. More extreme examples follow:
/hoo.oꜜo.o/ [hòō.óò.ō] hōō o (鳳凰を) 'Phoenix (Fenghuang)' (direct object) /too.oo.oꜜ.oo.u/ [tòo.ōo.ó.òō.ɯ́ᵝ] tōō o ōu (東欧を覆う) 'to cover Eastern Europe'
(This artificial example would be unlikely in normal speech.)
In many dialects, the high vowels /i/ and /u/ become devoiced when between voiceless consonants. However, when a word contains more than one such environment, devoicing in adjacent syllables does not normally occur. Additionally, /i/ and /u/ are devoiced following a downstep and a voiceless consonant at the end of a prosodic unit.
|/kutuꜜ/ > [kɯ̥ᵝtsɯ́ᵝ]||kutsu 靴 'shoe'|
|/aꜜtu/ > [átsɯ̥ᵝ]||atsu 圧 'pressure'|
|/hikaɴ/ > [çi̥kãɴ́]||hikan 悲観 'pessimism'|
|/hikaku/ > [çi̥kakɯ́ᵝ]||hikaku 比較 'comparison'|
|/kisitu/ > [kʲi̥ɕitsɯᵝ]||kishitsu 気質 'temperament'|
This devoicing is not restricted to only fast speech, though consecutive voicing may occur in fast speech.
To a lesser extent /o/ may devoice with the further requirement that there be two or more adjacent moras containing /o/:
|/kokoꜜro/ > [ko̥kóɺò]||kokoro 心 'heart'|
Japanese speakers are usually not even aware of the difference of the voiced and devoiced pair. On the other hand, gender roles play a part in prolonging the terminal vowel: it is regarded as effeminate to prolong, particularly the terminal /u/ as in 'arimasu'. Some nonstandard varieties of Japanese can be recognized by their hyper-devoicing, while in some Western dialects and some registers of formal speech, every vowel is voiced.
Japanese vowels are slightly nasalized when adjacent to nasals /m, n/. Before the moraic nasal /ɴ/, vowels are heavily nasalized:
|/seesaɴ/ > [seesãɴ́]||seisan 生産 'production'|
Glottal stop insertion
At the beginning and end of utterances, Japanese vowels may be preceded and followed by a glottal stop [ʔ], respectively. This is demonstrated below with the following words (as pronounced in isolation):
|/eꜜɴ/ > [ẽ́ɴ̀] ~ [ʔẽ́ɴ̀]:||en 円 'yen'|
|/kisiꜜ/ > [ki̥ɕíʔ]:||kishi 岸 'shore'|
|/uꜜ/ > [ɯ́ᵝʔ] ~ [ʔɯ́ᵝʔ]:||u 鵜 'cormorant'|
When an utterance-final word is uttered with emphasis, this glottal stop is plainly audible, and is often indicated in the writing system with a small letter tsu っ called a sokuon. This is also found in interjections like あっ and えっ.
Japanese words have traditionally been analysed to be composed of moras; that is to say, whereas the "building blocks" of words in English are syllables, in Japanese they are the moras. Each mora occupies one rhythmic unit, i.e. it is perceived to have the same time value. A mora may be "regular" consisting of just a vowel (V) or a consonant and a vowel (CV), or may be one of two "special" moras, /N/ and /Q/. A glide /j/ may precede the vowel in "regular" moras. Some analyses posit a third "special" mora, /R/, the second part of a long vowel. In this table, the period represents a mora break, rather than the conventional syllable break.
|Mora Type||Example||Japanese||moras per word|
|V||/o/||o 尾 'tail'||1-mora word|
|jV||/jo/||yo 世 'world'||1-mora word|
|CV||/ko/||ko 子 'child'||1-mora word|
|CjV||/kjo/1||kyo 巨 'hugeness'||1-mora word|
|N||/N/ in /ko.N/ or /ko.n/||kon 紺 'deep blue'||2-mora word|
|Q||/Q/ in /ko.Q.ko/ or /ko.k.ko/||kokko 国庫 'national treasury'||3-mora word|
- ^1 Traditionally, moras were divided into plain and palatal sets, the latter of which entailing palatalization of the consonant element.
Consonantal moras are restricted from occurring word initially, though utterances starting with [n] are possible. Vowels may be long, and consonants may be geminate (doubled). Geminate consonants are limited to /ɴn/, /ɴm/ and sequences of /Q/ followed by a voiceless obstruent, though some words are written with geminate voiced obstruents. In the analysis without archiphonemes, geminate clusters are simply two identical consonants, one after the other.
In English, stressed syllables in a word are pronounced louder, longer, and with higher pitch, while unstressed syllables are relatively shorter in duration. In Japanese, all moras are pronounced with equal length and loudness. Japanese is therefore said to be a mora-timed language.
Standard Japanese has a distinctive pitch accent system: a word can have one of its moras bearing an accent or not. An accented mora is pronounced with a relatively high tone and is followed by a drop in pitch. The various Japanese dialects have different accent patterns, and some exhibit more complex tonic systems.
As an agglutinative language, Japanese has generally very regular pronunciation, with much simpler morphophonology than in fusional languages. Nevertheless, there are a number of prominent sound change phenomena, primarily in morpheme combination and in conjugation of verbs and adjectives. Phonemic changes are generally reflected in the spelling, though some non-phonemic changes are not reflected in spelling.
In Japanese, sandhi is prominently exhibited in rendaku – consonant mutation of a leading consonant from unvoiced to voiced when not word-initial, in some contexts. While this is reflected in the spelling via an addition of voicing marks (two dots) as in ka, ga (か／が?), in some cases this combines with the yotsugana mergers, notably ji, dzi (じ／ぢ?) and zu, dzu (ず／づ?) in standard Japanese, with the resulting spelling thus being morphophonemic rather than purely phonemic.
The other common sandhi in Japanese is conversion of つ or く (tsu, ku) as a trailing consonant to a geminate consonant when not word-final – orthographically, the sokuon っ, as this occurs particularly in the context of つ.
Sandhi also occurs much less often in renjō (連声?), where, most commonly, a terminal /n/ on one morpheme results in an /n/ (or /m/) being added to the start of the next morpheme, as in ten + ō = tennō (天皇、てん ＋ おう → てんのう?).
Another prominent feature is onbin (音便?, euphonic sound change), particularly historical sound changes.
In cases where this has occurred within a morpheme, the morpheme itself is still distinct but with a different sound, as in hōki (箒、ほうき?, broom), which underwent two sound changes from earlier hahaki (ははき?) → hauki (はうき?) (onbin) → houki (ほうき?) (historical vowel change) → hōki (ほうき?) (long vowel, sound change not reflected in kana spelling).
However, certain forms are still recognizable as irregular morphology, particularly forms that occur in basic verb conjugation, as well as some compound words.
Polite adjective forms
The polite adjective forms (used before the polite copula gozaru (ござる?, be) and verb zonjiru (存じる?, think, know)) exhibit a one-step or two-step sound change. Firstly, these use the continuative form, -ku (〜く?), which exhibits onbin, dropping the k as -ku (〜く?) → -u (〜う?). Secondly, the vowel may combine with the preceding vowel, according to historical sound changes; if the resulting new sound is palatalized, meaning ゆ、よ (yu, yo?), this combines with the preceding consonant, yielding a palatalized syllable.
This is most prominent in certain everyday terms that derive from an i-adjective ending in -ai changing to -ō (-ou), which is because these terms are abbreviations of polite phrases ending in gozaimasu, sometimes with a polite o- prefix. The terms are also used in their full form, with notable examples being:
- arigatō (有難う、ありがとう?, Thank you), from arigatai (有難い、ありがたい?, (I am) grateful).
- ohayō (お早う、おはよう?, Good morning), from hayai (早い、はやい?, (It is) early).
- omedetō (お目出度う、おめでとう?, Congratulations), from (It is) auspicious (目出度い、めでたい?).
The morpheme hito (人（ひと）?, person) (with rendaku -bito (〜びと?)) has changed to uto (うと?) or udo (うど?), respectively, in a number of compounds. This in turn often combined with a historical vowel change, resulting in a pronunciation rather different from that of the components, as in nakōdo (仲人、なこうど?, matchmaker) (see below). These include:
- otōto (弟?, younger brother), from otohito (弟人、おとひと?) → otouto (おとうと?) → otōto.
- imōto (妹、いもうと?, younger sister), from imohito (妹人、いもひと?) → imouto (いもうと?) → imōto (いもうと?).
- shirōto (素人、しろうと?, novice), from shirohito (白人、しろひと?) → shirouto (しろうと?) → shirōto.
- kurōto (玄人、くろうと?, veteran), from 黒人 (くろひと kurohito?) → kurouto (くろうと?) → (kurōto?).
- nakōdo (仲人、なこうど?, matchmaker), from nakabito (仲人、なかびと?) → nakaudo (なかうど?) → nakoudo (なこうど?) → nakōdo.
- shūto (舅、しゅうと?, stepfather), from shihito (舅、しひと?) → shiuto (しうと?) → shuuto (しゅうと?) → shūto.
In some cases morphemes have effectively fused, and are a word is no longer recognizable as being composed of two separate morphemes
- Labrune (2012:59)
- Riney et al. (2007)
- Akamatsu (2000:81 fn 5, 135)
- Okada (1991:95)
- Akamatsu (1997) speculates that only 10% of population are consistent [ɡ] users.
- Japanese academics represent [ɡo] as ご and [ŋo] as こ゜.
- Okada (1991:95)
- Itō & Mester (1995:827)
- Itō & Mester (1995:825)
- Itō & Mester (1995:826)
- Itō & Mester (1995:828)
- Labrune (2012:132–3)
- Labrune (2012:133–4)
- Okada (1991:95)
- see Akamatsu (1997)
- Labrune (2012:104)
- Kawahara (2006:550)
- Labrune (2012:104–5) points out that the prefix |bu| has the same effect.
- Kawahara (2006:537–8), citing Katayama (1998)
- Kawahara (2006:538)
- Kawahara (2006:559, 561, 565)
- Labrune (2012:135)
- Tsuchida (2001:225)
- Tsuchida (2001:242)
- Seward (1992:9)
- Moras are represented orthographically in katakana and hiragana–each mora, with the exception of CjV clusters, being one kana–and are referred to in Japanese as 'on' or 'onji'.
- Labrune (2012:143)
- Labrune (2012:143–144)
- Itō & Mester (1995:827). In such a classification scheme, plain counterpart of moras with a palatal glide are onsetless moras.
- Akamatsu, Tsutomu (1997). Japanese phonetics: Theory and practice. München: Lincom Europa. ISBN 3-89586-095-6.
- Akamatsu, Tsutomu (2000). Japanese phonology: A functional approach. München: Lincom Europa. ISBN 3-89586-544-3.
- Itō, Junko; Mester, R. Armin (1995). "Japanese phonology". In Goldsmith, John A. The Handbook of Phonological Theory. Blackwell Handbooks in Linguistics. Blackwell Publishers. pp. 817–838.
- Kawahara, Shigeto (2006). "A Faithfulness ranking projected from a perceptibility scale: The case of [+ Voice] in Japanese". Language 82 (3): 536–574.
- Labrune, Laurence (2012). The Phonology of Japanese. Oxford, England: Oxford University Press. ISBN 978-0-19-954583-4.
- Okada, Hideo (1991). "Japanese". Journal of the International Phonetic Association 21 (2): 94–96. doi:10.1017/S002510030000445X.
- Riney, Timothy James; Takagi, Naoyuki; Ota, Kaori; Uchida, Yoko (2007). "The intermediate degree of VOT in Japanese initial voiceless stops". Journal of Phonetics 35 (3): 439–443. doi:10.1016/j.wocn.2006.01.002.
- Seward, Jack (1992). Easy Japanese. McGraw-Hill Professional. ISBN 978-0-8442-8495-8.
- Tsuchida, Ayako (2001). "Japanese vowel devoicing". Journal of East Asian Linguistics 10 (3): 225–245. doi:10.1023/A:1011221225072.
- Bloch, Bernard, "Studies in colloquial Japanese IV: Phonemics", Language 26 (1): 86–125, doi:10.2307/410409, JSTOR 410409, OCLC 486707218
- Haraguchi, Shosuke (1977), The tone pattern of Japanese: An autosegmental theory of tonology, Tokyo, Japan: Kaitakusha, ISBN 0-87040-371-0
- Haraguchi, Shosuke (1999), "Chap. 1: Accent", in Tsujimura, Natsuko, The Handbook of Japanese Linguistics, Malden, Mass.: Blackwell Publishers, pp. 1–30, ISBN 0-631-20504-7
- (dissertation) Katayama, Motoko (1998), Loanword phonology in Japanese and optimality theory, Santa Cruz: University of California, Santa Cruz
- Kubozono, Haruo (1999), "Chap. 2: Mora and syllable", in Tsujimura, Natsuko, The Handbook of Japanese Linguistics, Malden, Mass.: Blackwell Publishers, pp. 31–61, ISBN 0-631-20504-7
- Ladefoged, Peter (2001), A Course in Phonetics (4th ed.), Boston: Heinle & Heinle, Thomson Learning, ISBN 0-15-507319-2
- Martin, Samuel E. (1975), A reference grammar of Japanese, New Haven, Conn.: Yale University Press, ISBN 0-300-01813-4
- McCawley, James D. (1968), The Phonological Component of a Grammar of Japanese, The Hague: Mouton
- Pierrehumbert, Janet; Beckman, Mary (1988), Japanese Tone Structure, Linguistic Inquiry monographs (No. 15), Cambridge, Mass.: MIT Press, ISBN 0-262-16109-5
- Sawashima, M.; Miyazaki, S. (1973), "Glottal opening for Japanese voiceless consonants", Annual Bulletin (Research Institute of Logopedics and Phoniatrics, University of Tokyo) 7: 1–10, OCLC 633878218
- Shibatani, Masayoshi (1990), "Japanese", in Comrie, Bernard, The major languages of east and south-east Asia, London: Routledge, ISBN 0-415-04739-0
- Shibatani, Masayoshi (1990), The Languages of Japan, Cambridge: Cambridge University Press, ISBN 0-521-36070-6
- Vance, Timothy J. (1987), An Introduction to Japanese Phonology, Albany, NY: State University of New York Press, ISBN 0-88706-360-8