INTSINT is an acronym for INternational Transcription System for INTonation.
It was originally developed by Daniel Hirst in his 1987 thesis as a prosodic equivalent of the International Phonetic Alphabet, and the INTSINT alphabet was subsequently used in Hirst & Di Cristo (eds) 1998 in just over half of the chapters.
INTSINT codes the intonation of an utterance by means of an alphabet of 8 discrete symbols constituting a surface phonological representation of the intonation:
- T (Top), H (Higher), U (Upstepped), S (Same), M (mid), D (Downstepped), L (Lower), B (Bottom).
These tonal symbols are considered phonological in that they represent discrete categories and surface since each tonal symbol corresponds to a directly observable property of the speech signal.
The tones can be aligned with phonological constituents by means of the following alignment diacritics following the tonal symbol:
- [ (initial), < (early), : (medial), > (late), ] (final)
The relevant phonological constituent with which the tonal segments are aligned can be taken as the sequence of symbols between the following pair of slashes /…/.
The following is an example of a transcription using the IPA (International Phonetic Alphabet) of a possible reading of the sentence "It's time to go :
This corresponds to a Mid tone aligned with the middle of the syllable "It's" then a Top tone aligned with the middle of the unit "time to" and then a Downstepped tone aligned early in the syllable "go" and a Bottom tone aligned with the end of the same syllable.
The phonetic interpretation of the INTSINT tonal segments can be carried out using two speaker dependent (or even utterance dependent) parameters.
- key: like a musical key, this establishes an absolute point of reference defined by a fundamental frequency value (in hertz).
- range: this determines the interval between the highest and lowest pitches of the utterance.
In the current algorithm (Hirst 2004, 2005) the tonal segments can be converted to target points, like those generated by the Momel algorithm, using the following equivalences. P(i) in the following formulae refers to the current Pitch target, P(i-1) to the preceding pitch target. Pitch targets are normally calculated on a logarithmic scale.
The targets T, M and B are defined 'absolutely' without regard to the preceding targets
- T: P(i) := key + range/2
- M: P(i) := key
- B: P(i) := key - range/2
Other targets are defined with respect to the preceding target:
- H: P(i) := (P(i-1) + T) / 2
- U: P(i) := (3*P(i-1) + T) / 4
- S: P(i) := P(i-1)
- D: P(i) := (3*P(i-1) + B) / 4
- L: P(i) := (P(i-1) + B) / 2
A sequence of tonal targets such as:
- [M T L H L H D B]
assuming values for a female speaker of key as 240 Hz and range as 1 octave, would be converted to the following F0 targets:
- [240 340 240 286 220 273 242 170]
An interesting consequence of this model is that it automatically introduces an asymptotic lowering of sequences such as H L H... such as has often been described both for languages with lexical tone and for languages where tone is only introduced by the intonation system, without the need to introduce a specific downdrift or declination component.
The particular values used for calculating the value of D and U were chosen so that in a sequence [T D] for example, the D tone is lowered by the same amount as the H tone in the sequence [T L H]. In many phonological accounts, Downstepped tones are analysed as a High tone which is lowered by the presence of a "floating" low tone, so that the surface tone [D] can be considered as underlyingly [L H].
- Hirst, D.J. & Di Cristo, A. (eds) 1998. Intonation Systems. A survey of Twenty Languages. (Cambridge, Cambridge University Press). [ISBN 0-521-39513-5 (Hardback); 052139550X (Paperback)].
- Hirst, D.J. 2004. Lexical and Non-lexical Tone and Prosodic Typology. in Proceedings of International Symposium on Tonal Aspects of Languages. Beijing, March 2004, 81-88
- Hirst, D.J. 2005. Form and function in the representation of speech prosody. in K.Hirose, D.J.Hirst & Y.Sagisaka (eds) Quantitative prosody modeling for natural speech description and generation (=Speech Communication 46 (3-4)), 334-347