Jump to content

User:Lsmaguire/sandbox

From Wikipedia, the free encyclopedia

Merge model

[edit]

The Merge model, known as Merge, is an autonomous model of speech processing, developed by Dennis Norris, Anne Cutler, and James McQueen in 2000[1].

Merge attempts to simulate how humans might decode speech as a group of sounds to interpret meaning. Within Merge, once information is passed from one stage of processing to the next, information cannot be fed backward to a preceding stage, also known as bottom-up processing[2]. In interactive models such as TRACE, information can be fed backward to a preceding stage, also known as top-down processing[2], as well as fed forward.

Merge is an expansion of The Race Model, designed to process mispronunciations using only bottom-up processing[1] in response to data suggesting that humans decode mispronunciations by top-down processing[1]. The efficiency of top-down processing was criticised by questioning the complexity of TRACE[1][3].

Merge was criticised for being unable to account for evidence of top-down processing and for containing interactive elements[4][5][6]. Merge was further criticised for being in unnecessary opposition to the ART model, which recognises that top-down processing occurs in other cognitive processes[5][7].

Motivation

[edit]

Alternative to TRACE

[edit]

Merge was proposed as an alternative to the TRACE model of speech processing.

The key criticisms of TRACE being that:

  • According to the information encapsulation principle, TRACE violates modularity by incorporating top-down processing[1].
  • TRACE uses unnecessary processing time to correct mispronunciations: it must correct information in all preceding stages of processing before producing an output[1][8].
  • The top-down processes in TRACE are unnecessarily active when a correct pronunciation is inputted[1][9].

In sum, TRACE was described as inefficient, overcomplicated, and non-modular[1].

Expansion of The Race Model

[edit]

Merge is an expansion of The Race Model, an autonomous system, developed by Cutler and Norris in 1979[10]. The Race Model was criticised for being unable to process mispronunciations[1].

To correct mispronunciations without top-down processing, the phonemic (distinguished sound) and lexical (whole word) information required merging in a way that maintained modularity[1].

Architecture

[edit]
Depiction of the Merge model architecture whilst processing a mispronunciation of the word dog, pronounced /dɒg/.

In Merge, when speech is inputted to the (acoustic) feature stage, the acoustic information is passed to the phoneme stage, which decodes each feature individually into distinguishable sounds, also known as phonemes[1]. Then, the phonemes are passed to the lexical stage, where a potential word match is identified[1]. If the phonemes are an exact match to the potential word, the potential word is outputted as text[1].

If the phonemes are not an exact match to the potential word, such as during instances of mispronunciation, decision stages are built within the phoneme stage and the lexical stage[1].

Lexical decision stages inform phoneme decision stages of other potential word matches[1], merging lexical and phonemic information[1]. Potential words and potential phonemes compete with each other to identify enough evidence for one potential word match[1].

Merge is described as superior to TRACE because:

  • Decision stages are only built when required, they are not unnecessarily active[1].
  • There is no top-down processing because lexical and phonemic information are merged into independent decision stages[1].
  • There is no top-down processing, therefore information in each stage does not need correcting before producing an output[1].

In sum, Merge is described as more efficient and parsimonious than TRACE, as well as modular[1].

Criticism

[edit]

Merge received several criticisms following publication.

Modularity

[edit]

Critics questioned the modularity of Merge because of the construction of decision stages and the merging of information[11][12].

It was proposed that decision stages, which are only active when there is no exact word match, must involve a top-down process directed from the lexical stage to activate itself[11].

It was further proposed that merging two distinct types of information, from two distinct stages, indicates an interactive aspect: in a strictly modular system, distinct types of information cannot interact with each other[11].

In the same line of argument, it was raised that phonemic decision stages must consistently consult the lexical decision stages to identify one potential word match, indicating a further interactive aspect[11][12].

Evidence for top-down processing

[edit]

Several critics questioned why top-down processes should not exist in speech processing when evidence for their existence has been found in other cognitive processes[4][5]. For example, it was highlighted that top-down processes have been found in the visual and auditory systems of monkeys[4][5].

Feedback consistency

[edit]

Merge was further described as lacking the architecture to accommodate feedback consistency[13]: where one pronunciation can be written several ways, each representing a different meaning[6].

Feedback consistency has been reported to increase auditory processing time during lexical decision tasks[6][13], suggesting that the lexical stage must direct a top-down process to the phoneme stage to correct the predicted word match to the potential word match[13]. Whilst attempting to simulate the increased processing time associated with feedback consistency, it was found that purely autonomous architectures were incompatible[14].

Phonemic restoration

[edit]

It was then proposed that Merge lacks the architecture needed to accommodate instances of phonemic restoration[5].

Phonemic restoration occurs when an ambiguous sound cannot be decoded into a phoneme without subsequent contextual information[5]. It was put forward that phonemic restoration must require some form of top-down processing to be able to decode the ambiguous sound after processing the subsequent context[4]. For example: ?eel, pronounced /?il/, where /?/ represents an ambiguous sound, does not contain enough evidence to form a word match in the lexical stage: /?/ may be replaced with /p/, to form peel, pronounced /pil/, or /h/, to form heel, pronounced /hil/[5]. When followed by the context: is on the shoe, the ambiguous sound /?/ may be decoded as /h/ to form heel, pronounced /hil/[5]. When followed by the context: is on the orange, the ambiguous sound /?/ may be decoded as /p/ to form peel, pronounced /pil/[5].

It was identified that both Merge and TRACE only accept single word inputs, therefore they may not be able to accept context to perform phonemic restoration[5].

ART Model

[edit]

Critics pointed to the interactive Adaptive Resonance Theory (ART) model[5][7], which is reported to accommodate feedback consistency and phonemic restoration[5][7].

Recognising that top-down processes have been evidenced in cognitive processes outside of speech processing, ART was designed with both top-down and bottom-up processing aspects, permitting feedback consistency to require a longer processing time[5].

Furthermore, by accepting a continuous stream of input, ART is said to process preceding and subsequent context in order to restore ambiguous phonemes[5][7].

References

[edit]
  1. ^ a b c d e f g h i j k l m n o p q r s t u Norris, D.; McQueen, J. M.; Cutler, A. (2000). "Merging information in speech recognition: Feedback is never necessary". Behavioral and Brain Sciences. 23 (3): 299–325. doi:10.1017/s0140525x00003241. ISSN 0140-525X.
  2. ^ a b Kennison, S. (2018). Psychology of language: Theory and applications (1st ed.). London: Bloomsbury Publishing. pp. 75–76.
  3. ^ McClelland, J. L.; Elman, J. L. (1986). "The TRACE model of speech perception". Cognitive Psychology. 18 (1): 1–86. doi:10.1016/0010-0285(86)90015-0. ISSN 0010-0285.
  4. ^ a b c d Doeleman, T. L.; Sereno, J. A.; Jongman, A.; Sereno, S. C. (2000). "Features and feedback". Behavioral and Brain Sciences. 23 (3): 328–329. doi:10.1017/s0140525x00263243. ISSN 0140-525X.
  5. ^ a b c d e f g h i j k l m n Grossberg, S. (2000). "Brain feedback and adaptive resonance in speech perception". Behavioral and Brain Sciences. 23 (3): 332–333. doi:10.1017/s0140525x00303247. ISSN 0140-525X.
  6. ^ a b c Lacruz, I.; Folk, J. R. (2004). "Feedforward and Feedback Consistency Effects for High- and Low-Frequency Words in Lexical Decision and Naming". The Quarterly Journal of Experimental Psychology Section A. 57 (7): 1261–1262. doi:10.1080/02724980343000756. ISSN 0272-4987.
  7. ^ a b c d Carpenter, G. A.; Grossberg, S. (1987). "ART 2: self-organization of stable category recognition codes for analog input patterns". Applied Optics. 26 (23): 4919–4926. doi:10.1364/ao.26.004919.
  8. ^ Frauenfelder, U. H.; Segui, J.; Dijkstra, T. (1990). "Lexical effects in phonemic processing: Facilitatory or inhibitory?". Journal of Experimental Psychology: Human Perception and Performance. 16 (1): 77–91.
  9. ^ Frauenfelder, U. H.; Peeters, G. (1998). "Simulating the time-course of spoken word recognition: An analysis of lexical competition in TRACE". In Grainger, J.; Jacobs, A. M. (eds.). Localist Connectionist Approaches To Human Cognition. London: Erlbaum Associates.
  10. ^ Cutler, A; Norris, D (1979). "Monitoring sentence comprehension". In Garrett, M. F.; Cooper, W. E.; Walker, E. C. T. (eds.). Sentence processing: Psycholinguistic Studies presented to Merrill Garrett (1st ed.). Hillsdale: Erlbaum Associates. pp. 117–126.
  11. ^ a b c d Appelbaum, I. (2000). "Merging information versus speech recognition". Behavioral and Brain Sciences. 23 (3): 325–326.
  12. ^ a b Connine, C. M.; LoCasto, P. C. (2000). "Inhibition". Behavioral and Brain Sciences. 23 (3): 328–328.
  13. ^ a b c Johannes, C.; Ziegler, A. B.; Guy, C.; Van Orden, C. (2000). "Feedback consistency effects". Behavioral and Brain Sciences. 23 (3): 351–352.
  14. ^ Jacobs, A. M.; Rey, A.; Ziegler, J. C.; Grainger, J. (1998). "MROM-P: An interactive activation, multiple read-out model of orthographic and phonological processes in visual word recognition.". In Grainger, J.; Jacobs, A. M. (eds.). Localist Connectionist Approaches To Human Cognition (1st ed.). Erlbaum Associates. pp. 152–153, 171–172.
[edit]

Types of Speech Processing - bottom-up versus top-down processing, interactive vs autonomous models of speech processing.