This is the user sandbox of Lsmaguire. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

Merge model

The Merge model, known as Merge, is an autonomous model of speech processing, developed by Dennis Norris, Anne Cutler, and James McQueen in 2000^[1].

Merge attempts to simulate how humans might decode speech as a group of sounds to interpret meaning. Within Merge, once information is passed from one stage of processing to the next, information cannot be fed backward to a preceding stage, also known as bottom-up processing^[2]. In interactive models such as TRACE, information can be fed backward to a preceding stage, also known as top-down processing^[2], as well as fed forward.

Merge is an expansion of The Race Model, designed to process mispronunciations using only bottom-up processing^[1] in response to data suggesting that humans decode mispronunciations by top-down processing^[1]. The efficiency of top-down processing was criticised by questioning the complexity of TRACE^[1]^[3].

Merge was criticised for being unable to account for evidence of top-down processing and for containing interactive elements^[4]^[5]^[6]. Merge was further criticised for being in unnecessary opposition to the ART model, which recognises that top-down processing occurs in other cognitive processes^[5]^[7].

Motivation

Alternative to TRACE

Merge was proposed as an alternative to the TRACE model of speech processing.

The key criticisms of TRACE being that:

According to the information encapsulation principle, TRACE violates modularity by incorporating top-down processing^[1].
TRACE uses unnecessary processing time to correct mispronunciations: it must correct information in all preceding stages of processing before producing an output^[1]^[8].
The top-down processes in TRACE are unnecessarily active when a correct pronunciation is inputted^[1]^[9].

In sum, TRACE was described as inefficient, overcomplicated, and non-modular^[1].

Expansion of The Race Model

Merge is an expansion of The Race Model, an autonomous system, developed by Cutler and Norris in 1979^[10]. The Race Model was criticised for being unable to process mispronunciations^[1].

To correct mispronunciations without top-down processing, the phonemic (distinguished sound) and lexical (whole word) information required merging in a way that maintained modularity^[1].

Architecture

In Merge, when speech is inputted to the (acoustic) feature stage, the acoustic information is passed to the phoneme stage, which decodes each feature individually into distinguishable sounds, also known as phonemes^[1]. Then, the phonemes are passed to the lexical stage, where a potential word match is identified^[1]. If the phonemes are an exact match to the potential word, the potential word is outputted as text^[1].

If the phonemes are not an exact match to the potential word, such as during instances of mispronunciation, decision stages are built within the phoneme stage and the lexical stage^[1].

Lexical decision stages inform phoneme decision stages of other potential word matches^[1], merging lexical and phonemic information^[1]. Potential words and potential phonemes compete with each other to identify enough evidence for one potential word match^[1].

Merge is described as superior to TRACE because:

Decision stages are only built when required, they are not unnecessarily active^[1].
There is no top-down processing because lexical and phonemic information are merged into independent decision stages^[1].
There is no top-down processing, therefore information in each stage does not need correcting before producing an output^[1].

In sum, Merge is described as more efficient and parsimonious than TRACE, as well as modular^[1].

Criticism

Merge received several criticisms following publication.

Modularity

Critics questioned the modularity of Merge because of the construction of decision stages and the merging of information^[11]^[12].

It was proposed that decision stages, which are only active when there is no exact word match, must involve a top-down process directed from the lexical stage to activate itself^[11].

It was further proposed that merging two distinct types of information, from two distinct stages, indicates an interactive aspect: in a strictly modular system, distinct types of information cannot interact with each other^[11].

In the same line of argument, it was raised that phonemic decision stages must consistently consult the lexical decision stages to identify one potential word match, indicating a further interactive aspect^[11]^[12].

Evidence for top-down processing

Several critics questioned why top-down processes should not exist in speech processing when evidence for their existence has been found in other cognitive processes^[4]^[5]. For example, it was highlighted that top-down processes have been found in the visual and auditory systems of monkeys^[4]^[5].

Feedback consistency

Merge was further described as lacking the architecture to accommodate feedback consistency^[13]: where one pronunciation can be written several ways, each representing a different meaning^[6].

Feedback consistency has been reported to increase auditory processing time during lexical decision tasks^[6]^[13], suggesting that the lexical stage must direct a top-down process to the phoneme stage to correct the predicted word match to the potential word match^[13]. Whilst attempting to simulate the increased processing time associated with feedback consistency, it was found that purely autonomous architectures were incompatible^[14].

Phonemic restoration

It was then proposed that Merge lacks the architecture needed to accommodate instances of phonemic restoration^[5].

Phonemic restoration occurs when an ambiguous sound cannot be decoded into a phoneme without subsequent contextual information^[5]. It was put forward that phonemic restoration must require some form of top-down processing to be able to decode the ambiguous sound after processing the subsequent context^[4]. For example: ?eel, pronounced /?il/, where /?/ represents an ambiguous sound, does not contain enough evidence to form a word match in the lexical stage: /?/ may be replaced with /p/, to form peel, pronounced /pil/, or /h/, to form heel, pronounced /hil/^[5]. When followed by the context: is on the shoe, the ambiguous sound /?/ may be decoded as /h/ to form heel, pronounced /hil/^[5]. When followed by the context: is on the orange, the ambiguous sound /?/ may be decoded as /p/ to form peel, pronounced /pil/^[5].

It was identified that both Merge and TRACE only accept single word inputs, therefore they may not be able to accept context to perform phonemic restoration^[5].

ART Model

Critics pointed to the interactive Adaptive Resonance Theory (ART) model^[5]^[7], which is reported to accommodate feedback consistency and phonemic restoration^[5]^[7].

Recognising that top-down processes have been evidenced in cognitive processes outside of speech processing, ART was designed with both top-down and bottom-up processing aspects, permitting feedback consistency to require a longer processing time^[5].

Furthermore, by accepting a continuous stream of input, ART is said to process preceding and subsequent context in order to restore ambiguous phonemes^[5]^[7].

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u Norris, D.; McQueen, J. M.; Cutler, A. (2000). "Merging information in speech recognition: Feedback is never necessary". Behavioral and Brain Sciences. 23 (3): 299–325. doi:10.1017/s0140525x00003241. ISSN 0140-525X.
^ ^a ^b Kennison, S. (2018). Psychology of language: Theory and applications (1st ed.). London: Bloomsbury Publishing. pp. 75–76.
^ McClelland, J. L.; Elman, J. L. (1986). "The TRACE model of speech perception". Cognitive Psychology. 18 (1): 1–86. doi:10.1016/0010-0285(86)90015-0. ISSN 0010-0285.
^ ^a ^b ^c ^d Doeleman, T. L.; Sereno, J. A.; Jongman, A.; Sereno, S. C. (2000). "Features and feedback". Behavioral and Brain Sciences. 23 (3): 328–329. doi:10.1017/s0140525x00263243. ISSN 0140-525X.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ Grossberg, S. (2000). "Brain feedback and adaptive resonance in speech perception". Behavioral and Brain Sciences. 23 (3): 332–333. doi:10.1017/s0140525x00303247. ISSN 0140-525X.
^ ^a ^b ^c Lacruz, I.; Folk, J. R. (2004). "Feedforward and Feedback Consistency Effects for High- and Low-Frequency Words in Lexical Decision and Naming". The Quarterly Journal of Experimental Psychology Section A. 57 (7): 1261–1262. doi:10.1080/02724980343000756. ISSN 0272-4987.
^ ^a ^b ^c ^d Carpenter, G. A.; Grossberg, S. (1987). "ART 2: self-organization of stable category recognition codes for analog input patterns". Applied Optics. 26 (23): 4919–4926. doi:10.1364/ao.26.004919.
^ Frauenfelder, U. H.; Segui, J.; Dijkstra, T. (1990). "Lexical effects in phonemic processing: Facilitatory or inhibitory?". Journal of Experimental Psychology: Human Perception and Performance. 16 (1): 77–91.
^ Frauenfelder, U. H.; Peeters, G. (1998). "Simulating the time-course of spoken word recognition: An analysis of lexical competition in TRACE". In Grainger, J.; Jacobs, A. M. (eds.). Localist Connectionist Approaches To Human Cognition. London: Erlbaum Associates.
^ Cutler, A; Norris, D (1979). "Monitoring sentence comprehension". In Garrett, M. F.; Cooper, W. E.; Walker, E. C. T. (eds.). Sentence processing: Psycholinguistic Studies presented to Merrill Garrett (1st ed.). Hillsdale: Erlbaum Associates. pp. 117–126.
^ ^a ^b ^c ^d Appelbaum, I. (2000). "Merging information versus speech recognition". Behavioral and Brain Sciences. 23 (3): 325–326.
^ ^a ^b Connine, C. M.; LoCasto, P. C. (2000). "Inhibition". Behavioral and Brain Sciences. 23 (3): 328–328.
^ ^a ^b ^c Johannes, C.; Ziegler, A. B.; Guy, C.; Van Orden, C. (2000). "Feedback consistency effects". Behavioral and Brain Sciences. 23 (3): 351–352.
^ Jacobs, A. M.; Rey, A.; Ziegler, J. C.; Grainger, J. (1998). "MROM-P: An interactive activation, multiple read-out model of orthographic and phonological processes in visual word recognition.". In Grainger, J.; Jacobs, A. M. (eds.). Localist Connectionist Approaches To Human Cognition (1st ed.). Erlbaum Associates. pp. 152–153, 171–172.

External Links

Types of Speech Processing - bottom-up versus top-down processing, interactive vs autonomous models of speech processing.

[:0-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u Norris, D.; McQueen, J. M.; Cutler, A. (2000). "Merging information in speech recognition: Feedback is never necessary". Behavioral and Brain Sciences. 23 (3): 299–325. doi:10.1017/s0140525x00003241. ISSN 0140-525X.

[:1-2] Kennison, S. (2018). Psychology of language: Theory and applications (1st ed.). London: Bloomsbury Publishing. pp. 75–76.

[3] McClelland, J. L.; Elman, J. L. (1986). "The TRACE model of speech perception". Cognitive Psychology. 18 (1): 1–86. doi:10.1016/0010-0285(86)90015-0. ISSN 0010-0285.

[:2-4] Doeleman, T. L.; Sereno, J. A.; Jongman, A.; Sereno, S. C. (2000). "Features and feedback". Behavioral and Brain Sciences. 23 (3): 328–329. doi:10.1017/s0140525x00263243. ISSN 0140-525X.

[:3-5] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ Grossberg, S. (2000). "Brain feedback and adaptive resonance in speech perception". Behavioral and Brain Sciences. 23 (3): 332–333. doi:10.1017/s0140525x00303247. ISSN 0140-525X.

[:4-6] Lacruz, I.; Folk, J. R. (2004). "Feedforward and Feedback Consistency Effects for High- and Low-Frequency Words in Lexical Decision and Naming". The Quarterly Journal of Experimental Psychology Section A. 57 (7): 1261–1262. doi:10.1080/02724980343000756. ISSN 0272-4987.

[:5-7] Carpenter, G. A.; Grossberg, S. (1987). "ART 2: self-organization of stable category recognition codes for analog input patterns". Applied Optics. 26 (23): 4919–4926. doi:10.1364/ao.26.004919.

[8] Frauenfelder, U. H.; Segui, J.; Dijkstra, T. (1990). "Lexical effects in phonemic processing: Facilitatory or inhibitory?". Journal of Experimental Psychology: Human Perception and Performance. 16 (1): 77–91.

[9] Frauenfelder, U. H.; Peeters, G. (1998). "Simulating the time-course of spoken word recognition: An analysis of lexical competition in TRACE". In Grainger, J.; Jacobs, A. M. (eds.). Localist Connectionist Approaches To Human Cognition. London: Erlbaum Associates.

[:9-10] Cutler, A; Norris, D (1979). "Monitoring sentence comprehension". In Garrett, M. F.; Cooper, W. E.; Walker, E. C. T. (eds.). Sentence processing: Psycholinguistic Studies presented to Merrill Garrett (1st ed.). Hillsdale: Erlbaum Associates. pp. 117–126.

[:6-11] Appelbaum, I. (2000). "Merging information versus speech recognition". Behavioral and Brain Sciences. 23 (3): 325–326.

[:7-12] Connine, C. M.; LoCasto, P. C. (2000). "Inhibition". Behavioral and Brain Sciences. 23 (3): 328–328.

[:8-13] Johannes, C.; Ziegler, A. B.; Guy, C.; Van Orden, C. (2000). "Feedback consistency effects". Behavioral and Brain Sciences. 23 (3): 351–352.

[14] Jacobs, A. M.; Rey, A.; Ziegler, J. C.; Grainger, J. (1998). "MROM-P: An interactive activation, multiple read-out model of orthographic and phonological processes in visual word recognition.". In Grainger, J.; Jacobs, A. M. (eds.). Localist Connectionist Approaches To Human Cognition (1st ed.). Erlbaum Associates. pp. 152–153, 171–172.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]