Jump to content

Wikipedia:Sandbox: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Clearing the sandbox (BOT EDIT) (Peachy 2.0 (alpha 8))
Tag: Replaced
Prgraben (talk | contribs)
New entry "Neural Automata"
Line 5: Line 5:
* Feel free to try your editing skills below *
* Feel free to try your editing skills below *
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■-->
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■-->

'''Neural Automata''' ('''NA''') <ref name="CarmantiniEA15"> Carmantini
GS, beim Graben P, Desroches M, Rodrigues S. Turing computation with
recurrent artificial neural networks. In: Proceedings of the NIPS Workshop
on Cognitive Computation: Integrating neural and symbolic approaches
[Internet]. 2015. pp. 5–13, http://arxiv.org/abs/1511.01427.</ref><ref
name="CarmantiniEA17">Carmantini GS, beim Graben P, Desroches M,
Rodrigues S. A modular architecture for transparent computation in
recurrent neural networks. Neural Networks. 2017;85:85–105.</ref> are
[[neural networks]] that emulate either any particular [[Turing Machine]]
(TM) or the [[Universal Turing Machine]] (UTM) in general. Therefore, they
are capable of executing arbitrary partially computable functions.

= General Introduction =

NA ultimately present a theoretical framework for dissecting symbolic
representations and computations (i.e. neural code) directly from neuronal
data. Thus the ultimate question of this framework is as follows: assuming
that the brain performs computations then how to understand these
computations in the light of TM and how are these implemented in the
brain? This is achieved by employing [[algebraic representation theory]],
which allows to map TM to biologically plausible [[neural network]] models
and subsequently compare (or correlate) the observables of the model with
neuronal data (e.g. from [[central pattern generator]]s); see Figure
[[#fig:1|1]]. Moreover, the symbols and computations are mapped onto the
neural network’s activation functions, [[synapse]]s and network
architecture, which implies that theoretical principles can be developed to
unravel how memory and computations are achieved by neurophysiological
processes.

[[File:Neural_Representation.png|thumb|none|alt=<span>Aim: Mapping
Turing machines to biologically plausible neural network to ultimately
extract the neural code.</span><span
label="fig:1"></span>|<span>Aim: Mapping Turing machines to
biologically plausible neural network to ultimately extract the neural
code.</span><span label="fig:1"></span>]]

= Representation of Turing Computations in NA =

Algebraic representation theory together with [[Vector Symbolic
Architectures]] (VSA)<ref name="Gayler06"> Gayler RW. Vector symbolic
architectures are a viable alternative for Jackendoff’s challenges. Behavioral
and Brain Sciences. 2006 Feb;29(01):78–9.</ref><ref name="Plate95">
Plate TA. Holographic reduced representations. IEEE Transactions on Neural
Networks. 1995;6(3):623–41.</ref><ref name="Kanerva09"> Kanerva P.
Hyperdimensional computing: An introduction to computing in distributed
representation with high-dimensional random vectors. Cognitive
Computation. 2009;1(2):139–59.</ref><ref name="GrabenPotthast09a">
beim Graben P, Potthast R. Inverse problems in dynamic cognitive
modeling. Chaos. 2009;19(1):015103.</ref> enables the passage from a
symbolic space (<math display="inline">S</math>) to a vectorial space
(<math display="inline">V</math>). This is necessary since the space of
symbols is not endowed with sufficient mathematical structure and indeed
measurements and observables are real numbers. Abstractly, one defines a
map, <math display="inline">\pi: S \rightarrow V</math>, that allows to
combine different elements of the symbolic space (i.e. <math
display="inline">s \in S</math>) and represent them uniquely in terms of
the canonical elements of the basis vectors of the vectorial space (<math
display="inline">\pi(s) \in V</math>). Moreover, Turing computations
(<math display="inline">\lambda : S \rightarrow S</math>) defined on
the space of symbols can also be realized via some representation (<math
display="inline">\Lambda : V \rightarrow V</math>) defined on the vector
space. Finally, it is crucial that the map <math
display="inline">\pi</math> is structure-preserving by satisfying the
commutative diagram shown in Figure [[#fig:2|2]]. In other words, this
mapping is well-defined if it satisfies the relation <math
display="inline">\Lambda (\pi(s)) = \pi(\lambda(s))</math>.<br />

[[File:comdiag.png|thumb|none|188px|alt=<span>Commutative map and
structure preserving map <math
display="inline">\pi</math>.</span><span
label="fig:2"></span>|<span>Commutative map and structure preserving
map <math display="inline">\pi</math>.</span><span
label="fig:2"></span>]]

= Specific Implementation of NA =

NA have been introduced by G. Carmantini, P. beim Graben, M. Desroches,
and S. Rodrigues<ref name="CarmantiniEA15" /><ref
name="CarmantiniEA17" /> as parsimonious, transparent and modular
neural network implementations of [[Nonlinear Dynamical Automata]]
(NDA). <ref name="GrabenPotthast09a" /> <ref name="Moore90"> Moore
C. Unpredictability and undecidability in dynamical systems. Physical Review
Letters. 1990;64(20):2354–7.</ref><ref name="Moore91a">Moore C.
Generalized shifts: Unpredictability and undecidability in dynamical systems.
Nonlinearity. 1991;4:199–230.</ref><ref
name="GrabenLiebscherSaddy00"> beim Graben P, Liebscher T, Saddy JD.
Parsing ambiguous context-free languages by dynamical systems:
Disambiguation and phase transitions in neural networks with evidence from
event-related brain potentials (ERP). In: Jokinen K, Heylen D, Njiholt A,
editors. Learning to behave. Enschede: Universiteit Twente; 2000. pp.
119–35. (TWLT 18; vol. II: Internalising Knowledge).</ref> <ref
name="GrabenJurishEA04"> beim Graben P, Jurish B, Saddy D, Frisch S.
Language processing by dynamical systems. International Journal of
Bifurcation and Chaos. 2004;14(2):599–621.</ref><ref
name="GrabenGerthVasishth08"> beim Graben P, Gerth S, Vasishth S.
Towards dynamical system models of language-related brain potentials.
Cognitive Neurodynamics [Internet]. 2008;2(3):229–55.</ref> This makes
them contrasting with Google DeepMind’s Neural Turing Machines (NTM),
<ref name="GravesWayneDanihelka14">Graves A, Wayne G, Danihelka I.
Neural Turing machines [Internet]. Google DeepMind; 2014.
http://arxiv.org/abs/1410.5401.</ref> which are actually neural
implementations of conventional von Neumann computers, requiring
gradient descent learning of at least 60,000 parameters.

The construction of NA relies essentially on fundamental insights of
automata and dynamical systems theory. The complete mapping is shown in
Figure [[#fig:3|3]]. In a first step, NDA are built from symbolic dynamics,
shift spaces, and Gödel encodings, as a particular instance of VSA, making
NA completely transparent and hence explainable.<ref
name="DoranSchulzBesold17">Doran D, Schulz S, Besold TR. What does
explainable AI really mean? A new conceptualization of perspectives; 2017.
http://arxiv.org/abs/1710.00794 </ref>

[[File:Complete_mapping.png|thumb|none|alt=<span>A complete
mapping from Turing machines to neural networks.</span><span
label="fig:3"></span>|<span>A complete mapping from Turing machines
to neural networks.</span><span label="fig:3"></span>]]

In a second step, the resulting nonlinear dynamics is mediated by piecewise
affine-linear maps that are defined over fractal domains of an appropriately
constructed state space. This state space is given as the unit square
embedded into a two-dimensional real vector space. Each dimension
Gödel-encodes the tape of a Turing machine in one direction from its
read/write head.

In a final third step, the NDA is implemented as a recurrent neural network,
comprising three modules. The ''machine configuration layer'' consists of
two neurons with activation space <math display="inline">[0, 1]</math>
representing the machine configuration in the unit square. The ''branch
selection layer'' carries out a classification task for deciding which branch of
the piecewise affine-linear NDA map is required for the next computation.
The ''linear transformation layer'' stores the parameters of the NDA map
and transforms the present state of the machine configuration layer to its
successor.

Compared to previous neural implementations of TM in computational
neuroscience, <ref name="SiegelmannSontag95">Siegelmann HT, Sontag
ED. On the computational power of neural nets. Journal of Computer and
System Sciences. 1995;50(1):132–50.</ref> NA are much more
parsimonious, implemented with only some 200 neural units. They do not
require any training as their architecture and synaptic connectivity are
explicitly and transparently constructed from the TM’s transition table.

NA are different from [[Neural Field Automata]] (NFT), which implement
NDA by means of [[Dynamic Neural Fields]] (DNF).<ref
name="GrabenPotthast14">beim Graben P, Potthast R. Universal neural
field computation. In: Coombes S, Graben P beim, Potthast R, Wright JJ,
editors. Neural fields: Theory and applications. Berlin: Springer; 2014. pp.
299–318. </ref>

= Example =

In the introductory publications,<ref name="CarmantiniEA15" />,<ref
name="CarmantiniEA17" /> NA have been used to construct [[central
pattern generator]]s (CPG) as [[finite-state automata]], and a
diagnosis-repair parser for ambiguous [[context-free grammar]]s from an
interacting automata network (IAN) in a modular way. Together with neural
observation models, they allow for correlation analyses with
neurophysiological data from cognitive neuroscience, e.g. simulated ERP,
MEG, or fMRI.

The animated Figure [[#fig:4|[fig:4]]] shows the dynamics exhibited by the
machine configuration layer for the processing of the linguistic example in
<ref name="CarmantiniEA17" />. It refers to a psycholinguistic experiment
conducted in German language using event-related brain potentials (ERP).
Since German allows for relatively free word order, sentences are not
restricted to the canonical subject-verb-object rule (SVO) of English. Thus,
two kinds of experimental conditions (1) control sentences with “normal” SO
structure and (2) sentences with deviant OS structure have been presented
to the subjects of the study. In the ERP experiment a late positivity (P600),
reflecting a “garden path” caused by syntactic diagnosis and repair
processes for OS sentences has been reported.

In the simulation, one neural unit encodes the content of the stack, the
other one that of the remaining input of a pushdown automaton processing
a context-free grammar (for details see <ref name="CarmantiniEA17" />.
In analogy to the ERP study, many realizations of the NA dynamics have
been simulated, starting from randomly prepared initial conditions. The
moving rectangles shown in Figure [[#fig:4|[fig:4]]], represent the
envelopes of clouds of individual microstates that can be regarded as neural
macrostates.

Figure [[#fig:5|4]] presents synthetic ERPs of the simulation from Figure
[[#fig:4|[fig:4]]] <ref name="CarmantiniEA17" />,<ref
name="GrabenGerthVasishth08" />. These are mean activations of the NA,
averaged over all network units and over all realizations.

[[File:synerp.png|thumb|none|alt=<span>Synthetic ERP of an NA,
processing canonical SO sentences (“normal”) versus deviant OS sentences
(“garden path”).</span><span label="fig:5"></span>|<span>Synthetic
ERP of an NA, processing canonical SO sentences (“normal”) versus deviant
OS sentences (“garden path”).</span><span label="fig:5"></span>]]

Obviously the network observable indicates a difference when processing
deviant OS sentences in comparison to “normal” SO sentences.

The results shown in Figure [[#fig:5|4]] could be used for further
correlation analysis between neurophysiological observables and simulated
data from NA.

= References =

<references />

Revision as of 11:23, 29 October 2019

Neural Automata (NA) [1][2] are neural networks that emulate either any particular Turing Machine (TM) or the Universal Turing Machine (UTM) in general. Therefore, they are capable of executing arbitrary partially computable functions.

General Introduction

NA ultimately present a theoretical framework for dissecting symbolic representations and computations (i.e. neural code) directly from neuronal data. Thus the ultimate question of this framework is as follows: assuming that the brain performs computations then how to understand these computations in the light of TM and how are these implemented in the brain? This is achieved by employing algebraic representation theory, which allows to map TM to biologically plausible neural network models and subsequently compare (or correlate) the observables of the model with neuronal data (e.g. from central pattern generators); see Figure 1. Moreover, the symbols and computations are mapped onto the neural network’s activation functions, synapses and network architecture, which implies that theoretical principles can be developed to unravel how memory and computations are achieved by neurophysiological processes.

Aim: Mapping Turing machines to biologically plausible neural network to ultimately extract the neural code.

Representation of Turing Computations in NA

Algebraic representation theory together with [[Vector Symbolic Architectures]] (VSA)[3][4][5][6] enables the passage from a symbolic space () to a vectorial space (). This is necessary since the space of symbols is not endowed with sufficient mathematical structure and indeed measurements and observables are real numbers. Abstractly, one defines a map, , that allows to combine different elements of the symbolic space (i.e. ) and represent them uniquely in terms of the canonical elements of the basis vectors of the vectorial space (). Moreover, Turing computations () defined on the space of symbols can also be realized via some representation () defined on the vector space. Finally, it is crucial that the map is structure-preserving by satisfying the commutative diagram shown in Figure 2. In other words, this mapping is well-defined if it satisfies the relation .

File:Comdiag.png
Commutative map and structure preserving map .

Specific Implementation of NA

NA have been introduced by G. Carmantini, P. beim Graben, M. Desroches, and S. Rodrigues[1][2] as parsimonious, transparent and modular neural network implementations of Nonlinear Dynamical Automata (NDA). [6] [7][8][9] [10][11] This makes them contrasting with Google DeepMind’s Neural Turing Machines (NTM), [12] which are actually neural implementations of conventional von Neumann computers, requiring gradient descent learning of at least 60,000 parameters.

The construction of NA relies essentially on fundamental insights of automata and dynamical systems theory. The complete mapping is shown in Figure 3. In a first step, NDA are built from symbolic dynamics, shift spaces, and Gödel encodings, as a particular instance of VSA, making NA completely transparent and hence explainable.[13]

A complete mapping from Turing machines to neural networks.

In a second step, the resulting nonlinear dynamics is mediated by piecewise affine-linear maps that are defined over fractal domains of an appropriately constructed state space. This state space is given as the unit square embedded into a two-dimensional real vector space. Each dimension Gödel-encodes the tape of a Turing machine in one direction from its read/write head.

In a final third step, the NDA is implemented as a recurrent neural network, comprising three modules. The machine configuration layer consists of two neurons with activation space representing the machine configuration in the unit square. The branch selection layer carries out a classification task for deciding which branch of the piecewise affine-linear NDA map is required for the next computation. The linear transformation layer stores the parameters of the NDA map and transforms the present state of the machine configuration layer to its successor.

Compared to previous neural implementations of TM in computational neuroscience, [14] NA are much more parsimonious, implemented with only some 200 neural units. They do not require any training as their architecture and synaptic connectivity are explicitly and transparently constructed from the TM’s transition table.

NA are different from Neural Field Automata (NFT), which implement NDA by means of Dynamic Neural Fields (DNF).[15]

Example

In the introductory publications,[1],[2] NA have been used to construct [[central pattern generator]]s (CPG) as finite-state automata, and a diagnosis-repair parser for ambiguous context-free grammars from an interacting automata network (IAN) in a modular way. Together with neural observation models, they allow for correlation analyses with neurophysiological data from cognitive neuroscience, e.g. simulated ERP, MEG, or fMRI.

The animated Figure [fig:4] shows the dynamics exhibited by the machine configuration layer for the processing of the linguistic example in [2]. It refers to a psycholinguistic experiment conducted in German language using event-related brain potentials (ERP). Since German allows for relatively free word order, sentences are not restricted to the canonical subject-verb-object rule (SVO) of English. Thus, two kinds of experimental conditions (1) control sentences with “normal” SO structure and (2) sentences with deviant OS structure have been presented to the subjects of the study. In the ERP experiment a late positivity (P600), reflecting a “garden path” caused by syntactic diagnosis and repair processes for OS sentences has been reported.

In the simulation, one neural unit encodes the content of the stack, the other one that of the remaining input of a pushdown automaton processing a context-free grammar (for details see [2]. In analogy to the ERP study, many realizations of the NA dynamics have been simulated, starting from randomly prepared initial conditions. The moving rectangles shown in Figure [fig:4], represent the envelopes of clouds of individual microstates that can be regarded as neural macrostates.

Figure 4 presents synthetic ERPs of the simulation from Figure [fig:4] [2],[11]. These are mean activations of the NA, averaged over all network units and over all realizations.

Synthetic ERP of an NA, processing canonical SO sentences (“normal”) versus deviant OS sentences (“garden path”).

Obviously the network observable indicates a difference when processing deviant OS sentences in comparison to “normal” SO sentences.

The results shown in Figure 4 could be used for further correlation analysis between neurophysiological observables and simulated data from NA.

References

  1. ^ a b c Carmantini GS, beim Graben P, Desroches M, Rodrigues S. Turing computation with recurrent artificial neural networks. In: Proceedings of the NIPS Workshop on Cognitive Computation: Integrating neural and symbolic approaches [Internet]. 2015. pp. 5–13, http://arxiv.org/abs/1511.01427.
  2. ^ a b c d e f Carmantini GS, beim Graben P, Desroches M, Rodrigues S. A modular architecture for transparent computation in recurrent neural networks. Neural Networks. 2017;85:85–105.
  3. ^ Gayler RW. Vector symbolic architectures are a viable alternative for Jackendoff’s challenges. Behavioral and Brain Sciences. 2006 Feb;29(01):78–9.
  4. ^ Plate TA. Holographic reduced representations. IEEE Transactions on Neural Networks. 1995;6(3):623–41.
  5. ^ Kanerva P. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation. 2009;1(2):139–59.
  6. ^ a b beim Graben P, Potthast R. Inverse problems in dynamic cognitive modeling. Chaos. 2009;19(1):015103.
  7. ^ Moore C. Unpredictability and undecidability in dynamical systems. Physical Review Letters. 1990;64(20):2354–7.
  8. ^ Moore C. Generalized shifts: Unpredictability and undecidability in dynamical systems. Nonlinearity. 1991;4:199–230.
  9. ^ beim Graben P, Liebscher T, Saddy JD. Parsing ambiguous context-free languages by dynamical systems: Disambiguation and phase transitions in neural networks with evidence from event-related brain potentials (ERP). In: Jokinen K, Heylen D, Njiholt A, editors. Learning to behave. Enschede: Universiteit Twente; 2000. pp. 119–35. (TWLT 18; vol. II: Internalising Knowledge).
  10. ^ beim Graben P, Jurish B, Saddy D, Frisch S. Language processing by dynamical systems. International Journal of Bifurcation and Chaos. 2004;14(2):599–621.
  11. ^ a b beim Graben P, Gerth S, Vasishth S. Towards dynamical system models of language-related brain potentials. Cognitive Neurodynamics [Internet]. 2008;2(3):229–55.
  12. ^ Graves A, Wayne G, Danihelka I. Neural Turing machines [Internet]. Google DeepMind; 2014. http://arxiv.org/abs/1410.5401.
  13. ^ Doran D, Schulz S, Besold TR. What does explainable AI really mean? A new conceptualization of perspectives; 2017. http://arxiv.org/abs/1710.00794
  14. ^ Siegelmann HT, Sontag ED. On the computational power of neural nets. Journal of Computer and System Sciences. 1995;50(1):132–50.
  15. ^ beim Graben P, Potthast R. Universal neural field computation. In: Coombes S, Graben P beim, Potthast R, Wright JJ, editors. Neural fields: Theory and applications. Berlin: Springer; 2014. pp. 299–318.