Vector Symbolic Architectures

Motivation

Vector Symbolic Architecture(s) (VSA) is a term coined by psychologist Ross W. Gayler [1] to refer to a class of connectionist network architectures developed since the mid 1990s. Such architectures were developed in an attempt to address the challenges to connectionism posed by Jerry Fodor, Zenon Pylyshyn, Ray Jackendoff, and other researchers. These researchers have argued that connectionist networks are – either in principle or in practice – incapable of modeling crucial features cognition (language and thought), such as compositionality (the ability to put together complex thoughts or sentences from simpler thoughts or words) and systematicity (the fact that someone who understands the sentence John kissed Mary can also understand the sentence Mary kissed John).[2] VSA addresses these challenges by providing a binding operator associating individual (John, Mary) with roles (AGENT, PATIENT) and a superposition operator that allows multiple associations to be composed into a coherent whole. The name VSA comes from the fact that vectors are the sole means of representing all entities (roles, fillers, compositions).

Origins

VSA originated in Paul Smolensky's Tensor Product Model [3], which implemented binding via outer product. Although this approach worked in principle, it was subject to a combinatorial explosion problem: binding two vectors of N elements resulted in a square matrix of N2 elements; binding that matrix with another vector of N elements resulted in a cube of N3 elements, etc.

Approaches

All VSA approaches implement composition as vector addition and solve the combinatorial explosion problem by reducing the outer product back to a single vector of N dimensions. Individual approaches differ in the kind of reduction they perform and the contents of the vectors they use. Tony Plate's Holographic Reduced Representations (HRR)[4] use real-valued vectors and reduce the matrix through circular convolution. Pentti Kanerva's Binary Spatter Codes (BSC)[5] use binary (0/1) vectors and ignore the off-diagonal matrix elements, so that binding is implemented as elementwise (Hadamard) product ⊗. Ross Gayler's Multiply / Add / Permute (MAP) architecture likewise ignores the non-diagonal elements but uses +1/-1 instead of 0/1, so that each vector is its own multiplicative inverse. This self-inverse property is convenient for recovering the elements of individual associations but necessitates some additional mechanism to 'protect' or quote associations embedded in others (Bill knows that [John kissed Mary]). Hence, MAP uses a permutation operator for quoting/protecting. Permutation is also useful for encoding order or precedence: rather than simply associating some other item A with an item B, we can represent the fact that A comes before B by binding the vector for A with the permutation of the vector for B; i.e., A⊗P(B).

Having solved the problem of combinatorial explosion, VSA approaches are free to use vectors of arbitrarily large dimensionality (size); N=10,000 is typical. Such large vectors provide a number of desirable features; for example, the space of such vectors contains on the order of 2N nearly-orthogonal vectors [6] , and each such vector can be degraded by up to 30% and still be closer to its original form than to any of the other vectors in the space.[7] As with ordinary arithmetic, vector multiplication and addition are associative and commuative, and multiplication distributes over addition.

The use of large vectors also makes VSA a variety of distributed representation, where the identity of a symbol is encoded across all elements of the representation (vector), and each element the representation takes part in representing the symbol. Chris Eliasmith and his colleagues have argued that, compared to 'localist' representations (one element per symbol), such representations are more capable of scaling up to nontrivial problems. [8] Stewart, Bekolay, and Eliasmith have also provided an implemention of HRR in spiking neurons , which they embed in an architecture that uses them to do processing,[9] adding to the plausibility of VSA as a model of cognition in humans and other animals.

Unlike many traditional neural networks, VSAs do not rely on backpropagation or other compute-intensive learning algorithms, and, as shown in the example below, can often induce a pattern from a single example. Further, elementwise multiplication and addition are embarrassingly parallel operations that can be performed efficiently (in principle, in constant time).

Noise

Because they use finite dimensionality and finite precision, all VSA approaches can be considered a variety of lossy (noisy) compression. To recover the original vectors in the presence of noise, VSA employs a cleanup memory that stores the vector representations of the items of interest (Mary, AGENT, etc.) Cleanup memory can be implemented as a simple list of items, as a Hopfield Network, or in a spiking-neuron model.[15]

Example

This example, due to Pentti Kanerva[10], shows how the MAP variety of VSA can be used to perform analogical reasoning.

Consider the knowledge we have about a particular country: for example, the name of our country is United States of America, and our capital is Washington DC; we might say that Washington DC fills the CAPITAL role for the US. Our monetary unit is the dollar. The name of our neighbor to the south is Mexico; its capital is Mexico City, and its currency is the peso. Using the notation <X> to mean the vector representation of X, we can represent this knowledge as follows:

V1 = [<NAME>⊗<USA> + <CAP>⊗<WDC> + <MON>⊗<DOL> ]

V2 = [<NAME>⊗<MEX> + <CAP>⊗<MXC> + <MON>⊗<PES> ]

To compare the United States to Mexico, we can take the vector product V1V2 of these representations, which is itself just another vector:

[<NAME>⊗<USA> + <CAP>⊗<WDC> + <MON>⊗<DOL> ] ⊗
[<NAME>⊗<MEX> + <CAP>⊗<MXC> + <MON>⊗<PES> ]

Multiplying out, we see that this vector is equal to the vector

<NAME>⊗<USA>⊗<NAME>⊗<MEX> +
<NAME>⊗<USA>⊗<CAP>⊗<MXC> +
<NAME>⊗<USA>⊗<MON>⊗<PES> +
<CAP>⊗<WDC>⊗<NAME>⊗<MEX> +
<CAP>⊗<WDC>⊗<CAP>⊗<MXC> +
<CAP>⊗<WDC>⊗<MON>⊗<PES> +
<MON>⊗<DOL>⊗<NAME>⊗<MEX> +
<MON>⊗<DOL>⊗<CAP>⊗<MXC> +
<MON>⊗<DOL>⊗<MON>⊗<PES>

Thanks to associativity and commutativity, we can rewrite this expression as

<NAME>⊗<NAME>⊗<USA>⊗<MEX> +
<CAP>⊗<CAP>⊗<WDC>⊗<MXC> +
<MON>⊗<MON>⊗<DOL>⊗<PES> +
<NAME>⊗<USA>⊗<CAP>⊗<MXC> +
<NAME>⊗<USA>⊗<MON>⊗<PES> +
<CAP>⊗<WDC>⊗<NAME>⊗<MEX> +
<CAP>⊗<WDC>⊗<MON>⊗<PES> +
<MON>⊗<DOL>⊗<NAME>⊗<MEX> +
<MON>⊗<DOL>⊗<CAP>⊗<MXC>

Because of the self-canceling property of MAP, this expression can be rewritten as

<USA>⊗<MEX> +
<WDC>⊗<MXC> +
<DOL>⊗<PES> +
<NAME>⊗<USA>⊗<CAP>⊗<MXC> +
<NAME>⊗<USA>⊗<MON>⊗<PES> +
<CAP>⊗<WDC>⊗<NAME>⊗<MEX> +
<CAP>⊗<WDC>⊗<MON>⊗<PES> +
<MON>⊗<DOL>⊗<NAME>⊗<MEX> +
<MON>⊗<DOL>⊗<CAP>⊗<MXC>

To ask 'What is the 'dollar of Mexico'? we multiply this vector by <DOL>, our VSA representation of dollar:

<DOL> ⊗
[<USA>⊗<MEX> +
<WDC>⊗<MXC> +
<DOL>⊗<PES> +
<NAME>⊗<USA>⊗<CAP>⊗<MXC> +
<NAME>⊗<USA>⊗<MON>⊗<PES> +
<CAP>⊗<WDC>⊗<NAME>⊗<MEX> +
<CAP>⊗<WDC>⊗<MON>⊗<PES> +
<MON>⊗<DOL>⊗<NAME>⊗<MEX> +
<MON>⊗<DOL>⊗<CAP>⊗<MXC>]

=

[<DOL>⊗<USA>⊗<MEX> +
<DOL>⊗<WDC>⊗<MXC> +
<DOL>⊗<DOL>⊗<PES> +
<NAME>⊗<USA>⊗<CAP>⊗<MXC> +
<DOL>⊗<NAME>⊗<USA>⊗<MON>⊗<PES> +
<DOL>⊗<CAP>⊗<WDC>⊗<NAME>⊗<MEX> +
<DOL>⊗<CAP>⊗<WDC>⊗<MON>⊗<PES> +
<DOL>⊗<MON>⊗<DOL>⊗<NAME>⊗<MEX> +
<DOL>⊗<MON>⊗<DOL>⊗<CAP>⊗<MXC>]

=

[<PES> +
<DOL>⊗<USA>⊗<MEX> +
<DOL>⊗<WDC>⊗<MXC> +
<NAME>⊗<USA>⊗<CAP>⊗<MXC> +
<DOL>⊗<NAME>⊗<USA>⊗<MON>⊗<PES> +
<DOL>⊗<CAP>⊗<WDC>⊗<NAME>⊗<MEX> +
<DOL>⊗<CAP>⊗<WDC>⊗<MON>⊗<PES> +
<DOL>⊗<MON>⊗<DOL>⊗<NAME>⊗<MEX> +
<DOL>⊗<MON>⊗<DOL>⊗<CAP>⊗<MXC>]

This vector will be highly correlated with the vector <PES> representing the peso, and will have a low correlation with any of the other vectors (<DOL>, <USA>, <NAME>, etc.) Hence we can rewrite the expression as

[<PES> + noise]

where noise is a vector uncorrelated with any other vector in our model. If we like, we can get a perfect reproduction of the representation <PES> by passing this noisy vector through a cleanup memory. More important, though, is that we have correctly answered the question 'What is the dollar of Mexico?' with the answer Peso, without the need for decomposing or extracting any of the components of the encoded representation, as we would with a traditional computational model based on data structures and pointers.

Applications

Because of the generality of the role/filler mechanism, VSA can be applied to a variety of tasks. Recent applications have included solutions to intelligence tests[11] and graph isomorphisms [12], as well as a behavior-based robotics [13] and the representation of word meaning and order in natural language [14].

References

1. ^ Ross W. Gayler (2003) Vector Symbolic Architectures Answer Jackendoff's Challenges for Cognitive Neuroscience. In Slezak, P., ed.: ICCS/ASCS International Conference on Cognitive Science. CogPrints, Sydney, Australia, University of New South Wales, 133-138.
2. ^ Jerry A. Fodor and Zenon W. Pylyshyn (1988). Connectionism and Cognitive architecture: A Critical Analysis. Cognition, 28:1–2, 3-71.
3. ^ Paul Smolensky (1990). Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems. Artificial Intelligence 46, 159-216.
4. ^ Tony Plate (2005) Holographic Reduced Representation: Distributed Representation for Cognitive Structures.Stanford: CSLI Publications.
5. ^ Pentti Kanerva (1994) The Spatter Code for Encoding Concepts at Many Levels. In M. Marinaro and P.G. Morasso (eds.), ICANN '94, Proceedings of International Conference on Artificial Neural Networks (Sorrento, Italy), vol. 1, 226-229. London: Springer-Verlag.
6. ^ Robert Hecht-Nielsen (1994) Context Vectors; General Purpose Approximate Meaning Representations Self-organized from Raw Data. In J.M. Zurada, R. J. Marks II and C. J. Robinson, Computational Intelligence: Imitating Life. IEEE Press.
7. ^ Pentti Kanerva (2009) Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors. Cognitive Computation 1(2), 139-159.
8. ^ Terry Stewart and Chris Eliasmith. (2011) Compositionality and Biologically Plausible Models. In W. Hinzen, E. Machery, and M. Werning (eds.) Oxford Handbook of Compositionality. Oxford University Press.
9. ^ Terry Stewart, Trevor Bekolay, and Chris Eliasmith (2011). Neural Representations of Compositional Structures: Representing and Manipulating Vector Spaces with Spiking Neurons. Connection Science 22(3), 145-153.
10. ^ Pentti Kanerva (2010) What We Mean When We Say "What's the Dollar of Mexico?": Prototypes and Mapping in Concept Space. In P D. Bruza, W. Lawless, K. van Rijsbergen, D. A. Sofge, and D. Widdows (eds.), Quantum Informatics for Cognitive, Social, and Semantic Processes: Papers from the AAAI Symposium.
11. ^ Daniel Rasmussen and Chris Eliasmith (2011) A Neural Model of Rule Generation in Inductive Reasoning. Topics in Cognitive Science 3, 140–153
12. ^ Ross W. Gayler and Simon D. Levy (2009) A Distributed Basis for Analogical Mapping. Proceedings of the Second International Analogy Conference. NBU Press.
13. ^ Simon D. Levy, Suraj Bajracharya, and Ross W. Gayler (2013) Learning Behavior Hierarchies via High-Dimensional Sensor Projection. In Learning Rich Representations from Low-Level Sensors: Papers from the 2013 AAAI Workshop.
14. ^ Michael N. Jones and Douglas J. Mewhort (2007) Representing Word Meaning and Order Information in a Composite Holographic Lexicon. Psychological Review 114:1, 1-37.
15. ^ Terrence C. Stewart, Yichuan Tang, and Chris Eliasmith (2010) A Biologically Realistic Cleanup Memory: Autoassociation in Spiking Neurons. Cognitive Systems Research 12, 84-92.