An Information-Geometric Formulation of Pattern Separation and Evaluation of Existing Indices

Wang, Harvey; Singh, Selena; Trappenberg, Thomas; Nunes, Abraham

doi:10.3390/e26090737

Open AccessArticle

An Information-Geometric Formulation of Pattern Separation and Evaluation of Existing Indices

¹

Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada

²

Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, ON L8S 4L8, Canada

³

Department of Psychiatry, Dalhousie University, Halifax, NS B3H 4R2, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(9), 737; https://doi.org/10.3390/e26090737

Submission received: 11 June 2024 / Revised: 25 August 2024 / Accepted: 26 August 2024 / Published: 29 August 2024

(This article belongs to the Special Issue Entropy and Information in Biological Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Pattern separation is a computational process by which dissimilar neural patterns are generated from similar input patterns. We present an information-geometric formulation of pattern separation, where a pattern separator is modeled as a family of statistical distributions on a manifold. Such a manifold maps an input (i.e., coordinates) to a probability distribution that generates firing patterns. Pattern separation occurs when small coordinate changes result in large distances between samples from the corresponding distributions. Under this formulation, we implement a two-neuron system whose probability law forms a three-dimensional manifold with mutually orthogonal coordinates representing the neurons’ marginal and correlational firing rates. We use this highly controlled system to examine the behavior of spike train similarity indices commonly used in pattern separation research. We find that all indices (except scaling factor) are sensitive to relative differences in marginal firing rates, but no index adequately captures differences in spike trains that result from altering the correlation in activity between the two neurons. That is, existing pattern separation metrics appear (A) sensitive to patterns that are encoded by different neurons but (B) insensitive to patterns that differ only in relative spike timing (e.g., synchrony between neurons in the ensemble).

Keywords:

pattern separation; information geometry; information theory

1. Introduction

1.1. Pattern Separation in Computational Neuroscience

The hippocampus plays a key role in the formation of complex associative and episodic memories [1,2]. Classical computational models have proposed that the hippocampus performs two complementary neural computations to minimize interference and maximize information storage: pattern separation and pattern completion [3]. Pattern separation is a computation performed by a neural network that minimizes the similarity between distinct but overlapping input patterns [4]. It is thought to be performed prior to long-term memory storage to reduce the probability of interference in memory recall and enhance downstream pattern completion [5]. Pattern completion, in contrast to pattern separation, is performed by a network during the retrieval of stored patterns when presented with partial or degraded input patterns.

The hippocampal dentate gyrus has a set of properties that makes it ideally suited to perform pattern separation, such as the sparse and competitive firing of granule cells [6,7]. Rodent studies have shown that activity patterns in the dentate gyrus are less correlated than those in the entorhinal cortex and hippocampal CA3 region, consistent with dentate gyrus pattern separation [8]. Importantly, pattern separation is not only found in the hippocampus. The cerebellum and insect mushroom body also have properties that would facilitate pattern separation [9]. Thus, pattern separation is a fundamental computation utilized by the brain to reduce interference between activity patterns to promote functions such as memory encoding.

Physiological changes in the dentate gyrus have been linked to conditions involving cognitive impairments, including schizophrenia [10], Alzheimer’s disease [11], and epilepsy [12]. Patients affected by these conditions present with reduced abilities to distinguish between similar stimuli [13,14,15]. There is, therefore, great interest in the detailed biophysical mechanisms underlying pattern separation. It has been theorized that ensembles of interconnected, transiently active neurons encode and transfer information with their precise timing in neural firing [16], thus enabling numerous operations in the brain [17,18]. Many computational models of such neuron assemblies have been implemented to simulate pattern separation under various physiological conditions [19,20,21,22]. However, when studied in isolation, “pattern separation” is often defined even more arbitrarily, often reduced to refer to any computation that minimizes similarity or overlap (see further discussions in Section 1.2 and Section 3), and many of the aforementioned studies often employ inconsistent measurements for it. How do we simulate a computation for which there is no precise definition? Such definition, therefore, is essential to further the understanding of what pattern separation is and is not, as well as providing a theoretical basis for further studies.

1.2. There Exists No Single Definition for Pattern Separation

Pattern separation is generally defined as a computation that maximizes dissimilarity or orthogonalization [23,24] between neural patterns, given similar yet distinct input patterns [25,26,27]. Definition 1 formalizes this concept.

Definition 1.

For some dissimilarity or distance measure d and input neural patterns

X^{(1)}

and

X^{(2)}

, the mapping f performs pattern separation if and only if the dissimilarity between its outputs is greater than that between

X^{(1)}

and

X^{(2)}

:

d (X^{(1)}, X^{(2)}) < d (f (X^{(1)}), f (X^{(2)})) .

Clearly, the definition of pattern separation relies heavily on the form of the dissimilarity measure d. Definition 1 is in fact not a definition for a single, but rather, for multiple computations. This creates a problem where different forms of d would translate to different interpretations for pattern separation. For example, if Pearson correlation is used for d, then pattern separation would be a computation that minimizes correlation; if d is the Hamming distance, then pattern separation is a computation that minimizes overlap.

Having a diverse set of indices, all describing “pattern separation”, limits the comparability of pattern separation efficacy across computational studies [20,23,28]. Madar et al. [29] showed that quantifying the pattern separation abilities of a system by applying different measures d to the same dataset lead to different conclusions; for example, they showed that pairs of spike trains can be uncorrelated without being orthogonal, and therefore suggested that pattern separation should be considered a group of potential computations, since it is unclear which spike train features are most relevant to the brain and what constitutes similarity or dissimilarity. For instance, if information was coded using a neuron’s firing rate, a pattern separator would aim to convert a series of input trains with similar rates to ones with dissimilar rates. On the other hand, the firing rate could carry no information, and the system would rather use spike timing (or temporal coding) to represent information; in this case, the pattern separator may aim to simply change the relative timing of spike trains. A recent analysis of the most commonly used pattern separation indices demonstrated that some failed to capture information loss that may occur with high degrees of sparsity (a property often associated with strong pattern separation), and instead found that these measures confounded information loss with excellent pattern separation [30]. Although these studies evaluated existing pattern separation measures, the evaluations were not conducted in such a way that would control for degrees of separation achieved by different pattern separation strategies. Taken together, these analyses of the most commonly used pattern separation measures highlight major issues with consistency, measurement accuracy, and applicability between studies, and highlight the importance of developing a single, unifying definition of pattern separation, along with a controlled re-evaluation of existing measures.

1.3. An Information-Geometric Formulation of Pattern Separation

In this work, we use tools from information geometry to describe pattern separation as a single computation. This formulation is consistent with the existing notion of pattern separation (Definition 1) but also addresses shortcomings of existing definitions and measures.

A pattern separator (e.g., a neural network such as the hippocampal dentate gyrus) and its inputs (e.g., from the entorhinal cortex) are respectively modeled as a statistical manifold and its coordinates. A point on this manifold is a probability distribution that (A) is identified by its input coordinates and (B) generates output patterns (i.e., spike trains) as samples. A highly effective separator maps input patterns with small differences to output patterns with large differences. That is, small differences between the input spike trains representing different patterns (which here constitute small changes in coordinates), would result in very different output spike trains from the neural pattern separator. Under our formulation, an excellent pattern separator would correspond to a statistical manifold with high curvature that maps small changes in its coordinates to distributions whose samples are highly distant or distinct from each other. In such manifolds, a single sample from a distribution provides a large degree of information about the underlying parameters or inputs by which it was generated. This is not only consistent with traditional notions of pattern separation but also satisfies the problem identified by Bird et al. [30], where common measures can fail to distinguish between pattern separation and information loss (which could occur by randomly adding noise to a pattern).

We demonstrate an application of this formalism by implementing a two-neuron system with an analytically tractable probability law such that distances between spike train distributions on the underlying manifold can be controlled. Using this system, we can simulate pattern separation that occurs by (A) changing which neurons are active across patterns, (B) changing how much the neurons are active, and (C) changing the relative timing of neural spikes across patterns. We then evaluate the degree to which several commonly used pattern separation indices are sensitive to each of these changes. These three circuit properties ultimately sculpt the types of spatiotemporal firing patterns that a circuit is ideally suited to produce and process [16]. Specific groups of neurons may have distinct combinations of the above three properties to facilitate the same neural computation repeatedly [16], which may serve to ensure efficient and reliable neural computation. Precise spike timing has been theorized to underlie information transfer within and between neuronal ensembles [18], and repeating connectivity patterns between these ensembles across the neocortex may additionally facilitate spike-timing-dependent neural computation, localized to specific cortical regions [31]. Therefore, the three parameters we choose here are relevant to known mechanisms of pattern generation and encoding in the brain, and we simulate pattern separation by subsequently modulating these parameters.

1.4. An Evaluation of Existing Indices of Pattern Separation

Existing indices appear much more sensitive to pattern separation as a result of changes in the neuron’s firing rates, and appear, in general, insensitive to pattern separation as a result of changes in the rate of the neuron’s coincident firing (and thus the relatively timing by which the neurons are active). In addition, our results provide further evidence that, while sensitive to changes in neuronal firing rates, the variety of existing pattern separation indices could lead to inconsistent conclusions on the amount of pattern separation performed.

2. An Information-Geometric Formulation of Pattern Separation

2.1. Pattern Separator Described by a Statistical Manifold

Let us use an N-dimensional statistical manifold

P

to describe the neural activity of a pattern separator, such as the hippocampal dentate gyrus, in response to different inputs. Consider the manifold’s coordinates

ξ

, representing parameters that influence the separator’s outputs. These parameters can be subdivided into two different types: (A) those that are intrinsic to the system, such as network connectivity and other biophysical properties, and (B) those received as input patterns from other areas, such as the perforant path inputs from entorhinal cortex to the hippocampal dentate gyrus. Each distinct parameterization

ξ

specifies a probability distribution

p (x; ξ)

on

P

, from which neural patterns

x

may be drawn. In the case that the intrinsic parameters are fixed, the change in coordinates

d ξ = {(d ξ_{i})}_{i = 1, \dots, N}

describes the change in the inputs, and the distance between the probability distributions reflects the change in the outputs.

A highly effective pattern separator maps input patterns with small differences to output patterns with large differences. Under our information-geometric formulation, this would correspond to a manifold

P

with high curvature, which maps small changes in its coordinates

d ξ

, to distributions whose samples are highly distant or distinct from each other,

p (x; ξ)

and

p (x; ξ + d ξ)

. Under this model, a single output sample provides a large degree of information about the underlying parameters by which it was generated.

The definition of distance between probability distributions

p (x; ξ)

and

p (x; ξ + d ξ)

is given by the Fisher information matrix. The direction and magnitude of change in a probability distribution, as a result of a small change

d ξ_{i}

in the

i^{th}

coordinate, are described by the derivative with respect to

d ξ_{i}

,

L (ξ_{i}) = \frac{\partial log p (x; ξ)}{\partial ξ_{i}},

where

L

is known as the score function. Elements of the Fisher information matrix

G = {(g (ξ_{i}, ξ_{j}))}_{i, j = 1, \dots, N}

are given by the expectation of the product of the score functions for two coordinates

\begin{matrix} g (ξ_{i}, ξ_{j}) & = {〈L (ξ_{i}) L (ξ_{j})〉}_{x} \\ = {〈\frac{\partial log p (x; ξ)}{\partial ξ_{i}} \frac{\partial log p (x; ξ)}{\partial ξ_{j}}〉}_{x}, \end{matrix}

(1)

where

{〈f (x)〉}_{x}

denotes the expectation value of a function

f (x)

, defined as the weighted average over all possible values of

x

,

\sum_{x} f (x) p (x)

. The squared distance between two probability distributions defined by

ξ

and

ξ + d ξ

can then be computed by the quadratic form of

d ξ

,

d s^{2} = \sum_{i = 1}^{N} \sum_{j = 1}^{N} g (ξ_{i}, ξ_{j}) d ξ_{i} d ξ_{j} .

(2)

In our formulation of pattern separation, two different coordinates,

ξ

and

ξ + d ξ

, represent two different input patterns. These coordinates define two different probability distributions,

p (x; ξ)

and

p (x; ξ + d ξ)

, whose samples are less overlapping, or more separated, as a result of

d ξ

. This conceptualization of pattern separation is consistent with its existing notion given in Definition 1, and provides a clear mathematical formalism that we demonstrate when analyzing different pattern separation metrics.

It is important to note that while we can directly simulate changes

d ξ

theoretically,

ξ

and

d ξ

are only indirectly influenced in real neural systems. For example, pattern separation may be achieved by drastically altering intrinsic parameters such as network connectivity and neuronal biophysics. We emphasize here that these properties may change within biological systems and affect pattern separation performance, but these changes occur over longer time scales than what is most relevant for pattern separation of two similar and highly overlapping stimuli presented consecutively. We, however, still consider these mechanisms of achieving pattern separation. As such, to evaluate the existing indices, we manipulate

d ξ

to simulate changes in the input patterns or physiological properties of the network, giving us a ground truth for the type and degree of the simulated pattern separation.

2.2. A Two-Neuron Manifold and Its Orthogonal Coordinates

Consider a model of a system of two neurons [32] and its time-independent firing pattern

x = {(x_{i})}_{i = 1, 2}

in a given observation window, where

x_{i} = 1

and

x_{i} = 0

indicate that the ith neuron is active (with one or more spikes) or silent, respectively. We use the highly controlled probability distribution

p (x; ξ)

for this model to separately alter the independent firing rates and the degree of correlation between neurons. Under the formalism introduced above, these manipulations change the coordinates on the system’s manifold. Controlling the amount of pattern separation, as well as the strategy by which it is achieved, facilitates our evaluation of the existing indices. Specifically, we can use this model to simulate the pattern separation that arises from changes in each neuron’s independent firing rates, the relative timing that they are active, or both.

The probability distribution

p (x; ξ)

for this model is given by the probabilities that each neuron is active or silent,

q_{m n} = Prob {x_{1} = m, x_{2} = n}, m, n = 0, 1 .

These probabilities give an exact expansion of the logarithm of the probability distribution of the system,

log p (x; ξ) = log \frac{q_{10}}{q_{00}} x_{1} + log \frac{q_{01}}{q_{00}} x_{2} + θ x_{1} x_{2} + log q_{00},

where

θ = log \frac{q_{11} q_{00}}{q_{10} q_{01}} .

(3)

The term

θ

specifies the within-ensemble correlation that represents the degree to which neurons 1 and 2 are correlated.

θ

is 0 when the two neurons are uncorrelated, approaches

- \infty

as the two neurons become maximally anti-correlated (as

q_{11}

and

q_{00}

approach 0), and approaches ∞ as the two neurons become maximally correlated (as

q_{10}

and

q_{01}

approach 1).

Among the set of four probabilities,

{q_{00}, q_{10}, q_{01}, q_{11}}

, only three variables are free, due to the constraint

q_{00} + q_{10} + q_{01} + q_{11} = 1 .

The manifold for this two-neuron model is therefore three dimensional. Though there are many coordinate systems for this model (for example, any of the three from

{q_{00}, q_{10}, q_{01}, q_{11}}

), we use

ξ = (η_{1}, η_{2}, θ)

to simulate different types of pattern separation, where the variables

\begin{matrix} η_{1} = 〈x_{1}〉 = q_{10} + q_{11}, η_{2} = 〈x_{2}〉 = q_{01} + q_{11}, \end{matrix}

(4)

denote the marginal firing rates of neurons 1 and 2, respectively. Changes in

η_{1}, η_{2}, θ

represent the means by which pattern separation may be achieved in a neural system in equilibrium: by changing (A) which neurons fire, (B) how much they fire, and/or (C) the relative timing at which they fire.

The parameters

η_{1}, η_{2}, θ

form a convenient coordinate system because

(η_{1}, η_{2})

and

θ

are mutually orthogonal, meaning that their directions of change are uncorrelated [32]. Specifically, this means that

g (η_{1}, θ) = g (η_{2}, θ) = 0 .

(5)

Crucially, the orthogonality between

(η_{1}, η_{2})

and

θ

allows us to increase the distance between two probability distributions by changing only

d θ

or one of

d η_{1}

and

d η_{2}

. In other words, for two distributions with the same values of

η_{1}

and

η_{2}

, the distance between them is proportional to the difference between their

θ

values,

d s^{2} |_{d η_{1} = d η_{2} = 0} = g (θ, θ) {(d θ)}^{2} .

(6)

Similarly, this orthogonality simplifies Equation (2) in the case where

d θ = 0

d s^{2} |_{d θ = 0} = 2 g (η_{1}, η_{2}) d η_{1} d η_{2} + g (η_{1}, η_{1}) {(d η_{1})}^{2} + g (η_{2}, η_{2}) {(d η_{2})}^{2} .

(7)

Using this setup, we define a “control” system by specifying its probability distribution using these three variables, denoted as

p (x; η_{1}, η_{2}, θ)

. Using similar notation, we define a second “comparison” distribution

p (x; η_{1} + d η_{1}, η_{2} + d η_{2}, θ + d θ)

, and thus simulate pattern separation by systematically manipulating the values of

d η_{1}

,

d η_{2}

, and

d θ

to alter the distance between

p (x; η_{1}, η_{2}, θ)

and

p (x; η_{1} + d η_{1}, η_{2} + d η_{2}, θ + d θ)

. The changes in the parameters of the two distributions (i.e.,

d η_{1}

,

d η_{2}

, and

d θ

) represent the differences between two input patterns presented to a pattern separator, and the distance between the distributions represents the differences between the corresponding outputs from the separator. Pattern separation is therefore achieved in this system when small

d η_{1}

,

d η_{2}

, and/or

d θ

result in a large distance between the control and comparison distributions.

An appropriate index of pattern separation should be sensitive to the increased distance between

p (x; η_{1}, η_{2}, θ)

and

p (x; η_{1} + d η_{1}, η_{2} + d η_{2}, θ + d θ)

. This increased distance corresponds to pattern separation as a result of changes in each neuron’s firing rates, relative timing of activity, or a combination of both.

2.3. Generating Diverging Patterns

Neural patterns were generated from control and comparison distributions

p (x; η_{1}, η_{2}, θ)

and

p (x; η_{1} + d η_{1}, η_{2} + d η_{2}, θ + d θ)

, by sampling

N_{bins} = 10^{3}

times from each distribution. Each sample constituted a “time bin” in which one, both, or neither of the neurons may be active; that is, samples were drawn from categorical distributions with event probabilities

{q_{00}, q_{10}, q_{01}, q_{11}}

for each time bin.

2.3.1. Patterns with Dissimilar Neuronal Coincident Firing Rates

To assess the sensitivity of existing indices in detecting pattern separation as a result of changes in within-ensemble, neuron–neuron correlation structures, we set the marginal firing rates of the two neurons constant and gradually increased

d θ

. Specifically, we set the control distribution to

p (x; η_{1}, η_{2}, 0)

, and the comparison to

p (x; η_{1}, η_{2}, d θ)

. Recall from Equation (6), the squared distance between the two distributions is proportional to

{(d θ)}^{2}

in this setup.

We assigned new values for

θ

by changing

q_{11}

and

q_{00}

the same amount in the same direction, while changing

q_{10}

and

q_{01}

the same amount in the opposite direction, thereby keeping

η_{1}

and

η_{2}

constant. In other words,

q_{00}, q_{10}, q_{01}, and q_{11}

were updated by the following

\begin{matrix} q_{00} \leftarrow q_{00} + d q_{11}, q_{10} \leftarrow q_{10} - d q_{11}, q_{01} \leftarrow q_{01} - d q_{11}, q_{11} \leftarrow q_{11} + d q_{11}, \end{matrix}

so that the corresponding new

θ

was

\begin{matrix} θ \leftarrow log \frac{(q_{11} + d q_{11}) (q_{00} + d q_{11})}{(q_{10} - d q_{11}) (q_{01} - d q_{11})}, \end{matrix}

where

η_{1}

and

η_{2}

remained the same

\begin{matrix} η_{1} \leftarrow (q_{10} - d q_{11}) + (q_{11} + d q_{11}) = q_{10} + q_{11}, \\ η_{2} \leftarrow (q_{01} - d q_{11}) + (q_{11} + d q_{11}) = q_{01} + q_{11} . \end{matrix}

This procedure was repeated for different values of

η_{1}

and

η_{2} \geq η_{1}

. We also applied the additional constraints

η_{2} = 1 - η_{1}

to keep the overall firing rate of the system constant.

Overall, this experimental setup created comparison distributions whose two neurons ranged from being nearly maximally anti-correlated (when

q_{11} = q_{00} = 10^{- 2}

so that

d θ

is small) to nearly maximally correlated (when

q_{10} = 10^{- 2}

so that

d θ

is large).

2.3.2. Patterns with Dissimilar Neuronal Firing Rates

To assess the sensitivity of existing indices in detecting pattern separation as a result of changes in one neuron’s firing rate, we set the marginal firing rate of one of the neurons to be constant, allowed the neurons to fire independently, and gradually increased

d η_{2}

. Specifically, we set the control distribution to

p (x; η_{1}, 0.1, 0)

and the comparison to

p (x; η_{1}, 0.1 + d η_{2}, 0)

. The squared distance between these two distributions is proportional to

{(d η_{2})}^{2}

:

d s^{2} |_{d η_{1} = d θ = 0} = g (η_{2}, η_{2}) {(d η_{2})}^{2} .

(8)

We defined our systems by expressing

{q_{00}, q_{10}, q_{01}, q_{11}}

as functions of

η_{1}

and

η_{2}

at

θ = 0

. Since

θ = 0

means that neurons 1 and 2 fire independently,

q_{11} = η_{1} η_{2}

. We could also, of course, show this by directly using the two-neuron model, with

\begin{matrix} θ = log \frac{q_{11} q_{00}}{q_{10} q_{01}} = 0 & \Leftrightarrow \frac{q_{11} q_{00}}{q_{10} q_{01}} = 1 \Leftrightarrow q_{11} = \frac{q_{10} q_{01}}{q_{00}} . \end{matrix}

Since

q_{10} = η_{1} - q_{11}

,

q_{01} = η_{2} - q_{11}

, and

q_{00} = 1 - q_{10} - q_{01} - q_{11}

, we can rearrange the above,

\begin{matrix} q_{11} & = \frac{(η_{1} - q_{11}) (η_{2} - q_{11})}{1 - (η_{1} - q_{11}) - (η_{2} - q_{11}) - q_{11}} \\ \Leftrightarrow (η_{1} - 1) (η_{2} - 1) = 1 - η_{1} - η_{2} + q_{11} \\ \Leftrightarrow q_{11} = η_{1} η_{2} . \end{matrix}

Overall, this experimental setup created comparison distributions that permitted one of the neurons to range from sparsely active (when

d η_{2}

was small) to densely firing (when

d η_{2}

was large).

3. Existing Indices

Recall from Definition 1 that pattern separation is characterized by the increase in pairwise dissimilarity between two neural patterns. Specifically, the mapping f performs pattern separation if and only if,

d (X^{(1)}, X^{(2)}) < d (f (X^{(1)}), f (X^{(2)})),

where d is some index of dissimilarity between two neural patterns

X^{(1)}

and

X^{(2)}

. The superscripts (1) and (2) respectively denote the first and second patterns examined. Individual indices are denoted as d with subscripts.

3.1. Pattern Representation

Existing indices of pattern separation operate on one of two representations of neural patterns: discretized vectors

x

or non-discretized lists T (the notation

X

above merely denotes the pattern as a random variable but makes no specification on how the pattern is represented).

3.1.1. Non-Discretized Patterns

Consider a system of

N_{neurons}

neurons. The spiking pattern of this system can be represented as a list of spike trains from each individual neuron

T = {T_{i}}_{i = 1, \dots, N_{neurons}}

.

T_{i} = {T_{i j}}_{j = 1, \dots, n_{i}}

denotes the list of spike times

T_{i j}

from the

i th

neuron, where

n_{i}

is the total number of times the neuron fires.

3.1.2. Discretized Patterns

A pattern T can be binned and binarized so that a wider variety of indices can be applied. This is generally performed by dividing the observation window into

N_{bins}

equally sized bins, and assigning each bin a binary value

x_{i j} \in {0, 1}

.

x_{i j} = 1

indicates that the

i th

neuron is active at least once in the

j th

bin, and

x_{i j} = 0

otherwise. The matrix representation of a binarized pattern is then simply

X = {(x_{i j})}_{i = 1, \dots, N_{neurons}}^{j = 1, \dots, N_{bins}}

. These matrices are then generally vectorized into

\begin{matrix} x & = vec (X^{⊺}) \\ = [x_{11}, x_{12}, \dots, x_{1 N_{bins}}, x_{21}, x_{22}, \dots, x_{2 N_{bins}}, \dots x_{N_{neurons} 1}, x_{N_{neurons} 2}, \dots, x_{N_{neurons} N_{bins}}] . \end{matrix}

(9)

3.2. Common Indices

The Pearson correlation,

d_{ρ} (x^{(1)}, x^{(2)}) = \frac{〈x^{(1)} x^{(2)}〉 - 〈x^{(1)}〉 〈x^{(2)}〉}{\sqrt{〈{(x^{(1)})}^{2}〉 - {〈x^{(1)}〉}^{2}} \sqrt{〈{(x^{(2)})}^{2}〉 - {〈x^{(2)}〉}^{2}}},

(10)

is one of the most widely used indices. It has been used to study pattern separation both in computational [33] and experimental studies [8,34,35,36], as well as under various physiological conditions, such as during epileptic hyperexcitability in the dentate gyrus [33].

The cosine similarity, or the normalized dot product,

d_{θ} (x^{(1)}, x^{(2)}) = \frac{x^{(1)} \cdot x^{(2)}}{∥ x^{(1)} ∥ ∥ x^{(2)} ∥},

(11)

is another popular index of pattern separation. The original Hebb–Marr framework theorized pattern separation as the orthogonalization of the input patterns [23,24]. The terms “decorrelation” (as described by the Pearson correlation) and “orthogonalization”, however, are not mathematically equivalent; Madar et al. [29] explicitly showed that pairs of spike trains can be uncorrelated without being orthogonal, or can be orthogonal without being uncorrelated. The cosine similarity has thus been used to explicitly determine whether spike trains are truly orthogonalized.

While the cosine similarity measures the angle between two vector representations of spike trains, the scaling factor,

d_{ϕ} (x^{(1)}, x^{(2)}) = \frac{∥ x^{(1)} ∥}{∥ x^{(2)} ∥},

(12)

quantifies their difference in norm. Together,

d_{θ}

and

d_{ϕ}

are, in theory, sufficient to fully describe the similarity between two vectors in Euclidean space [29] since they focus on complementary features, angle and norm.

One implementation of the Hamming distance, or population distance, between two neural patterns, is given by

d_{f} = \frac{HD}{2 N (1 - s)},

where

HD

denotes the number of positions at which the corresponding values are different [37], and s the sparsity.

d_{f}

is used to evaluate pattern separation in several papers [19,20,38]. This implementation of Hamming distance is limited by its lack of consideration of the temporal aspect of a given pattern—a cell is considered active if it fires at least once during stimulus presentation, regardless of time and frequency [38]. We therefore use a modified definition of the Hamming distance [30], and take the average of the absolute difference at each time bin,

d_{η} (x^{(1)}, x^{(2)}) = \frac{1}{N_{neurons} N_{bins}} \sum_{i = 1}^{N_{neurons}} \sum_{j = 1}^{N_{bins}} |x_{i j}^{(1)} - x_{i j}^{(2)}|,

(13)

where permanently inactive spike trains are not removed from the ensemble, and the bins in which each neuron spikes are taken into consideration.

The SPIKE similarity was designed to assess the dissimilarity between two spike trains [39]. We compute the SPIKE similarity between two neural patterns, collected from an ensemble of neurons, as the average across each neuron,

d_{S} (T^{(1)}, T^{(2)}) = \frac{1}{N_{neurons}} \sum_{i = 1}^{N_{neurons}} δ_{S} (T_{i}^{(1)}, T_{i}^{(2)}) .

(14)

The quantity

δ_{S} (T_{i}^{(1)}, T_{i}^{(2)}) = 1 - \frac{1}{τ} \int_{0}^{τ} d t D (t),

is used to assess the dissimilarity between two spike trains [29]. It is computed by integrating, over the observation window

[0, τ]

, the distance between two spike trains from Kreuz et al. [39],

D (t) = \frac{S^{(1)} x_{ISI}^{(2)} + S^{(2)} x_{ISI}^{(1)}}{2 {〈x_{ISI}^{(n)}〉}^{2}},

where

S^{(n)} (t) = \frac{d {t_{P}}^{(n)} x_{F}^{(n)} + d {t_{F}}^{(n)} x_{P}^{(n)}}{x_{ISI}^{(n)}}

is the local weighting for the spike time differences of each spike train.

Figure 1 shows the variables involved in computing the SPIKE similarity. For any given time t, the preceding (denoted with subscript P) and following (denoted with subscript F) spike times are respectively given by

\begin{matrix} {t_{P}}^{(n)} = max_{j} (T_{i j}^{(n)} | T_{i j}^{(n)} \leq t), {t_{F}}^{(n)} = min_{j} (T_{i j}^{(n)} | T_{i j}^{(n)} > t) . \end{matrix}

The instantaneous inter-spike interval (denoted with subscript ISI) for each neuron is

x_{ISI}^{(n)} = {t_{F}}^{(n)} - {t_{P}}^{(n)} .

The intervals to the previous and following spikes for each neuron are denoted as

\begin{matrix} x_{P}^{(n)} = t - {t_{P}}^{(n)}, x_{F}^{(n)} = {t_{F}}^{(n)} - t . \end{matrix}

For pattern 1, the instantaneous absolute differences of the preceding and following spike times are, respectively,

\begin{matrix} d {t_{P}}^{(1)} = min_{j} (|{t_{P}}^{(1)} - T_{i j}^{(2)}|), d {t_{F}}^{(1)} (t) = min_{j} (|{t_{F}}^{(1)} - T_{i j}^{(2)}|) . \end{matrix}

For pattern 2,

d {t_{P}}^{(2)}

and

d {t_{F}}^{(2)}

are defined analogously.

3.3. Information-Theoretic Measures

In addition to common indices of pattern separation found in pattern separation research, we also evaluate three information-theoretic measures using the toolbox provided by Bird et al. [30], who first applied these measures to neural patterns. These indices are used to show that common indices, described in the previous section, could conflate pattern separation with information loss.

The estimated mutual information

{\hat{d}}_{M}

is computed based on the modified Kozachenko–Leonenko estimator, adopted from Houghton [40]. Briefly, patterns

T^{(1)}

and

T^{(2)}

are each divided into N periods of equal size. The pairwise spike train distances between each period of each spike train are computed using the Wasserstein distance

δ

(see [30,41,42]). This produces a set of pairwise distances between periods of

T^{(1)}

and

T^{(2)}

. A biased estimator of the mutual information between

T^{(1)}

and

T^{(2)}

, in terms of an integer smoothing parameter

1 \leq h < N

, is given by

{\hat{d}}_{M} (T^{(1)}, T^{(2)}) = \frac{1}{N} \sum_{i = 1}^{N} log \frac{N |C (T_{i}^{(1)}, T_{i}^{(2)})|}{h^{2}},

(15)

where

|C (T_{i}^{(1)}, T_{i}^{(2)})|

is number of pairs

(T_{j}^{(1)}, T_{j}^{(2)})

such that both

δ (T_{i}^{(1)}, T_{j}^{(1)}) < δ_{h} (T_{i}^{(1)})

and

δ (T_{i}^{(2)}, T_{j}^{(2)}) < δ_{h} (T_{i}^{(2)})

, where

δ_{h} (T_{i})

is the distance of

T_{i}

to its

h^{th}

nearest neighbor amongst all segments in T.

The transfer entropy [43] from one pattern

T^{(1)}

to another

T^{(2)}

is the mutual information between

T^{(2)}

at the current time

τ

and the history of

T^{(1)}

conditioned on the history of

T^{(2)}

:

d_{T} [T^{(1)} : T^{(2)}] = H (T_{τ}^{(2)} | T_{t < τ}^{(2)}) - H (T_{τ}^{(2)} | T_{t < τ}^{(2)}, T_{t < τ}^{(1)}),

(16)

Unlike other measures,

d_{T}

is directional and asymmetrical (whose arguments are thus denoted using square brackets and colon

[:])

. In many circumstances, the transfer entropy requires less data to produce an accurate estimate than the mutual information [44,45].

The relative redundancy reduction is given by

d_{R} (T^{(1)}, T^{(2)}) = [R (T^{(1)}) - R (T^{(2)})] {\hat{d}}_{M} (T^{(1)}, T^{(2)}),

(17)

where

R (T)

is the redundancy of pattern T, quantifying the parts of a signal that may encode the same information. The redundancy R is estimated by adopting from Williams and Beer [46] to apply to pattern T,

\begin{matrix} R (T) = min_{T_{i}} [{\hat{d}}_{M} (T ∖ T_{i}, T_{i})] . \end{matrix}

4. An Evaluation of Existing Pattern Separation Indices

Table 1 shows the indices compared in this study, which are described in greater detail in the previous section. These existing indices are computed on two sample patterns, one drawn from the control distribution and the other from the comparison distribution, which are described in Section 2.3. Recall from Section 3.1 that an index d operates one of two representations of neural patterns: binarized vectors and undiscretized spike times.

For each measurement of an index d, vector

x^{(1)}

is generated from the control distribution, and

x^{(2)}

from the comparison distribution. Recall that each of these distributions specifies a unique set of

{q_{00}, q_{10}, q_{01}, q_{11}}

, which can then be used as event probabilities for a categorical distribution, from which we sample

N_{bins}

times to construct a neural pattern

x

. For indices that operate on undiscretized patterns,

x^{(1)}

and

x^{(2)}

are converted to

T^{(1)}

and

T^{(2)}

by dividing a 0 to 16 ms observation period into

N_{bins}

bins, and adding time stamps that correspond to active bins into ordered sets. Each existing index is measured on

N_{trials} = 10

control and comparison sample pattern pairs, then averaged. Pearson correlation, cosine similarity, scaling factor, and SPIKE similarity are reviewed by Madar et al. [29] and implemented in this study using custom Python scripts. Estimated mutual information, estimated redundancy reduction, and transfer entropy are proposed by Bird et al. [30] and computed using their Matlab toolbox. The values of

N_{bins}

and

N_{trials}

are chosen to be large enough to decrease the standard error of the mean, yet still small enough to be computationally tractable.

4.1. Pattern Separation via Dissimilar Within-Ensemble Correlation

Figure 2 shows the average values of existing indices computed on sampled pattern pairs drawn from control and comparison distributions,

p (x; η_{1}, η_{2}, 0)

and

p (x; η_{1}, η_{2}, d θ)

, described in Section 2.3.1. Note that these plots are not symmetrical around

d θ = 0

since there is an upper limit for how much the two neurons can be correlated, given that

η_{2} \leq η_{1}

. Recall from Equation (6) that the distance between these distributions is proportional to

d θ

. As such, an appropriate index of pattern separation should be sensitive to the increase in dissimilarity between patterns from

p (x; η_{1}, η_{2}, 0)

and

p (x; η_{1}, η_{2}, d θ)

, as the magnitude of

d θ

grows larger.

However, Figure 2 shows that, with the exception of mutual information and transfer entropy, existing indices are insensitive to

d θ

. No index among the Pearson correlation (Figure 2A), cosine similarity (Figure 2B), Hamming distance (Figure 2C), SPIKE similarity (Figure 2D), and scaling factor (Figure 2E) displays changes with

d θ

at all. The lack of change in the scaling factor is most unsurprising since this index is only designed to measure the norm of a vector; in this case, the norm of a pattern is equivalent to the sum of the number of times each neuron fires, which remains fixed throughout the experiment. There does not appear to be a change in transfer entropy (Figure 2G) from systems whose two neurons fire independently, to those with anti-correlated neurons (i.e., when

d θ < 0

); only when the two neurons became strongly correlated (as

d θ

became sufficiently large when

η_{1} = η_{2} = 0.5

) does transfer entropy decrease. The amount of redundancy reduction (Figure 2H) appears to only vary insignificantly around 0, no matter what the value of

d θ

is.

Mutual information (Figure 2F) is the only index that varies with

d θ

, though the trend appears to be non-monotonic. As the systems become more and more anti-correlated (i.e., as

d θ < 0

grew smaller), the mutual information between patterns where the neurons’ firing rates are uneven (i.e., when

| η_{1} - η_{2} |

is large) are decreased but increased for patterns whose neurons fire more evenly. On the other hand, it always seems to decrease as the neurons become more correlated (i.e., when

d θ > 0

grows larger).

4.2. Patterns with Dissimilar Firing Rates in Neuron 2

Figure 3 shows the average values of existing indices computed on sampled pattern pairs drawn from the control and comparison distributions,

p (x; η_{1}, 0.1, 0)

and

p (x; η_{1}, 0.1 + d η_{2}, 0)

, where the firing rate of neuron 1 remains fixed between the two distributions, and the firing rate of neuron 2 is increased (see Section 2.3.2 for more detail). Recall that the distance between these distributions is proportional to

d η_{2}

. As such, an appropriate index of pattern separation should be sensitive to the increase in dissimilarity between patterns from

p (x; η_{1}, 0.1, 0)

and

p (x; η_{1}, 0.1 + d η_{2}, 0)

, as the magnitude of

d η_{2}

grows larger.

Our results complement that of Madar et al. [29] and give another demonstration of how different indices could lead to different interpretations of pattern separation. Pearson correlation (Figure 3A) is less sensitive to neuron 2’s increase in activity if neuron 1 fires sparsely, while the opposite is true for cosine similarity (Figure 3B). The Hamming distance (Figure 3C) appears to increase linearly with

d η_{2}

, while the SPIKE similarity (Figure 3D) and scaling factor (Figure 3E) decrease. Among the non-information-theoretic measures, the scaling factor appears to be the only index to change non-linearly with

d η_{2}

. Unlike their non-information-theoretic counterparts, the indices mutual information (Figure 3F), transfer entropy (Figure 3G), and redundancy reduction (Figure 3H) all exhibit non-monotonic behavior with increasing

d η_{2}

.

5. Discussion

We have conceptualized pattern separation in information-geometric terms, and instantiated a simple, highly controlled example using a two-neuron system with a closed-form probability law. This allowed us to test the behavior of commonly used pattern separation measures under tightly controlled circumstances. Specifically, our simple model can generate separated patterns by either (A) changing which neurons are active (i.e., altering each neuron’s marginal firing rates), or (B) changing the correlation of activity between the two neurons.

Our work builds on previous evaluations of pattern separation indices. Madar et al. [29] evaluated commonly used indices using recordings of input and output spike trains of single hippocampal neurons, and showed that different indices could yield different results and interpretations. Our results further support their observation. We also evaluated different pattern separation indices for specific weaknesses, much like Bird et al. [30] had done, but with a clearer mathematical formulation of pattern separation. In doing so, we show that existing pattern separation measures, including the information-theoretic measures proposed by Bird et al. [30], fail to recognize pattern separation when it is implemented by changing neuronal co-activation rates, rather than individual neurons’ marginal firing rates. This limitation of the existing measures may restrict the study of pattern separation in circuits in which pattern separation occurs via a temporal coding mechanism [22,29].

Our information-geometric formulation of pattern separation can be extended to include more complex neural dynamics. The two-neuron model serves as a simple example that allows us to isolate the effects of neuronal co-activation. Clearly, it omits many detailed behaviors of real neural systems. For example, the distribution

p (x; η_{1}, η_{2}, θ)

describes neural systems in equilibrium (i.e., the probability of neural activity is time independent), and different families of distributions (i.e., a different statistical manifolds) could be employed to introduce the temporal structure of spikes or refractoriness after a spike [47,48]. Similarly, different sources of background noise could potentially be investigated through more complex distributions. Noise is inherently considered in our simulations, where spike trains are sampled from categorical distributions, with distinct sets of parameters (e.g., the probability that a neuron fires) presumed to store and convey neural information. Two distinct spike trains sampled from the same distribution, therefore, carry the same information despite the spike trains not being identical. The indices of pattern separation evaluated in this study are affected by background noise as demonstrated by Bird et al., and are reflected in our results by the non-zero error bars (which show the standard error of the mean) in Figure 2 and Figure 3.

Our results encourage a search for new pattern separation indices. Neuronal co-activation, as implemented in our study, may occur due to a third neuron in a separate layer that projects into a larger layer. This neuronal circuit architecture is consistent with expansion re-coding [9,49]. Additionally, increasing the level of connectivity within a neuronal population, such as dentate gyrus circuits with high degrees of mossy-fiber sprouting, may also lead to increased neuronal co-activation rates. Mossy-fiber sprouting is commonly studied in epilepsy, for which pattern separation deficits have been reported in silico [33]. To sufficiently study pattern separation within these systems, we argue that new measures capable of capturing the effects of neuronal co-activation rates must be applied. Inspiration for these measures might be taken from various existing techniques designed for the in-depth analysis of spatiotemporal patterns, such as generating surrogate data (applicable when an analytical approach is not feasible) [50] or convolving the cross-correlation histogram [51] to detect precise, higher-order temporal correlations between spike trains. In addition to detecting patterns from spike trains, the pattern grouping algorithm [52,53] (built on the pattern detection algorithm from Abeles and Gerstein [16]), is also capable of evaluating these patterns’ statistical significance. These methods may provide more robust ways to capture the various neural features (neuronal co-activation rates, in particular) that are relevant to pattern separation. Another such measure could, of course, be the distance between probability distributions underlying different neural patterns. For high-dimensional systems (i.e., models with many neurons and/or parameters), however, we would require efficient methods by which geodesic distances between distributions can be estimated. This requires efficiently estimating Fisher information matrices G, whose dimension increases exponentially with the number of neurons even in simple models [32]. However, progress in this area may help us better understand the computational nature of pattern separation as well as neural systems that perform pattern separation.

Author Contributions

Conceptualization, A.N. and H.W.; methodology, A.N. and H.W.; software, H.W.; validation, H.W.; formal analysis, H.W.; investigation, H.W.; resources, A.N.; data curation, H.W.; writing—original draft preparation, H.W., S.S. and A.N.; writing—review and editing, All authors; visualization, H.W.; supervision, A.N. and T.T.; project administration, A.N.; funding acquisition, A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Research Nova Scotia grant number RNS-NHIG-2021-1931.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data and code can be made available by request.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Tulving, E.; Markowitsch, H.J. Episodic and declarative memory: Role of the hippocampus. Hippocampus 1998, 8, 198–204. [Google Scholar] [CrossRef]
Scoville, W.B.; Milner, B. Loss of Recent Memory after Bilateral Hippocampal Lesions. J. Neurol. Neurosurg. Psychiatry 1957, 20, 11–21. [Google Scholar] [CrossRef] [PubMed]
McClelland, J.L.; McNaughton, B.L.; O’Reilly, R.C. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 1995, 102, 419–457. [Google Scholar] [CrossRef] [PubMed]
Wigström, H. A model of a neural network with recurrent inhibition. Kybernetik 1974, 16, 103–112. [Google Scholar] [CrossRef]
Rolls, E.T.; Kesner, R.P. A computational theory of hippocampal function, and empirical tests of the theory. Prog. Neurobiol. 2006, 79, 1–48. [Google Scholar] [CrossRef]
Espinoza, C.; Guzman, S.J.; Zhang, X.; Jonas, P. Parvalbumin+ interneurons obey unique connectivity rules and establish a powerful lateral-inhibition microcircuit in dentate gyrus. Nat. Commun. 2018, 9, 4605. [Google Scholar] [CrossRef]
Guzman, S.J.; Schlögl, A.; Espinoza, C.; Zhang, X.; Suter, B.A.; Jonas, P. How connectivity rules and synaptic properties shape the efficacy of pattern separation in the entorhinal cortex–dentate gyrus–CA3 network. Nat. Comput. Sci. 2021, 1, 830–842. [Google Scholar] [CrossRef]
Neunuebel, J.; Knierim, J. CA3 Retrieves Coherent Representations from Degraded Input: Direct Evidence for CA3 Pattern Completion and Dentate Gyrus Pattern Separation. Neuron 2014, 81, 416–427. [Google Scholar] [CrossRef]
Cayco-Gajic, N.A.; Silver, R.A. Re-evaluating Circuit Mechanisms Underlying Pattern Separation. Neuron 2019, 101, 584–602. [Google Scholar] [CrossRef]
Nakahara, S.; Turner, J.A.; Calhoun, V.D.; Lim, K.O.; Mueller, B.; Bustillo, J.R.; O’Leary, D.S.; McEwen, S.; Voyvodic, J.; Belger, A.; et al. Dentate gyrus volume deficit in schizophrenia. Psychol. Med. 2020, 50, 1267–1277. [Google Scholar] [CrossRef]
Ohm, T.G. The dentate gyrus in Alzheimer’s disease. In Progress in Brain Research; Elsevier: Amsterdam, The Netherlands, 2007; Volume 163, pp. 723–740. [Google Scholar] [CrossRef]
Young, C.C.; Stegen, M.; Bernard, R.; Müller, M.; Bischofberger, J.; Veh, R.W.; Haas, C.A.; Wolfart, J. Upregulation of inward rectifier K⁺ (Kir2) channels in dentate gyrus granule cells in temporal lobe epilepsy. J. Physiol. 2009, 587, 4213–4233. [Google Scholar] [CrossRef]
Das, T.; Ivleva, E.I.; Wagner, A.D.; Stark, C.E.; Tamminga, C.A. Loss of pattern separation performance in schizophrenia suggests dentate gyrus dysfunction. Schizophr. Res. 2014, 159, 193–197. [Google Scholar] [CrossRef] [PubMed]
Parizkova, M.; Lerch, O.; Andel, R.; Kalinova, J.; Markova, H.; Vyhnalek, M.; Hort, J.; Laczó, J. Spatial Pattern Separation in Early Alzheimer’s Disease. J. Alzheimer’s Dis. 2020, 76, 121–138. [Google Scholar] [CrossRef] [PubMed]
Madar, A.D.; Pfammatter, J.A.; Bordenave, J.; Plumley, E.I.; Ravi, S.; Cowie, M.; Wallace, E.P.; Hermann, B.P.; Maganti, R.K.; Jones, M.V. Deficits in Behavioral and Neuronal Pattern Separation in Temporal Lobe Epilepsy. J. Neurosci. 2021, 41, 9669–9686. [Google Scholar] [CrossRef]
Abeles, M.; Gerstein, G.L. Detecting spatiotemporal firing patterns among simultaneously recorded single neurons. J. Neurophysiol. 1988, 60, 909–924. [Google Scholar] [CrossRef] [PubMed]
Hebb, D.O. The Organization of Behavior: A Neuropsychological Theory; JohnWiley & Sons: New York, NY, USA, 1949. [Google Scholar]
Buzsáki, G. Neural Syntax: Cell Assemblies, Synapsembles, and Readers. Neuron 2010, 68, 362–385. [Google Scholar] [CrossRef]
Myers, C.E.; Scharfman, H.E. A role for hilar cells in pattern separation in the dentate gyrus: A computational approach. Hippocampus 2009, 19, 321–337. [Google Scholar] [CrossRef]
Chavlis, S.; Poirazi, P. Pattern separation in the hippocampus through the eyes of computational modeling: CHAVLIS and POIRAZI. Synapse 2017, 71, e21972. [Google Scholar] [CrossRef]
Santhakumar, V.; Aradi, I.; Soltesz, I. Role of Mossy Fiber Sprouting and Mossy Cell Loss in Hyperexcitability: A Network Model of the Dentate Gyrus Incorporating Cell Types and Axonal Topography. J. Neurophysiol. 2005, 93, 437–453. [Google Scholar] [CrossRef]
Madar, A.D.; Ewell, L.A.; Jones, M.V. Pattern separation of spiketrains in hippocampal neurons. Sci. Rep. 2019, 9, 5282. [Google Scholar] [CrossRef]
Santoro, A. Reassessing pattern separation in the dentate gyrus. Front. Behav. Neurosci. 2013, 7, 96. [Google Scholar] [CrossRef] [PubMed]
McNaughton, B.L.; Nadel, L. Hebb-Marr networks and the neurobiological representation of action in space. In Neuroscience and Connectionist Theory; Lawrence Erlbaum Associates, Inc.: New York, NY, USA, 1990. [Google Scholar]
Wigström, H. Spatial propagation of associations in a cortex-like neural network model: Spatial Propagation of Associations. J. Neurosci. Res. 1977, 3, 301–319. [Google Scholar] [CrossRef] [PubMed]
Rolls, E.T. The mechanisms for pattern completion and pattern separation in the hippocampus. Front. Syst. Neurosci. 2013, 7, 74. [Google Scholar] [CrossRef]
Yassa, M.A.; Stark, C.E. Pattern separation in the hippocampus. Trends Neurosci. 2011, 34, 515–525. [Google Scholar] [CrossRef]
Vineyard, C.M.; Verzi, S.J.; James, C.D.; Aimone, J.B. Quantifying neural information content: A case study of the impact of hippocampal adult neurogenesis. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 5181–5188. [Google Scholar] [CrossRef]
Madar, A.D.; Ewell, L.A.; Jones, M.V. Temporal pattern separation in hippocampal neurons through multiplexed neural codes. PLoS Comput. Biol. 2019, 15, e1006932. [Google Scholar] [CrossRef] [PubMed]
Bird, A.D.; Cuntz, H.; Jedlicka, P. Robust and consistent measures of pattern separation based on information theory and demonstrated in the dentate gyrus. PLoS Comput. Biol. 2024, 20, e1010706. [Google Scholar] [CrossRef] [PubMed]
Harris, K.D.; Shepherd, G.M.G. The neocortical circuit: Themes and variations. Nat. Neurosci. 2015, 18, 170–181. [Google Scholar] [CrossRef] [PubMed]
Nakahara, H.; Amari, S.i. Information-Geometric Measure for Neural Spikes. Neural Comput. 2002, 14, 2269–2316. [Google Scholar] [CrossRef]
Yim, M.Y.; Hanuschkin, A.; Wolfart, J. Intrinsic rescaling of granule cells restores pattern separation ability of a dentate gyrus network model during epileptic hyperexcitability: Intrinsic rescaling and neuronal network pattern separation. Hippocampus 2015, 25, 297–308. [Google Scholar] [CrossRef]
Leutgeb, J.K.; Leutgeb, S.; Moser, M.B.; Moser, E.I. Pattern Separation in the Dentate Gyrus and CA3 of the Hippocampus. Science 2007, 315, 961–966. [Google Scholar] [CrossRef]
Senzai, Y.; Buzsáki, G. Physiological Properties and Behavioral Correlates of Hippocampal Granule Cells and Mossy Cells. Neuron 2017, 93, 691–704.e5. [Google Scholar] [CrossRef] [PubMed]
Danielson, N.B.; Turi, G.F.; Ladow, M.; Chavlis, S.; Petrantonakis, P.C.; Poirazi, P.; Losonczy, A. In Vivo Imaging of Dentate Gyrus Mossy Cells in Behaving Mice. Neuron 2017, 93, 552–559.e4. [Google Scholar] [CrossRef] [PubMed]
Hamming, R.W. Error Detecting and Error Correcting Codes. Bell Syst. Tech. J. 1950, 29, 147–160. [Google Scholar] [CrossRef]
Chavlis, S.; Petrantonakis, P.C.; Poirazi, P. Dendrites of dentate gyrus granule cells contribute to pattern separation by controlling sparsity: Dendritic role in pattern separation. Hippocampus 2017, 27, 89–110. [Google Scholar] [CrossRef]
Kreuz, T.; Chicharro, D.; Houghton, C.; Andrzejak, R.G.; Mormann, F. Monitoring spike train synchrony. J. Neurophysiol. 2013, 109, 1457–1472. [Google Scholar] [CrossRef]
Houghton, C. Calculating the Mutual Information between Two Spike Trains. Neural Comput. 2019, 31, 330–343. [Google Scholar] [CrossRef]
Dobrushin, R.L. Prescribing a System of Random Variables by Conditional Distributions. Theory Probab. Its Appl. 1970, 15, 458–486. [Google Scholar] [CrossRef]
Sihn, D.; Kim, S.P. A Spike Train Distance Robust to Firing Rate Changes Based on the Earth Mover’s Distance. Front. Comput. Neurosci. 2019, 13, 82. [Google Scholar] [CrossRef]
Schreiber, T. Measuring Information Transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef]
Treves, A.; Panzeri, S. The Upward Bias in Measures of Information Derived from Limited Data Samples. Neural Comput. 1995, 7, 399–407. [Google Scholar] [CrossRef]
Conrad, M.; Jolivet, R.B. Comparative performance of mutual information and transfer entropy for analyzing the balance of information flow and energy consumption at synapses. Neuroscience, 2020; preprint. [Google Scholar] [CrossRef]
Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
Amari, S.i. Information Geometry as Applied to Neural Spike Data. In Encyclopedia of Computational Neuroscience; Jaeger, D., Jung, R., Eds.; Springer: New York, NY, USA, 2022; pp. 1669–1672. [Google Scholar] [CrossRef]
Amari, S.-i. Information Geometry of Multiple Spike Trains. In Analysis of Parallel Spike Trains; Grün, S., Rotter, S., Eds.; Springer: Boston, MA, USA, 2010; pp. 221–252. [Google Scholar] [CrossRef]
O’Reilly, R.C.; McClelland, J.L. Hippocampal conjunctive encoding, storage, and recall: Avoiding a trade-off. Hippocampus 1994, 4, 661–682. [Google Scholar] [CrossRef] [PubMed]
Stella, A.; Bouss, P.; Palm, G.; Grün, S. Comparing Surrogates to Evaluate Precisely Timed Higher-Order Spike Correlations. Eneuro 2022, 9, ENEURO.0505-21.2022. [Google Scholar] [CrossRef]
Stark, E.; Abeles, M. Unbiased estimation of precise temporal correlations between spike trains. J. Neurosci. Methods 2009, 179, 90–100. [Google Scholar] [CrossRef]
Tetko, I.V.; Villa, A.E. A pattern grouping algorithm for analysis of spatiotemporal patterns in neuronal spike trains. 1. Detection of repeated patterns. J. Neurosci. Methods 2001, 105, 1–14. [Google Scholar] [CrossRef]
Tetko, I.V.; Villa, A.E. A pattern grouping algorithm for analysis of spatiotemporal patterns in neuronal spike trains. 2. Application to simultaneous single unit recordings. J. Neurosci. Methods 2001, 105, 15–24. [Google Scholar] [CrossRef]

Figure 1. Variables used to compute the SPIKE similarity. The green and blue solid bars respectively represent the spike times of spike trains 1 and 2, relative to some observation time t, shown by the red line.

Figure 2. Measurements of existing pattern separation indices versus change in the rate of coincidental firing. (A) Pearson correlation, (B) cosine similarity, (C) Hamming distance, (D) SPIKE similarity, (E) scaling factor, (F) mutual information, (G) transfer entropy, and (H) redundancy reduction are applied to spike trains generated from

p (x; η_{1}, η_{2}, 0)

and

p (x; η_{1}, η_{2}, d θ)

, over different values of

η_{1}

and

η_{2}

shown by different colored plots (with the constraint

η_{1} + η_{2} = 1

), and averaged over 10 samples. The error bars show the standard error of the mean.

Figure 2. Measurements of existing pattern separation indices versus change in the rate of coincidental firing. (A) Pearson correlation, (B) cosine similarity, (C) Hamming distance, (D) SPIKE similarity, (E) scaling factor, (F) mutual information, (G) transfer entropy, and (H) redundancy reduction are applied to spike trains generated from

p (x; η_{1}, η_{2}, 0)

and

p (x; η_{1}, η_{2}, d θ)

, over different values of

η_{1}

and

η_{2}

shown by different colored plots (with the constraint

η_{1} + η_{2} = 1

), and averaged over 10 samples. The error bars show the standard error of the mean.

Figure 3. Measurements of existing pattern separation indices versus change in the rate of coincidental firing. (A) Pearson correlation, (B) cosine similarity, (C) Hamming distance, (D) SPIKE similarity, (E) scaling factor, (F) mutual information, (G) transfer entropy, and (H) redundancy reduction are applied to spike trains generated from

p (x; η_{1}, 0.1, 0)

and

p (x; 0.1, 0.1 + d η_{2}, 0)

, over different values of

η_{1}

shown by different colored plots, and averaged over 10 samples. The error bars show the standard error of the mean.

Figure 3. Measurements of existing pattern separation indices versus change in the rate of coincidental firing. (A) Pearson correlation, (B) cosine similarity, (C) Hamming distance, (D) SPIKE similarity, (E) scaling factor, (F) mutual information, (G) transfer entropy, and (H) redundancy reduction are applied to spike trains generated from

p (x; η_{1}, 0.1, 0)

and

p (x; 0.1, 0.1 + d η_{2}, 0)

, over different values of

η_{1}

shown by different colored plots, and averaged over 10 samples. The error bars show the standard error of the mean.

Table 1. Existing indices of pattern separation evaluated in this study.

Symbol	Description	Equation
$d_{ρ} (x^{(1)}, x^{(2)})$	Pearson correlation	(10)
$d_{θ} (x^{(1)}, x^{(2)})$	Cosine similarity	(11)
$d_{ϕ} (x^{(1)}, x^{(2)})$	Scaling factor	(12)
$d_{η} (x^{(1)}, x^{(2)})$	Hamming distance	(13)
$d_{S} (T^{(1)}, T^{(2)})$	SPIKE similarity	(14)
${\hat{d}}_{M} (T^{(1)}, T^{(2)})$	Estimated mutual information	(15)
$d_{T} [T^{(1)} : T^{(2)}]$	Transfer entropy	(16)
$d_{R} (T^{(1)}, T^{(2)})$	Relative redundancy reduction	(17)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Singh, S.; Trappenberg, T.; Nunes, A. An Information-Geometric Formulation of Pattern Separation and Evaluation of Existing Indices. Entropy 2024, 26, 737. https://doi.org/10.3390/e26090737

AMA Style

Wang H, Singh S, Trappenberg T, Nunes A. An Information-Geometric Formulation of Pattern Separation and Evaluation of Existing Indices. Entropy. 2024; 26(9):737. https://doi.org/10.3390/e26090737

Chicago/Turabian Style

Wang, Harvey, Selena Singh, Thomas Trappenberg, and Abraham Nunes. 2024. "An Information-Geometric Formulation of Pattern Separation and Evaluation of Existing Indices" Entropy 26, no. 9: 737. https://doi.org/10.3390/e26090737

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Information-Geometric Formulation of Pattern Separation and Evaluation of Existing Indices

Abstract

1. Introduction

1.1. Pattern Separation in Computational Neuroscience

1.2. There Exists No Single Definition for Pattern Separation

1.3. An Information-Geometric Formulation of Pattern Separation

1.4. An Evaluation of Existing Indices of Pattern Separation

2. An Information-Geometric Formulation of Pattern Separation

2.1. Pattern Separator Described by a Statistical Manifold

2.2. A Two-Neuron Manifold and Its Orthogonal Coordinates

2.3. Generating Diverging Patterns

2.3.1. Patterns with Dissimilar Neuronal Coincident Firing Rates

2.3.2. Patterns with Dissimilar Neuronal Firing Rates

3. Existing Indices

3.1. Pattern Representation

3.1.1. Non-Discretized Patterns

3.1.2. Discretized Patterns

3.2. Common Indices

3.3. Information-Theoretic Measures

4. An Evaluation of Existing Pattern Separation Indices

4.1. Pattern Separation via Dissimilar Within-Ensemble Correlation

4.2. Patterns with Dissimilar Firing Rates in Neuron 2

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI