Efficient Context-Preserving Encoding and Decoding of Compositional Structures Using Sparse Binary Representations

Malits, Roman; Mendelson, Avi

doi:10.3390/info16050343

Open AccessArticle

Efficient Context-Preserving Encoding and Decoding of Compositional Structures Using Sparse Binary Representations

by

Roman Malits

^*

and

Avi Mendelson

Department of Computer Science, Technion Institute of Technology, Haifa 3200003, Israel

^*

Author to whom correspondence should be addressed.

Information 2025, 16(5), 343; https://doi.org/10.3390/info16050343

Submission received: 8 March 2025 / Revised: 14 April 2025 / Accepted: 18 April 2025 / Published: 24 April 2025

(This article belongs to the Special Issue Optimization Algorithms and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Despite their unprecedented success, artificial neural networks suffer extreme opacity and weakness in learning general knowledge from limited experience. Some argue that the key to overcoming those limitations in artificial neural networks is efficiently combining continuity with compositionality principles. While it is unknown how the brain encodes and decodes information in a way that enables both rapid responses and complex processing, there is evidence that the neocortex employs sparse distributed representations for this task. This is an active area of research. This work deals with one of the challenges in this field related to encoding and decoding nested compositional structures, which are essential for representing complex real-world concepts. One of the algorithms in this field is called context-dependent thinning (CDT). A distinguishing feature of CDT relative to other methods is that the CDT-encoded vector remains similar to each component input and combinations of similar inputs. In this work, we propose a novel encoding method termed CPSE, based on CDT ideas. In addition, we propose a novel decoding method termed CPSD, based on triadic memory. The proposed algorithms extend CDT by allowing both encoding and decoding of information, including the composition order. In addition, the proposed algorithms allow to optimize the amount of compute and memory needed to achieve the desired encoding/decoding performance.

Keywords:

encoding; decoding; compositionality; compositional structures; nested structures; hierarchical structures; context preserving; hyperdimensional computing; vector symbolic architectures; context-dependent thinning; sparse distributed; CDT

1. Introduction

AI based on deep learning is one of the landmark achievements of the 21st century, a turning point likened to the industrial revolution of the 18th century [1]. The field is growing exponentially, and the diversity of network architectures is enormous. In general, a neural network implements a statistical model with many parameters, optimized based on the statistics of the training data. During training, if presented with enough examples spanning the statistical properties for the desired task, the model will ideally converge and produce the correct output when given an input, not only for the inputs seen during training but also for test data and other novel inputs.

One important property that enables deep learning success is the continuity principle. In deep learning, information encoding and processing are performed using real numbers that vary continuously. The network’s weights, even those far from input and deeply “buried” within the network, can be modified gradually during learning to improve the model’s statistical inference performance smoothly.

Another essential principle is compositionality [2]. This ability potentially enables us to understand an infinite number of novel situations by encoding the situation as a novel composition of familiar, simpler parts and composing together our understanding of those parts. Two of the most successful and widely used types of artificial deep learning networks are Convolutional Neural Networks (CNNs) [3] and transformers [4]. Those architectures use compositionality to a certain degree. Inspired by the mammalian visual systems, CNN architecture processes an input image by analyzing image patches using multiple layers of feature detectors. Within a given layer, each feature detector analyzes all patches. Importantly, in addition to continuity, CNN possesses some level of compositionality using spatial structure. Transformers take an entire sequence of symbols as input and produce a sequence of symbols as output. The model calculates a weighted sum of information from vectors encoding other symbols to compute the activation vector for a given symbol at a given layer. It was shown [5] that this is equivalent to a graph with weighted links between symbols: links along which data flow. Some researchers claim that CNN and transformers derive much of their power from their additional compositional structure [6]. The compositional structure in those networks is not physical but abstract; thus, the use of structure information is limited. For example, in a transformer, the data flow in a particular link cannot be accessed by the rest of the network—once a given layer’s data-flow graph has been used, the graph disappears.

Despite their unprecedented success, deep learning still suffers from extreme opacity and weakness in learning general knowledge from limited experience. Some researchers argue [7] that the key to overcoming those limitations in artificial neural networks is to combine continuity with compositionality principles efficiently.

The brain continually processes multiple streams of sensory information. A central question in this context is how the brain implements compositionality and encodes and decodes information to enable both rapid responses and complex processing. Answering this question might be the key to unlocking the power of human intelligence in artificial neural networks. Empirical data suggest that, at any given time, only a small proportion of neurons across different cortical regions are active. This observation is consistent with energy constraints in the cortex, which limit average sustainable neural activity to approximately 1–3% of the total neuronal population [8]. Many researchers have proposed that the cortex employs sparse distributed representations (SDRs), where a subset of vector components represents each entity, and each component can be reused across multiple entities [9,10].

This work proposes practical algorithms for the efficient encoding and decoding of hierarchical nested compositional structures, which are essential for representing complex real-world concepts, objects, and scenarios. We focus on a family of algorithms that operate on data encoded using sparse distributed representations.

Sparse distributed representations form a prominent area of study in theoretical neuroscience, machine learning, and other computational frameworks, often referred to as Hyperdimensional Computing (HDC) [11] and Vector Symbolic Architectures (VSAs) [12]. A recent survey [13] describes many successful, practical, real-world applications with this type of algorithm. Some examples include learning robotic tasks such as viewpoint-invariant object recognition, place recognition, and simple reactive behaviors [14]. The multiclass learning and classification of various biosignals [15] demonstrated that it is possible to map natural language problems to symbolic structures to allow the capture of structural information for deep learning language models. It was also shown that this type of data representation is well-suited for neural implementations on low-power hardware where sparse activity is desirable [16]. For example, it can play the role of an abstraction layer for emerging computing hardware paradigms such as neuromorphic computing [17,18].

To be useful in practice for real-world applications, algorithms on distributed representations should be “structure-sensitive” [19]. Algorithms that do not change the information representation dimensionality are particularly interesting because they allow one to apply the same operations on objects/concepts of different complexity. Numerous methods were proposed for encoding complex structures using distributed representations [13,19,20,21,22,23,24,25,26,27,28]. One of the proposed algorithms in this domain is called context-dependent thinning (CDT) [29]. A distinguishing feature of CDT is that it preserves both “unstructured” similarity and structured overlap—meaning the CDT-bound vector remains similar to each SDR input and combinations of similar inputs. As a result, it is possible to detect if a given item is part of a composite by checking the overlap with the composite’s SDR vector without decoding the whole structure. The CDT method has several weaknesses: it is slower than some of the other methods, it is commutative, meaning the order of the composition is not encoded, and there is no straightforward algebraic way to unbind (decode) the original components.

This work introduces a novel encoding method for compositional structures called context-preserving SDR encoding (CPSE). Our algorithm builds on CDT ideas and extends them. In particular, CPSE allows for encoding information about the order of composition and optimizing the speed of convergence to the desired output SDR sparsity.

In addition, we propose a novel decoding algorithm for compositional SDRs encoded using the CPSE method, termed context-preserving SDR decoding (CPSD). This algorithm allows a pure compute-based decoding method applicable in a simple setting when base-level components are known. In the general case, when the number of possible base-level components is huge, CPSD allows decoding by using triadic memory (TM) [30].

We analyze CPSE and CPSD performance and show the trade-off between optimum TM memory size, SDR size, and the number of elements that can be encoded and decoded while maintaining a low decoding error probability.

For example, consider components ‘a’, ‘b’, and ‘c’. Using CPSE, one can form components representing ‘ab’, ‘ac’, ‘bc’, and ‘abc’, all of which retain the original size and sparsity of the base-level components. With CPSD, the base-level components and their compositional order can be retrieved. This process can be extended recursively to encode/decode many layers of structure information.

The contributions of this work are as follows:

We present a novel algorithm for encoding compositional SDR structures, termed context-preserving SDR encoding (CPSE). Our method builds on the ideas of the CDT [29] method while addressing some of its weaknesses.
We propose a novel decoding algorithm called context-preserving SDR decoding (CPSD) based on triadic memory [30].
CPSE encoding improves upon CDT by incorporating information about the sequence of base-level components within the composite SDR.
CPSE encoding improves upon CDT by allowing us to optimize the convergence speed to achieve the desired output SDR sparsity.
We show that CPSE encoding requires a near-constant amount of computation for order-invariant structures containing more than four components, irrespective of the base-level components.
We show that CPSD decoding allows for accurate retrieval of the identity and order of base components from a composite SDR.
We show the trade-off between SDR size and the number of elements that can be encoded while maintaining a low decoding error rate.
We detail a pure compute-based decoding method applicable in a simple setting when base-level components are known and a TM-based method for the general case when the number of possible base-level components is huge.
We show the trade-off and optimum TM memory size needed to achieve acceptable decoding bit error probability.

2. Materials and Methods

2.1. Background

2.1.1. Related Works

Various models and methods have been proposed to encode and decode symbolic representations of complex data structures [19]. Tensor Product Representations (TPRs) [24] define data structures using the concept of fillers and roles. Fillers are the different entities of the structure. Roles represent the position of the fillers in the structure. Both roles and fillers are encoded as vectors in some n-dimensional space. The binding is performed by taking the outer product of vectors representing roles and fillers, resulting in a matrix. If the role vectors are linearly independent or orthogonal, a component can be perfectly decoded by projecting the bound tensor onto the subspace for that role. Higher-order tensors can bind any number of vectors, but the dimension grows multiplicatively, which is a significant drawback of the method. Therefore, there is a need to compress the resulting high-order tensors into fixed-size vectors, which is not trivial. TPR (compressed into a vector) tends to produce a vector orthogonal or unrelated to its factors, making the direct similarity-based recall of parts difficult.

Holographic Reduced Representations (HRRs) [21] allow encoding complex structured information into high-dimensional fixed-size vectors of Gaussian-distributed, real-valued elements. Superposition is performed via element-wise summation, and binding is carried out by circular convolution. Binding does not increase dimensionality; it “mixes” the two vectors into one vector of the same length and typically involves scaling the vectors to unit length. HRRs are called “holographic” because information about a component is distributed across all components of the bound vector. Decoding is performed by a circular correlation operation that recovers an approximation of the input. Results are nearly orthogonal to inputs when vectors are random. While bound vectors of similar inputs are similar, they are dissimilar to inputs.

Fourier Holographic Reduced Representations (FHRRs) [25] implement the same algebra as HRRs but in the frequency domain: each hypervector is like a list of phasors, and convolution in the HRR’s real domain corresponds to the multiplication of phasors in FHRRs. Binding is performed by element-wise complex multiplication. Superposition is the element-wise addition of these complex vectors, followed by optional normalization. Decoding is achieved by element-wise multiplying with the complex conjugate of one of the factors. This method allows more efficient operations, but it does not solve the fundamental challenges of HRRs.

Binary Splatter Codes (BSCs) [23] use binary hypervectors of a fixed length (often with 50% of 1 s) to represent items. The binding operation is component-wise XOR. Superposition is typically performed by a bit-wise majority vote: if adding multiple binary vectors, the sum at each bit position is thresholded to 0 or 1, depending on whether more inputs had 0 or 1 in that position. This majority operation yields a binary result with approximately equal numbers of 1 s and 0 s. If there is a tie, a tiebreak rule is used. The BSC is mathematically equivalent to a simplified form of HRRs when restricted to a binary phase angle. Using the XOR as the binding operation produces an output with independent random bits when combining two vectors. This yields an almost orthogonal code for the pair. This type of encoding does not preserve density. A naive XOR of two sparse vectors can double the vector density, if the vectors have disjointed 1 s.

The Sparse Block Code (SBC) [27] is a model designed to marry the benefits of sparse representations with rigorous binding operations. In this method, a hypervector is structured into fixed-size blocks of bits. Each block can be seen as a “segment”, and a vector might have one bit per block in the most constrained form. The binding is implemented as a segment-wise circular shift: essentially, one vector’s bits are used to permute or rotate the other vector’s blocks. This binding can be seen as a permutation-based analog of convolution. This method has the advantage that the binding operation is invertible, and the binding is non-commutative, meaning that order is encoded without extra mechanisms. The downside is that the SBC is not keeping structural similarity. There is no direct similarity between a composite and its components.

The Vector-Derived Transformation Binding (VTB) [28] method operates on dense, real vectors. Binding is implemented using a linear transformation from one vector applied to another. This requires constructing and using large matrices derived from vectors. The method possesses a fully invertible linear operation with a well-defined algebra, and it is non-commutative by design. Similar to previously described methods, the downside is that VTB does not keep structural similarity.

The Binary Sparse Distributed Code (BDSC) [26] is a vector symbolic architecture developed for representing complex structured data with sparse binary vectors. For binding multiple items, the procedure involves using the CDT [29] procedure. There is no fundamental difference between the BSDC and CDT in terms of binding/unbinding. The BSDC can be looked at as a CDT-based approach.

The CDT method [29] allows for encoding nested hierarchical structures using sparse binary hypervectors, ensuring that the bound vector has the same sparsity (percentage of 1 s) as the input vectors. CDT works by a sequence of operations: superimposing (OR-ing) the sparse vectors, then using random permutations and intersections (AND) to “thin out” the superposition, keeping only a subset of active bits that depend on each input’s pattern. Additive or subtractive operations can achieve binding (see the detailed algorithm in Section 2.3). The outcome of CDR encoding is a sparse binary vector representing the bound structure.

A distinguishing feature from other methods is that CDT preserves both “unstructured” similarity and structured overlap, meaning the CDT-bound vector remains similar to each input and combinations of similar inputs. As a result, it is possible to detect if a given item is part of a composite by checking the overlap with the composite’s vector. Using sparse binary representations (SDRs) makes CDT well-suited for neural implementations or low-power hardware where sparse activity is desirable [16]. In addition, a recent work [31] performed a comprehensive performance evaluation and comparison of various VSA algorithms on different tasks. For example, they evaluated how efficiently different algorithms bundle information into one representation. They have shown that sparse binary representations perform better than dense binary vectors.

Their conclusion was that methods that use sparse binary vectors (like CDT) have several advantages; for example, they require a relatively low number of dimensions for binding and later restoring information with low probability of error, thus facilitating relatively low memory consumption in real applications.

The CDT method has several weaknesses. The procedure is more complex than a simple XOR or convolution; it involves multiple random permutations and bit-wise operations, which might be relatively slower. While CDT preserves similarity and sparsity, the trade-off is that the binding might not be as uniquely random as XOR/conv, which could cause false positives if many similar composites reside in memory. In general, CDT does not have a straightforward algebraic unbinding. One cannot directly “divide” the thinned vector by enabling one component to reach the other. Also, because CDT is commutative, it does not inherently distinguish roles or the order of operands.

This work proposes a novel encoding method, CPSE, based on CDT ideas. In addition, we propose a novel decoding method, CPSD, based on triadic memory (TM) [30]. The proposed algorithms are designed to allow the encoding and decoding of information about the order of composition while optimizing the amount of computing and memory needed to achieve the desired encoding (binding) and decoding (unbinding) performance.

2.1.2. Sparse Binary Representation (SDR)

Sparse binary representations (SDRs) are large binary vectors with small sparsity, meaning that only a small percentage of bits are ‘1’. The work presented in [32] analyzes and derives basic mathematical properties for binary SDRs, such as representation capacity, scaling laws, error bounds, probability of mismatches, robustness to noise, subsampling, and unions. Throughout the remainder of this paper, the term “SDR” refers specifically to a binary sparse distributed representation.

This analysis shows that binary SDRs have many promising properties for distributed information representation. Here, we describe SDR properties that are important for our encoding method. In particular, we describe the ability to reliably recognize SDR patterns by matching a small subset of SDR active bits, even in the presence of noise.

We define cardinality as the number of ‘1’ bits in vector x of length n and sparsity s:

w_{x} = n * s

(1)

Given a vector of fixed size n and cardinality

w

, the number of unique encodings is as follows:

(\frac{n}{w}) = \frac{n!}{w! (n - w)!}

(2)

For example, the SDR of size n = 2048, sparsity of s = 2%, and cardinality

w_{x} = 40

support a vast number of

~ 2.37 \times 10^{84}

encodings.

The overlap score between SDR x and y is the number of bits that are ‘1’ in the same locations in both SDR vectors, and it is computed as a dot product. For noise robustness, we will treat two SDRs as matching if their overlap score is above a threshold

θ

.

We define the overlap set

Ω_{x} (n, w, b)

as a set of vectors of size n with cardinality

w_{x}

that have exactly b bits of overlap with x. The number of vectors in the overlap set can be computed by the following:

|Ω_{x} (n, w, b)| = (\frac{w_{x}}{b}) \times (\frac{n - w_{x}}{w - b})

(3)

Given SDRs x and y with size n and cardinality

w_{x}

, the probability of a false positive match, given threshold

θ

, is as follows:

P_{F P} (n, w, θ) = \frac{\sum_{b = θ}^{w} |Ω_{x} (n, w, b)|}{(\binom{n}{w})}

(4)

This equation can be used to find the required SDR size and sparsity to allow the desired SDRs false positive decoding performance given the amount of noise/subsampling in a particular application. For example, for SDRs of size n = 2048 and sparsity s = 2%, the probability of a false positive partial match of

θ = 50 %

with a random SDR is less than

2 \times 10^{- 26}

.

2.2. Problem Statement

Let

X = \{x_{1}, \dots, x_{M}\}

be a set of

M

random component vectors encoded as SDRs of size n and sparsity s, where vector

x_{k}

is the SDR for component k. Basic components are encoded to SDRs by any previously proposed method, for example, from [33]. The components might represent any type of information, such as points from temporal patterns, values of M different features, etc.

We need an encoding method with the following properties:

Given X, the encoder will allow the creation of composite SDR Y representation using M component vectors from X as input.
Encoder output SDR will have size n and sparsity similar to base-level components.
The encoding method should be deterministic. The same input should result in the same output.
As similarity between X inputs increase, so the similarity (overlap score) between encoder output SDRs increases.
The encoder output SDR will contain information about the identity of base-level components and order of composition in a way that allows decoding.

Given the encoder output SDR Y, we need a decoding method with the following properties:

It will be possible to find the number of base components M in Y.
The decoder will allow us to find the identity of all M base component vectors and the order of their composition in Y with a low probability of error.
Given $x_{j}$ , it will be possible to find if $x_{j}$ is part of Y and its position in the composition.

2.3. CDT Encoding Method

Given a set

X = \{x_{1}, \dots, x_{M}\}

of

M

random component vectors, CDT allows the creation of compositional encoding SDR Y by uniformly selecting bits from each component SDR into the compositional SDR Y, using one of the following procedures [29].

2.3.1. Additive CDT Method

To create compositional encoding Y, perform the following steps:

$\hat{Y} = ⋃_{i = 1 : M} x_{i}$ (Union of M SDRs in X);
$Y = 0$ ;
Create a random permutation matrix P;
$\hat{X} = \hat{Y}$ ;
$\hat{X} = P \hat{X}$ (apply random permutation);
$Y = Y | (\hat{Y} & \hat{X})$ ;
$repeat steps 5, 6 until Y sparsity > target$ .
Toy example to explain the algorithm steps:
x1 = [1 1 0 0 0 0 0 0 0 0].
x2 = [0 0 0 0 0 0 1 1 0 0].
- $\hat{Y}$ = [1 1 0 0 0 0 1 1 0 0].
- $Y = 0$
- P = Shift 1 position to the right.
- $\hat{X}$ = [1 1 0 0 0 0 1 1 0 0].
- $\hat{X}$ = [0 1 1 0 0 0 0 1 1 0].
- $Y$ = 0 | ([1 1 0 0 0 0 1 1 0 0] & [0 1 1 0 0 0 0 1 1 0]) = [0 1 0 0 0 0 0 1 0 0].
- Completed after 1 iteration.

2.3.2. Additive CDT Performance Analysis

In step 1, the additive CDT encoding procedure creates a vector

\hat{Y}

as union of

M

random SDR components. For sparsity (s), the probability (p) of a bit in a given index to be ‘1’ is

p = \frac{s}{100} .

The probability of a bit in a given index to be ‘1’ for the bit-wise OR of M SDRs of equal sparsity is given by Equation (5):

p (M, s) ≅ 1 - {(1 - \frac{s}{100})}^{M} = 1 - {(1 - p)}^{M} \overset{M < 8}{\to} M * p

(5)

Figure 1 shows the mean and std for the sparsity of the union of M random SDRs. We also show the expected sparsity according to Equation (5) and the maximum sparsity limit (M*s).

As expected, when the number of component vectors (M) grows, the combined sparsity is slightly lower than M*s because there are more active bit collisions. Please note that in all the experiments, we show the mean and std after running them 500 times for M in the range from two to ten.

We can compute the expected number (K) of times that steps 5–6 need to be repeated for additive CDT to reach the sparsity of s. Unrolling steps 5–6 gives the following formula:

Y = \hat{Y} & (⋃_{i = 1}^{K} \hat{Y} P^{i})

. The vector

\hat{Y} P^{i}

is vector

\hat{Y}

permuted using the permutation matrix

P

for i = 1…K consecutive times. We find K by solving Equation (6):

p = p (Y) = p (\hat{Y)} * p (⋃_{i = 1}^{k} \hat{Y} P^{i})

(6)

By using Equation (5), we can write the following:

p ≅ p (M, s) * p (K, p (M, s)) = p (M, s) * (1 - {(1 - p (M, s))}^{K})

(7)

Equation (8) shows the solution for K:

K ≅ \frac{\log (1 - \frac{p}{p (M, s)})}{\log (1 - p (M, s))} \overset{p (M, s) \approx M * p}{\to} \frac{\log (1 - \frac{1}{M})}{\log (1 - M * p)}

(8)

Figure 2 shows the simulation results for the number of repetitions (K) of steps 5–6 needed for the convergence of additive CDT to target the sparsity as a function of M. The dotted red line shows the expected number K according to Equation (8).

We can see that Equation (8) gives a good estimate of the expected number of iterations. For a composition of four elements or more, on average, less than five iterations of the algorithm are needed. Figure 3 shows the final sparsity of Y at the end of the additive CDT procedure when the target sparsity is 2%.

The result of additive CDT is a compositional SDR with a sparsity between 2% and 3.8%. We can see that additive CDT is fast, but the final SDR sparsity is not preserved, and there is a significant overshoot.

2.3.3. Substructive CDT Method

To create compositional encoding Y, perform the following steps:

$\hat{Y} = ⋃_{i = 1 : M} x_{i}$ (Union of M SDRs in X);
$Y = \hat{Y}$ ;
Create a random permutation matrix P;
$\hat{X} = \hat{Y}$ ;
$\hat{X} = P \hat{X}$ (apply random permutation);
$Y = Y & (~ \hat{X})$ ;
$repeat steps 5, 6 until Y sparsity < target$ .
Toy example to explain the algorithm steps:
x1 = [1 1 0 0 0 0 0 0 0 0].
x2 = [0 0 0 0 0 0 1 1 0 0].
- $\hat{Y}$ = [1 1 0 0 0 0 1 1 0 0].
- $Y = [1 1 0 0 0 0 1 1 0 0] .$
- P = Shift by 1 position to the right.
- $\hat{X}$ = [1 1 0 0 0 0 1 1 0 0].
- $\hat{X}$ = [0 1 1 0 0 0 0 1 1 0].
- $Y$ = [1 1 0 0 0 0 1 1 0 0] & [1 0 0 1 1 1 1 0 0 1] = [1 0 0 0 0 0 1 0 0 0].
- Completed after 1 iteration.

2.3.4. Substructive CDT Performance Analysis

We can compute the expected number of times (K) that steps 5–6 need to be repeated for substructive CDT to reach a sparsity of s. Unrolling steps 5–6 gives the following formula:

Y = Y & (⋂_{i}^{K} ~ \hat{Y} P^{i})

.

\hat{Y} P^{i}

is vector

\hat{Y}

permuted using permutation matrix

P

for i = 1…K times. We negate all bits and perform a bit-wise operation AND between the vectors. We find K by solving Equation (9):

p = p (Y) = p (\hat{Y)} * p (⋂_{i}^{K} ~ \hat{Y} P^{i})

(9)

By using Equation (5), we can write the following:

p ≅ p (M, s) * {(1 - p (M, s))}^{K}

(10)

Equation (11) shows the solution for K:

K ≅ \frac{\log (\frac{p}{p (M, s)})}{\log (1 - p (M, s))} \overset{p (M, s) \approx M * p}{\to} \frac{\log (\frac{1}{M})}{\log (1 - M * p)}

(11)

Figure 4 shows the number of repetitions (K) of steps 5–6 needed for the convergence of substructive CDT to target sparsity as a function of the number of component vectors (M). The dotted red line shows the expected number according to Equation (11). We can see that the equation gives a good estimate of the expected number of iterations. Compared to additive CDT, we can see that substructive CDT converges much slower.

Figure 5 shows the sparsity of Y at the end of the additive CDT procedure for a target sparsity of 2%. The result is a compositional SDR with sparsity between 1.65% and 2%.

Compared to additive CDT, we can see that substructive CDT converges much better to the target sparsity. The convergence to target sparsity becomes worse as the number of SDRs in the composition grows.

2.4. Context-Preserving SDR Encoding (CPSE) Method

In this section, we describe a novel SDR encoding method called CPSE. Careful examination of the performance of additive and substructive CDT leads to several interesting observations. For a number of component vectors (M) of three and higher, the additive CDT procedure convergence speed is much higher than substructive CDT, but the convergence steps for substructive CDT are smaller; thus, the overshoot relative to the target sparsity is lower. Also, the overshoot of the two methods is in opposite directions. Similarly to CDT, our method also uniformly selects bits from each component SDR into the compositional SDR Y. Contrary to CDT, CPSE is dynamically switching the encoding procedure from coarse additive update steps to fine-grained substructive steps. This allows us to control the convergence speed and achieve compositional SDR with sparsity very close to the desired target using a minimum number of permutation iterations. In addition, CPSE augments the encoded elements with information about the order of the composition.

To create compositional encoding SDR Y for M components, the CPSE encoder will perform the following steps:

At startup $\tilde{P} = \{{\tilde{p}}_{1}, \dots, {\tilde{p}}_{M}\}$ , compute a set of M random permutation matrices. For SDR vector size n, the permutation matrix size is n × n. Permutation matrices are used to encode information about the position of the base-level SDRs. For each position i = 1…M, a different permutation matrix ${\tilde{p}}_{i}$ is used. All invocations of the encoder will use the same matrix $\tilde{P}$ for position information encoding.
The input to the encoder is $X = \{x_{1}, \dots, x_{M}\}$ , a set of $M$ random component vectors.
Add position information to encoded elements by computing vector $\tilde{X} = \{{\tilde{x}}_{1}, \dots, {\tilde{x}}_{M}\}$ , where for each component $x_{i} \in X$ , component i in $\tilde{X}$ is ${{\tilde{x}}_{i} = x}_{i} {\tilde{p}}_{i}$ .
$\hat{Y} = ⋃_{i = 1 : M} {\tilde{x}}_{i}$ (Union of M SDRs in $\tilde{X}$ ).
Create a random permutation matrix P.
Additive Phase
$Y = 0$ .
$\hat{X} = \hat{Y}$ .
$\hat{X} = P \hat{X}$ (apply random permutation).
$Y = Y | (\hat{Y} & \hat{X})$ .
$repeat steps 8, 9 until Y sparsity > target$ .
Substructive Phase
$\hat{X} = P \hat{X}$ .
$Y = Y & (~ \hat{X})$ .
$repeat steps 11, 12 until Y sparsity < target * (1.0 + SOT)$ .
For Generic CPSD Decoding
For each component i in $X$ , store the following mapping in memory: $\{Y, Y {\tilde{p}}_{i}^{- 1}\} \to x_{i}$ .

Note that the random permutation matrix in step 1 can be computed a priori. The same matrix should be used for all encodings. This matrix can also be computed in real time by

{\tilde{p}}_{i} = {\tilde{p}}^{i}

, where

\tilde{p}

is some random permutation and

{\tilde{p}}^{i}

is

{\tilde{p}}_{i}

raised to the i power. SOT stands for the sparsity overshoot threshold (see Equation (13)). This parameter allows for the control of the mean output Y sparsity.

The purpose of step 3 is to allow the decoding of the component composition order from the final SDR Y. To achieve that goal, we need to ensure that the information about the order of the composition is included in the final SDR. We accomplish that by making sure that each component in the set of all possible components has a unique SDR according to composition order. Next, the encoder performs steps similar to the additive CDT procedure until Y SDR sparsity

\geq

desired sparsity. This allows for fast convergence near the desired sparsity using “coarse” sparsity update steps. As seen in the previous section, at the end of this step, SDR Y will have a sparsity higher than the target sparsity. Then, the encoder performs steps similar to substructive CDT until Y SDR sparsity

\leq

desired sparsity + SOT. This allows fine-grained improvement steps to achieve a sparsity that is very close to the target sparsity with minimum overshoot. Step 14 is needed for generic CPSD decoding (see Section 2.5.2). We compute the optimum SOT for the output to be near the target sparsity. For that purpose, we find the change in sparsity achieved by a single step 12 of the CPSE algorithm by using Equation (10) and the following equation.

∆ p ≅ p (M, s) * {(1 - p (M, s))}^{N} - p (M, s) * {(1 - p (M, s))}^{N + 1} = p (M, s) * {(1 - p (M, s))}^{N} * p (M, s) = p * p (M, s) .

(12)

We set the following:

SOT = 0.5 * ∆ p = 0.5 * p * p (M, s) \overset{p (M, s) \approx M * p}{\to} 0.5 * M * p^{2} .

(13)

To compute the total number of iterations, we assume that step 10 is performed at most K + 1 times, where K is computed according to Equation (8). Then, we compute the number of iterations (N) needed for step 12. For that purpose, we solve the following equation,

p + S O T = p (\hat{Y)} * p (⋃_{i = 1}^{K + 1} \hat{Y} P^{i}) * p (⋂_{i}^{N} ~ \hat{Y} P^{i}),

(14)

using Equations (5), (7), (10) and (13):

p + 0.5 p * p (M, s) ≅ p (M, s) (1 - {(1 - p (M, s))}^{K + 1}) {(1 - p (M, s))}^{N}

The following expression is used for p in Equation (7):

p (1 + 0.5 * p (M, s)) ≅ ({p (M, s)}^{2} + p * (1 - p (M, s))) {(1 - p (M, s))}^{N}

Equation (15) shows the solution for N:

N ≅ \frac{\log (\frac{p (1 + 0.5 * p (M, s))}{{p (M, s)}^{2} + p * (1 - p (M, s))})}{\log (1 - p (M, s))} \overset{p (M, s) \approx M * p}{\to} \frac{\log (\frac{1 + 0.5 * M * p}{(1 + M * p (M - 1))})}{\log (1 - M * p)}

(15)

For the CPSE algorithm, the number of iterations in step 10 is between K and K + 1.

The number of iterations in step 12 is N (see Figure 6).

In general, the implementation of CPSE can be iterative or deterministic as the needed number of iterations can be computed a priori using the equations we provided. The actual implementation should depend on the application and hardware.

As a final remark for this section, we would like to address the validity of the assumption that the CPSE encoder input to step 2 is M random SDR vectors. In real systems, encoder input X might contain data from N different feature channels. Accordingly, values in X from different feature channels might have similar SDR encoding. To allow the unique decoding of the identity of component value/feature pairs from the final SDR Y, we need each component in the set of all possible components to have a unique SDR. For that purpose, the following additional steps are required between CPSE steps 1 and 2:

$\hat{P} = \{{\hat{p}}_{1}, \dots, {\hat{p}}_{N}\}$ , a set of N random permutation matrices, one for each feature channel. The permutation matrix for channel i can be computed by ${\hat{p}}_{i} = {\hat{p}}^{i}$ , where $\hat{p}$ is some random permutation and ${\hat{p}}^{i}$ is $\hat{p}$ raised to the i power.
$\hat{F} = \{{\hat{f}}_{1}, \dots, {\hat{f}}_{N}\}$ , a set of N random SDRs, one for each feature channel.
Use the CPSE encoder to compute vector $\hat{X} = \{{\hat{x}}_{1}, \dots, {\hat{x}}_{M}\}$ , such that for each component $x_{i} \in X$ from feature channel j, the component in $\hat{X}$ is the CPSE encoding of two SDRs ${{\hat{x}}_{i} = C P S E ({{\hat{f}}_{j}, x}_{i} {\hat{p}}_{j}})$ .
The input to CPSE encoding step 2 is $\hat{X}$ instead of $X$ .

The decoder can be used to find final value/feature pairs by finding the ID of the feature channel by decoding

{\hat{f}}_{j}

and then by finding the appropriate feature value, as explained in the next section.

2.5. Context-Preserving SDR Decoding (CPSD) Method

In this section, we describe a novel SDR decoding method called CPSD. The input to the CPSD decoder is SDR Y, which is created by the CPSE encoder. The decoding method has two variations: basic CPSD and generic CPSD, which are described in the next sections.

2.5.1. Basic CPSD Decoding Procedure

To find the components encoded in Y, the decoder needs to perform the following steps to find the original M components:

Calculate the first component: ${\dot{x}}_{1} = Y {\tilde{p}}_{1}^{- 1}$ .
For each possible component value $x_{k}$ , compute the overlap score with ${\dot{x}}_{1}$ .
Set $x_{1}$ to be $x_{k}$ with the maximum overlap score.
Set the number of components M in Y to be as follows:
$M = ⌊\frac{n \times s}{m a x i m u m o v e r l a p s c o r e}⌋$
Set the minimum match threshold $θ$ = $\frac{n \times s}{M} - 1$ .
- For i = 2:M, perform the following:
- Calculate ${\dot{x}}_{i} = Y {\tilde{p}}_{i}^{- 1}$ .
- For each possible component value $x_{k}$ , compute the overlap score with ${\dot{x}}_{i}$ .
- Find a list of elements with overlap score $\geq θ$ .
- If the list contains a single element, set $x_{i}$ to be $x_{k}$ .
  If the list contains multiple elements, set $x_{i}$ to be $x_{k}$ with the maximum overlap score. Optionally, return K candidate elements for $x_{i}$ with K highest overlap scores; K is a parameter to the decoder.

SDR Y contains a union of M subsampled SDRs from

\tilde{X}

.

Figure 7 shows the sparsity of Y at the end of the CPSE procedure is

~ n x s

.

Figure 8 shows that CPSE ensures that each of the original components is represented in Y by

\frac{n \times s}{M} \pm 1

bits. For that reason, after performing the operation

{\dot{x}}_{i} = Y {\tilde{p}}_{i}^{- 1}

, it is guaranteed that

{\dot{x}}_{i}

have at least

\frac{n \times s}{M} - 1

bits that match the original component

x_{i}

. This is the minimum overlap score we expect. The other

\frac{n \times s (M - 1)}{M}

bits belong to other M − 1 components, projected by

{\tilde{p}}_{i}^{- 1}

to random positions.

It was shown that over a random choice of the SDRs codebook, given a sufficient subset of the original vector bits, it is possible to decode the original SDR with a very small probability of error [34]. Given a particular use case, the user should use Equation (4) to find suitable values for sparsity, SDR size, and the maximum number of SDRs in a single composition (M) that allow an acceptable FP decoding rate given threshold

θ = \frac{n \times s}{M}

(see Table 1).

Allowing the decoder to return multiple possible candidates (step 6.e) for each element

x_{i}

is an additional possibility to increase the maximum number of elements in the composition while maintaining low FP. This requires a way to tell which of the list of candidate elements is the correct one after decoding.

The basic CPSD decoding method is appropriate when all the possible base-level components are known, and their number is low enough to allow for an acceptable decoding time. For example, if the encoding represents a word in English, the base components are letters of the English alphabet. If the encoding represents a sentence, the base-level components are all the words in the English vocabulary.

If the encoding is a combination of sentences, the number of base-level components is huge and is too high to allow computation using a basic decoding method. The inevitable conclusion is that as we go higher in the hierarchy, the basic decoding method becomes computationally infeasible.

The following section describes the generic CPSD decoding method, which allows a practical and efficient implementation of the decoding procedure for situations when the number of possible component elements is huge.

2.5.2. Generic CPSD Decoding Procedure

As explained in the previous section, we can apply the basic decoding procedure in cases when the base-level components encodings are known a priori and the compute time is acceptable. For the general case, when a basic decoder is not applicable, we need a way to narrow the set of possible base-level components to the set of SDRs used by the encoder and optimize the search algorithm to allow for a deterministic computation time that does not depend on the number of possible base-level components vectors. To do that, we propose to use triadic memory (TM) [30]. TM is a type of associative memory that takes SDRs as its arguments and supports fast θ(1) store/fetch operations of the form

\{x, y\} \leftrightarrow \{z\}

.

To allow for CPSD decoding, we need the following step to be part of the CPSE procedure: after encoding SDR Y using CPSE, for every base component

x_{k}

from composition order k, store in memory the following association:

\{Y, Y {\tilde{p}}_{k}^{- 1}\} \to x_{k}

(see Section 2.4 step 14).

CPSD algorithm:

Query TM $\{Y, Y {\tilde{p}}_{1}^{- 1}\} \to {\dot{x}_{1}}$ .
Find the overlap score between ${\dot{x}}_{1}$ and $Y {\tilde{p}}_{1}^{- 1}$ .
If the overlap score < overlap threshold, stop the decoding procedure, as it is unable to decode Y.
Set $x_{1}$ = ${\dot{x}}_{1}$ .
Set the number of components M in Y to be as follows:
$M = ⌊\frac{n \times s}{o v e r l a p s c o r e}⌋$
Set the minimum match threshold $θ$ = $\frac{n \times s}{M} - 1$ .
For i = 2:M, perform the following:
- Query TRM $\{Y, Y {\tilde{p}}_{i}^{- 1}\} \to {\dot{x}_{i}}$ .
- If there is an overlap of $Y {\tilde{p}}_{i}^{- 1} a n d {\dot{x}}_{i} \geq θ, s e t x_{i}$ = ${\dot{x}}_{i}$ .

Step 14 in CPSE (see Section 2.4) stores a pair in TM, where Y serves as the context and

Y {\tilde{p}}_{i}^{- 1}

is like an “index” to component i in the context of Y. By design, both

Y

and

Y {\tilde{p}}_{k}^{- 1}

are random SDRs; for that reason, the TM capacity can be used to its full potential.

3. Results

3.1. CPSE Performance Analysis

Figure 6 shows the total number of repetitions (y-axis) needed for the convergence of CPSD to target sparsity as a function of a number of component vectors (x-axis) and the breakdown of the number of additive (step 9) and substructive (step 12) iterations, and the total number of iterations (both actual and calculated according to Equation (15)).

We can see that the number of steps needed by CPSE to reach target sparsity is small compared to substructive CDT and it is no more than five iterations for any number of component vectors equal or higher than four. Figure 7 shows the sparsity of Y at the end of the CPSE procedure using a target sparsity of 2% and SOT computed by Equation (13).

We can see from Figure 8 that for M components with a sparsity of 2%, the CPSE encoder output SDR Y is stable, with a mean sparsity of ~1.95% when using those parameters. Please note that CPSE convergence is stable with the number of component SDRs, contrary to both CDT methods. In the worst case, the result is a compositional SDR Y with a sparsity between 1.8% and 2.1%, which is much closer to the desired target sparsity than substructive CDT and additive CDT methods.

Figure 8 shows the number of bits taken from each component SDR into compositional SDR Y. As needed, simulation results show that the bits are selected uniformly, and the number of bits from each component is

~ \frac{n \times s}{M}

. As explained in Section 2.5, this is important to allow a low probability of error when decoding the original M components, given Y.

3.2. CPSD Performance Analysis

Table 1 shows the probability of a false positive match between two random SDRs of size N and sparsity s = 2% for a different number of components (M) and threshold values

θ = \frac{n \times s}{M}

relative to s = 2% sparsity (100%). The values of FP are calculated using Equation (4) in Section 2.1.2.

The table shows that for an SDR of size N = 2000, it is possible to encode a composition of M = 6 elements while maintaining a false positive (FP) probability of decoding the wrong component at 2.70 × 10⁻⁴. For N = 4000, it is possible to encode the same number of components with an FP of 2.03 × 10⁻⁷. An important conclusion is that with sufficiently high dimensionality, it is possible to create a composition using many elements while keeping low decoding error rates.

After querying for the first element, CPSD checks the overlap score of the fetched SDR value. As shown in Table 1, to maintain the FP above a desired threshold, the maximum number of elements in a composition is limited to a number M, which is a function of the SDR size and sparsity. This defines the minimum overlap score threshold between

{\dot{x}}_{1}

and

Y {\tilde{p}}_{1}^{- 1}

. If the overlap score is lower, CPSD cannot continue, because either Y is not an output of CPSE or TM has reached its full capacity, leading to a high error rate when fetching values from memory. The amount of memory we need for TM to support SDRs of size N is

N^{3} * s i z e o f (w e i g h t)

. For example, for an SDR size of N = 2000 and weight encoded by one bit, the needed memory size is 953.7 Mbytes. We have analyzed the capacity of triadic memory (ability to restore SDRs without errors) for storing random SDRs of size 2000 and sparsity s = 2% (40 active bits) for different values of size of (weight) for TM. Figure 9 shows the percentage of SDRs fetched from TM with zero or more errors as a function of stored SDRs.

We can see from Figure 9 that using one-bit weights has the best capacity/memory size ratio. The downside of using one-bit weights is that exceeding the maximum capacity when using a one-bit weight will lead to a fast degradation of memory performance and loss of information. Using two-bit weights increases the capacity by ~50% and allows a graceful performance degradation at the expense of doubling the memory footprint of TM. Further increasing the weight size allows for slower memory degradation and a slightly higher capacity at the expense of more two-bit and three-bit errors. Further increasing the weight size is counterproductive and leads to worse performance. Table 2 shows the detailed results presented in Figure 9. The R/W column contains the number of SDRs stored/fetched from TM. Columns marked as 0, 1, 2, etc., correspond to the relative number of SDRs that had this amount of bit errors between the stored and fetched SDRs.

The column marked as ≥5 contains the total number of SDRs with five or more bit errors. For example, when using a one-bit weight and storing/fetching 600,000 random SDRs, 98.7% of the fetched SDRs had zero errors, 1.2% had one-bit errors, and 0.1% of SDRs had two-bit errors. We can see from Table 2 that an eight-bit weight leads to worse performance than a four-bit weight for an SDR of size 2000 and sparsity of 2%.

Based on our results, it is best to use two-bit weights since it allows a significant memory capacity, coupled with graceful memory performance degradation. If a particular application needs a higher capacity than possible with a single TM, it is possible to use multiple TM memory banks, for example, a different TM for different levels of the hierarchy.

In general, the user should use Equation 4 to find suitable values for SDR size, sparsity, and the maximum number of SDRs in a single composition, and choose an appropriate TM size to allow an acceptable FP decoding rate for the particular application needs.

4. Discussion

Despite their unprecedented success, artificial neural networks suffer extreme opacity and weakness in learning general knowledge from limited experience. Some argue that the key to overcoming those limitations in artificial neural networks is to find a way to efficiently combine continuity with compositionality principles. While it is unknown how the brain encodes and decodes information to enable both rapid responses and complex processing, there is evidence that the neocortex employs sparse distributed representations for information processing. This is an active area of research.

This work deals with one of the challenges in this field related to encoding and decoding nested compositional structures, which are essential for representing complex real-world concepts. In this work, we focus on improving one of the proposed algorithms in this domain called context-dependent thinning (CDT) [29]. A distinguishing feature of CDT is that the CDT-bound vector remains similar to each SDR input and combinations of similar inputs. As a result, it is possible to detect if a given item is part of a composite by checking the overlap with the composite’s SDR vector without decoding the whole structure. In addition, a recent work [31] performed a comprehensive performance evaluation and comparison of various VSA algorithms on different tasks. Their conclusion was that methods that use sparse binary vectors (like CDT) have several advantages; for example, they require a relatively low number of dimensions for binding and later restoring information with a low probability of error, thus facilitating relatively low memory consumption in real applications.

The CDT method has several weaknesses: it is slower than some of the other methods, it is commutative, meaning the order of the composition is not encoded, and there is no straightforward algebraic way to unbind (decode) the original components.

In this work, we have presented a novel algorithm for encoding compositional SDR structures, termed context-preserving SDR encoding (CPSE). Our method builds on the ideas of the CDT [29] method while addressing some of its weaknesses. In addition, we have introduced a novel decoding algorithm called context-preserving SDR decoding (CPSD), based on triadic memory (TM) [30].

We have shown that CPSE encoding improves upon CDT by incorporating information about the sequence of base-level components within the composite SDR. In addition, CPSE improves upon CDT by optimizing the convergence speed to achieve the desired output SDR sparsity. In addition, CPSE convergence is stable with the number of component SDRs, contrary to CDT.

We have shown an interesting quality of our encoding method; encoding order-invariant structures containing more than four components requires a near-constant amount of computation when using CPSE, irrespective of the number of base-level components. This makes the CPSE algorithm deterministic and simplifies the required hardware needed to implement such a use case. In general, the implementation of CPSE can be iterative or deterministic as the needed number of iterations can be computed a priori using the equations we provided. The actual implementation should depend on the application and hardware.

We show that the CPSD decoding method allows us to retrieve the identity and order of base components from a composite SDR while keeping a very low probability of error. We detail both pure compute-based decoding methods applicable in a simple setting when base-level components are known and a TM-based method for the general case when the number of possible base-level components is huge. We show the trade-off between SDR size and the number of elements that can be encoded while maintaining a low decoding bit error rate. We have demonstrated the trade-off and optimum TM memory size needed to achieve an acceptable decoding bit error probability. For single TM, the optimum memory size is one bit or two bits for an SDR size of 2000 and sparsity of 2%.

In this setting, using one bit per triadic memory cell allows us to decode 600,000 encoded vectors with only 1.2% of vectors containing a single error bit relative to the original component. Storing more values essentially leads to memory collapse, leading to a very high recall bit error rate. Doubling the cell memory allows encoding/decoding 800,000 vectors with only 2.8% containing a single bit error. Storing 900,000 vectors will result in 9.2% of vectors containing up to two-bit errors and 46.1% containing a single error.

Based on our results, it is best to use two-bit weights since it allows a significant memory capacity, coupled with graceful memory performance degradation. If a particular application needs a higher capacity than possible with a single TM, it is possible to use multiple TM memory banks, for example, different TM for different levels of the hierarchy.

In general, the user should tune the SDR size, sparsity, and TM memory size for the particular application needs.

Our proposed encoding and decoding methods allow for a practical and efficient implementation that optimizes computation and memory based on desired performance while preserving both “unstructured” similarity and structured overlap, with the ability to decode the order of composition and the identity of the base-level components in the general case when the number of possible components is huge. We believe that the combination of those qualities is unique and hope that it will open the door for the efficient incorporation of structured information based on sparse distributed representations in artificial neural networks.

Author Contributions

Conceptualization, R.M. and A.M.; methodology, R.M. and A.M.; software, R.M.; validation, R.M.; formal analysis, R.M.; investigation, R.M.; resources, R.M. and A.M.; data curation, R.M.; writing—original draft preparation, R.M. and A.M.; visualization, R.M.; supervision, A.M.; project administration, A.M.; funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The encoding/decoding algorithm code used to create results for this paper is available from the following github repository: https://github.com/romanma9999/CDT_SDR/tree/master, accessed on 17 April 2025.

Acknowledgments

We thank the creators of the triadic memory project for providing the code for this research. The code for triadic memory used in this work is available from the following github repository under the MIT license https://github.com/PeterOvermann/TriadicMemory, accessed on 1 July 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
SDR	Binary Sparse Distributed Representation
CNN	Convolution Neural Network
CDT	Context-Dependent Thinning
TM	Triadic Memory
CPSE	Context-Dependent SDR Encoding
CPSD	Context-Dependent SDR Decoding
HDC	Hyperdimensional Computing
VSA	Vector Symbolic Architectures
TPS	Tensor Product Representations
HRR	Holographic Reduced Representations
SBC	Sparse Block Codes
VTB	Vector-Derived Transformation Binding
BDSC	Binary Sparse Distributed Codes
BSC	Binary Spatter Codes

References

Malik, S.; Muhammad, K.; Waheed, Y. Artificial intelligence and industrial applications-A revolution in modern industries. Ain Shams Eng. J. 2024, 15, 102886. [Google Scholar] [CrossRef]
Rambelli, G. Constructions and Compositionality: Cognitive and Computational Explorations; Cambridge University Press: Cambridge, UK, 2025. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2323. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5999–6009. [Google Scholar] [CrossRef]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Henderson, J. The Unstoppable Rise of Computational Linguistics in Deep Learning. arXiv 2020, arXiv:2005.06420. [Google Scholar] [CrossRef]
Smolensky, P.; McCoy, R.T.; Fernandez, R.; Goldrick, M.; Gao, J. Neurocompositional computing: From the Central Paradox of Cognition to a new generation of AI systems. AI Mag. 2022, 43, 308–322. [Google Scholar] [CrossRef]
Lennie, P. The Cost of Cortical Computation. Curr. Biol. 2003, 13, 493–497. [Google Scholar] [CrossRef]
Hromádka, T.; DeWeese, M.R.; Zador, A.M. Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol. 2008, 6, 124–137. [Google Scholar] [CrossRef]
Weliky, M.; Fiser, J.; Hunt, R.H.; Wagner, D.N. Coding of natural scenes in primary visual cortex. Neuron 2003, 37, 703–718. [Google Scholar] [CrossRef]
Kanerva, P. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cogn. Comput. 2009, 1, 139–159. [Google Scholar] [CrossRef]
Gayler, R.W. Vector symbolic architectures answer jackendoff’s challenges for cognitive neuroscience. arXiv 2003, arXiv:cs/0412059. [Google Scholar] [CrossRef]
Kleyko, D.; Rachkovskij, D.; Osipov, E.; Rahimi, A. A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part II: Applications, Cognitive Models, and Challenges. ACM Comput. Surv. 2023, 55, 1–52. [Google Scholar] [CrossRef]
Neubert, P.; Schubert, S.; Protzel, P. An Introduction to Hyperdimensional Computing for Robotics. Künstliche Intell. 2019, 33, 319–330. [Google Scholar] [CrossRef]
Rahimi, A.; Kanerva, P.; Benini, L.; Rabaey, J.M. Efficient Biosignal Processing Using Hyperdimensional Computing: Network Templates for Combined Learning and Classification of ExG Signals. Proc. IEEE 2019, 107, 123–143. [Google Scholar] [CrossRef]
Rahimi, A.; Datta, S.; Kleyko, D.; Frady, E.P.; Olshausen, B.; Kanerva, P.; Rabaey, J.M. High-Dimensional Computing as a Nanoscalable Paradigm. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 64, 2508–2521. [Google Scholar] [CrossRef]
Kleyko, D.; Davies, M.; Frady, E.P.; Kanerva, P.; Kent, S.J.; Olshausen, B.A.; Osipov, E.; Rabaey, J.M.; Rachkovskij, D.A.; Rahimi, A.; et al. Vector Symbolic Architectures as a Computing Framework for Emerging Hardware. Proc. IEEE 2022, 110, 1538–1571. [Google Scholar] [CrossRef]
Ma, Y.; Hildebrandt, M.; Baier, S.; Tresp, V. Holistic representations for memorization and inference. UAI 2018, 1, 403–413. [Google Scholar] [CrossRef]
Kleyko, D.; Rachkovskij, D.A.; Osipov, E.; Rahimi, A. A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part I: Models and Data Transformations. ACM Comput. Surv. 2022, 55, 1–40. [Google Scholar] [CrossRef]
Frady, E.P.; Kleyko, D.; Sommer, F.T. Variable Binding for Sparse Distributed Representations: Theory and Applications. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 2191–2204. [Google Scholar] [CrossRef]
Plate, T.A. Holographic reduced representations. IEEE Trans. Neural Netw. 1995, 6, 623–641. [Google Scholar] [CrossRef]
Von Der Malsburg, C. The What and Why of Binding. Neuron 1999, 24, 95–104. [Google Scholar] [CrossRef] [PubMed]
Kanerva, P. The Spatter Code for Encoding Concepts at Many Levels. In ICANN ’94; Marinaro, M., Morasso, P.G., Eds.; Springer: London, UK, 1994. [Google Scholar] [CrossRef]
Smolensky, P. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif. Intell. 1990, 46, 159–216. [Google Scholar] [CrossRef]
Yeung, C.; Zou, Z.; Imani, M. Generalized Holographic Reduced Representations. arXiv 2024, arXiv:2405.09689. [Google Scholar] [CrossRef]
Rachkovskij, D.A. Representation and processing of structures with binary sparse distributed codes. IEEE Trans. Knowl. Data Eng. 2001, 13, 261–276. [Google Scholar] [CrossRef]
Laiho, M.; Poikonen, J.H.; Kanerva, P.; Lehtonen, E. High-dimensional computing with sparse vectors. In Proceedings of the 2015 IEEE Biomedical Circuits and Systems Conference (BioCAS), Atlanta, GA, USA, 22–24 October 2015; pp. 1–4. [Google Scholar] [CrossRef]
Gosmann, J.; Eliasmith, C. Vector-Derived Transformation Binding: An Improved Binding Operation for Deep Symbol-Like Processing in Neural Networks. Neural Comput. 2019, 31, 849–869. [Google Scholar] [CrossRef]
Rachkovskij, D.A.; Kussul, E.M. Binding and normalization of binary sparse distributed representations by context-dependent thinning. Neural Comput. 2001, 13, 411–452. [Google Scholar] [CrossRef]
Available online: https://peterovermann.com/TriadicMemory.pdf (accessed on 1 July 2024).
Schlegel, K.; Neubert, P.; Protzel, P. A comparison of vector symbolic architectures. Artif. Intell. Rev. 2022, 55, 4523–4555. [Google Scholar] [CrossRef]
Ahmad, S.; Hawkins, J. Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory. arXiv 2015, arXiv:1503.07469. [Google Scholar] [CrossRef]
Purdy, S. Encoding Data for HTM Systems. arXiv 2016, arXiv:1602.05925. [Google Scholar] [CrossRef]
Thomas, A.; Dasgupta, S.; Rosing, T. A Theoretical Perspective on Hyperdimensional Computing (Extended Abstract). IJCAI Int. Jt. Conf. Artif. Intell. 2022, 72, 5772–5776. [Google Scholar] [CrossRef]

Figure 1. Sparsity of vector

\hat{Y}

as a function of M component SDR vectors.

Figure 1. Sparsity of vector

\hat{Y}

as a function of M component SDR vectors.

Figure 2. Additive CDT—permutation iterations needed to reach target sparsity.

Figure 3. Additive CDT—final compositional SDR Y sparsity.

Figure 4. Substructive CDT—permutation needed for reaching target sparsity.

Figure 5. Substructive CDT—final compositional SDR Y sparsity.

Figure 6. CPSE—number of permutation iterations needed for reaching target sparsity.

Figure 7. Final compositional SDR Y sparsity.

Figure 8. CPSE—bits taken from base-level components into compositional SDR Y.

Figure 9. Percentage of SDRs fetched from TM with zero or more errors as a function of stored SDR number and memory cell weight size.

Table 1. Probability of false positive match for SDR size N and sparsity 2% as a function of the number of composition components and threshold

θ

.

Table 1. Probability of false positive match for SDR size N and sparsity 2% as a function of the number of composition components and threshold

θ

.

Number of Components	$θ$	FP N = 2000, s = 2%	FP N = 4000, s = 2%
2	50%	8.30 × 10⁻²⁵	2.59 × 10⁻⁴⁹
3	33%	2.21 × 10⁻¹²	4.12 × 10⁻²⁶
4	25%	5.10 × 10⁻⁸	7.06 × 10⁻¹⁶
5	20%	1.85 × 10⁻⁵	8.34 × 10⁻¹¹
6	16%	2.70 × 10⁻⁴	2.03 × 10⁻⁷

Table 2. The capacity of triadic memory as a function of bits number allocated for storing weight.

R/W	0	1	2	$3$	4	≥5
1-bit weight
500,000	100	0	0	0	0	0
600,000	98.7	1.2	0.1	0	0	0
700,000	0	1.6	5.3	11.4	17.2	64.5
2-bit weight
600,000	100	0	0	0	0	0
700,000	99.9	0.1	0	0	0	0
800,000	97.1	2.8	0.1	0	0	0
900,000	43.7	46.1	9.2	0.8	0.1	0.1
1,000,000	0.1	6.2	20.5	29.8	24.4	19
4-bit weight
500,000	100	0	0	0	0	0
600,000	99.9	0.1	0	0	0	0
700,000	98.5	1.4	0.1	0	0	0
800,000	88.4	11.3	0.2	0.1	0	0
900,000	57.1	37.9	4.5	0.3	0.1	0.1
1,000,000	22.1	49	23.4	4.8	0.6	0.1
1,100,000	4.8	27.9	36.8	21.6	7.1	1.8
8-bit weight
500,000	100	0	0	0	0	0
600,000	99.9	0.1	0	0	0	0
700,000	98.2	1.7	0.1	0	0	0
800,000	86.1	13.5	0.3	0.1	0	0
900,000	53.4	40.3	5.7	0.4	0.1	0.1
1,000,000	18.7	47.3	26.6	6.4	0.9	0.1
1,100,000	2.8	20.8	35.1	26.4	11.2	3.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Malits, R.; Mendelson, A. Efficient Context-Preserving Encoding and Decoding of Compositional Structures Using Sparse Binary Representations. Information 2025, 16, 343. https://doi.org/10.3390/info16050343

AMA Style

Malits R, Mendelson A. Efficient Context-Preserving Encoding and Decoding of Compositional Structures Using Sparse Binary Representations. Information. 2025; 16(5):343. https://doi.org/10.3390/info16050343

Chicago/Turabian Style

Malits, Roman, and Avi Mendelson. 2025. "Efficient Context-Preserving Encoding and Decoding of Compositional Structures Using Sparse Binary Representations" Information 16, no. 5: 343. https://doi.org/10.3390/info16050343

APA Style

Malits, R., & Mendelson, A. (2025). Efficient Context-Preserving Encoding and Decoding of Compositional Structures Using Sparse Binary Representations. Information, 16(5), 343. https://doi.org/10.3390/info16050343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Context-Preserving Encoding and Decoding of Compositional Structures Using Sparse Binary Representations

Abstract

1. Introduction

2. Materials and Methods

2.1. Background

2.1.1. Related Works

2.1.2. Sparse Binary Representation (SDR)

2.2. Problem Statement

2.3. CDT Encoding Method

2.3.1. Additive CDT Method

2.3.2. Additive CDT Performance Analysis

2.3.3. Substructive CDT Method

2.3.4. Substructive CDT Performance Analysis

2.4. Context-Preserving SDR Encoding (CPSE) Method

2.5. Context-Preserving SDR Decoding (CPSD) Method

2.5.1. Basic CPSD Decoding Procedure

2.5.2. Generic CPSD Decoding Procedure

3. Results

3.1. CPSE Performance Analysis

3.2. CPSD Performance Analysis

4. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI