A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations

Le, Minh-Son; Pham, Thi-Nhan; Nguyen, Thanh-Dat; Chang, Ik-Joon

doi:10.3390/electronics13193847

Open AccessArticle

A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations

by

Minh-Son Le

,

Thi-Nhan Pham

,

Thanh-Dat Nguyen

and

Ik-Joon Chang

^*

Department of Electronic Engineering, Kyung Hee University, Yongin-si 17104, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(19), 3847; https://doi.org/10.3390/electronics13193847 (registering DOI)

Submission received: 12 August 2024 / Revised: 24 September 2024 / Accepted: 26 September 2024 / Published: 28 September 2024

(This article belongs to the Special Issue Research on Key Technologies for Hardware Acceleration)

Download

Browse Figures

Versions Notes

Abstract

:

Binary neural networks (BNNs) that use 1-bit weights and activations have garnered interest as extreme quantization provides low power dissipation. By implementing BNNs as computation-in-memory (CIM), which computes multiplication and accumulations on memory arrays in an analog fashion, namely, analog CIM, we can further improve the energy efficiency to process neural networks. However, analog CIMs are susceptible to process variation, which refers to the variability in manufacturing that causes fluctuations in the electrical properties of transistors, resulting in significant degradation in BNN accuracy. Our Monte Carlo simulations demonstrate that in an SRAM-based analog CIM implementing the VGG-9 BNN model, the classification accuracy on the CIFAR-10 image dataset is degraded to below 50% under process variations in a 28 nm FD-SOI technology. To overcome this problem, we present a variation-aware BNN framework. The proposed framework is developed for SRAM-based BNN CIMs since SRAM is most widely used as on-chip memory; however, it is easily extensible to BNN CIMs based on other memories. Our extensive experimental results demonstrate that under process variation of 28 nm FD-SOI, with an SRAM array size of

128 \times 128

, our framework significantly enhances classification accuracies on both the MNIST hand-written digit dataset and the CIFAR-10 image dataset. Specifically, for the CONVNET BNN model on MNIST, accuracy improves from 60.24% to 92.33%, while for the VGG-9 BNN model on CIFAR-10, accuracy increases from 45.23% to 78.22%.

Keywords:

BNN; deep neural network; computation in memory; in-memory computations; SRAM; polar neural network

1. Introduction

Deep neural networks (DNNs) have shown outstanding performance, surpassing human-level accuracy in many applications such as image processing, voice recognition, and language translation. However, deploying DNNs in resource-constrained edge devices remains challenging for several reasons. DNNs typically require a large number of parameters, leading to substantial memory demands, which are difficult to accommodate in embedded systems. Moreover, the computational demands of DNNs result in high energy dissipation, presenting a major obstacle for edge devices.

To address these issues, computation in memory (CIM) has emerged as a promising paradigm, reducing energy dissipation by storing model parameters in memory and performing computations directly within the memory, thus minimizing energy-intensive data transfers. Prior works have demonstrated notable progress in this area. For example, NeuRRAM, presented in [1], demonstrates a versatile RRAM-based CIM chip that supports multiple model architectures and computational bit precisions. Similarly, the work in [2] focuses on high-precision floating point (FP16 and BF16) computations, proposing an ReRAM-based CIM macro that delivers high throughput and energy efficiency for artificial intelligence (AI) edge devices. Other studies, such as [3,4], investigate multi-bit precision techniques in analog CIM systems by utilizing phase-change memory (PCM). These works leverage the analog computing paradigm to perform multiplications and accumulations within memory. For instance, Ref. [3] introduces a signed multiply-and-accumulation (MAC) feature in embedded PCM arrays, while Ref. [4] develops a multi-bit precision core based on backend-integrated multi-level PCM.

To further enhance energy efficiency and reduce memory demands, binary neural networks (BNNs) have emerged as a promising solution. BNNs binarize weights and activations, significantly reducing model size and enabling efficient deployment on smaller embedded memories. This approach drastically lowers energy consumption without significantly compromising accuracy [5,6,7]. The energy efficiency of BNNs can be further enhanced by directly executing computations in embedded memories such as SRAM, e-FLASH, and STT-MRAM, through CIM techniques [8,9,10,11,12,13,14,15].

However, in reality, we should consider the potential problem that process variation significantly degrades the accuracy of BNNs operating on analog CIM platforms. Process variation occurs due to manufacturing imperfections in semiconductor fabrication, leading to deviations in device parameters such as threshold voltage, channel length, and oxide thickness [16]. These variations affect the behavior of transistors and other components, introducing inaccuracies in analog computations. The low-resolution weights of BNNs make them particularly sensitive to these variations, increasing the likelihood of computation errors. This strongly motivates the development of techniques to alleviate the impact of process variations and maintain the accuracy of BNNs in such platforms.

This work introduces a variation-aware BNN framework designed to ensure accurate analog CIM operations despite process variations in scaled technologies, using the 6T-SRAM bit-cell [9] from Table 1 as an example. Recently, many emerging non-volatile memory (eNVM)-based CIMs have garnered interest due to their high density and low standby power [11,12,17,18,19,20]. However, eNVM-based CIMs face challenges in manufacturing actual hardware, whereas SRAM offers advantages from a design perspective, and thus, plays a dominant role in CIM design. In light of this, we develop the variation-aware BNN framework on an SRAM-based CIM. Nonetheless, the developed framework can be readily extended to CIMs utilizing other memory types and SRAM bit-cell configurations.

Prior work, such as [21], has addressed variation-aware training for memristor-based CIM on crossbars, but this approach does not extend to SRAM-based BNNs. Unlike [21], which focuses on memristor crossbars, we develop more realistic models for weights and activation variations through Monte Carlo simulations. Additionally, we optimize the biasing potentials of word lines and bit lines in SRAM-based CIM circuits, resulting in significant accuracy improvements for more complex BNN models, such as RESNET-18 [22] and VGG-9 [23], when evaluated on the CIFAR-10 dataset [24]. Table 2 presents a summary of our work and a comparison with previous studies on SRAM-based BNN CIM systems. Unlike prior approaches such as [14,15], which introduce additional hardware to address process and temperature variations, our method relies on software-based techniques, avoiding any hardware overhead. Although the variation-aware training process requires multiple training iterations, the overhead is minimal as the trained weights can be reused once training is completed.

The contribution of this paper can be summarized as follows.

We develop mathematical models to quantify the impact of process variations on SRAM-based BNN CIM circuits. In these circuits, the current of an SRAM cell represents the multiplication result of the stored weight and the input activation. However, parametric process variations cause fluctuations in the SRAM cell current, directly affecting the accuracy of analog computations, and consequently, the BNN inference accuracy. Our model interprets these fluctuations as variations in the weights of the BNN. To model these weight variations, we utilized the distribution of SRAM cell currents obtained through Monte Carlo (MC) simulations in 28 nm FD-SOI technology. Consequently, our method is applicable to SRAM-CIM circuits employing current-based analog computation.
Based on the derived model, we present the variation-aware BNN training framework to produce variation-resilient training results. During the training, BNNs are considered bi-polar neural networks due to the weight variations aforementioned. We demonstrate the efficacy of the developed framework through extensive simulations.
We optimize the biasing voltages of word lines (WLs) and bit lines (BLs) in SRAM to achieve a balance between maintaining acceptable accuracy and minimizing power consumption.

The remaining part of this paper is organized as follows. In Section 2, we explain the background regarding BNN, the architecture of SRAM-based CIM, how DNNs can be mapped onto SRAM-based CIM arrays, and in-memory batch normalization. In Section 3, we present the variation-aware framework and optimization methodology for biasing voltages of WLs and BLs of SRAM. Section 4 validates the efficacy of our framework. Lastly, we conclude the paper in Section 5.

2. Preliminaries

2.1. Binary Neural Network

In a BNN, all weights and activations are binarized, significantly enhancing the DNN inference energy efficiency. Many researchers have shown that despite such a low-precision format, BNNs deliver good inference accuracy [5,6,7]. The first BNN introduced [5] used the sign function for the binarization of both weight and activation, where all weights and activations become ‘+1’ or ‘−1’. However, some state-of-the-art (SOTA) works have improved the accuracy of BNNs by using the activations of ‘0’ or ‘1’ [7] while still employing the sign function for the binarization of weights. Considering such a trend, we use the following activation function:

B i n A c t (X) = \{\begin{matrix} 1, & X \geq t h r e s h \\ 0, & X < t h r e s h \end{matrix},

(1)

where

t h r e s h

is the activation threshold. In our experiments, where

t h r e s h

is assumed to be ‘0.5’, the results are shown in Table 3. These results follow a similar trend to the SOTA works [7]. Since we utilize the activation function in (1), the activations take values of ‘0’ or ‘1’.

2.2. The Architecture of SRAM-Based CIM

Figure 1 shows the most widely used 6T-SRAM-based CIM architecture and cell configuration for the BNN computation, which refers to the design of Rui Liu et al. [9]. We consider such an architecture for our proposed BNN framework, discussed in Section 3.

In Figure 1, the weights of the BNNs are stored to 6T-SRAM cells, and the bitwise multiplications between weights and input activations of the network are directly computed in an analog fashion inside the SRAM array. Let us assume that the weight of ‘+1’ represents Q = 1, QB = 0, and the weight of ‘−1’ represents the inverted cell. When we operate a BNN on the given configuration, the input activations of a certain BNN layer become the digital values of WLs, since the activation function of (1) is considered in this work, as mentioned in Section 2.1. When the inference is executed, all WLs are biased upon the input activations, and then, the product of the

i^{t h}

weight and the

i^{t h}

activation becomes the difference between the

b l

and

b l b

cell currents, ‘

i_{c e l l_b l_i}

-

i_{c e l l_b l b_i}

’ in Figure 1. All cell currents are accumulated to the currents of BL and BLB, implying that ‘

I_{B L}

-

I_{B L B}

’ becomes the multiply-and-accumulation (MAC) output. ‘

I_{B L}

-

I_{B L B}

’ is sensed by the differential current sense amplifier (CSA), producing the binary activation output based on (1). Please note that when

t h r e s h

of (1) is not zero, a certain circuitry is necessary to implement this. Further, we need to implement batch normalization properly. These are embedded into the sense amplifier shown in Figure 2, discussed in Section 2.4.

2.3. Mapping DNNs onto SRAM-Based CIM Arrays

2.3.1. Input Splitting

In the CIM architecture of Figure 1, we store weights to SRAM and control the potentials of WL upon the input activations. Then, the SRAM directly computes the matrix multiplications of the convolutional and fully connected (FC) layers by using analog computing techniques. In such a scheme, the maximum matrix size that the SRAM can calculate at once is dependent on the SRAM array size, which relies on the physical design constraints. Unfortunately, the computed matrix size often exceeds the SRAM array size. Figure 3 shows such a situation well. Here, some convolution layers have 4-dimensional weights. We can regard a convolution layer with 4-dimensional weights as a 2-dimensional matrix with the size of (kernel size × kernel size × input channel size)×(output channel size). For instance, in Figure 3, the 4-dimensional convolutional layer whose kernel, input channel, and output channel sizes are 3, 128, and 256, respectively, is considered as a 1152

(= 3 \times 3 \times 128) \times 256

matrix. To compute the matrix on the circuit of Figure 1, we need 1152 memory rows. Since it is challenging to implement an SRAM array with 1152 rows, we need to properly split the matrix by considering the SRAM array size. Under such a circumstance, an SRAM CIM circuit can deal with one split part of the matrix and produces the corresponding partial sum. All partial sums delivered by the SRAM CIM circuits should be accumulated to complete the matrix computation. SOTA works showed that the precision of the partial sums significantly affects the accuracy of the computed BNNs [9,17,20,26]. To obtain multi-bit partial sums in the SRAM CIM circuits, we need analog-to-digital converters (ADCs) to produce multi-bit outputs, incurring large area and energy overheads.

To address this problem, the authors of [26] developed an input splitting technique, which we employ in this work. A large convolutional or FC layer is reconstructed into several smaller groups, as shown in Figure 3, whose input number should be smaller or equal to the number of rows in an SRAM array. Hence, the SRAM array of Figure 1 computes the weighted sums of each group, and the CSAs produce their own 1-bit outputs. Then, the outputs of all groups are merged to fit the input size of the following BNN layer, which is performed by digital machines to obtain accurate merging without the effect of process variations.

The accuracy of BNNs employing the input splitting technique is compared with the baseline BNN accuracy in Table 3. The results demonstrate that accuracy improves with increasing SRAM array size across all BNN models and both binary activation schemes. This improvement is attributed to the reduction in the number of groups required for splitting as the array size increases, as shown in Table 4, Table 5, and Table 6 for the CONVNET, RESNET-18, and VGG-9 BNN models, respectively. This trend aligns with observations from previous work [26]. It is noteworthy that the first layer, which processes the input image, and the last layer, which computes class scores, are excluded from the input splitting and binary quantization, and are instead managed by digital hardware, in line with SOTA BNN implementations [5,6,7,26]. The split BNN accuracies presented in Table 3 serve as baselines for evaluating the techniques introduced in this work.

2.3.2. Mapping

We provide more detailed discussions regarding the mapping between convolutional layers of BNNs and SRAM-based CIM arrays, shown in Figure 3. As aforementioned, convolutional layers are split to ensure that their size is equal to or less than the number of rows in the SRAM array. As shown in Table 6, where the SRAM array size is

256 \times 256

, the number of split groups of the layer is six

(= ⌈ 1152 / 256 ⌉ + 1)

, and the number of input channels per group is 21

(= 128 / 6)

in VGG-9 BNN models. Hence, the input size of each group is 189

(= 3 \times 3 \times 21)

, which is smaller than the number of rows in the SRAM array. Consequently, each group can be regarded as a 2-dimensional matrix with a size of

(3 \times 3 \times 21) \times (256)

. Under this circumstance, we can have the mapping strategy that all weights corresponding to each output channel are stored on one column of the SRAM array. The outputs of each group, which is binary (‘0’ or ‘1’), are obtained from the macros. We can manage FC layers with the above mapping strategy as well.

2.4. In-Memory Batch Normalization

Batch normalization (BN) is a technique used to stabilize the learning process, significantly reducing the number of training epochs. In BNNs, BN plays a crucial role in enhancing accuracy, making it an essential component [27]. BN can be described by the following equation.

Y = γ \frac{X - μ}{\sqrt{σ^{2} + ϵ}} + β,

(2)

where X and Y denote the input and output activations of BN, respectively. The parameters

μ

and

σ^{2}

represent the mean and variance of the input activations computed across a mini-batch, while

γ

and

β

are learnable parameters corresponding to the scaling and shifting operations. The term

ϵ

is a small constant introduced to ensure numerical stability during the normalization process. During the backward propagation of the training, these four parameters are updated and used to normalize the output of the current batch. In the inference, these parameters become constant, and hence, BN can be regarded as a linear transformation function. As shown in Figure 4, in the inference, the output of the BN layer becomes the input of the activation function (1). In this work, we merge (1) and (2), whose function is named

B n B i n A c t ()

. The merged function can be expressed by

B n B i n A c t (X) = \{\begin{matrix} 1, & X \geq X_{t h} \\ 0, & X < X_{t h} \end{matrix},

(3)

where X is the output of the weighted-sum layer, and

X_{t h} = \frac{(t h r e s h - β)}{γ} (\sqrt{σ^{2} + ϵ}) + μ .

(4)

Most previous works assume that BN is computed in software [8,9,19,20], which requires analog-to-digital converters (ADCs) to convert the accumulated BL currents into high-precision (32-bit floating point) digital values. These digital values are then processed by digital processors, a method that incurs a significant energy overhead, especially for edge devices with strict power constraints. To mitigate this overhead, Ref. [28] proposed implementing BN directly in the hardware using additional cells.

In our approach, we address the problem by embedding the BN functionality into the differential CSA, as shown in Figure 2. Specifically, we merge the activation function and BN computation by introducing variable current biasing within the CSA. The required current biasing values,

I_{T h r e s_N e g}

and

I_{T h r e s_P o s}

, are derived from the conversion rule provided in Table 7 and are necessary to handle both positive and negative thresholds (

X_{t h}

). This approach eliminates the need for energy-intensive high-resolution ADCs.

During the inference phase of a BNN, the BN layer has unique learned parameters for each output channel, corresponding to each column in the SRAM-based CIM array. For each channel, the threshold value

X_{t h}

is calculated based on eqmergedxth and can be quantized from a 32-bit floating point representation to a fixed-point format, which has a range of

[- \frac{2^{n}}{2}, \frac{2^{n}}{2} - 1]

. In our experiments,

X_{t h}

can be quantized up to a 5-bit integer for CONVNET and VGG-9, and a 6-bit integer for RESNET-18, with an SRAM array size of

256 \times 256

, resulting in an accuracy loss of less than 1%. The current biasing values are then regulated by a current-steering digital-to-analog converter (DAC), such as [29], to precisely adjust the bias currents, thereby enabling BN directly in the analog domain. It is anticipated that the high-resolution ADC will typically consume more power due to its increased complexity and 32-bit floating point requirements.

While the primary focus of this work is on presenting a variation-aware framework, a detailed examination of the CSA operation and the control of variable current biasing is beyond the scope of this study. A more comprehensive discussion on these aspects, including their implications for system performance and energy efficiency, will be addressed in future work.

3. A Variation-Aware Binary Neural Network Framework

3.1. Variation-Aware Models for SRAM-Based BNN CIM

In this section, we present a variation-aware BNN framework to enhance the reliability of CIM under process variations. The framework assumes SRAM-based CIM, the configuration of which is discussed in Section 2.2. To develop such a BNN framework, firstly, variation-aware models are investigated and derived as follows.

In the given configuration (Figure 1), as discussed, the MAC output is defined by ‘

I_{B L}

-

I_{B L B}

’, which is described as

\begin{matrix} I_{B L} - I_{B L B} & = Σ_{i = 0}^{N - 1} (i_{c e l l_b l_i} - i_{c e l l_b l b_i}) \times W L_{i} \\ = Σ_{i = 0}^{N - 1} ((W_{i}) \times I M) \times W L_{i} \end{matrix},

(5)

where

W_{i}

is the

i^{t h}

weight stored in the SRAM array (i.e.,

W_{i}

is ‘+1’ or ‘−1’),

W L_{i}

is the

i^{t h}

word line (

W L

) status (ON or OFF), which corresponds to activation values (‘1’ or ‘0’), and

I M

is the current margin, that is, the absolute value of difference between the BL and BLB currents for one cell (i.e., one bitwise multiply operation), where no process variations are assumed. It is important to acknowledge that, in practice, both ‘

i_{c e l l_b l_i}

’ and ‘

i_{c e l l_b l b_i}

’ are subject to process variations, which can be interpreted as variations in the BNN weights, denoted as

W_{i}

in Equation (5). Let us model the weight variation as

Δ_{(W_{i})}

. Consequently, the product of the

i^{t h}

weight and the

i^{t h}

activation, ‘

i_{c e l l_b l_i} - i_{c e l l_b l b_i}

’, can be redefined as

\begin{matrix} i_{c e l l_b l_i} - i_{c e l l_b l b_i} = (W_{i} + Δ_{(W_{i})}) \times I M . \end{matrix}

(6)

In this work, we analyze the impact of process variations on both

b l

and

b l b

cell currents through 10,000 MC simulations in 28 nm FD-SOI technology. For a 6T-SRAM bit-cell configuration, as shown in Figure 1, we considered a stored weight scenario as ‘+1’ (i.e., Q = 1 and QB = 0), as depicted in Figure 5. When the WL is activated (i.e., input neuron is 1), the

b l

and

b l b

cell currents, affected by process variations, exhibit log-normal or normal distributions, characterized as

L N / N (μ_{b l}

,

σ_{b l}^{2})

and

L N / N (μ_{b l b}, σ_{b l b}^{2})

, respectively, as illustrated in Figure 5. Then, we can sequentially derive the following equations.

Δ_{(W_{i})} = \{\begin{matrix} Δ_{(+ 1)} = \frac{1}{I M} \times (i_{c e l l_b l_i} - i_{c e l l_b l b_i}) - 1 \\ Δ_{(- 1)} = \frac{1}{I M} \times (i_{c e l l_b l b_i} - i_{c e l l_b l_i}) + 1 \end{matrix}

(7)

From the distribution of

b l

and

b l b

cell currents obtained through the MC simulations and Equation (7), we derive the resulting weight distribution under process variations, as shown in Figure 6. This analysis demonstrates that, due to process variations in the given SRAM-based CIM configuration, binary weights (−1/+1) in the BNN are transformed into analog weights (

- 1 + Δ_{(- 1)}

/

+ 1 + Δ_{(+ 1)}

), with their distributions following log-normal or normal patterns.

3.2. Variation-Aware Framework for Bi-Polar Neural Networks

The discussion of Section 3.1 shows that with the effect of process variations, each weight stored in memory array experiences process variations, with the weight variation as

Δ_{(W_{i})}

. Then, the weight stored in each SRAM cell is not an exact digital value of +1 or −1 but can be redefined as

P o l a r i z e (W_{i}) = \{\begin{matrix} + 1 + Δ_{(+ 1)}, & when W_{i} = + 1 \\ - 1 + Δ_{(- 1)}, & when W_{i} = - 1 \end{matrix},

(8)

where

W_{i}

is a binarized weight, and

Δ_{(+ 1)}

and

Δ_{(- 1)}

are random stochastic parameters to express the effect of process variations, whose distributions are obtained from (7). Our training framework is described as Algorithm 1, where the function of (8) is exploited. In the variation-aware training, we train BNNs based on Algorithm 1 from scratch.

Furthermore, when

V_{W L}

exceeds a certain threshold, some SRAM cells experience flipping (both

i_{c e l l_b l_i}

and

i_{c e l l_b l b_i}

are flipped) due to process variations. To account for this, Algorithm 1 incorporates a flipping function that flips the binarized weights (output of the Sign() function) with a specified probability, determined by the number of instances where both

i_{c e l l_b l_i}

and

i_{c e l l_b l b_i}

are flipped.

Additionally, due to process variations in the CSA, the activation threshold of (1) varies. To address this, Algorithm 1 employs a stochastic activation function, as given by (9), instead of the deterministic activation function described in (1).

S t o Q u a n t i z e (X) = \{\begin{matrix} 1, & X \geq (t h r e s h + Δ_{a c t}) \\ 0, & X < (t h r e s h + Δ_{a c t}) \end{matrix}

(9)

with

Δ_{a c t} \sim N (0, s t d d e v),

(10)

where the standard deviation of

Δ_{a c t}

is properly assumed. When the training step is completed, only binarized weights are left for the inference and the SRAM-based CIM. However, during inference, the quantized weights must be flipped and polarized again to assess the impact of process variations.

In Algorithm 1, C is the cost function for minibatch,

λ

the learning rate decay factor, and L the number of layers. ∘ indicates element-wise multiplication. The function Sign() specifies how to binarize the weights. The StoFlip() function flips the binarized weights with a specified probability p, which is determined based on the number of cases where both

i_{c e l l_b l_i}

and

i_{c e l l_b l b_i}

are flipped (as described in step 4 of Figure 7). The Polarize() function (8) is used to polarize the binarized one. The activations are clipped to [0, 1] by the Clip() function. The function StoQuantize() (9) specifies how to binarize the variation-aware activations. BatchNorm() and BackBatchNorm() defines how to batch-normalize and back-propagate the activations, respectively. Update() specifies how to update the parameters when their gradients are known. Straight-through estimator (STE) is used for estimating gradients for (1), as in [5]. The Split() and Merge() functions are for the input splitting and merging step, as discussed in Section 2.3.2.

A r r a y S i z e

is the size of the SRAM array, which is set to 128/256/512.

Algorithm 1 Training a reconstructed L-layer BNN with variation-aware weights and activations.

Require: a minibatch of inputs and target

(a_{0}, a^{*})

, previous weights W, previous BatchNorm parameters (

γ

,

β

),

A r r a y S i z e

, weights initialization coefficients from [30]

α

, and previous learning rate

η

.
Ensure: updated weights

W^{t + 1}

, updated BatchNorm parameters (

γ^{t + 1}

,

β^{t + 1}

) and updated learning rate

η^{t + 1} .

1. Computing the parameters gradients:
1.1 Forward propagation:
for

k = 1

to L do
// Input size per array

I n p u t S i z e = K e r n e l \times K e r n e l \times I n p u t C h a n n e l s

// Number of groups

n G r o u p s = ⌈ I n p u t S i z e / A r r a y S i z e ⌉

while

I n p u t S i z e % n G r o u p s \neq 0

do

n G r o u p s = n G r o u p s + 1

end while
// Input splitting

a_{k - 1}^{b} \leftarrow S p l i t (a_{k - 1}^{b}, n G r o u p s)

W_{k} \leftarrow S p l i t (W_{k}, n G r o u p s)

for

i = 1

to nGroups do

W_{k}^{b} [i] \leftarrow S i g n (W_{k} [i])

W_{k}^{b} [i] \leftarrow S t o F l i p (W_{k}^{b} [i])

W_{k}^{b} [i] \leftarrow P o l a r i z e (W_{k}^{b} [i])

s_{k} [i] \leftarrow a_{k - 1}^{b} [i] W_{k}^{b} [i]

end for

a_{k} \leftarrow B a t c h N o r m (s_{k}, γ_{k}, β_{k})

if

k < L

then

a_{k} \leftarrow C l i p (a_{k}, 0, 1)

a_{k}^{b} \leftarrow S t o Q u a n t i z e (a_{k})

a_{k}^{b} \leftarrow M e r g e (a_{k}^{b}, n G r o u p s)

a_{k}^{b} \leftarrow B i n A c t (a_{k}^{b})

end if
end for
1.2 Backward propagation:
Compute

g_{a_{L}} = \frac{\partial C}{\partial a_{L}}

knowing

a_{L}

and

a^{*}

for

k = L

to 1 do
if

k < L

then

g_{a_{k}} \leftarrow g_{a_{k}^{b}} \circ 1_{0 \leq a_{k} \leq t h r e s h}

(STE)
end if

(g_{s_{k}}, g_{γ_{k}}, g_{β_{k}}) \leftarrow B a c k B a t c h N o r m (g_{a_{k}}, s_{k}, γ_{k}, β_{k})

for

i = 1

to nGroups do

g_{a_{k - 1}^{b} [i]} \leftarrow g_{s_{k} [i]} W_{k}^{b} [i]

g_{W_{k}^{b} [i]} \leftarrow g_{s_{k} [i]}^{T} a_{k - 1}^{b} [i]

end for
end for
2. Accumulating the parameters gradients:
for

k = 1

to L do

γ_{k}^{t + 1} \leftarrow U p d a t e (γ_{k}, η, g_{γ_{k}})

β_{k}^{t + 1} \leftarrow U p d a t e (β_{k}, η, g_{β_{k}})

for

i = 1

to nGroups do

W_{k}^{t + 1} [i] \leftarrow U p d a t e (W_{k} [i], α_{k} [i] η, g_{W_{k}^{b} [i]})

end for

η^{t + 1} \leftarrow λ η

end for

3.3. Optimization of Biasing Voltages

In this section, we present the optimization methodology for biasing voltages of WLs and BLs of SRAM, respectively, expressed as

V_{W L}

and

V_{B L}

, which is shown in Figure 7. This methodology provides steps to find the optimal biasing voltages that achieve the best balance between accuracy and power consumption. It is important to note that the biasing voltages of

V_{W L}

and

V_{B L}

are critical factors influencing power consumption in SRAM-based CIM. However, power consumption is not directly addressed within our variation-aware BNN framework as outlined in Algorithm 1. Instead, the optimization methodology seeks to balance power consumption and accuracy by tuning these biasing voltages, as described in this section. The process begins by setting an initial configuration of

V_{W L}

and

V_{B L}

and running MC circuit simulations of the SRAM cell, as depicted in steps 1 and 2 of Figure 7. During these simulations, if the number of flips of

i_{c e l l_b l}

or

i_{c e l l_b l b}

matches the total number of MC simulations, the

V_{W L}

and

V_{B L}

configuration is discarded to ensure reliable operation of SRAM-based CIM and accuracy for BNNs. Conversely, if this condition is not met, the mean and variance of the

i_{c e l l_b l}

and

i_{c e l l_b l b}

distributions, along with the number of instances where both

i_{c e l l_b l}

and

i_{c e l l_b l b}

are flipped (if any), are fed to the variation-aware BNN framework (Section 3.2), as outlined in Steps 3, 4, and 5 of Figure 7. Following the variation-aware training (step 6 in Figure 7) and variation-aware inferences (step 7 in Figure 7), we collected the average accuracy and compared it across different

V_{W L}

and

V_{B L}

configurations. This comparison aimed to identify the optimal configuration where accuracy remains acceptable while minimizing power consumption.

Upon determining the optimal biasing voltages for each BNN model, we obtain the corresponding trained weights, which are represented as either ‘−1’ or ‘+1’. These trained weights can be reused after the completion of the training process. However, for each new set of weights, the process of tuning the WL and BL voltages must be repeated to ensure optimal operation and accuracy. Therefore, for each BNN model with the corresponding optimal biasing voltages, we need to store the corresponding trained weights in the SRAM cells, with the stored weight scenarios as ‘+1’ (Q = 1 and QB = 0) and ‘−1’ (Q = 0 and QB = 1), as shown in Figure 5.

3.4. Modeling of IR Drop

The resistance of the power lines in an SRAM array causes IR drop, causing the drop of supply voltages. Such an effect is not considered in our experiments. However, we can easily model the drop effect by applying lower supply voltages in our MC simulations.

4. Validation of Our Framework

4.1. Experimental Setting

We assess the effectiveness of our proposed framework across various SRAM array sizes and different biasing voltages for WLs and BLs in a 6T-SRAM bit-cell configuration [9]. Notably, this framework can be extended to CIMs employing alternative memory technologies and SRAM bit-cell configurations. Using 10,000 MC simulations in 28 nm FD-SOI technology, we analyze the mean and variance of the

i_{c e l l_b l}

and

i_{c e l l_b l b}

distributions, as well as the occurrence of simultaneous flips in

i_{c e l l_b l}

and

i_{c e l l_b l b}

, as detailed in Table 8, for the stored weight scenario as ‘+1’ (i.e., Q = 1 and QB = 0), illustrated in Figure 5. These simulations were performed using HSPICE. The results, including the mean and variance of the cell current distributions and flip instances, are integrated into our variation-aware BNN framework. We further estimate the average inference accuracy of RESNET-18 and VGG-9 BNN models on the CIFAR-10 dataset, and the CONVNET BNN model on the MNIST dataset, both before and after applying the variation-aware training. The framework is implemented using the TensorFlow deep learning library.

During training, the loss is minimized using the Adam optimization algorithm from [31]. The initial learning rate is set to 0.01 for the VGG-9 and RESNET-18 BNN models, and 0.001 for the CONVNET BNN model. The maximum number of training epochs is set to 100 for CONVNET, 150 for VGG-9, and 300 for RESNET-18. The learning rate is decayed by a factor of 0.31 when the scalar statistics of validation accuracy show insufficient improvement. To account for process variation, variation-aware inference is performed 100 times on 10,000 validation images, and the resulting accuracies are averaged. In Equation (10), the standard deviation is assumed to be 10% of the threshold (

t h r e s h

) value defined in Equations (1) and (9) for both training and inference.

For the baseline, we do not consider the effect of process variation while the input splitting technique, mentioned in Section 2.3.2, is used. For the RESNET-18 BNN model, we assume that short-cuts have the full-precision data format. Many SOTA works [7] have employed such an approach since the accuracy of RESNETs is sensitive to the quantization errors of short-cuts, which is followed in this work.

4.2. Results and Discussion

The average inference accuracies before implementing variation-aware training for CONVNET, RESNET-18, and VGG-9 BNN models, as shown in Figure 8, Figure 9, and Figure 10, indicate that in SRAM-based analog CIMs, the inference accuracies for the MNIST and CIFAR-10 datasets are significantly degraded by process variations. Specifically, with an SRAM array size of

128 \times 128

, for a word-line voltage

(V_{W L})

of 0.9 V and a bit-line voltage

(V_{B L})

of 0.4 V, the accuracies drop below 61%, 20%, and 50% for CONVNET, RESNET-18, and VGG-9, respectively.

Our proposed variation-aware training framework effectively mitigates this degradation. Figure 11 illustrates the inference accuracies of CONVNET after applying the variation-aware training, where the effect of process variations is also considered during inference. The results demonstrate that our framework significantly improves accuracy under process variations. For instance, with an SRAM array size of

128 \times 128

, a word-line voltage

(V_{W L})

of 0.9 V, and a bit-line voltage

(V_{B L})

of 0.4 V, the accuracy is 60.24% under process variations of 28 nm FD-SOI (Figure 8). Our variation-aware training framework significantly improves the accuracy for this array size and voltage configuration to 92.33%. Among the various bit-line and word-line voltage configurations, an SRAM array size of

128 \times 128

with

V_{W L}

= 0.4 V and

V_{B L}

= 0.1 V emerges as the optimal configuration for CONVNET, maintaining an acceptable accuracy of 98.08% (Figure 11) compared to the baseline of 98.92%, while also minimizing power consumption.

Similar results are observed for the RESNET-18 and VGG-9 BNN models. Our variation-aware training framework supports significant accuracy improvement under process variations. For instance, with an SRAM array size of

128 \times 128

, as shown in Figure 10, the accuracies for VGG-9 are 76.82% and 45.23% for two biasing cases of

V_{W L}

= 0.4/

V_{B L}

= 0.1 and

V_{W L}

= 0.9/

V_{B L}

= 0.4, respectively. In Figure 12, the accuracies for these two cases improve to 85.22% and 78.22%, respectively, validating the efficacy of our variation-aware training framework. Among the various word-line and bit-line voltage configurations, SRAM array sizes of

128 \times 128

with

V_{W L}

= 0.7 V/

V_{B L}

= 0.4 V for RESNET-18 and

V_{W L}

= 0.6 V/

V_{B L}

= 0.4 V for VGG-9 have been identified as the optimal setups. These configurations achieve accuracies of 77.07% for RESNET-18 (Figure 13) and 86.47% for VGG-9 (Figure 12), which are close to the baseline accuracies of 78.87% and 87.24%, respectively. Additionally, these setups effectively minimize power consumption.

As illustrated in Figure 11, Figure 12 and Figure 13, the accuracy under process variations improves with increasing

V_{W L}

and array size for two main reasons. Firstly, as shown in Table 4, Table 5 and Table 6, the number of groups required to be split decreases with larger array sizes, resulting in higher accuracy, consistent with the trend observed in [26]. Secondly, higher

V_{W L}

results in larger cell currents, providing better immunity to process variations. However, when

V_{W L}

exceeds a certain threshold, the SRAM cell currents

i_{c e l l_b l}

or

i_{c e l l_b l b}

may flip due to a negative-read static noise margin [32], as marked with “Flipped” in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12. This does not ensure reliable operation of SRAM-based CIM and accuracy for BNNs. Considering this factor, we determined the optimal biasing points for

V_{W L}

and

V_{B L}

to be

V_{W L}

= 0.4 V/

V_{B L}

= 0.1 V,

V_{W L}

= 0.7 V/

V_{B L}

= 0.4 V, and

V_{W L}

= 0.6 V/

V_{B L}

= 0.4 V for the CONVNET, RESNET-18, and VGG-9 BNN models, respectively. With an SRAM array size of

128 \times 128

, the accuracies of CONVNET on the MNIST dataset, and RESNET-18 and VGG-9 on the CIFAR-10 dataset are 98.08%, 77.07%, and 86.47%, respectively.

The results of our experiments demonstrate that process variations significantly impact the inference accuracies of BNN models implemented in SRAM-based CIMs. For instance, prior to applying variation-aware training, the average inference accuracies for the CONVNET, RESNET-18, and VGG-9 BNN models on the MNIST and CIFAR-10 datasets were notably degraded, dropping to as low as 61%, 20%, and 50%, respectively, under voltage configurations of

V_{W L}

= 0.9 V and

V_{B L}

= 0.4 V in a

128 \times 128

SRAM array.

The implementation of our variation-aware training framework effectively mitigated these losses. For example, applying variation-aware training improved the accuracy of CONVNET from 60.24% to 92.33% under the same voltage configurations. Additionally, our analysis identified the optimal biasing points for

V_{W L}

and

V_{B L}

for each model, which not only improved accuracy but also minimized power consumption. The optimal configurations for CONVNET, RESNET-18, and VGG-9 were determined to be

V_{W L}

= 0.4 V/

V_{B L}

= 0.1 V,

V_{W L}

= 0.7 V/

V_{B L}

= 0.4 V, and

V_{W L}

= 0.6 V/

V_{B L}

= 0.4 V, respectively. Overall, our variation-aware training framework provides a robust solution to enhance the BNNs accuracy and energy efficiency of SRAM-based analog CIMs under process variations.

While our variation-aware framework effectively mitigates process variations arising from silicon (Si) manufacturing, we acknowledge that other sources of variation, such as aging, temperature fluctuations, and battery instability, may also impact the reliability of SRAM-based CIM architectures. These variations are particularly critical due to the sensitivity of WLs and BLs in analog computations. However, addressing these factors is beyond the scope of this work, and we leave this as future research to further enhance the robustness of CIM systems.

5. Conclusions

In this work, we address the challenge of process variation in SRAM-based BNN computation-in-memory (CIM) systems, aiming to balance accuracy and power consumption effectively. We developed mathematical models to capture the impact of process variations on analog computations in SRAM cells, validated through Monte Carlo simulations in 28 nm FD-SOI technology. Based on these models, we proposed a variation-aware BNN training framework that enhances accuracy despite process variations, as demonstrated through extensive simulations. Furthermore, we optimized the biasing of word lines and bit lines in SRAM to achieve an optimal trade-off between accuracy and power efficiency, making our approach particularly suitable for energy-efficient edge devices.

Author Contributions

Conceptualization, M.-S.L. and I.-J.C.; methodology, M.-S.L. and I.-J.C.; software, M.-S.L.; validation, M.-S.L.; formal analysis, M.-S.L.; investigation, M.-S.L. and T.-D.N.; writing—original draft preparation, M.-S.L., T.-N.P., T.-D.N. and I.-J.C.; writing—review and editing, M.-S.L., T.-N.P., T.-D.N. and I.-J.C.; visualization, M.-S.L.; supervision, I.-J.C.; project administration, I.-J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) under RS-2021-II210106 and RS-2020-II201294.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CIM	Computation in memory
DNNs	Deep neural networks
BA	Binary Activation
BNNs	Binary neural networks
ADCs	Analog-to-digital converters
CSA	Current sense amplifier
MAC	Multiply-and-accumulation
eNVM	Emerging non-volatile memories
MC	Monte Carlo
BN	Batch normalization
FC	Fully connected
BLs	Bit lines
WLs	Word lines

References

Wan, W.; Kubendran, R.; Schaefer, C.; Eryilmaz, S.B.; Zhang, W.; Wu, D.; Deiss, S.; Raina, P.; Qian, H.; Gao, B.; et al. A compute-in-memory chip based on resistive random-access memory. Nature 2022, 608, 504–512. [Google Scholar] [CrossRef] [PubMed]
Wen, T.-H.; Hsu, H.-H.; Khwa, W.-S.; Huang, W.-H.; Ke, Z.-E.; Chin, Y.-H.; Wen, H.-J.; Chang, Y.-C.; Hsu, W.-T.; Lo, C.-C.; et al. 34.8 A 22nm 16Mb Floating-Point ReRAM Compute-in-Memory Macro with 31.2TFLOPS/W for AI Edge Devices. In Proceedings of the 2024 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 18–22 February 2024; pp. 580–582. [Google Scholar] [CrossRef]
Antolini, A.; Lico, A.; Zavalloni, F.; Scarselli, E.F.; Gnudi, A.; Torres, M.L.; Canegallo, R.; Pasotti, M. A Readout Scheme for PCM-Based Analog In-Memory Computing With Drift Compensation Through Reference Conductance Tracking. IEEE Open J.-Solid-State Circuits Soc. 2024, 4, 69–82. [Google Scholar] [CrossRef]
Khaddam-Aljameh, R.; Stanisavljevic, M.; Mas, J.F.; Karunaratne, G.; Braendli, M.; Liu, F.; Singh, A.; Müller, S.M.; Egger, U.; Petropoulos, A.; et al. HERMES Core—A 14 nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing. In Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, 13–19 June 2021; pp. 1–2. [Google Scholar] [CrossRef]
Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1. arXiv 2016, arXiv:1602.02830. [Google Scholar]
Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. arXiv 2016, arXiv:1603.05279v4. [Google Scholar]
Kim, H.; Kim, K.; Kim, J.; Kim, J.-J. BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 30 April 2020; Available online: https://openreview.net/forum?id=r1x0lxrFPS (accessed on 25 September 2024).
Yin, S.; Jiang, Z.; Seo, J.-S.; Seok, M. XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks. IEEE J.-Solid-State Circuits 2020, 6, 1733–1743. [Google Scholar] [CrossRef]
Liu, R.; Peng, X.; Sun, X.; Khwa, W.-S.; Si, X.; Chen, J.-J.; Li, J.-F.; Chang, M.-F.; Yu, S. Parallelizing SRAM arrays with customized bit-cell for binary neural networks. In Proceedings of the 55th Annual Design Automation Conference (DAC), San Francisco, CA, USA, 24–28 June 2018. [Google Scholar] [CrossRef]
Kim, H.; Oh, H.; Kim, J.-J. Energy-efficient XNOR-free in-memory BNN accelerator with input distribution regularization. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD), Virtual Event. 2–5 November 2020. [Google Scholar] [CrossRef]
Choi, W.H.; Chiu, P.-F.; Ma, W.; Hemink, G.; Hoang, T.T.; Lueker-Boden, M.; Bandic, Z. An In-Flash Binary Neural Network Accelerator with SLC NAND Flash Array. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020. [Google Scholar] [CrossRef]
Angizi, S.; He, Z.; Awad, A.; Fan, D. MRIMA: An MRAM-Based In-Memory Accelerator. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. (Tcad) 2020, 5, 1123–1136. [Google Scholar] [CrossRef]
Saha, G.; Jiang, Z.; Parihar, S.; Xi, C.; Higman, J.; Karim, M.A.U. An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network. IEEE Access 2020, 8, 91405–91414. [Google Scholar] [CrossRef]
Kim, J.; Koo, J.; Kim, T.; Kim, Y.; Kim, H.; Yoo, S.; Kim, J.-J. Area-Efficient and Variation-Tolerant In-Memory BNN Computing using 6T SRAM Array. In Proceedings of the Symposium on VLSI Circuits, Kyoto, Japan, 9–14 June 2019. [Google Scholar] [CrossRef]
Oh, H.; Kim, H.; Ahn, D.; Park, J.; Kim, Y.; Lee, I.; Kim, J.-J. Energy-efficient charge sharing-based 8T2C SRAM in-memory accelerator for binary neural networks in 28nm CMOS. In Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC), Busan, Republic of Korea, 7–10 November 2021. [Google Scholar] [CrossRef]
Bhunia, S.; Mukhopadhyay, S.; Roy, K. Process Variations and Process-Tolerant Design. In Proceedings of the 20th International Conference on VLSI Design Held Jointly with 6th International Conference on Embedded Systems (VLSID’07), Bangalore, India, 6–10 January 2007. [Google Scholar] [CrossRef]
Yi, W.; Kim, Y.; Kim, J.-J. Effect of Device Variation on Mapping Binary Neural Network to Memristor Crossbar Array. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy, 25–29 March 2019. [Google Scholar] [CrossRef]
Laborieux, A.; Bocquet, M.; Hirtzlin, T.; Klein, J.-O.; Nowak, E.; Vianello, E.; Portal, J.-M.; Querlioz, D. Implementation of Ternary Weights With Resistive RAM Using a Single Sense Operation Per Synapse. IEEE Trans. Circuits Syst. Regul. Pap. 2021, 1, 138–147. [Google Scholar] [CrossRef]
Sun, X.; Peng, X.; Chen, P.-Y.; Liu, R.; Seo, J.-s.; Yu, S. Fully parallel RRAM synaptic array for implementing binary neural network with (+1, −1) weights and (+1, 0) neurons. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, Republic of Korea, 22–25 January 2018. [Google Scholar] [CrossRef]
Sun, X.; Yin, S.; Peng, X.; Liu, R.; Seo, J.-s.; Yu, S. XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018. [Google Scholar] [CrossRef]
Liu, B.; Li, H.; Chen, Y.; Li, X.; Wu, Q.; Huang, T. Vortex: Variation-aware training for memristor X-bar. In Proceedings of the 52nd Annual Design Automation Conference (DAC), San Francisco, CA, USA, 8–12 June 2015. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556v6. [Google Scholar]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Technical Report. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 25 September 2024).
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Kim, Y.; Kim, H.; Kim, J.-J. Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators. arXiv 2019, arXiv:1811.02187. [Google Scholar]
Sari, E.; Belbahri, M.; Nia, V.P. How Does Batch Normalization Help Binary Training? arXiv 2020, arXiv:1909.09139v3. [Google Scholar]
Kim, H.; Kim, Y.; Kim, J.-J. In-memory batch-normalization for resistive memory based binary neural network hardware. In Proceedings of the 24th Asia and South Pacific Design Automation Conference (ASPDAC), Tokyo, Japan, 21–24 January 2019. [Google Scholar] [CrossRef]
Chen, T.; Gielen, G.G.E. A 14-bit 200-MHz Current-Steering DAC With Switching-Sequence Post-Adjustment Calibration. IEEE J.-Solid-State Circuits 2007, 42, 2386–2394. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980v9. [Google Scholar]
Arandilla, C.D.C.; Alvarez, A.B.; Roque, C.R.K. Static Noise Margin of 6T SRAM Cell in 90-nm CMOS. In Proceedings of the UkSim 13th International Conference on Computer Modelling and Simulation (UKSIM), Cambridge, UK, 30 March–1 April 2011. [Google Scholar] [CrossRef]

Figure 1. 6T-SRAM-based CIM architecture for a BNN and truth table of input neurons and weights.

Figure 2. Sense amplifier circuit.

Figure 3. SRAM-based CIM mapping [26].

Figure 4. Batch normalization merging in inference phase.

Figure 5. Cell current distributions.

Figure 6. Weight distribution under process variations.

Figure 7. Biasing voltages of SRAM optimization methodology.

Figure 8. Average inference accuracy before variation-aware training of CONVNET BNN model on MNIST dataset at the FS corner and 85 °C.

Figure 9. Average inference accuracy before variation-aware training of RESNET-18 (full-precision shortcut) BNN model on CIFAR-10 dataset at the FS corner and 85 °C.

Figure 10. Average inference accuracy before variation-aware training of VGG-9 BNN model on CIFAR-10 dataset at the FS corner and 85 °C.

Figure 11. Average inference accuracy after variation-aware training of CONVNET BNN model on MNIST dataset at the FS corner and 85 °C.

Figure 12. Average inference accuracy after variation-aware training of VGG-9 BNN model on CIFAR-10 dataset at the FS corner and 85 °C.

Figure 13. Average inference accuracy after variation-aware training of RESNET-18 (full-precision shortcut) BNN model on CIFAR-10 dataset at the FS corner and 85 °C.

Table 1. SRAM bit cells from SOTA works.

	IEEE Access’20 [13]	VLSI’19 [14]	ASSCC’21 [15]	DAC’18 [9]
SRAM bit cell	10T + BEOL MOM cap	6T (split WL)	8T2C	6T
Technology	22 nm	28 nm	28 nm	65 nm

Table 2. Summary and comparison with previous works on SRAM-based BNN CIM.

	VLSI’19 [14]	ASSCC’21 [15]	This Work
Technique to mitigate process variations.	In-memory calibration by using some biasing rows.	Charge-based computation.	Software framework.
Advantages	Robust to deterministic noise.	Robust to random cell variations.	No hardware penalty.
Disadvantages	Difficulty in coping with random cell variations. The area and power overhead of the biasing rows.	Large cell area compared to 6T-SRAM cell.	Many training processes.

Table 3. Comparison of two binary activation cases: (+1/−1) and (1/0).

Network	Dataset	Full Precision	BNN		Split BNN
			BNN		128		256		512
			(+1/−1)	(1/0)	(+1/−1)	(1/0)	(+1/−1)	(1/0)	(+1/−1)	(1/0)
CONVNET [25]	MNIST [25]	99.43	99.29	99.33	98.85	98.92	98.89	99.17	99.13	99.22
RESNET-18 [22]	CIFAR-10 [24]	91.17	82.82	83.06	67.68	78.87	78.30	81.02	78.23	81.85
VGG-9 [23]	CIFAR-10 [24]	93.71	89.77	91.36	86.94	87.24	87.63	88.58	88.35	88.79

Table 4. Number of split groups for CONVNET BNN model on MNIST dataset.

Layer	Input Count per Output	Array Size
Layer	Input Count per Output	128	256	512
1	3 × 3 × 1	-	-	-
2	3 × 3 × 32	3	2	1
3	3 × 3 × 32	3	2	1
4	3 × 3 × 32	3	2	1
5	1568	14	7	4
6	512	-	-	-

Table 5. Number of split groups for RESNET-18 BNN model on CIFAR-10 dataset.

Layer	Input Count per Output	Array Size
Layer	Input Count per Output	128	256	512
1	3 × 3 × 3	-	-	-
2→7	3 × 3 × 16	2	1	1
8→13	3 × 3 × 32	3	2	1
14→19	3 × 3 × 64	6	3	2
20	64	-	-	-

Table 6. Number of split groups for VGG-9 BNN model on CIFAR-10 dataset [26].

Layer	Input Count per Output	Array Size
Layer	Input Count per Output	128	256	512
1	3 × 3 × 3	-	-	-
2	3 × 3 × 128	9	6	3
3	3 × 3 × 128	9	6	3
4	3 × 3 × 256	18	9	6
5	3 × 3 × 256	18	9	6
6	3 × 3 × 512	36	18	9
7	8192	64	32	16
8	1024	8	4	2
9	1024	-	-	-

Table 7. Software to hardware conversion of

B n B i n A c t ()

.

Table 7. Software to hardware conversion of

B n B i n A c t ()

.

Software Implementation	Hardware Implementation
$B n B i n A c t (X) = \{\begin{matrix} 1, & X \geq X_{t h} \\ 0, & X < X_{t h} \end{matrix}$	$O u t p u t (Y) = \{\begin{matrix} 1, & Y \geq X_{t h} \times I M \\ 0, & Y < X_{t h} \times I M \end{matrix}$
	where $Y = I_{B L} - I_{B L B}$

Table 8. Number of instances where both

i_{c e l l_b l}

and

i_{c e l l_b l b}

are flipped at the FS corner and 85 °C. Here, the term “Flipped” indicates that the number of flips of

i_{c e l l_b l}

or

i_{c e l l_b l b}

matches the total number of MC simulations. Consequently, the

V_{W L}

and

V_{B L}

configurations are discarded to ensure the reliable operation of SRAM-based CIM and the accuracy of BNNs.

Table 8. Number of instances where both

i_{c e l l_b l}

and

i_{c e l l_b l b}

are flipped at the FS corner and 85 °C. Here, the term “Flipped” indicates that the number of flips of

i_{c e l l_b l}

or

i_{c e l l_b l b}

matches the total number of MC simulations. Consequently, the

V_{W L}

and

V_{B L}

configurations are discarded to ensure the reliable operation of SRAM-based CIM and the accuracy of BNNs.

	0.1	0.2	0.3	0.4
$V_{WL}$	0.1	0.2	0.3	0.4
$0.4$	0	0	0	0
$0.5$	0	0	0	0
$0.6$	138	0	0	0
$0.7$	Flipped	311	0	0
$0.8$	Flipped	Flipped	514	0
$0.9$	Flipped	Flipped	Flipped	807

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Le, M.-S.; Pham, T.-N.; Nguyen, T.-D.; Chang, I.-J. A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations. Electronics 2024, 13, 3847. https://doi.org/10.3390/electronics13193847

AMA Style

Le M-S, Pham T-N, Nguyen T-D, Chang I-J. A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations. Electronics. 2024; 13(19):3847. https://doi.org/10.3390/electronics13193847

Chicago/Turabian Style

Le, Minh-Son, Thi-Nhan Pham, Thanh-Dat Nguyen, and Ik-Joon Chang. 2024. "A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations" Electronics 13, no. 19: 3847. https://doi.org/10.3390/electronics13193847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations

Abstract

1. Introduction

2. Preliminaries

2.1. Binary Neural Network

2.2. The Architecture of SRAM-Based CIM

2.3. Mapping DNNs onto SRAM-Based CIM Arrays

2.3.1. Input Splitting

2.3.2. Mapping

2.4. In-Memory Batch Normalization

3. A Variation-Aware Binary Neural Network Framework

3.1. Variation-Aware Models for SRAM-Based BNN CIM

3.2. Variation-Aware Framework for Bi-Polar Neural Networks

3.3. Optimization of Biasing Voltages

3.4. Modeling of IR Drop

4. Validation of Our Framework

4.1. Experimental Setting

4.2. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI