TA-Quatro: Soft Error-Resilient and Power-Efficient SRAM Cell for ADC-Less Binary Weight and Ternary Activation In-Memory Computing

Nguyen, Thanh-Dat; Le, Minh-Son; Pham, Thi-Nhan; Chang, Ik-Joon

doi:10.3390/electronics13152904

Open AccessArticle

TA-Quatro: Soft Error-Resilient and Power-Efficient SRAM Cell for ADC-Less Binary Weight and Ternary Activation In-Memory Computing

Department of Electronic Engineering, Kyung Hee University, Yongin-si 17104, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(15), 2904; https://doi.org/10.3390/electronics13152904

Submission received: 6 June 2024 / Revised: 18 July 2024 / Accepted: 19 July 2024 / Published: 23 July 2024

(This article belongs to the Special Issue Analog Circuits and Analog Computing)

Download

Browse Figures

Versions Notes

Abstract

:

Some applications, such as satellites, require ultralow power and high-radiation resilience. We developed a12Tsoft error-resilient SRAM cell, TA-Quatro, to deliver in-memory computing (IMC) for those applications. Based on our TA-Quatro cell, we implemented an IMC circuit to support binary weights and ternary activations in a single SRAM cell. Our simulation under 28 nm FD-SOI technology demonstrates that the TA-Quatro IMC circuit maintains good IMC stability at a scaled supply of 0.7Vand achieves ternary activation without needing analog-to-digital converters. These advancements significantly enhance the power efficiency of the proposed IMC circuit compared to state-of-the-art works.

Keywords:

in-memory computing; ternary activation and binary weight network; deep neural network (DNN); radiation-hardened SRAM

1. Introduction

A CMOS image sensor (CIS) used in satellites requires extreme power efficiency due to the battery-operated environment. We can improve the power efficiency of the CIS by employing an always-on-block (AOB) unit, which is a tiny processing unit with low power dissipation. The AOB unit continues to operate, even in standby mode. When the AOB unit detects a targeted object, it wakes the main processing unit of the CIS. It is known that such a scheme significantly enhances power efficiency. By exploiting deep neural networks (DNNs) in the AOB unit, it is possible to decently improve the detection accuracy, which is challenging due to the considerable power dissipation required to process DNNs. We can overcome the challenge by operating tiny DNNs with binary weights on an SRAM-based analog in-memory computing (IMC) circuit: the so-called analog SRAM IMC (ASI) circuit. Many state-of-the-art (SOTA) works [1,2,3,4] have shown that such a system delivers very high power efficiency in processing DNNs. One may be concerned about the ASI circuit’s drawback: process, voltage, and temperature (PVT) variations degrade its computing accuracy. In the AOB unit of the CIS, we can compensate for errors due to analog computing to a certain degree by repeating the object detection several times.

However, our target application is an AOB unit for a CIS on satellites operating in severe radiation environments. Under this circumstance, radiation-induced soft errors significantly threaten the reliability of the SRAM [5,6]. Conventionally, soft errors in the SRAM are relaxed using error correction codes (ECCs). However, applying conventional ECC techniques to ASI circuits is difficult, since they do not individually read words stored in the SRAM. When we operate ASIs in severe radiation environments, soft errors will be accumulated in the SRAM. As a result, the accuracy of the ASI circuit can be degraded.

We can use scrubbing techniques [7,8] to handle the problem. However, frequent scrubbing accompanies considerable performance and power penalties. Repeated processing, as mentioned above, may not alleviate the issue, since the effect of soft errors continuously exists during repeated processing.

We conquer the problem by proposing a 12T soft-error resilient SRAM cell named TA-Quatro. This SRAM cell configuration provides cell-level ternary activation, thus efficiently delivering IMC operations of binary weight networks (BWNs) with ternary activation. Several works have shown that the BWNs with ternary activation show good classification accuracy despite their extremely low precision [1,9]. DNNs with such a low precision achieve ultralow power processing, which is critical for our target applications.

Our further contributions can be summarized as follows:

We develop a TA-Quatro IMC circuit. Recently, TAIM [1] implemented a cell-level ternary activation IMC using a 6T SRAM. Although the cell area of our TA-Quatro is larger than the 6T SRAM used in TAIM, our TA-Quatro IMC circuit delivers several advantages in addition to the soft error resilience over TAIM, which is discussed in this work.
We aggressively scale down the supply power (VDD) of our TA-Quatro IMC circuit to 0.7V, thus significantly enhancing the power efficiency. In the SRAM-based IMC architecture, where multiple wordlines are simultaneously activated, cell-to-cell interference causes data flipping, thus making it challenging to lower the VDD. In low-voltage operations, the variability of analog computing is also problematic. Our TA-Quatro IMC circuit manages these problems efficiently.
Our TA-Quatro IMC circuit achieves ternary activation outputs without analog-to-digital converters (ADCs). The ADC-less ternary activation output can be obtained due to the differential-end computing architecture of our TA-Quatro IMC circuit.

The remaining part of this paper is organized as follows. The motivation of our work and SOTA works are mentioned in Section 2. Section 3 describes the proposed TA-Quatro architecture. The power efficiency of our circuit is evaluated in Section 4. Then, we conclude our work in Section 5.

2. Motivation and Overview

2.1. Soft Error-Resilient SRAM Cells

Many soft error-resilient SRAM cells have been devised [10,11,12,13]. The 12T DICE [13] and 10T Quatro [11], as shown in Figure 1, are the most widely known ones. Some works have proven their good soft error resilience through actual tests. These two SRAM cells have dual-interlocked cell structures, where each memory node becomes the gate input of N-Fets and P-Fets to drive different memory nodes. Hence, single-event upset (SEU) [5,6,14] does not directly cause cell data flipping. For example, as shown in Figure 1a, The 10T SRAM Quatro, with the initial state (A = D = 1, B = C = 0), considers a scenario where a radiation particle impacts node “A”, thus temporarily reducing its potential and changing A from 1 to 0. This may weaken the pull down of N2 and N4. However, the voltages at nodes “B” and “C” remain largely unaffected (still 0), as this disturbance does not impact P1 and P4. Consequently, the voltage at “A” will return to its supply level after the radiation particle strike. As a result, the original cell data are preserved.

In scaled technologies, the cell layout of the SRAM should be the thin-cell type to acquire more aggressive design rules, known as so-called push rules [14,15]. We must draw poly and active layers in the thin-cell-type layout with only horizontal and vertical directions. The dual-locked structures of 12T DICE and 10T Quatro require complex wiring of the cell transistors, thus making it difficult to implement their thin-cell-type layouts efficiently. The efficient thin-cell-type layout of 12T DICE has yet to be reported. The authors of [10] developed the thin-cell-type layout of a 10T Quatro and a 12T we-Quatro. The reason why they present a 12T we-Quatro is that the 10T Quatro suffers from weak writability. The writing of the 6T SRAM is triggered by pulling down a memory node. However, in the 10T Quatro, the pull up of an N-FET access transistor leads to the writing, thus causing weak writability [10]. The 12T we-Quatro overcame this by using four access transistors. Interestingly, the thin-cell-type layout of the 12T we-Quatro is the same as that of the 10T Quatro.

We present a TA (Ternary Activation)-Quatro, a variation of the 12T we-Quatro (Figure 1b), which has a single wordline (WL) and bitline (BL). Our TA-Quatro has two WLs and four BLs since we split the WL and BL of the 12T we-Quatro, which incurs cell area overhead to a certain degree. However, the split of the WL and BL provides cell-level ternary activation inputs in the IMC circuit. Many works have shown that BWNs with ternary activation achieve more accuracy than binary neural networks with binary weight and activation. One may implement ternary activation inputs by using the techniques to support multibit activations such as serial coding or pulse width modulation [16,17,18]. However, such techniques incur considerable power and performance penalties. The cell-level ternary activation allows us to build the BWN IMC circuit with ternary activations efficiently.

2.2. Ternary Activation IMC Based on SRAM Cells

Recently, several techniques to support a cell-level ternary activation IMC have been introduced, including a 12T XNOR-SRAM cell [2] and a 6T-based TAIM [1], as illustrated in Figure 2a,b, respectively. The XNOR-SRAM cell incurs a considerable cell area penalty, thus limiting its practical applications. Furthermore, during IMC operations, this cell consumes the power even when the input is ‘0’, thus resulting in large power dissipation.

TAIM achieves cell-level ternary activation by slightly modifying the conventional 6T SRAM. Unlike the conventional 6T SRAM, the TAIM SRAM cell has two WLs, a WLL, and a WLR in Figure 2b. The scheme supports the ternary activation with ‘−1’ (WLL is high and WLR is low), ‘0’ (WLL is low and WLR is low), and ‘1’ (WLL is low and WLR is high). In the BWN, we have two weights: ‘−1’ (Q = ‘0’) and ‘1’ (Q = ‘1’). The product of the weight and activation becomes the current to flow through two access transistors. The analog accumulation of the currents becomes the bitwise multiplication and accumulation (MAC) output.

TAIM delivers very high area efficiency. However, the 6T SRAM cell experiences a high soft error rate in a radiation-harsh environment such as space, which is our target environment, as mentioned above. Furthermore, cell-to-cell interference is problematic under the process variations of scaled technologies, thereby potentially causing cell data flipping. To address this, TAIM uses a scaled voltage for WLL and WLR while maintaining the cell VDD high. Under this circumstance, the analog MAC outputs are significantly affected by process variations due to the low gate voltage of the access transistors.

Note that our target is to develop an IMC technique for the AOB, which continuously turns on even in standby mode. In such an operating scenario, the leakage power of the SRAM IMC circuit is a critical problem. As has been aforementioned, it is difficult to scale down the supply voltage in the TAIM configuration due to cell-to-cell interference, thus resulting in significant leakage current. Our TA-Quatro IMC circuit addresses these problems efficiently, as discussed in Section 4.2.2.

3. Our TA-Quatro IMC Operation

Figure 3a describes our TA-Quatro cell configuration with the BL switching schematic of Figure 3b. The circuit of Figure 3b is shared by a column whose area overhead is insignificant. The thin-cell-type layout of our TA-Quatro under the logic rules of a 28 nm FD-SOI technology is shown in Figure 3d. Figure 3c defines the mapping between the stored weight and the memory node status in the proposed cell, which is referred to throughout this work. Table 1 summarizes the biasing conditions of our design during three operations: write, read, and IMC.

3.1. Write and Read Operations

Under the biasing of Table 1, we can explain the writing mechanism as follows. Firstly, let us consider the case that the data initially stored in our TA-Quatro cell are ‘+1’, where A = D = 0 and B = C = 1. To write ‘−1’, where A = D = 1 and B = C = 0, two split wordlines (WL1 and WL2) are activated, thus turning on N5, N6, N7, and N8. In the BL switching circuit (Figure 3b), only SW5 and SW6 are ON; BL1 and BL4 are applied to the VDD. Meanwhile, BL2 and BL3 are grounded. The BL1 and BL2 biasing pull up A while pulling down C. These pull down and pull up operations turn on P1, N2, and N3 and turn off N2, N3, and P1, thus leading to B = 0 and D = 1.

Table 1 also shows the biasing of the read operation, where only WL1 is activated. In the BL switching circuit (Figure 3b), SW1, SW3, SW5, and SW6 are OFF. Meanwhile, SW2 and SW4 are ON; then, the stored value in the SRAM cell is sensed at the end of BL_LEFT and BL_RIGHT by using a current sense amplifier (CSA) [19].

3.2. In-Memory Computing Operation

Figure 4 illustrates our ADC-less TA-Quatro architecture for a ternary activation IMC, where the TA-Quatro array directly computes the MAC operations in an analog fashion. The table in Figure 4 summarizes how our TA-Quatro cell executes the analog multiplication between the weight and activation. The input neuron value of ‘−1’, ‘0’, or ‘+1’ corresponds to the logic level of the two split wordlines (WL1 and WL2) as (0, 1), (0, 0), or (1, 0), respectively. Meanwhile, the binary weight value ‘−1’ is represented by the state of four nodes—A, B, C, and D—as A = D = VDD, B = C = GND, or vice versa for the corresponding ‘+1’. Then, the multiplication result is expressed by ‘

i_{m u l t}

’ as the difference of two bitline currents (i.e.,

i_{m u l t}

=

i B L_{L E F T} - i B L_{R I G H T}

). For instance, when the input neuron value is ‘+1’ (A = D = GND and B = C = VDD), and the weight value is ‘+1’ (i.e., WL1 and WL2 are (1, 0), respectively), BL_LEFT is pulled down, and BL_RIGHT is pulled up. Therefore, the multiplication result is represented by

i_{m u l t}

=

I_{P D}

−

I_{P U}

. When the input is 0, it means that WL1 and WL2 are inactivated, so there is no current through the bitlines, thus representing a multiplication result of 0. Other cases of input and weight values are expressed in the table of Figure 4. In the array level, when all the WLs are activated at the same time, the ‘

i B L_{L E F T}

’ and ‘

i B L_{R I G H T}

’ of each cell on the column are accumulated through the column-shared BL_LEFT and BL_RIGHT, respectively. Under this circumstance, the MAC output is represented by the difference of the accumulated currents, which is described as ‘

I_{L E F T}

’ and ‘

I_{R I G H T}

’ in Figure 4. Such a differential-end IMC operation delivers ADC-less ternary activation outputs, as further discussed.

In the algorithm of the BWN with ternary activation, the MAC outputs are fed to their corresponding batch normalization (BN) layers. Many works have shown that removing BN layers results in a considerable accuracy drop in the DNNs with low-bit precision, such as the BWN [20,21,22]. Hence, we consider the scenario in which the MAC outputs computed in the analog domain are applied to their corresponding BN layers. Please note that the ternary activations are the outputs of an activation function with positive and negative thresholds. Consequently, even in the ternary activation scenario, the MAC outputs, the inputs of the BN layers, should have multibit precision. This is the reason why IMC circuits with ternary activations need ADCs to produce multibit digital outputs.

Some works have shown that the BN layers can be implemented as in-array analog computations [20,23]. Under the analog in-array BN scenario, we can remove ADCs by merging the effects of BN and the thresholds of a ternary activation function. In Figure 4, the merging is implemented in-array analog computations with “Thresholds”. We use our TA-Quatro SRAM cells to store the information related to the “Thresholds”, as referred to in [20]. Such an effort enables us to obtain our ADC-less ternary activations.

Unlike our TA-Quatro IMC circuit, TAIM [1] has ADCs. We conclude that it is difficult to eliminate the ADCs in TAIM. We need a negative threshold to deliver the ternary activation with ‘−1’, ‘0’, and ‘1’, as aforementioned. However, TAIM has single-end IMC operations, since two BLs of 6T SRAM are electrically connected to accumulate the currents flowing through the two BLs. Supporting the negative threshold as an in-array analog computation is complicated in such a single-end IMC circuit. The other reason the ADC is necessary for analog IMC circuits is to generate partial sums. We can operate large-size networks on IMC circuits by splitting their MAC computations fit for the memory array size used in the IMC circuits. In such a situation, the split MAC results computed by the analog IMC circuits, called partial sums in TAIM, should be accumulated. The split MAC results should be converted to multibit digital signals to execute such an accumulation in the digital domain, where we need ADCs. However, a software technique, known as knowledge distillation (KD), addresses the above issue, as shown in Figure 5. The KD technique generates a student neural network whose MAC sizes fit the memory array with nearly the same accuracy as the teacher network with large MAC sizes, which has been already proven in many works [24,25]. Such a scenario is assumed in this work. So, we do not need ADCs to generate partial sums.

To conclude, our TA-Quatro circuit provides ADC-less ternary activations, since we have a differential-end IMC architecture. It is not easy to efficiently implement ADC-less ternary activation with single-end IMC architecture, such as TAIM. Instead of ADCs, we are using current sense amplifiers (CSAs) [19]. The CSAs produce the binary output of ‘0’ or ‘1’, while the output of the binary weight and ternary activation IMC is ternary, as shown in Figure 4. Therefore, in our TA-Quatro, we need twice-CSA operations with two sensing cycles. During the first cycle of operation, the “Threshold Controller” will activate the word lines (WLs) associated with “Threshold 1”. Subsequently, during the second-cycle operation, it will activate word lines corresponding to “Threshold 2”. These operations yield 2-bit binary outputs of ‘00’, ‘01’, or ‘11’, thereby representing ‘−1’, ‘0’, or ‘1’, respectively. These values serve as the input activation for the next layer. The power efficiency of using two-cycle sensing instead of the ADC is discussed in Section 4.2.3.

4. Evaluation

To thoroughly evaluate the circuit, we undertook a comprehensive post-layout simulation process. This began with creating the circuit layout from the schematic using the 28 nm FD-SOI technology PDK (Process Design Kit) on the Virtuoso software version 6.17. Once the layout was complete, we extracted the netlist, which served as the foundation for our simulations. Utilizing HSPICE simulation tools, we performed detailed simulations to analyze the performance of the circuit. The results of these simulations, including key performance metrics, are presented in this section. This approach ensured a robust evaluation of the operation and reliability of the circuit.

4.1. The Effects of Cell-to-Cell Interference in Our TA-Quatro IMC Circuit

As mentioned in Section 2.2, cell-to-cell interference in SRAM-based IMCs to employ multiple-WL activation potentially causes cell data flipping under process variations of scaled technologies. We compared the resilience to cell-to-cell interference between TAIM and our TA-Quatro IMC. We executed 1000 Monte Carlo simulations of two designs at their worst scenarios, as shown in Figure 6. In TAIM, the worst scenario is that all WLLs are activated, and the bottom cell has ‘+1’, while the others on the same column store ‘−1’. Then, the BL is quickly discharged to 0 V, thus pulling down ‘Q’ of the bottom cell. The worst scenario of our TA-Quatro IMC is similar to that of TAIM, as shown in Figure 6b. We assumed 128 cells per column, 100 °C, and the worst process corner (FS) concerning cell-to-cell interference for the simulations. We considered various voltage pairs of cell supply voltage and WL voltage, which are expressed as (VDD, VWL).

The simulation results show that the number of flipped cells at the low supply voltage of TA-Quatro is much less than that of the TAIM design, as depicted in Figure 7. For instance, TA-Quatro had no errors when the VDD was 0.7 (V) and the VWL was 0.5 (V), whereas there were 946/1000 flipped cells in the TAIM design with the same simulation constraints. The lowering of the VWL reduces cell-to-cell interference, thus delivering a more aggressive scaling of the VDD. However, in the scaled VDD, parametric variations significantly affected the DNN accuracy. The soft error resilience was also worse. Similarly to the measurement results in previous work [26], The 12T we-Quatro design demonstrated strong resilience to soft errors, even at a low voltage of 0.5 V in a 28 nm FD-SOI technology. Our TA-Quatro has a similar cell structure to the 12T we-Quatro, thus implying that our TA-Quatro has good soft error resilience at 0.7 V. Therefore, we consider (0.7 V, 0.5 V) to be the optimal condition of our TA-Quatro IMC. Further, our study shows that the (0.7 V, 0.5 V) pair of our TA-Quatro IMC enables almost comparable process variation tolerance to the (1 V, 0.45 V) pair of TAIM, as discussed in Section 4.2.

4.2. Power Efficiency of the Proposed TA-Quatro

4.2.1. The Optimization of Supply Voltage

The dependence of the BL current variability on the VDD is plotted in Figure 8, which shows that the variability of the proposed TA-Quatro was smaller than that of TAIM, even when the VDD of TAIM was 1 V and the VDD of TA-Quatro was 0.7 V. Specifically, as shown in Figure 8d,e, to ensure no data flipping, we investigated the BL current variability of TA-Quatro at the pair of supply voltage and WL voltage (VDD, VWL) as (0.7 V, 0.5 V) and that of TAIM at (1 V, 0.45 V). The variability (

\frac{σ}{μ}

) in the BL current of TA-Quatro was 0.24. On the other hand, the BL current variability of TAIM was 0.28. This shows that our design delivers better variation resilience, even at the lower cell VDD, hence delivering very high power efficiency in processing the IMC (e.g., 3.78 (μW) in the TT corner, which is 16.2% less than TAIM that operates in the VDD as 1 V).

4.2.2. Leakage Power Consumption

As was aforementioned, our target is to develop an IMC technique for the AOB, which continuously turns on even in standby mode. Therefore, leakage power consumption significantly affects the power efficiency of the IMC circuit. Please note that lowering the power supply (VDD) in TAIM requires reducing the VWL (wordlines voltage) to minimize cell-to-cell interference, as we mentioned in Section 4.1. However, reducing the VWL affects the DNN accuracy, as process variations significantly impact the analog MAC outputs due to the low gate voltage of the access transistors, as discussed in Section 4.2. Consequently, we analyzed the power leakage consumption in comparison with TA-Quatro at two optimal VDD levels for IMC operation: 1 V for TAIM and 0.7 V for TA-Quatro. We investigated the leakage power consumption of TAIM and the proposed TA-Quatro at various process corners. Figure 9 shows that our TA-Quatro at VDD = 0.7 V had better leakage power consumption than TAIM at VDD = 1 V (18.9% better than TAIM in the TT process corner). The leakage current in transistors is exponentially dependent on the threshold voltage (

V_{t}

) [14]. A higher

V_{t}

reduces the leakage current in transistors when they are in the off state. Since transistors in the slow–slow (SS) corner have higher Vt values compared to other corners, such as typical (TT) or fast (FF) corners, the SS corner yields the lowest leakage power. Even in the SS corner, the TA-Quatro at VDD = 0.7 V demonstrated better leakage power consumption than TAIM at VDD = 1 V, thus showing a 14.3% improvement over TAIM in the SS process corner. TA-Quatro exhibited lower leakage power consumption compared to TAIM, despite featuring a higher number of transistors, thanks to its adept operation at lower voltage supply.

4.2.3. Power Consumption Comparison

In this subsection, we compare the power efficiency of our TA-Quatro IMC and TAIM [1]. We considered three scenarios: TA-Quatro IMC with ADCs, ADC-less TA-Quatro IMC with two-cycle sensing, and TAIM technique with ADCs. The array size used in the three scenarios was 256 × 128. As discussed in Section 4.2, we considered the (VDD, VWL) pair of (0.7 V, 0.5 V) as the optimal condition of our TA-Quatro IMC, and that of TAIM was (1 V, 0.45 V).

According to Table 2, our TA-Quatro with ADCs obtained a power efficiency of 1298.6 TOPS/W, which is higher than TAIM (1244.7 TOPS/W). When our TA-Quatro IMC used two-cycle sensing instead of ADCs, even though the operating time was more than twice that of other designs using ADCs, the power efficiency was significantly enhanced (41.02% higher than TAIM and 35.17% higher than TA-Quatro IMC with ADCs).

The results of Table 2 were simulated in a 28 nm FD-SOI technology, where only the cell array and ADC power were considered. We suitably obtained the ADC power model from the breakdown of other works [1].

4.3. Comparison with Recent Works on Ternary Activation and Binary Weight for In-Memory SRAM Computing and Discussion

The effectiveness of the TA-Quatro IMC circuit and the impact of the process variations were evaluated using the variation-aware inference technique from [27]. The procedure for evaluating the accuracies of the TA-Quatro IMC Circuit on the MNIST and CIFAR-10 datasets is systematically detailed in Figure 10. The evaluation process comprises the following steps:

Step 1: Hspice simulation of TA-Quatro SRAM Cell: Firstly, we perform a circuit simulation of the TA-Quatro SRAM cell, thus incorporating process variation analysis through 1000 Monte Carlo simulations. These simulations utilize statistical models derived from 28 nm FDSOI technology.
Step 2: Obtaining statistical parameters from Step 1 results: The simulation results indicate that the currents of the TA-Quatro SRAM cells (‘ $i B L_{L E F T}$ ’ and ‘ $i B L_{R I G H T}$ ’) follow normal distributions, as shown in Figure 8d. The mean ( $μ$ ) and standard deviation ( $σ$ ) values of these distributions are extracted for further steps.
Step 3: Input parameters from Step 2 for the variation-aware BTN framework: The obtained mean and standard deviation values are used as input parameters in a variation-aware inference framework for a binary weights and ternary activations neural network (BTN), as outlined in the forward propagation process in Algorithm 1 of the technique presented in [27]. Notably, we employ a ternary activation function instead of the binary activation function employed in [27]. Within this BTN framework, deep neural networks (DNNs) such as CONVNET and VGG-9 are mapped onto TA-Quatro SRAM-based IMC arrays, which are configured in a 256 × 128 array size.
Step 4: Conduct variation-aware inferences: This step involves conducting variation-aware inferences to determine the Top-1 accuracy of the models. The inference process is repeated 100 times to enhance the reliability of the evaluation process. Each time, the Top 1 Accuracy metric is evaluated across 50,000 validation images on VGG-9 with the CIFAR-10 dataset and 10,000 validation images on CONVNET with the MNIST dataset.
Step 5: Averaging the results: The Top-1 accuracy results from the 100 inference times are averaged to obtain a final accuracy value, thus ensuring a reliable and comprehensive evaluation.

Figure 10. Evaluation procedure for the accuracy of the TA-Quatro IMC circuit on MNIST and CIFAR-10 datasets.

The methodologies employed for mapping DNNs onto TA-Quatro SRAM-based IMC arrays and the detailed implementation of the variation-aware BTN framework have been elaborated in the referenced techniques [27]. This structured approach thoroughly and accurately evaluates our TA-Quatro IMC 256 × 128 array architecture on the MNIST and CIFAR-10 datasets.

Table 3 summarizes the comparison with recent works on binary weight and ternary activation IMC architectures based on SRAM arrays. This table shows that our TA-Quatro IMC circuit achieved classification accuracies of 99.42% and 88.5% on the MNIST and CIFAR-10 datasets, respectively. These accuracies are equivalent to those achieved by the TAIM [1] and XNOR SRAM [2] designs. Both of the works in [1,2] need analog-to-digital converters (ADCs) and high supply voltage (VDD) to keep the classification accuracy acceptable, which costs more energy consumption and complexity in hardware design. In addition, as mentioned in Section 4.1, our TA-Quatro has strong resilience to soft errors, even at a low voltage of 0.7 V in a 28 nm FD-SOI technology. Our design, using conventional sense amplifiers instead of ADCs and working at a low supply voltage (0.7 V), showed a much better energy efficiency (1755.3 TOPS/W) with acceptable classification accuracies compared to other works.

However, as shown in Table 3, our design needed the 12T-SRAM in the cell configuring, which remains disadvantageous in terms of area for deployment in large-scale applications. This will be a driving force for our improvements in the future. In addition, our work is firmly grounded in reliable simulation results, thus serving as a robust foundation for our findings. However, to bolster the validity of our work and ensure a comprehensive evaluation, it is necessary to undertake the development and testing of an actual chip in future research. Creating a physical chip will enable a thorough assessment of the system under real-world conditions, thus addressing factors that simulations may not fully encapsulate, such as manufacturing variations and environmental influences. A critical consideration for our future chip design is to ensure that our proposed Ta-Quatro can operate effectively with a low-voltage supply. It is essential to maintain a stable voltage because digital switching can produce spikes that may disrupt normal operation. To address this concern, we plan to implement proven techniques, such as those outlined in [28,29,30], to stabilize the voltage and enhance system reliability. This approach will ensure that our design performs well in simulations and maintains reliability and efficiency in practical applications.

5. Conclusions

In this paper, we have presented a soft error-resilient and power-efficient 12T-SRAM cell, named TA-Quatro, to deliver in-memory computing (IMC) for ultralow power and high-radiation-resilience applications, such as satellites. Based on the proposed TA-Quatro configuration, we implemented an IMC circuit to support binary weights and ternary activations, thus significantly improving the DNN accuracy. Our TA-Quatro circuit shows good IMC stability at the scaled supply of 0.7 V. Furthermore, the proposed TA-Quatro IMC design achieves ternary computing outputs without analog-to-digital converters, thus further improving the power efficiency compared to other SOTA works related to cell-level ternary activation. Using 28 nm FD-SOI technology, our TA-Quatro-based IMC circuit delivered a power efficiency of 1755.3 TOPS/W, which is better than other state-of-the-art works.

Author Contributions

Conceptualization, T.-D.N. and I.-J.C.; methodology, T.-D.N. and I.-J.C.; software, T.-D.N.; validation, T.-D.N. and I.-J.C.; formal analysis, T.-D.N.; investigation T.-D.N. and M.-S.L.; writing—original draft preparation, T.-D.N., M.-S.L., T.-N.P. and I.-J.C.; writing—review and editing, T.-D.N., T.-N.P. and I.-J.C.; visualization, T.-D.N.; supervision, I.-J.C.; project administration, I.-J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) under RS-2021-II210106 and RS-2020-II201294.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CIS	CMOS image sensor
AOB	lways-on-block
IMC	In-memory computing
DNNs	Deep neural networks
TA	Ternary activation
BWNs	Binary weight networks
ADCs	Analog-to-digital converters
CSA	Current sense amplifier

References

Kang, N.; Kim, H.; Oh, H.; Kim, J.-J. TAIM: Ternary Activation In-Memory Computing Hardware with 6T SRAM Array. In Proceedings of the 59th ACM/IEEE DAC Conference, San Francisco, CA, USA, 10–14 July 2022; pp. 1081–1086. [Google Scholar] [CrossRef]
Yin, S.; Jiang, Z.; Seo, J.-S.; Seok, M. XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks. IEEE J.-Solid-State Circuits 2020, 6, 1733–1743. [Google Scholar] [CrossRef]
Song, S.; Kim, Y. Novel In-Memory Computing Adder Using 8+T SRAM. Electronics 2022, 11, 929. [Google Scholar] [CrossRef]
Lee, S.; Kim, Y. Charge-Domain Static Random Access Memory-Based In-Memory Computing with Low-Cost Multiply-and-Accumulate Operation and Energy-Efficient 7-Bit Hybrid Analog-to-Digital Converter. Electronics 2024, 13, 666. [Google Scholar] [CrossRef]
Chen, R.; Chen, L.; Han, J.; Wang, X.; Liang, Y.; Ma, Y.; Shangguan, S. Comparative Study on the “Soft Errors” Induced by Single-Event Effect and Space Electrostatic Discharge. Electronics 2021, 10, 802. [Google Scholar] [CrossRef]
Marques, C.M.; Wrobel, F.; Aguiar, Y.Q.; Michez, A.; Saigné, F.; Boch, J.; Dilillo, L.; Alía, R.G. Evaluation of a Simplified Modeling Approach for SEE Cross-Section Prediction: A Case Study of SEU on 6T SRAM Cells. Electronics 2024, 13, 1954. [Google Scholar] [CrossRef]
Saleh, A.M.; Serrano, J.J.; Patel, J.H. Reliability of scrubbing recovery-techniques for memory systems. IEEE Trans. Reliab. 1990, 39, 114–122. [Google Scholar] [CrossRef]
Giordano, R.; Barbieri, D.; Perrella, S.; Catalano, R. Custom Scrubbing for Robust Configuration Hardening in Xilinx FPGAs. Instruments 2019, 3, 56. [Google Scholar] [CrossRef]
Alnatsheh, N.; Kim, Y.; Cho, J.; Choi, K.K. A Novel 8T XNOR-SRAM: Computing-in-Memory Design for Binary/Ternary Deep Neural Networks. Electronics 2023, 12, 877. [Google Scholar] [CrossRef]
Dang, L.D.T.; Kim, J.S.; Chang, I.J. We-Quatro: Radiation-Hardened SRAM Cell with Parametric Process Variation Tolerance. IEEE Trans. Nucl. Sci. 2017, 9, 2489–2496. [Google Scholar] [CrossRef]
Jahinuzzaman, S.M.; Rennie, D.J.; Sachdev, M. A Soft Error Tolerant 10T SRAM Bit-Cell with Differential Read Capability. IEEE Trans. Nucl. Sci. 2009, 56, 3768–3773. [Google Scholar] [CrossRef]
Yao, R.; Lv, H.; Zhang, Y.; Chen, X.; Zhang, Y.; Liu, X.; Bai, G. A High-Reliability 12T SRAM Radiation-Hardened Cell for Aerospace Applications. Micromachines 2023, 14, 1305. [Google Scholar] [CrossRef] [PubMed]
Jahinuzzaman, S.M.; Rennie, D.J.; Sachdev, M. Upset Hardened Memory Design for Submicron CMOS Technology. IEEE Trans. Nucl. Sci. 1996, 43, 2874–2878. [Google Scholar]
Weste, N.; Harris, D. CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed.; Pearson Education: London, UK, 2011. [Google Scholar]
Ottati, F.; Turvani, G.; Masera, G.; Vacca, M. Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories. Electronics 2021, 10, 2291. [Google Scholar] [CrossRef]
Wang, J.; Wang, X.; Ecker, C.; Subramaniyan, A.; Das, R.; Blaauw, D.; Sylvester, D. A 28-nm Compute SRAM with Bit-Serial Logic/Arithmetic Operations for Programmable In Memory Vector Computing. IEEE J.-Solid-State Circuits 2020, 55, 76–86. [Google Scholar] [CrossRef]
Kang, M.; Gonugondla, S.K.; Patil, A.; Shanbhag, N.R. A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array. IEEE J.-Solid-State Circuits 2018, 53, 642–655. [Google Scholar] [CrossRef]
Jhang, C.-J.; Xue, C.-X.; Hung, J.-M.; Chang, F.-C.; Chang, M.-F. Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 1773–1786. [Google Scholar] [CrossRef]
Nguyen, T.-D.; Le, M.-S.; Pham, T.-N.; Chang, I.-J. TRIO: A Novel 10T Ternary SRAM Cell for Area-Efficient In-memory Computing of Ternary Neural Networks. In Proceedings of the 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hangzhou, China, 11–13 June 2023. [Google Scholar]
Kim, H.; Kim, Y.; Kim, J.-J. In memory batch-normalization for resistive memory based binary neural network hardware. In Proceedings of the 24th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 21–24 January 2019; pp. 645–650. [Google Scholar]
Sari, E.; Belbahri, M.; Nia, V.P. How Does Batch Normalization Help Binary Training? arXiv 2020, arXiv:1909.09139. [Google Scholar]
Qin, H.; Gong, R.; Liu, X.; Bai, X.; Song, J.; Sebe, N. Binary neural networks: A survey. Pattern Recognit. 2020, 105, 107281. [Google Scholar] [CrossRef]
Oh, H.; Kim, H.; Ahn, D.; Park, J.; Kim, Y.; Lee, I.; Kim, J.-J. Energy-Efficient In-Memory Binary Neural Network Accelerator Design Based on 8T2C SRAM Cell. IEEE Solid-State Circuits Lett. 2022, 5, 70–73. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Stephen; Maybank, J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Beyer, L.; Zhai, X.; Markeeva, L.; Anil, R.; Kolesnikov, A. Knowledge distillation: A good teacher is patient and consistent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 10915–10924. [Google Scholar]
Dang, L.D.T.; Linh, T.D.; Dat, N.T.; Min, C.; Kim, J.; Chang, I.-J.; Han, J.-w. Comparing Variation-tolerance and SEU/TID-Resilience of Three SRAM Cells in 28nm FD-SOI Technology: 6T, Quatro, and we-Quatro. In Proceedings of the IEEE International Reliability Physics Symposium, Dallas, TX, USA, 28 April–30 May 2020. [Google Scholar]
Le, M.-S.; Pham, T.-N.; Nguyen, T.-D.; Chang, I.-J. PR-CIM: A Variation-Aware Binary-Neural-Network Framework for Process-Resilient Computation-in-memory. arXiv 2021, arXiv:2110.09962. [Google Scholar]
Kok, C.L.; Kok, C.L. Designing a Twin Frequency Control DC-DC Buck Converter Using Accurate Load Current Sensing Technique. Electronics 2024, 13, 45. [Google Scholar] [CrossRef]
Kok, C.L.; Tang, H.; Teo, T.H.; Koh, Y.Y. A DC-DC Converter with Switched-Capacitor Delay Deadtime Controller and Enhanced Unbalanced-Input Pair Zero-Current Detector to Boost Power Efficiency. Electronics 2024, 13, 1237. [Google Scholar] [CrossRef]
Teo, B.C.T.; Lim, W.C.; Venkadasamy, N.; Lim, X.Y.; Kok, C.L.; Siek, L. A CMOS Rectifier with a Wide Dynamic Range Using Switchable Self-Bias Polarity for a Radio Frequency Harvester. Electronics 2024, 13, 1953. [Google Scholar] [CrossRef]

Figure 1. The schematics of (a) 10T SRAM Quatro, (b) we-Quatro, (c) 12T SRAM DICE.

Figure 2. (a) The schematic of XNOR-SRAM cell [2] and (b) TAIM cell [1].

Figure 3. (a) The proposed TA-Quatro configuration, (b) bitline switching schematic, (c) the relation between the stored weight value, and the equivalent memory node status in the TA-Quatro cell, (d) thin-cell-type layout of TA-Quatro cell in the 28 nm FD-SOI technology.

Figure 4. Mapping binary weight and ternary activation of IMC to the TA-Quatro array architecture.

Figure 5. Schematic of the knowledge distillation setup. The teacher network is a network with large MAC sizes, and the student network is the network fit to our IMC circuit.

Figure 6. Read disturb problem in SRAM IMC: (a) TAIM [1] and (b) TA-Quatro.

Figure 7. The number of the flipped cells depends on the supply voltage and wordlines voltage with 1000 MC simulation on TAIM (a) and our TA-Quatro (b).

Figure 8. (a–c) variability of bitline currents and power consumption of considered TA-Quatro and TAIM versus different supply voltages in SS, TT, and FF corners, respectively. (d,e) the normal distribution of ‘

i B L

’ − ‘

i B L B

’ in TA-Quatro and TAIM.

Figure 8. (a–c) variability of bitline currents and power consumption of considered TA-Quatro and TAIM versus different supply voltages in SS, TT, and FF corners, respectively. (d,e) the normal distribution of ‘

i B L

’ − ‘

i B L B

’ in TA-Quatro and TAIM.

Figure 9. Leakage power for various process corners.

Table 1. The biasing condition of our TA-Quatro cell.

Operation	WL1	WL2	BL1	BL2	BL3	BL4	SW1	SW2	SW3	SW4	SW5	SW6
Write “+1”	VDD	VDD	GND	VDD	VDD	GND	OFF	OFF	OFF	OFF	ON	ON
Write “−1”	VDD	VDD	VDD	GND	GND	VDD	OFF	OFF	OFF	OFF	ON	ON
Read	VDD	GND	PRE	PRE	PRE	PRE	OFF	ON	OFF	ON	OFF	OFF
IMC	GND/VDDL	GND/VDDL	PRE	PRE	PRE	PRE	ON	ON	ON	ON	OFF	OFF

Table 2. Performance comparison of various designs implemented in 28 FDSOI technology.

	TAIM with ADCs	TA-Quatro with ADCs	TA-Quaro with Two-Cycle CSA
Supply Voltage	1.0V	0.7V	0.7V
Input/weight precision	Ternary/Binary	Ternary/Binary	Ternary/Binary
Array size	256 × 128	256 × 128	256 × 128
Cell area ( $μ$ m²)	0.3	0.785	0.785
Array leakage power consumption ( $μ$ W)	12.15	9.83	9.83
TOPS/W	1244.7 *	1298.6*	1755.3 *

* The values were derived from simulations conducted under 28 nm FD-SOI technology to ensure a fair comparison.

Table 3. Comparison with recent works on ternary activation and binary weight for in-memory SRAM computing.

	XNOR-SRAM [2]	TAIM [1]	This Work
Technology	65 nm CMOS	28 nm CMOS	28 nm FD-SOI
Number of cell transistors	12	6	12
Supply voltage	1.0V	1.0V	0.7V
Column sensing	ADCs	ADCs	Sense Amplifiers
Input/weight precision	Ternary/Binary	Ternary/Binary	Ternary/Binary
Consume power when input is ‘0’	Yes	No	No
TOPS/W	403	1087	1755.3
MNIST accuracy	98.84%	98.24%	98.42%
CIFAR 10 accuracy	88.78%	NA	88.5%
SEU-resilient	Low	Low	High

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, T.-D.; Le, M.-S.; Pham, T.-N.; Chang, I.-J. TA-Quatro: Soft Error-Resilient and Power-Efficient SRAM Cell for ADC-Less Binary Weight and Ternary Activation In-Memory Computing. Electronics 2024, 13, 2904. https://doi.org/10.3390/electronics13152904

AMA Style

Nguyen T-D, Le M-S, Pham T-N, Chang I-J. TA-Quatro: Soft Error-Resilient and Power-Efficient SRAM Cell for ADC-Less Binary Weight and Ternary Activation In-Memory Computing. Electronics. 2024; 13(15):2904. https://doi.org/10.3390/electronics13152904

Chicago/Turabian Style

Nguyen, Thanh-Dat, Minh-Son Le, Thi-Nhan Pham, and Ik-Joon Chang. 2024. "TA-Quatro: Soft Error-Resilient and Power-Efficient SRAM Cell for ADC-Less Binary Weight and Ternary Activation In-Memory Computing" Electronics 13, no. 15: 2904. https://doi.org/10.3390/electronics13152904

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TA-Quatro: Soft Error-Resilient and Power-Efficient SRAM Cell for ADC-Less Binary Weight and Ternary Activation In-Memory Computing

Abstract

1. Introduction

2. Motivation and Overview

2.1. Soft Error-Resilient SRAM Cells

2.2. Ternary Activation IMC Based on SRAM Cells

3. Our TA-Quatro IMC Operation

3.1. Write and Read Operations

3.2. In-Memory Computing Operation

4. Evaluation

4.1. The Effects of Cell-to-Cell Interference in Our TA-Quatro IMC Circuit

4.2. Power Efficiency of the Proposed TA-Quatro

4.2.1. The Optimization of Supply Voltage

4.2.2. Leakage Power Consumption

4.2.3. Power Consumption Comparison

4.3. Comparison with Recent Works on Ternary Activation and Binary Weight for In-Memory SRAM Computing and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI