Next Article in Journal
Positioning Accuracy Determination of the Servo Axes for Grinding Wavy-Tilt-Dam Seals Using a Four-Axis Grinder
Previous Article in Journal
Recapitulating Cardiac Structure and Function In Vitro from Simple to Complex Engineering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A 1T2C FeCAP-Based In-Situ Bitwise X(N)OR Logic Operation with Two-Step Write-Back Circuit for Accelerating Compute-In-Memory

1
Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China
2
School of Microelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
3
School of Microelectronics, University of Science and Technology of China, Hefei 230026, China
4
Zhejiang Lab, Hangzhou 311121, China
*
Author to whom correspondence should be addressed.
Micromachines 2021, 12(4), 385; https://doi.org/10.3390/mi12040385
Submission received: 19 March 2021 / Revised: 27 March 2021 / Accepted: 29 March 2021 / Published: 1 April 2021
(This article belongs to the Section E:Engineering and Technology)

Abstract

:
Ferroelectric capacitors (FeCAPs) with high process compatibility, high reliability, ultra-low programming current and fast operation speed are promising candidates to traditional volatile and nonvolatile memory. In addition, they have great potential in the fields of storage, computing, and memory logic. Nevertheless, effective methods to realize logic and memory in FeCAP devices are still lacking. This study proposes a 1T2C FeCAP-based in situ bitwise X(N)OR logic based on a charge-sharing function. First, using the 1T2C structure and a two-step write-back circuit, the nondestructive reading is realized with less complexity than the previous work. Second, a method of two-line activation is used during the operation of X(N)OR. The verification results show that the speed, area and power consumption of the proposed 1T2C FeCAP-based bitwise logic operations are significantly improved.

1. Introduction

Von Neumann architecture is widely used in computer systems, where memory and computing are completely isolated. However, the memory access speed is much slower than the processor’s processing speed [1,2], which has become a “memory wall” that limits the computer’s overall performance. Recently, processing-in-memory (PIM) architecture has been proposed because of its potential to overcome the “memory wall” problem [3,4,5]. PIM stores operands through a memory array and calculates them in memory, reducing power consumption during memory access and data handling. Therefore, PIM has great potential in graph computing, speech processing, memory database, and real-time analysis [6]. Based on the different types of memory, the mainstream research of the PIM architecture is divided into two: one is volatile memory-based PIM, and the other is nonvolatile memory-based PIM [7,8]. Characteristics, such as low storage density, high-energy consumption, and latency of static random-access memory (SRAM) caused by the serial row-by-row access mode, makes SRAM-based PIM unsuitable for large and complex computing tasks [9,10]. Dynamic random-access memory (DRAM)-based PIM has higher array efficiency, but DRAM reading is a destructive operation [11,12]. Moreover, the data stored in volatile memory will disappear once the power down. Traditional nonvolatile memory (flash) has high storage density, low cost, and can achieve high-precision volume production operations. These advantages make flash very suitable for PIM [13]. However, flash is programmed only in blocks, and hence its performance under advanced technology is poor [14,15]. Specifically, SRAM memory cells that support in-memory X(N)OR operations usually adopt a 6T or 8T structure on memory arrays, which leads to poor memory efficiency [16]. Regular refresh operations are required in DRAM to maintain data, which leads to poor power consumption performance of DRAM-based in-memory X(N)OR logic. To deal with these issues, emerging nonvolatile memories with a high-speed of reading and writing, high density, low power consumption, and easy scaling have attracted more attention in recent years [17,18,19]. Among these, resistive random-access memory (RRAM) uses bipolar and unipolar memristors to realize the in-memory X(N)OR logic [20]. However, this method is accompanied by a complex manufacturing process, expensive preparation cost and extra peripheral circuits. Moreover, the analog current summation is usually adopted by RRAM and magnetoresistive random-access memory (MRAM) to effectively realize the in-memory X(N)OR logic operations [19,21,22], which leads to higher power consumption. However, FeRAM uses the charge-sharing function of FeCAPs to realize the X(N)OR logic to reduce power consumption. In addition, the preparation process of FeRAM memory cells (1T1C, 1T2C, or 2T2C) is simple and fully compatible with the logic process. Hence, FeCAPs with high process compatibility, high reliability, ultra-low programming current, and fast operation speed are promising candidates for the traditional volatile and nonvolatile memories.
Recently, the 1T1C-FeCAP-based PIM structure has been studied [23]. Due to the destructive-reading of FeCAPs, a complex write-back circuit is needed, which leads to the complexity of the design and an increase in power consumption and cost. In this study, a 1T2C FeCAP-based in situ bitwise X(N)OR logic operation scheme was proposed. A two-line activation technique was also adopted during the process of X(N)OR. Assistant circuits, including a sense amplifier (SA) and a control module, were designed to implement bit-by-bit X(N)OR operations in the same bit line (BL).
The 1T2C cell structure can overcome the destructive reading issue of the 1T1C FeCAP cell using a two-step write-back circuit. During the writing phase, the data are written in the two FeCAPs simultaneously. In the sensing phase, the charge of one FeCAP is read out, and the other is responsible for storing data and assisting in data rewriting.

2. Previous Related Studies

X(N)OR is a function that cannot be realized quickly and efficiently using traditional central processing unit (CPU)-based methods. However, the X(N)OR operation is an important logic operation, which has many important applications. Implementing X(N)OR effectively and cheaply has become a hot research topic.
SRAM-based in-memory-X(N)OR operation is difficult to be realized by 6T-type SRAM cell. Hence, an SRAM cell with an 8T structure [24] or even 12T structure [25] has been proposed to realize X(N)OR logic operations in SRAM. The operation principle of an 8T SRAM cell is shown in Figure 1. The 8T-type cell has two pairs of switch transistors controlled by a word line (WL) and word-line bar (WLb), respectively. A pair of switch transistors controlled by WL connect Q and Qb to bitline (BL) and bitline-bar (BLb). Moreover, a pair of switch transistors controlled by WLb connect Q and Qb to BLb and BL. Data stored in Q and Qb nodes is the weight of the X(N)OR logic. The voltages of WL and WLb are inputs of the X(N)OR logic. The output of the X(N)OR logic is the multiplication result of weight and input, as shown in Figure 1. Finally, X(N)OR operation can be realized by using an 8T SRAM cell. However, due to the additional transistors and metal routing [24], the cell size of an 8T SRAM cell is much larger than that of a conventional 6T SRAM cell and other memory cells.
In the work of Angizi and Fan [26], a DRAM-based in-memory XOR2 circuit was designed. The DRAM memory cell is the basic computing cell. To realize the XOR2 logic, they proposed a new reconfigurable SA, as shown in Figure 2. The XOR2 circuit consists of three inverters, having different switching voltages (Vs), and an AND gate. Then, the SA can distinguish “00”, “01” and “11” states. The Low-Vs (low switching voltage) inverter uses 1/4 VDD as the switching voltage to realize the NOR2 function. Simultaneously, the High-Vs (high switching voltage) inverter uses 3/4 VDD as the switching voltage to realize the NAND2 function. Finally, XOR2 logic can be realized after a CMOS AND gate in a single memory cycle. Moreover, to realize the XOR2 function, two capacitors connected to the same BL are read out simultaneously, and charge-sharing is implemented on the BL to implement logical operations. The main disadvantages of this method are the large area of the XOR2 circuit and the volatility of DRAM cells that need a periodic refresh.
Moreover, Xiaoyu Sun [20] and our previous study [27] proposed the in-memory X(N)OR logic operations based on novel nonvolatile memory (NVM). In the work of Xiaoyu Sun, the sequential X(N)OR-RRAM architecture was proposed, as shown in Figure 3. The calculating units U1 and U2 represent the weights “−1” and “1”, respectively. For the input of X(N)OR logic, the two WLs of a calculating unit are in a complimentary state where (0, 1) represents “−1” and (1, 0) represents “+1”. In this method, the value of the current that flows through each calculating unit during readout is dependent on the multiplication result of its input and weight. However, the analog current summation is usually adopted by RRAM to effectively realize the in-memory X(N)OR logic, which leads to higher power consumption.
In our previous study, the X(N)OR operation based on the 1T1C FeCAP was proposed, as shown in Figure 4. Six MOS transistors were used in the circuit to achieve the X(N)OR operation. Two rows of data stored on the same BL are simultaneously read out, and then the charge-sharing function is realized. The signal of the transmission gate, which is triggered by the charge-sharing results of the BL voltage, is used to realize the X(N)OR operation.
Nevertheless, since the reading of the FeCAP is destructive and complex write-back circuits are needed to prevent data loss, the complexity of the design and the cost and power consumption are also increased.
The two representative studies are an X(N)OR logic function based on DRAM that was proposed by Angizi and Fan [26] and an X(N)OR logic function based on the 1T1C FeCAP that was proposed by our previous work [27].

3. Proposed 1T2C FeCAP-Based X(N)OR Logic Operation

Compared with the work of Angizi and Fan, we proposed a 1T2C FeCAP-based X(N)OR logic operation circuit that has lower power consumption and a smaller area. To reduce the complexity of the design of the write-back circuit, a two-step write-back circuit was designed that fully utilized the advantages of the 1T2C cell.

3.1. Operation of the Proposed 1T2C Cell

In the 1T2C cell, a transistor is used to select the two FeCAPs, as shown in Figure 5a. The write and read timings of the 1T2C cell are shown in Figure 5b,c, respectively.
During writing, the same voltage pulse is applied to the plate lines, PL1 and PL2, to polarize the two FeCAPs to the same state. During reading the 1T1C FeCAP cell, PL is applied with a reading pulse, and BL is left floating [28,29]. Then, BL is charged by the polarized charge in the FeCAP. Finally, the BL voltage is compared with the reference voltage using SA to read out the stored data. For 1T2C, when the FeCAP C1 is read, the polarization state of the C2 will be affected by the BL voltage. Hence, during the reading process of 1T2C, BL and PL2 are applied with the same read pulse, and PL1 is left floating. Then, PL1 is charged by the polarized charge in the FeCAP, that is, C1. Finally, the PL1 voltage is compared with the reference voltage using SA to read out the stored data.
The 1T2C-type FeCAP has only little effect on the area of the memory cell because the area of the transistor determines the cell area. The area of the transistor is substantially much larger than that of the FeCAP, as shown in Figure 6.

3.2. Dual-Row In-Memory X(N)OR Operation

X(N)OR and addition functions are prerequisites for accelerating various applications [30,31]. To realize the X(N)OR operation in the proposed 1T2C FeCAP, a mode-switchable SA circuit is designed, as shown in Figure 7.
The X(N)OR circuit consists of a latch SA, an inverter, and a transmission gate. The working mode of the SA circuit is changed by using two transistors NM3 and NM4. The working mode of the proposed SA is switched to traditional SA when NM4 is turned off, and NM3 is turned on. Conversely, the work mode of the proposed SA is switched to X(N)OR operation when NM3 is turned off, and NM4 is turned on.
As shown in Figure 8, the X(N)OR operation is divided into three phases. In the precharging phase, the residual charge on PL1 and node b is released by the two precharge transistors NM5 and NM10. In the charge-sharing phase, the charges in C1 and C3 are dashed out. Meanwhile, the amount of charges of C2 and C4 remains constant. Then, PL1 is charged using the function of charge-sharing of the two FeCAPs, C1 and C3. In the X(N)OR phase, the on–off states of transistors PM7 and NM6 are determined by the voltage of PL1(VPL1) to realize X(N)OR.
When the data in C1 and C3 is “00”, PL1 is charged to a lower voltage. Consequently, PM7 is turned on, and NM6 is turned off. Then the voltage of node b is increased to VDD. Finally, the output of the X(N)OR circuit is “0”. When the stored data in C1 and C3 is “01” or “10”, PL1 is charged to a medium voltage. Consequently, PM7 and NM6 are turned off. Then the voltage of node b remains “0”. Finally, the output of the X(N)OR circuit is “1”. When the stored data in C1 and C3 is “11”, PL1 is charged to a higher voltage. Consequently, NM6 is turned on, and PM7 is turned off. Then the voltage of node b is increased to VDD-Vth6, where Vth6 is the threshold voltage of node NM6. Finally, the output of the X(N)OR circuit is “0”. Through these operations, the X(N)OR logic operation is realized.

3.3. Two-Step Write-Back Circuit

Since the reading of the FeCAP is destructive, the stored data needs to be rewritten after the read operation [32,33]. For the traditional SA mode, the output of the SA will be fed back to the memory cell. As shown in Figure 9, when the read pulse on BL is pulled down, the rewriting of the data in the FeCAP is completed with the assistance of the latch SA [34].
However, for the X(N)OR mode, two rows of data stored on the same BL (or PL) are simultaneously read out, and the data of the cells cannot be written back in time. To solve this issue, the write-back circuits are designed, as shown in Figure 10.
In our previous study [27], the write-back module for the 1T1C-type FeCAP is designed, as shown in Figure 11. The write-back circuit is comprised of a register, a judgment module, and a control module.
The write-back process of the 1T1C-type FeCAP is divided into four phases, as shown in Figure 11. In the first phase, the data in C1 is read out by the latch SA and saved in the register. In the second phase, the X(N)OR function is executed. In the third phase, the data initially stored in C2 is obtained through the judgment module, according to the outputs of X(N)OR and the data in the register. Finally, C1 and C2 are written back. In detail, when the output of X(N)OR is “0”, the data stored in C2 and C1 is written back simultaneously. When the output of X (N)OR is “1”, the data stored C1 is written back first and then C2.
Compared with the 1T1C-type FeCAP, the write-back circuit of the 1T2C-type FeCAP is simpler, as shown in Figure 12a. In the X(N)OR process, data stored in FeCAPs C1 and C3 is sensed. Meanwhile, FeCAPs C2 and C4 keep the original data of C1 and C3.
The write-back process consists of two phases, as shown in Figure 12b. In the first phase, the data in C2 is read out by the latch SA, and the rewriting of the sensed data in C1 and C2 is realized with the assistance of the latch SA. In the second phase, C3 and C4 are written back the same as the write operation discussed in Section 3.1.
The register and decision module are omitted in the proposed 1T2C-type FeCAP write circuit. The two-step write-back circuit can reuse the read-write circuit of the traditional FeCAP without extra circuits, which not only greatly saves the area but also reduces the write-back latency.

4. Verification Results and Discussion

X(N)OR logic circuits and the two-step write-back method were simulated with a 28 nm CMOS logic process. The FeCAP model is fitted to the test data by the Landau-Khalatnikov (L-K) equation and then embedded into the simulation tool.

4.1. Device Fabrication, Performance, and Simulation Model

The FeCAP used in this study is fabricated using the back-end-of-line (BEOL) process and is fully compatible with the logic process. Figure 13a shows the cross-section transmission electron microscope (TEM) image of the fabricated FeCAP. In the BEOL process, 30-nm TiN was deposited as the bottom electrode (BE) by RF reactive sputtering. Then, 20-nm Hf0.5Zr0.5O2 was deposited on the BE by atomic layer deposition (ALD), where HfO2 and ZrO2 in Hf0.5Zr0.5O2 are configured in a stoichiometric ratio of 1:1. Finally, 30-nm TiN was deposited as the top electrode (TE) by RF reactive sputtering, and rapid thermal annealing was carried out. The fabrication process flows are shown in Figure 13b.
Previous studies have proved that HfO2-based films are very thin with a wide bandgap, as shown in Table 1 [35]. Hence, a Hf0.5Zr0.5O2-based FeCAP maintains good ferroelectricity as the process scales go down.
Moreover, multiple read-write operations will be performed on the 1T2C-type FeCAP cells to realize the in-memory X(N)OR logic operation. Hence, the endurance of the FeCAP is critical to the function of the X(N)OR circuit. Figure 14 shows the endurance of the FeCAP used in this study. The measurement results show that the FeCAP achieves ~107 stress cycles at 3 V/500 ns pulse at room temperature. In the X(N)OR mode, the FeCAP is biased <3 V and the read–write pulse width is less than 500 ns, so it could achieve much more cycles. Hence, the Hf0.5Zr0.5O2-based FeCAP has sufficient reliability to ensure the efficiency of the X(N)OR logic.
The FeCAP model was carried out by the L-K equation proposed by Aziz et al. [36] and optimized based on the measured data. The schematic diagram of the model is shown in Figure 15. A fifth-order polynomial voltage-controlled voltage-source (E0) is used to characterize the FeCAP. The polynomial coefficients of E0 are derived from the following L-K equation [36, 37]:
E ρ dp dt = α P + β P 3 + γ P 5
Here, α, β, and γ are the static parameters of the FeCAP and ρ is the kinetic coefficient [37]. Let QP be the polarization charge stored in the FeCAP, and AFE and TFE are its area and thickness, respectively. Then, the voltage VFE across the FeCAP is:
V FE = ( ρ T FE A FE dQ P dt ) + ( T EF { α Q P A FE + β Q P 3 A FE 3 + γ Q P 5 A FE 5 } )
The FeCAP is modeled as a nonlinear capacitor (CLK, simplified to E0) that is connected in series with a resistor (R0 = ρTFE/AFE), which is easy to implement in circuit simulation tool [36], as shown in Figure 15. Table 2 shows the model parameters of the FeCAP in Figure 15, where C0 is the parasitic parameter of ferroelectric materials. The current flow in R0 and E0 is captured through the current-control current-source (F0). Ci is charged by the current of F0. The voltage on E0 is equal to the charge on the FeCAP when Ci is chosen as 1 F.
The measured P-V curve of the FeCAP is shown in Figure 16a. The coercive voltage of the FeCAP is about 1.3 V. Hence, when the voltage of BL and PL is 1.8 V, the FeCAP provides a sufficient margin for the proposed mode-switchable SA circuit. The simulation P-V curve is in good agreement with the test results, as shown in Figure 16b.

4.2. X(N)OR Logic Operation Simulation Results

The X(N)OR logic operation is simulated and verified using a 28 nm CMOS process. The operating voltage of the enable transistors (NM3 and NM4), as well as the precharge transistors (NM5 and NM10) and inverter (NM9 and PM8) in X(N)OR is 0.9 V of core voltage. The operating voltage of the transmission gate (NM6 and PM7) and selector transistors (NM1 and NM2) is 1.8 V of IO voltage. The VDD in Figure 7 was set to 0.9 V during the simulation.
When the data in the two 1T2C cells are “01/10”, the output of X(N)OR remains constant at a high level. When the data in the two memory cells are “00”, the output of X(N)OR is pulled down rapidly. When the data in the two memory cells are “11”, the output of X(N)OR is slowly pulled down. Simulation results show that the X(N)OR circuit can work correctly within 100 ns, as shown in Figure 17.
Figure 18 shows the workflow in detail. During simulation, first, the FeCAPs (C1–C4) in the initial state are polarized; that is, data “0” or “1” is written into C1, C2 and C3, C4 in two steps according to the timing in Figure 5b. These two processes are shown in Figure 18a,b. Second, two word-lines (WLn and WLn+1) are activated simultaneously by an optimized row decoder (ORD), which enables multiple row activation required for bitwise in-memory X(N)OR operations between operands. Third, the charges in C1 and C3 are flushed out, and the voltage of PL1 (VPL1) is pulled up by the charge-sharing of FeCAPs. These two processes are shown in Figure 18c. Finally, as shown in Figure 18d, the X(N)OR circuit outputs digital bit “0” or “1” with the help of SA. As shown in Figure 17, when data in the two FeCAPs (C1 and C3) are “00”, VPL1 rises to about 400 mV, and PM7 is turned on. The output of X(N)OR is “1”. When data in the two FeCAPs (C1 and C3) are “01” or “10”, VPL1 rises to about 550 mV, and PM7 and NM6 are turned off. The output of X(N)OR remains constant at “1”. When data in the two FeCAPs (C1 and C3) are “11”, VPL1 rises to about 700 mV, and NM6 is turned on. The output of X(N)OR is “0”. Table 3 shows the truth table of the X(N)OR circuit.

4.3. Reliability of In-Memory X(N)OR Logic Operation

The reliability of the X(N)OR circuit is verified using different simulation conditions. Figure 19 and Figure 20 show the performances of the X(N)OR logic under different process corners. Figure 19 shows that the minimum margins of the neighboring states across different process corners exceed 100 mV, which ensures the circuit can work correctly. Moreover, the reliability of the X(N)OR circuit is also carried out using a 5000-sample Monte Carlo simulation at 125 °C and −20 °C, as shown in Figure 20a,b. The results proved that the X(N)OR logic works even process variations exist. All these show that the performance of the proposed 1T2C FeCAP-based in-memory X(N)OR logic is robust.

5. Conclusions

This study proposed a 1T2C FeCAP-based in situ bitwise X(N)OR logic operation scheme. A two-line activation technique was used during the X(N)OR process. Assistant circuits, including an SA and control module, were designed to implement bit-by-bit X(N)OR operations in the same BL. The 1T2C cell structure used in this work can overcome the destructive reading issue of the 1T1C FeCAP cell with a two-step write-back circuit. The circuit was verified in a 28 nm CMOS logic process with the FeCAP model carried out from the L–K equation. Table 4 summarizes the key parameters and compares our work with previous studies. The proposed circuit has the advantages of low design complexity, small area, high memory efficiency, and nonvolatility.

Author Contributions

Q.W. conceived the idea and performed the simulation. Q.W, D.Z. and Y.Z. designed the circuit. C.L., Q.H., X.L. and J.Y. took part in the discussion and provided expertise. J.Y. and H.L. supervised the work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China under grant no. 2019YFB2204800 and in part by the Major Scientific Research Project of Zhejiang Lab (Grant 277 No.2019KC0AD02) and in part by the National Natural Science Foundation of China under Grants 61904200, 62025406, 61834009 and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDB44000000.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chi, P.; Li, S.C.; Xu, C.; Zhang, T.; Zhao, J.; Liu, Y.P.; Wang, Y.; Xie, Y. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea, 18–22 June 2016; pp. 27–39. [Google Scholar]
  2. Yang, J.J.; Strukov, D.B.; Stewart, D.R. Memristive Devices for Computing. Nat. Nanotechnol. 2013, 8, 13–24. [Google Scholar] [CrossRef] [PubMed]
  3. Waldrop, M.M. The chips are down for Moore’s law. Nat. News 2016, 530, 145–147. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Horowitz, M. Computing’s energy problem (and what we can do about it). In Proceedings of the 2014 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 9–13 February 2014; Volume 57, pp. 10–14. [Google Scholar]
  5. Xia, Q.; Yang, J.J. Memristive crossbar arrays for brain-inspired computing. Nat. Mater. 2019, 18, 309–323. [Google Scholar] [PubMed]
  6. Indiveri, G.; Liu, S.C. Optimizing Weight Mapping and Data Flow for Convolutional Neural Networks on RRAM Based Processing-In-Memory Architecture. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2015; pp. 1–5. [Google Scholar]
  7. Angizi, S.; He, Z.Z.; Parveen, F.; Fan, D.L. RIMPA: A new reconfigurable dual-mode in-memory processing architecture with spin hall effect-driven domain wall motion device. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Bochum, Germany, 3–5 July 2017; pp. 45–50. [Google Scholar]
  8. Dai, G.; Huang, T.H.; Chi, Y.Z.; Zhao, J.S.; Sun, G.Y.; Liu, Y.P.; Wang, Y.; Xie, Y.; Yang, H.Z. Graphh: A processing-in-memory architecture for largescale graph processing. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2019, 38, 640–653. [Google Scholar] [CrossRef]
  9. Gauchi, R.; Kooli, M.; Vivet, P.; Noel, J.-P.; Beigné, E.; Mitra, S.; Charles, H.-P. Memory Sizing of a Scalable SRAM In-Memory Computing Tile Based Architecture. In Proceedings of the International Conference on Very Large Scale Integration (VLSI-SoC), Cuzco, Peru, 6–9 October 2019; pp. 166–171. [Google Scholar]
  10. Zhu, Q.L.; Akin, B.; Sumbul, H.E.; Sadi, F.; Hoe, J.C.; Pileggi, L.; Franchetti, F. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In Proceedings of the IEEE International 3D Systems Integration Conference (3DIC), San Francisco, CA, USA, 2–4 October 2013; pp. 1–7. [Google Scholar]
  11. Ma, Y.; Zheng, L.F.; Zhou, P.Q. CoDRAM: A Novel Near Memory Computing Framework with Computational DRAM. In Proceedings of the IEEE International Conference on ASIC (ASICON), Chongqing, China, 29 October–1 November 2019; pp. 1–4. [Google Scholar]
  12. Yang, Z.; Wei, L. Logic Circuit and Memory Design for In-Memory Computing Applications using Bipolar RRAMs. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2019; pp. 1–5. [Google Scholar]
  13. Hsu, P.K.; Du, P.Y.; Lo, C.R.; Lue, H.T.; Chen, W.C.; Hsu, T.H.; Yeh, T.H.; Hsieh, C.C.; Wei, M.L.; Wang, K.C.; et al. An Approach of 3D NAND Flash Based Nonvolatile Computing-In-Memory (nvCIM) Accelerator for Deep Neural Networks (DNNs) with Calibration and Read Disturb Analysis. In Proceedings of the IEEE International Memory Workshop (IMW), Dresden, Germany, 17–20 May 2020; pp. 1–4. [Google Scholar]
  14. Marotta, G.G.; Macerola, A.; D’Alessandro, A.; Torsi, A.; Cerafogli, C.; Lattaro, C.; Musilli, C.; Rivers, D.; Sirizotti, E.; Paolini, F.; et al. A 3bit/cell 32Gb NAND flash memory at 34nm with 6MB/s program throughput and with dynamic 2b/cell blocks configuration mode for a program throughput increase up to 13MB/s. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 7–11 February 2010; pp. 444–445. [Google Scholar]
  15. Naso, G.; Botticchio, L.; Castelli, M.; Cerafogli, C.; Cichocki, M.; Conenna, P.; D’Alessandro, A.; Santis, L.D.; Cicco, D.D.; Francesco, W.D.; et al. A 128Gb 3b/cell NAND flash design using 20 nm planar-cell technology. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 17–21 February 2013; pp. 218–219. [Google Scholar]
  16. Kim, H.; Oh, H.; Kim, J.J. Energy-efficient XNOR-free In-Memory BNN Accelerator with Input Distribution Regularization. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Diego, CA, USA, 2–5 November 2020; pp. 1–9. [Google Scholar]
  17. Vetter, J.S.; Mittal, S. Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing. Comput. Sci. Eng. 2015, 17, 73–82. [Google Scholar] [CrossRef]
  18. Wang, K.; Zhang, H.; Zhao, W.S. Spintronic Memories: From Memory to Computing-in-Memory. In Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), Qingdao, China, 17–19 July 2019; pp. 1–2. [Google Scholar]
  19. Lebdeh, M.A.; Abunahla, H.; Mohammad, B.; Al-Qutayri, M. An Efficient Heterogeneous Memristive xnor for In-Memory Computing. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 64, 2427–2437. [Google Scholar] [CrossRef]
  20. Sun, X.; Yin, S.; Peng, X.; Liu, R.; Seo, J.; Yu, S. XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018; pp. 1423–1428. [Google Scholar]
  21. Natsui, M.; Chiba, T.; Hanyu, T. Design of an energy-efficient XNOR gate based on MTJ-based nonvolatile logic-in-memory architecture for binary neural network hardware. Jpn. J. Appl. Phys. 2019, 58, SBBB01. [Google Scholar] [CrossRef]
  22. Chang, L.; Ma, X.; Wang, Z.; Zhang, Y.; Xie, Y.; Zhao, W. PXNOR-BNN: In/With Spin-Orbit Torque MRAM Preset-XNOR Operation-Based Binary Neural Networks. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2019, 27, 2668–2679. [Google Scholar] [CrossRef]
  23. Slesazeck, S.; Ravsher, T.; Havel, V.; Breyer, E.T.; Mulaosmanovic, H.; Mikolajick, T. A 2TnC ferroelectric memory gain cell suitable for compute-in-memory and neuromorphic application. In Proceedings of the IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 7–11 December 2019; pp. 1–38. [Google Scholar]
  24. Liu, R.; Peng, X.; Sun, X.; Khwa, W.S.; Si, X.; Chen, J.J.; Li, J.F.; Chang, M.F.; Yu, S. Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks. In Proceedings of the IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 24–28 June 2018; pp. 24–28. [Google Scholar]
  25. Yin, S.; Jiang, Z.; Seo, J.S.; Seok, M. XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks. IEEE J. Solid State Circuits 2020, 55, 1733–1743. [Google Scholar] [CrossRef]
  26. Angizi, S.; Fan, D. ReDRAM: A Reconfigurable Processing-in-DRAM Platform for Accelerating Bulk Bit-Wise Operations. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Westminster, CO, USA, 4–7 November 2019; pp. 1–8. [Google Scholar]
  27. Wang, Q.; Zhao, Y.; Yang, J.; Liu, C.; Jiang, P.; Ding, Q.; Gong, T.; Luo, Q.; Lv, H.; Liu, M. Non-volatile In Memory Dual-Row X(N)OR Operation with Write Back Circuit Based on 1T1C. In Proceedings of the IEEE International Conference on Solid-State & Integrated Circuit Technology (ICSICT), Kunming, China, 3–6 November 2020; pp. 1–3. [Google Scholar]
  28. Endoh, T.; Koike, H.; Ikeda, S.; Hanyu, T.; Ohno, H. An Overview of Nonvolatile Emerging Memories—Spintronics for Working Memories. IEEE J. Emerg. Sel. Top. Circuits Syst. 2016, 6, 109–119. [Google Scholar] [CrossRef]
  29. Yamada, J.; Miwa, T.; Koike, H.; Toyoshima, H. A self-reference read scheme for a 1T/1C FeRAM. In Proceedings of the Symposium on VLSI Circuits. Digest of Technical Papers, Honolulu, HI, USA, 11–13 June 1998; pp. 238–241. [Google Scholar]
  30. Ali, M.F.; Jaiswal, A.; Roy, K. In-Memory Low-Cost Bit-Serial Addition Using Commodity DRAM Technology. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 155–165. [Google Scholar] [CrossRef]
  31. Kimura, H.; Hanyu, T.; Kameyama, M. Multiple-valued logic-in-memory VLSI based on ferroelectric capacitor storage and charge addition. In Proceedings of the IEEE International Symposium on Multiple-Valued Logic (ISMVL), Boston, MA, USA, 15–18 May 2002; pp. 161–166. [Google Scholar]
  32. Elshamy, M.; Mostafa, H.; Said, M.S. New non-destructive Read/Write circuit for Memristor-based memories. In Proceedings of the International Conference on Engineering and Technology (ICET), Cairo, Egypt, 19–20 April 2014; pp. 1–5. [Google Scholar]
  33. Mulaosmanovic, H.; Dünkel, S.; Müller, J.; Trentzsch, M.; Beyer, S.; Breyer, E.T.; Mikolajick, T.; Slesazeck, S. Impact of Read Operation on the Performance of HfO2-Based Ferroelectric FETs. IEEE Electron Device Lett. 2020, 41, 1420–1423. [Google Scholar] [CrossRef]
  34. Ogiwara, R.; Tanaka, S.; Itoh, Y.; Miyakawa, T.; Takeuchi, Y.; Doumae, S.M.; Takenaka, H.; Kunishima, I.; Shuto, S.; Hidaka, O.; et al. A 0.5/spl mu/m 3V 1T1C 1Mb FRAM with a variable reference bit-line voltage scheme using a fatigue-free reference capacitor. IEEE J. Solid State Circuits 2000, 35, 545–551. [Google Scholar] [CrossRef]
  35. Kim, S.J.; Mohan, J.; Summerfelt, S.R. Ferroelectric Hf0.5Zr0.5O2 Thin Films: A Review of Recent Advances. JOM 2019, 71, 246–255. [Google Scholar] [CrossRef]
  36. Aziz, A.; Ghosh, S.; Datta, S.; Gupta, S.K. Physics-Based Circuit-Compatible SPICE Model for Ferroelectric Transistors. IEEE Electron Device Lett. 2016, 37, 805–808. [Google Scholar] [CrossRef]
  37. Milan, P.; Christopher, K.; Hoffmann, M.; Mulaosmanovic, H.; Stefan, M.; Evelyn, T.B.; Schroeder, U.; Kersch, A.; Mikolajick, T.; Slesazeck, S. A computational study of hafnia-based ferroelectric memories: From ab initio via physical modeling to circuit models of ferroelectric device. JCE 2017, 37, 1236–1256. [Google Scholar]
Figure 1. 8T static random-access memory (SRAM) cell design approaches for in-memory X(N)OR logic operation.
Figure 1. 8T static random-access memory (SRAM) cell design approaches for in-memory X(N)OR logic operation.
Micromachines 12 00385 g001
Figure 2. The reconfigurable sense amplifier is used to implement logic operations. Reprinted with permission from ref. [26]. Copyright @ 2019 IEEE.
Figure 2. The reconfigurable sense amplifier is used to implement logic operations. Reprinted with permission from ref. [26]. Copyright @ 2019 IEEE.
Micromachines 12 00385 g002
Figure 3. The sequential bit-cell design for X(N)OR implementation. Reprinted with permission from ref. [20]. Copyright @ 2018 IEEE.
Figure 3. The sequential bit-cell design for X(N)OR implementation. Reprinted with permission from ref. [20]. Copyright @ 2018 IEEE.
Micromachines 12 00385 g003
Figure 4. The X(N)OR circuit is based on the 1T1C-type ferroelectric capacitor (FeCAP) [27].
Figure 4. The X(N)OR circuit is based on the 1T1C-type ferroelectric capacitor (FeCAP) [27].
Micromachines 12 00385 g004
Figure 5. (a) A 2 × 1 array of the 1T2C cell. (b) Write timing. (c) Read timing.
Figure 5. (a) A 2 × 1 array of the 1T2C cell. (b) Write timing. (c) Read timing.
Micromachines 12 00385 g005
Figure 6. Layouts of the 1T1C and 1T2C FeCAP arrays.
Figure 6. Layouts of the 1T1C and 1T2C FeCAP arrays.
Micromachines 12 00385 g006
Figure 7. Schematic of the proposed mode switchable sense amplifier (SA).
Figure 7. Schematic of the proposed mode switchable sense amplifier (SA).
Micromachines 12 00385 g007
Figure 8. Timing of X(N)OR.
Figure 8. Timing of X(N)OR.
Micromachines 12 00385 g008
Figure 9. Schematic and timing of the latch SA.
Figure 9. Schematic and timing of the latch SA.
Micromachines 12 00385 g009
Figure 10. Schematic of the write-back module.
Figure 10. Schematic of the write-back module.
Micromachines 12 00385 g010
Figure 11. (a) The timing of latch data and X(N)OR. (b) The timing of write-back when the output of X(N)OR is “0”. (c) The timing of write-back when the output of X(N)OR is “1”.
Figure 11. (a) The timing of latch data and X(N)OR. (b) The timing of write-back when the output of X(N)OR is “0”. (c) The timing of write-back when the output of X(N)OR is “1”.
Micromachines 12 00385 g011
Figure 12. (a) The two-step write-back circuit of the 1T2C-type FeCAP. (b) Timing of the two-step write-back circuit.
Figure 12. (a) The two-step write-back circuit of the 1T2C-type FeCAP. (b) Timing of the two-step write-back circuit.
Micromachines 12 00385 g012
Figure 13. (a) Cross-sectional TEM images of a 20 nm thick HfO2 ferroelectric capacitor. (b) Fabrication process flow of a Hf0.5Zr0.5O2 ferroelectric capacitor.
Figure 13. (a) Cross-sectional TEM images of a 20 nm thick HfO2 ferroelectric capacitor. (b) Fabrication process flow of a Hf0.5Zr0.5O2 ferroelectric capacitor.
Micromachines 12 00385 g013
Figure 14. The endurance of the FeCAP used in this work.
Figure 14. The endurance of the FeCAP used in this work.
Micromachines 12 00385 g014
Figure 15. Ferroelectric capacitance model.
Figure 15. Ferroelectric capacitance model.
Micromachines 12 00385 g015
Figure 16. (a) Measured and (b) simulated P-V curves.
Figure 16. (a) Measured and (b) simulated P-V curves.
Micromachines 12 00385 g016
Figure 17. The simulation results of X(N)OR.
Figure 17. The simulation results of X(N)OR.
Micromachines 12 00385 g017
Figure 18. Dual-row activation to realize X(N)OR. (a) The writing status of C1 and C2. (b) The writing status of C3 and C4. (c) Precharging and charge-sharing process. (d) X(N)OR readout process.
Figure 18. Dual-row activation to realize X(N)OR. (a) The writing status of C1 and C2. (b) The writing status of C3 and C4. (c) Precharging and charge-sharing process. (d) X(N)OR readout process.
Micromachines 12 00385 g018
Figure 19. Minimum margin of the neighboring states and maximum latency time under different process corners in the X(N)OR mode.
Figure 19. Minimum margin of the neighboring states and maximum latency time under different process corners in the X(N)OR mode.
Micromachines 12 00385 g019
Figure 20. Monte Carlo simulation results of X(N)OR (a) at 125 °C (b) at −20 °C.
Figure 20. Monte Carlo simulation results of X(N)OR (a) at 125 °C (b) at −20 °C.
Micromachines 12 00385 g020
Table 1. Comparison of material properties and scalability between HfO2, PZT (lead zirconate titanate), and BTO (barium titanate).
Table 1. Comparison of material properties and scalability between HfO2, PZT (lead zirconate titanate), and BTO (barium titanate).
CharacteristicsPZTBTOHfO2
Thickness (nm)>70>255–40
Bandgap (eV)3–4~3.15.3–5.6
Dielectric constant~1300150–250~30
CMOS compatibilityPb and O2 diffusionBi and O2 diffusionStable
Remnant polarization (2Pr) (μC/cm2)20–40<101–40
Table 2. Model parameters of the FeCAP model.
Table 2. Model parameters of the FeCAP model.
Model parameterα (m/F)β (m5/F/C2)γ (m9/F/C4)R0 (Ω)C0 (pF)
Value−6.25 × 1094.88 × 10271.43 × 1047625288
Table 3. The truth table of the in-memory X(N)OR circuit.
Table 3. The truth table of the in-memory X(N)OR circuit.
C1C3OUT
000
101
011
110
Table 4. Performance comparison of several in-memory X(N)OR logic implementation plans.
Table 4. Performance comparison of several in-memory X(N)OR logic implementation plans.
ArchitectureNonvolatileMemory
Cell
TechnologyX(N)OR-Aera
SRAM-based [16]No6T28 nmSA and Ref
DRAM-based [26]No1T1C45 nm10T
RRAM-based [20]Yes1T1R65 nmCSA and Ref
MRAM-based [21,22]Yes2T1MTJ/1MTJ28 nm/40 nm12T/15T
1T1C FeCAP-based [27]Yes1T1C28 nm5T
1T2C FeCAP-basedYes1T2C28 nm5T
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, Q.; Zhang, D.; Zhao, Y.; Liu, C.; Hu, Q.; Liu, X.; Yang, J.; Lv, H. A 1T2C FeCAP-Based In-Situ Bitwise X(N)OR Logic Operation with Two-Step Write-Back Circuit for Accelerating Compute-In-Memory. Micromachines 2021, 12, 385. https://doi.org/10.3390/mi12040385

AMA Style

Wang Q, Zhang D, Zhao Y, Liu C, Hu Q, Liu X, Yang J, Lv H. A 1T2C FeCAP-Based In-Situ Bitwise X(N)OR Logic Operation with Two-Step Write-Back Circuit for Accelerating Compute-In-Memory. Micromachines. 2021; 12(4):385. https://doi.org/10.3390/mi12040385

Chicago/Turabian Style

Wang, Qiao, Donglin Zhang, Yulin Zhao, Chao Liu, Qiao Hu, Xuanzhi Liu, Jianguo Yang, and Hangbing Lv. 2021. "A 1T2C FeCAP-Based In-Situ Bitwise X(N)OR Logic Operation with Two-Step Write-Back Circuit for Accelerating Compute-In-Memory" Micromachines 12, no. 4: 385. https://doi.org/10.3390/mi12040385

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop