1. Introduction
Ferroelectric random-access memory (FeRAM) based on Hf
0.5Zr
0.5O
2 film has attracted great attention because of its potential advantages of fast programming speeds [
1,
2], low operating power [
3,
4], and good CMOS compatibility [
5,
6]. Therefore, Hf
0.5Zr
0.5O
2-based FeRAM is usually considered as one of the promising candidates for next-generation nonvolatile memory. In principle, Hf
0.5Zr
0.5O
2-based FeRAM benefits from its thin thickness and its compatibility to the advanced CMOS process node [
7]. It also can meet the requirements of today’s integrated circuits for high-density storage applications. However, the scaling of FeRAM capacitors is still limited compared to that of transistors, leading to low area efficiency. For instance, the ferroelectric capacitor (FeCAP) area was 40 × 10
3 nm
2 for the 28 nm node in Stefan et al.’s work [
8], and the FeRAM area was 0.49 um
2 for the 130 nm node in Zhao et al.’s work [
9]. Considering the above reasons, multi-level cell (MLC) FeRAM for high-density storage applications has also been continuously explored in recent studies. For instance, K. Asari et al. used a three-pulse accessing scheme to achieve multi-level technology for FeRAM-embedded reconfigurable hardware [
10]. Kai Ni et al. demonstrated one type of MLC non-volatile memory by fabricating three ferroelectric-insulator layer-based structures [
11]. However, some problems still need to be solved before its practical application, such as the small operation margin of MLC FeRAM and the large input offset of a readout circuit SA. These issues usually lead to the misreading of memory states, thus restricting the application of MLC FeRAM to high-density storage. The circuit design is usually considered as one critical step to make the connection between the study of a single device and the practical fabrication of microchips. It can help to solve some problems that cannot be overcome in device studies and can be used in trial-and-error approaches before chip fabrication to save the economic cost and time cost. Therefore, it is necessary to solve the issues of a small operation margin and a large input offset of MLC FeRAM using the circuit design and optimization.
In this work, we propose a configuration of Hf0.5Zr0.5O2-based 3TnC MLC FeRAM with good area efficiency. The nondestructive readout approach is used, and a capacitorless offset-canceled SA is designed to solve the abovementioned issues, which leads to a wide operation margin and read reliability. The experimental electrical characteristics and a SPICE model of a Hf0.5Zr0.5O2-based FeRAM device are introduced first in this paper, which presents eight polarization states for MLC. Subsequently, the circuit structure and the operation of a 3TnC MLC FeRAM macro are presented in the following sections. Then, the capacitorless offset-canceled SA is proposed to minimize the mismatch of the readout transistor and the readout circuit. Finally, the layout of the 4 Mb 3TnC MLC FeRAM is provided with high area efficiency.
2. FeRAM Device Characteristics and SPICE Model
Figure 1a shows that the FeCAP cells are integrated between the metal layers of M5 and M6 in the GSMC 130 nm logic process. After the front-end-of-line (FEOL) process, the FeCAP device was fabricated by utilizing the back-end-of-line (BEOL) process [
12,
13], as shown in the bottom right inset of
Figure 1a. Firstly, TiN film was deposited as the bottom electrode (BE) by using radio frequency (RF) reactive sputtering. Subsequently, the Hf
0.5Zr
0.5O
2 film with a thickness of 10 nm was deposited on the BE via atomic layer deposition (ALD), in which the stoichiometric ratio of the Hf and Zr elements was 1:1. Finally, TiN was deposited as the top electrode (TE) via RF reactive sputtering, followed by a step of rapid thermal annealing. Through these above fabrication steps, we experimentally fabricated the Hf
0.5Zr
0.5O
2-based FeCAP devices, and the size of each single device was 0.7 μm × 0.7 μm. The upper right inset of
Figure 1a shows the transmission electron microscopy (TEM) image of a single Hf
0.5Zr
0.5O
2-based FeCAP device, which shows its metal–ferroelectric–metal structure and confirms the 10 nm thickness of the Hf
0.5Zr
0.5O
2 film.
Figure 1b shows the experimentally measured P–V hysteresis curves and the simulated curve using a SPICE model, in which different sweep voltages of ±1.5 V, ±2.0 V, ±2.5 V, and ±3.0 V were utilized to perform the multiple level states. The P–V hysteresis curves were measured using a ferroelectric tester (Precision Premier II, Radiant Technologies, Inc., Albuquerque, NM, USA). Taking the sweep voltage of ±1.5 V as an example, the value of remnant polarization (Pr) was estimated to be 13 μC/cm
2. It also can be seen that the remanent polarization became larger when increasing the applied voltages, while these different remanent polarization states represent its potential application for multiple-level storage.
Figure 1b summarizes the eight positive and negative polarization states measured by using different applied voltages, which can be defined as the states from “111” to “000”. Thus, the eight polarization states can correspond to three bits in one single device of MLC FeRAM.
In order to make the subsequent circuit design of MLC FeRAM, stimulation was necessary to ensure a good fit with the experimental P–V curve, thus ensuring the simulation result was compatible with the performance of real devices. To simulate the electrical characteristics of our MLC FeRAM, we utilized the physics-based circuit-compatible SPICE model based on the single-domain approximation, referring to the literature by Aziz et al. [
14], as shown in
Figure 1c. In fact, this model specifically focuses on the compatibility of FEFET-based circuits with efficient design and analysis. This SPICE model is described using the time-dependent Landau–Khalatnikov equation [
15] as follows:
where
ρ is the kinetic coefficient;
α,
β, and
γ are the static parameters of the ferroelectric layer;
P is the FeRAM remanent polarization; and
E is the applied voltage on the FeRAM device. Further, by setting
QP,
TFE, and
AFE as the polarization charge stored in the FeRAM, the thickness, and the area of the FeRAM device, respectively, the time-dependent Landau–Khalatnikov equation can be described as follows:
FeRAM is modeled as a nonlinear capacitor (
CLK) that is connected in series with a resistor (
RLK =
ρ ×
TFE/
AFE), in which the nonlinear capacitor is simplified to the polynomial voltage-controlled voltage source (PVCVS). As the current flows through
RLK and PVCVS, the current is captured through the current-controlled current source (CCCS). Then, the
Ci of 1
F is charged by the current of CCCS, while the voltage across the CCCS is equal to the
QP in FeRAM. Therefore, the dashed block diagram implements the formula (
TFE × (
αQP/
AFE +
βQ3P/
A3FE +
γQ5P/
A5FE)). Finally, the voltage drop of FeRAM is equal to the sum of the voltage drop of the nonlinear capacitor
CLK and resistor
RLK, which implements the Landau–Khalatnikov equation. The P–V curve can be simulated by calculating the remanent polarization
P =
QP/AFE and monitoring the applied voltage
VFE on FeRAM.
Table 1 summarizes the parameters used in this model for MLC FeRAM, in which C
FE is the parasitic parameter of FeRAM. By adjusting the values of PVCVS (
α,
β,
γ) and the parasitic parameter, the
P–
V hysteresis curves of the MLC FeRAM device were simulated under different sweep voltages of ±1.5 V, ±2.0 V, ±2.5 V, and ±3.0 V, respectively. As shown in
Figure 1b, the simulated P–V curves of this model can fit well with the experimentally measured data of the FeRAM device, which also ensures its feasibility in the subsequent circuit design. This model will be used for the design and simulation of a 3TnC MLC FeRAM macro circuit, as discussed in later sections.
3. Circuit Structure and Operation of 3TnC MLC FeRAM Macro
Figure 2 shows the circuit structure of our 4 Mb 3TnC MLC FeRAM macro, which comprises one 4 Mb bank and the peripheral circuit. The 4 Mb bank consists of eight 512 Kb split banks, while each split bank contains 256 word-lines (WL) or plate-lines (PL) and 2048 bit-lines (BL). Herein, one split bank includes four 128 Kb segments, where each segment contains 256 WLs or PLs and 512 BLs. Further, one segment includes sixteen 8 Kb blocks, where each block contains 256 WLs or PLs and 32 BLs and 32 3TnC arrays. The 3TnC means that there is one pass transistor
QPA, one reset transistor
QR, one pass transistor
QPA, and one MLC FeCAP in a minimum memory unit. Therein, a reset transistor
QR and a readout transistor
QG, as a common read/write circuit, are shared by 256 memory units in one array. Meanwhile, the pass transistor
QPA only has the switch function. Therefore, the 3TnC also means there are three types of transistors (
QR,
QG and
QPA) and 256 FeCAPs in one memory array. In the peripheral circuit, one 1/32 column mux corresponds to one block, while one split bank corresponds to a 16 × 4 column mux. Similarly, one split bank includes 16 × 4 sense amplifiers. The row driver and decoder are used to address and decode. The local timing control circuit can drive the pulse sequence of the write operation and nondestructive readout. Finally, by selecting one of eight split banks and four segments, the output of 16 bits is obtained for the MLC FeRAM macro.
To expand the reading margin between adjacent storage levels in MLC FeRAM, we used a nondestructive readout scheme. In the traditional 1T1C array, the destructive readout scheme usually adopts the operation mode of power supply voltage
VDD to read out and write back. In comparison, the use of
VRD (less than the coercive field voltage) in our nondestructive readout scheme did not destroy the residual polarization between the adjacent levels of MLC FeRAM, thereby avoiding the misreading of the stored data between adjacent levels. This scheme is beneficial for improving the read reliability characteristics of MLC FeRAM [
16].
An operation sequence for the nondestructive readout scheme is given according to
Figure 3. Firstly, in the writing phase, a pass transistor
QPA and a reset transistor
QR turn on, which corresponds to the WL and the reset line (RL) turning on. Then, either the PL is applied with the write pulse
VWR for the data “111”, or the source line is applied with the write pulse for the data “000”. Secondly, during the reset phase, a pass transistor
QPA turns off and a reset transistor
QR turns on, which corresponds to the WL turning off and the RL turning on. This step leads to removing the residual charge on the floating gate
QG. Finally, in the readout phase, a pass transistor
QPA turns on and a reset transistor
QR turns off, which means the WL is turned on and the RL is turned off. By applying the voltage V
RD (less than the coercive field voltage) to the PL, the FeRAM-stored data are read out to the BL through the readout transistor
QG. In the last step, since the readout scheme is nondestructive, the readout transistor
QG, as a gain cell, can expand the reading margin of FeRAM [
17]. Therein, the sense margin can reach approximately 450 mV between two adjacent storage levels. This large sense margin can meet the requirement for distinguishing the eight different states from the “000” state to the “111” state for MLC FeRAM.
Figure 4 shows the overall pulse sequence diagram of the write–verify scheme. Due to the different residual polarization states of MLC FeRAM obtained by applying different voltages, the pulse sequence mode should be 2′b01 or 2′b11. However, to ensure the correctness of the written data for MLC FeRAM, the verify operation is added after the write operation, that is, the readout operation. If the read data are the same as the estimated data, which means the verification is correct, the pulse sequence continues to write the next adjacent storage level of the MLC FeRAM. If the verification is wrong, the pulse sequence mode enters 2b′00 or 2b′10 until the verification is correct. It should be emphasized that the polarization reversal of the ferroelectric domains is a relaxation phenomenon. Thus, the overall pulse sequence of different pulse widths is required to adjust the effectiveness of the written data for MLC FeRAM.
Owing to the ideal electrical characteristics of the SPICE model of FeRAM, we adopt the 2′b01 mode to simulate the distribution condition of the readout voltage for each storage unit level of the MLC FeRAM. After 10 k Monte-Carlo simulations in the 16 Kb MLC array, each storage cell level can be effectively distinguished without the overlapping of the readout voltage distribution, as shown in
Figure 5. At the same time, it can be seen that there is a nearly 450 mV reading margin between each storage unit level of MLC FeRAM.
4. Capacitorless Offset-Canceled Sense Amplifier
Due to the fluctuation of the CMOS process, there is usually a mismatch phenomenon in the readout transistor
QG of the 3TnC cell array and the readout circuit SA, resulting in a large input offset. To improve the reliability of the readout stored data between adjacent storage levels in MLC FeRAM, we propose a capacitorless offset-canceled SA to minimize the mismatch of SA and readout transistor. Meanwhile, compared with the single-capacitor offset-canceled SA [
18], the capacitorless offset-canceled SA uses the parasitic capacitor of a transistor to replace the original single metal/insulator/metal (MIM) capacitor, thus saving the area of whole chip.
The minimization mismatch principle of capacitorless offset-canceled SA is explained below. Firstly, in the offset cancellation phase, the outputs of inverters are connected to their inputs in
Figure 6a, which correspondingly close the switches of “pset_n”, “nset”, and “S1” in
Figure 6b. The parasitic capacitor of transistor
Q collects the trip voltage of inverters, leading to the formation of two voltages of
VL and
VR at the two sides of transistor
Q. Secondly, in the precharge phase, one side of the parasitic capacitor of transistor
Q is connected to the ground, which correspondingly closes the switches of “S2R” and “S1B”, while keeping the other switches open. Therefore, the other side of the parasitic capacitor of transistor
Q obtains the voltage
VR −
VL, which is the difference between the two trip voltages of
VL and
VR. In the BL sampling phase, the switches of “S3R” and “S1B” are closed, while the other switches are open. The different reference voltage
Vref is added to the voltage
VR −
VL for different storage levels of MLC FeRAM, which compensates for the mismatch of the two side inverters, thus canceling out the offset of the SA. Finally, in the evaluation phase, the switches of “pset_n”, “nset”, and “S1B” are closed, while the other switches are open. The SA can be quickly sensed thanks to the canceling out of this offset. Under the conditions of a TT process corner, 3.3 V, and 25 °C,
Figure 6b shows the simulation result of the output waveforms of “BL<0>” and “BL<1>” in the SA. Herein, it is noted that the offset cancellation and precharge phases can be run concurrently with the reset operation of the 3TnC array, thus avoiding any timing penalty for the proposed method.
Figure 7 shows the relationship between the input offset voltage and transistor size for both the proposed SA (capacitorless SA) and the conventional SA (conv. SA). Generally, the mismatch of transistors in the SA minimizes with the increment in its size, which means the input offset of all transistors of the SA also reduces accordingly. Importantly, after 10 k Monte-Carlo simulations, compared to the conv. SA, the standard deviation of the input offset can be reduced on average by nearly 45% in the proposed SA due to its minimization mismatch principle. Meanwhile, compared to the single MIM-capacitor SA with the same offset voltage and CMOS process, the area of capacitorless SA can be decreased by 35%.
5. The Layout of 4 Mb 3TnC MLC FeRAM and a Comparison with Other Memory Works
Figure 8 shows the layout of 4 Mb 3TnC MLC FeRAM, with an area of 3052 × 4306 μm
2, consisting of the 3TnC cell array, the capacitorless SA, and the other peripheral circuits. The inset shows the layout of the single 3T1C cell array. “AA” and “GATE” mean the active area and gate electrode of transistor.
Table 2 illustrates the performance comparison of our work with other memory works. The proposed 3TnC MLC FeRAM macro has the advantages of the high area efficiency of 12F
2 for each bit, a large sense margin of 450 mV between each level of storage data, and a low offset of 20 mV, which are all beneficial for high-density storage applications. Both the read and write time of the cell are 100 ns, while the max power consumption is 48.4 μW for a read and a write operation. Here, it should be noted that the influence of temperature on FeRAM is relatively small, as reported in reference [
19]; the variation in the readout voltage of FeRAM is about 50 mV; and the readout margin of adjacent polarized states of our FeRAM with 3TnC architecture is 450 mV. Therefore, our 3TnC MLC FeRAM chip has good stability.