A Low-Power Ternary Adder Using Ferroelectric Tunnel Junctions

Reuben, John; Fey, Dietmar; Lancaster, Suzanne; Slesazeck, Stefan

doi:10.3390/electronics12051163

Open AccessArticle

A Low-Power Ternary Adder Using Ferroelectric Tunnel Junctions

¹

Chair of Computer Architecture, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91058 Erlangen, Germany

²

NaMLab gGmbH, 01187 Dresden, Germany

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(5), 1163; https://doi.org/10.3390/electronics12051163

Submission received: 1 February 2023 / Revised: 23 February 2023 / Accepted: 24 February 2023 / Published: 28 February 2023

(This article belongs to the Section Circuit and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Computing systems are becoming more and more power-constrained due to unconventional computing requirements like computing on the edge, in-sensor, or simply an insufficient battery. Emerging Non-Volatile Memories are explored to build low-power computing circuits, and adders are one among them. In this work, we propose a low-power adder using a Ferroelectric Tunnel Junction (FTJ). FTJs are two-terminal devices where the data is stored in the polarization state of the device. An FTJ-based majority gate is proposed, which uses a current-mode sensing technique to evaluate the majority of the inputs. By conditionally selecting between the majority and its complement, an XOR operation is implemented, thereby achieving full-adder functionality. Since FTJ-based majority operation is slow, a ternary adder architecture is used to compensate for the speed loss. The ternary adder proposed by us has two stages of full adder and requires O(1) time for n-bit addition. The proposed adder is verified using a simulation in CMOS 130 nm technology. A 32-bit addition can be achieved in 100 μs and consumes 0.78 pJ, which is very power efficient (7.8 nW). The proposed adder can be used in applications where power consumption is crucial, and speed is not a strict requirement.

Keywords:

memristor; Ferroelectric Tunnel Junctions (FTJs); majority logic; sense amplifier; low-power; full adder; ternary arithmetic; carry-free addition

1. Introduction

Computing systems are becoming more and more power-constrained due to the need for computing capability on the edge (edge computing [1]), battery-operated sensors (in-sensor processing [2]), and other such applications. Consequently, there is a great need for low-power circuits. Emerging Non-Volatile Memories (NVMs) like Resistive RAM (ReRAM), Phase-change-memory (PCM), Spin-transfer Torque Magnetoresistive RAM(STT-MRAM), and Ferroelectric memories are possible candidates for building low-power circuits. Among these emerging memories, the Ferroelectric Tunnel Junction (FTJ), which is a type of ferroelectric memory, has gained considerable attention in recent years. An FTJ is a two-terminal device with a thin ferroelectric layer between two electrodes. Under voltage stress, the polarization of the ferroelectric layer can be changed, resulting in two states—one with a high tunneling current and another with a low tunneling current. The data can be stored as the polarization state of the device and can be READ by sensing the tunneling current. The device can be written (polarization state changed) by a ±4.5 V pulse of 10

μ

s duration. Due to the field-driven nature of writing, the energy to write into an FTJ cell is ≈100–500 fJ, which is much less than some emerging non-volatile memories (Resistive RAM, PCM). Moreover, the READ current (tunneling current) of an FTJ is in the pA range (i.e., a current density of 1 pA/

μ m^{2}

[3]), and consequently, the energy leakage is very low. On one hand, this pA current is advantageous for energy-efficient computing. At the same time, the small current requires a large sensing time (10–100

μ

s), compared to some emerging memories. In this work, we explore FTJ and explore the possibility of building a low-power hybrid FTJ-CMOS circuit by exploiting its low READ currents.

Adders form the fundamental computing unit of any computing system. In the literature, many adders built from NVMs are available. These include ReRAM-based adders [4], STT-MRAM based [5], spin-Hall-effect assisted MRAM based [6], Domain wall MTJ [7] etc. However, in all these adders, the NVM device is switched during the computation. Switching the device consumes energy and is also not good for the endurance of these devices. In fact, the limited endurance of these devices is a hurdle to adoption of these devices for computing. In this work, we take a different approach to adder design using NVM. We have pursued, for the first time, a hybrid CMOS-FTJ adder where the FTJ is not switched during the entire computation. This has enabled us to achieve not only a low-power adder but also an endurance-friendly design. However, it should be mentioned that there are applications in which one input remains constant and the other input changes (e.g., Digital filtering applications) in which the FTJ can be used to store the constant input of the adder, and the other input can be applied as a voltage. This will exploit the full power of using an FTJ device for addition and to achieve reconfigurability and will be explored in our future work.

Although FTJ is an energy-efficient device for computing, it requires a large READ time. The READ current for both the states is in pA and a reasonably large time (≈10–100

μ

s) must be given for efficiently differentiating between the two states [8]. Since slow READ would increase the latency, we pursued a ternary architecture for the adder. It is well known that ternary arithmetic (computing with three states instead of conventional two-states) can enable “carry-free” addition i.e., the rippling of carry from lower-significant bit can be avoided achieving

O (1)

latency for n-bit addition [9,10]. A ternary adder architecture was chosen and implemented using ReRAM technology in one of our earlier works [11]. Here in this paper, we investigate ferroelectric memories and once again pursue a ternary adder architecture to compensate for the speed loss in reading from FTJ. The results are promising, and a ternary adder could achieve lower power than adders based on other non-volatile memories. The rest of the paper is organized as follows. Section 2 presents the basic concept of ternary adders and an efficient implementation of ternary adders using full adders, which is proposed by us. Section 3 presents FTJ memory technology, its switching characteristics, and modeling methodology. This is followed by Section 4 in which the FTJ-based majority gate is presented, and its working principle is elaborated. Furthermore, the section expounds on how the majority gate is augmented with extra circuitry to build a full adder. Section 5 presents the simulation methodology and the simulation results and also compares the performance of the proposed FTJ-based adder with other adders based on NVMs. Section 6 concludes our work.

2. Ternary Adder Built from Full Adders

In the literature, “ternary adder” is a broad term and could mean an adder built from ternary logic gates, i.e., logic gates that accept three states in their input and output signals [12]. In our work, by “ternary adder,” we mean an adder whose operands have a ternary representation i.e., the digits of our operands are trits, having three states, and not bits, having two states. Furthermore, as in the binary system, for each bit, the base for each trit in our ternary operands is still the number 2. Binary signed-digit (SD) number systems, in which each trit has one of the possible values in {−1,0,+1}, are an example of that. A good overview of SD adders is presented in [13]. It has long been known that computing with three states enables “carry-free addition” in which two operands with a wordlength of n can be added in constant time, i.e., independent of n in

O (1)

[9,10], whereas binary adders can perform that in only

O (l o g (n))

steps if reasonable hardware resources are used.

As stated in the Introduction section, the ternary adder can compensate for the long READ time by enabling addition in

O (1)

time. However, trits can be difficult to process using FTJ. As will be elaborated later, this work uses FTJ-based circuits to implement majority gates which are in turn used to build the adders. The nature of an FTJ with a small window (the READ current for the two polarization states is 100 pA and 25 pA) makes a third state difficult to realize. Therefore, we pursued a ternary adder which is “ternary” in concept but “binary” in processing i.e., each pair of trit to be added is represented as two bits and processed.

In the literature, different solutions for the realization of ternary adders can be found. The first solutions were based on a direct conversion of the corresponding function tables for one digit position and required up to three compute steps, or they were based on function tables with two neighbored digits as inputs and required two compute steps [14]. Whereas later solutions were based per digit on two full adders to be processed in two successive steps. Figure 1 depicts such an example of an addition of two 4-trit long ternary numbers. Full adders were used due to the fact that full adders were available as a highly-optimized synthesizable solution in nearly every library of a semiconductor manufacturer. Therefore, implementing carry-free adders only costs about double the area compared to a conventional serially working ripple-carry-adders (RCA) by offering the advantage of addition in

O (1)

compared to

O (n)

in an RCA. The solution shown in Figure 1 is an optimized version proposed by us compared to the solution presented in [15] which uses an inverter in one of the input signals in the full adder. The inverters were avoided by using the unusual coding scheme we favor compared to the more known plus-minus coding used in [15].

3. Ferroelectric Tunnel Junctions: Basics

3.1. FTJ: Device Structure and Switching Operation

Before describing FTJ-based majority gates, a brief introduction to FTJ memory technology is needed because it is a new entry to emerging non-volatile memories. FTJs are basically two-terminal devices composed of a ferroelectric layer sandwiched between two metal electrodes [16]. The initial polarization of the ferroelectric material can be reversed by the application of a voltage pulse. The change in the polarization of the ferroelectric material exhibits itself as a polarization-dependent tunneling current through the ferroelectric layer. When this current is sensed, two different currents result corresponding to the two polarization states [17]. In other words, the internal polarization state gives rise to the non-volatility of the FTJ, and the state is sensed by reading out the polarization-dependent tunneling current.

In FTJ, the data is stored in the polarization state of the device. Let us denote polarization UP as logic “1” and polarization down as logic “0”. When the device is programmed to polarization UP, it has a higher tunneling current than when it is programmed to polarization DOWN (Figure 2). As depicted in Figure 2, the FTJ can be switched between the two states by applying ±4.5 V of 10

μ

s duration when it is used as a memory device. This feature can be used to store ternary data in a non-volatile way directly inside the adder circuit. For example, in a neuronal network application, this would allow the storage of the weights directly inside the adder circuits without the need for additional memory elements. However, in this work, the FTJ device is not used to store the input and, consequently, never switched during computation.

3.2. FTJ: Device Characteristics and Modeling

In this work, an FTJ model developed by NaMLab and fitted to a prototype device, based on Hf_0.5Zr_0.5O₂ (HZO), is used for circuit simulations [18]. In this section, we briefly discuss the prototype device, its electrical characteristics, and how it was modeled. The tunneling currents corresponding to polarization UP and DOWN states are 100 pA and 25 pA, respectively, assuming a device size of ~100

μ

m². The characterization presented here has been performed on large (~3000

μ

m²) bilayer FTJs composed of TiN (10 nm)/HZO (10 nm)/Al₂O₃ (2 nm)/TiAlN (17.5 nm) fabricated with physical vapor deposition (PVD) and atomic layer deposition (ALD), as described fully elsewhere [3]. It should be noted that the HZO and dielectric thicknesses give trade-offs in retention, switching voltage, and tunneling current density [19,20]. The energy consumption of our ternary adder can be somewhat tuned through these parameters, and in particular, since the main energy cost is in the read operation, a thicker ferroelectric layer may be preferable despite the higher write energy cost due to the improvement in retention.

The FTJ device characteristics were simulated using a Verilog-A model based on Preisach equations [21]. In the Preisach switching model, domains are non-interacting and can be described with independent hysteresis curves (hysterons), which vary in their remanent polarization,

P_{r}

, and coercive fields,

E_{c +}

and

E_{c -}

for positive and negative polarity, respectively. Full models of ferroelectric switching in FTJs must take into account additional impacts, for example, charge injection is necessary during switching in order to compensate polarization charges when the ferroelectric is separated from the metallic electrodes [22]. In the compact Verilog-A model, such phenomena are accounted for by calibrating the input parameters based on experimental data in order to capture the switching behavior. The dielectric capacitive displacement originating from the background permittivity is modeled as an additional parallel-connected capacitor. Miller et al. [23] proposed the use of the hyperbolic tangent function to describe the switching properties of the ferroelectric as a function of the applied field,

P (E)

, and verified the validity experimentally. This leads to the approximation:

P (E, k, P_{o f f}) = k \cdot P_{s a t} \cdot t a n h (\frac{E \pm E_{c}}{2 δ}) + P_{o f f}

(1)

where k and

P_{o f f}

are a scaling factor and offset polarization, respectively, which are needed when describing non-saturated switching behavior [24];

P_{s a t}

is the saturation polarization; and

δ

is a constant describing the steepness of the switching, given by:

δ = E_{c} {[l n \frac{1 + P_{r} / P_{s a t}}{1 - P_{r} / P_{s a t}}]}^{- 1}

(2)

defined such that

P (0) = - P_{r}

;

P_{s a t}

is the saturation polarization, with

P_{s a t}

≥

P_{r}

.

Following Jiang et al. [24], asymmetry in the positive and negative branches of the switching loop could be accounted for by assuming different E_c values for the positive and negative polarity, replacing

E_{c}

, k, and

P_{o f f}

with

E_{c, \pm}

,

k_{\pm}

, and

P_{o f f, \pm}

in Equations (1) and (2).

Note that the field E is applied to the whole stack. The dielectric layer causes a field drop which reduces the effective field

E_{e f f}

seen by the ferroelectric layer, and this is considered by using a capacitive voltage divider model to calculate

E_{c}

as the coercive field of the full stack, i.e., a ferroelectric layer in series with a dielectric tunneling barrier. The capacitance of the ferroelectric layer can further be used to capture the fraction of the ferroelectric and non-ferroelectric phases [25].

Finally, the leakage through the stack,

I_{l e a k}

, at a given voltage, V, and for a given polarization state, P, is approximated using an exponential function:

I_{l e a k} = R_{A, 0} A_{t o t} (e x p (\frac{V}{V_{P E}}) - 1)

(3)

where

R_{A, 0}

is a baseline resistance that depends on the material properties,

A_{t o t}

is the device area, and

V_{P E}

is a function that takes into account the current polarization state of the device with respect to the full resistance range:

V_{P E} = V_{P 0} - Δ V_{P} \cdot \frac{P}{P_{r}}

(4)

V_{P 0}

,

Δ

V_{P}

, and

R_{A, 0}

are all device-dependent parameters that must be fitted from experimental data. Figure 3 shows measured data on polarization switching (Figure 3a) and FTJ ON/OFF leakage currents (Figure 3b) used to calibrate the compact model used for this experiment.

Since FTJ devices have small read currents, they have the main advantage of being inherently low-power devices. Indeed, in an FTJ operation, the main energy cost comes during SET and RESET operations. Additionally, most HZO-based ferroelectric devices currently require wake-up cycling, which is a certain number of electric field cycles required before devices reach their full ferroelectric switching properties and/or memory window [26]. Since bilayer FTJs have a dielectric layer that reduces the voltage drop over the ferroelectric, they typically require large switching voltages (Figure 3a). However, the devices can be operated at ≤5 V [25], and both wake-up and SET operations have been investigated using a gradual switching approach [27], which allows the operating voltage to be further reduced to fit circuit requirements.

4. FTJ–Based Majority Gate

4.1. Operation of the FTJ-Based Majority Gate

As stated, in this work, we pursued a circuit in which the FTJ is not switched to conserve the lifetime/endurance of the device. Figure 4 depicts the structure of an FTJ-based majority gate proposed by us. It consists of transistors N5 and N6 forming a differential pair and a sense amplifier (formed by transistors N1, N2, P1–P4). Transistors N3 and N4 connect these two circuits to form an FTJ-based majority gate. The inputs of the majority gate are applied as voltages to the three FTJs as

A, B, C_{i n}

on the left. All three FTJs are pre-programmed to polarization “UP” state (this is a one-time event performed during the fabrication of the majority gate). When the inputs are applied, depending on the inputs, the node

n_{1}

gets charged to different voltages through the FTJs. In our circuit, logic “1” is 2.25 V, and logic “0” is 0 V. The FTJ on the right is used to charge the node

n_{2}

to a fixed voltage and forms the reference during the sensing operation. The majority of the operation is achieved as a comparison operation. Node

n_{1}

, which is the gate of N5, and node

n_{2}

, which is the gate of N6, form the crucial nodes which decide the output. The operation of the proposed gate is achieved in two phases—the charge accumulation phase followed by the sensing phase. Just before the charge accumulation phase, nodes

n_{1}

and

n_{2}

are pre-charged to 0.3 V through transmission gates TG1 and TG2, respectively. Then the inputs of the majority gate are applied to the FTJs (2.25 V if logic “1” and 0 V is logic “0’) during the charge accumulation phase. During this phase, node

n_{1}

is charged through three FTJs on the left, and similarly, node

n_{2}

is charged through the FTJ on the right. If the input (

A / B / C_{i n}

) to the majority gate is HIGH, there is a high potential (2.25−0.3 V

\approx 2 V

) across that FTJ, and a higher tunneling current is supplied to charge

n_{1}

. Conversely, if the input is low, the tunneling current through that corresponding FTJ is negligible and does not affect

n_{1}

significantly. Thus

n_{1}

gets charged to a voltage corresponding to the inputs (

A, B, C_{i n}

). Simultaneously, node

n_{2}

gets charged through an FTJ whose area is much smaller than the FTJs on the left (

A_{F T J}

is the area of the FTJ charging

n_{1}

). Hence it conducts a sub-pA current and charges

n_{2}

to 315 mV, forming a stable reference (a different reference voltage can be achieved by changing the area of the FTJ charging

n_{2}

). The charge accumulation phase is allotted 50

μ

s because the FTJ tunneling current is in pA, and 50

μ

s are needed to charge the nodes

n_{1}, n_{2}

to a reasonable voltage that can be sensed by the SA. After this, the sensing phase starts in which the Sense amplifier’s enable signal, EN, goes high. As shown in the waveforms, during the charge accumulation phase, the EN is low, and so transistors N3 and N4 are OFF, and there is no current flow either in the sense or reference path from

V_{D D}

to ground. When the EN goes high, transistors N3 and N4 are turned ON, and this causes transistors N5 and N6 to conduct. However, N5 and N6 do not conduct equal currents, but their currents are determined by the voltages at nodes

n_{1}

and

n_{2}

. If the potential at

n_{1}

is higher than

n_{2}

, N5 (sense path) conducts a higher current than N6 (reference path) and vice versa.

The conceptual waveforms on the right depict the functioning of the SA for the case when

n_{1}

is higher than

n_{2}

. When

n_{1} > n_{2}

, N5 conducts a larger current. It must be noted that nodes

S A_{o u t}

and

\bar{S A_{o u t}}

were pre-charged to

V_{D D}

(through P1 and P4) when EN was low. Since the sense path conducts more current than the reference path,

\bar{S A_{o u t}}

starts discharging faster than

S A_{o u t}

. However,

\bar{S A_{o u t}}

is the input for the inverter formed by P3–N2. Hence the drop in voltage at

\bar{S A_{o u t}}

leads to a high at node

S A_{o u t}

. This, in turn, reinforces the low at

\bar{S A_{o u t}}

because

S A_{o u t}

is the input to the inverter formed by the P2–N1 pair. In other words, when EN goes high, the faster discharge of the voltage at

\bar{S A_{o u t}}

is reinforced by the positive feedback formed by the cross-coupled inverters. Hence,

S A_{o u t}

goes high and

\bar{S A_{o u t}}

goes low.

The FTJ-based majority gate of Figure 4 was designed in 130 nm CMOS technology. As stated, the reference voltage was achieved by charging node

n_{2}

through a smaller FTJ. For

50 μ s

charge accumulation phase,

n_{2}

gets charged to 315 mV. However, the voltage to which node

n_{1}

gets charged during the charge accumulation phase is not fixed but depends on inputs

A, B, C_{i n}

of the majority gate. For cases (000,001,010,100), the node

n_{1}

gets programmed to 310 mV or less. Hence for theses cases,

n_{1} < n_{2}

and consequently

S A_{o u t}

= 0, resulting in the majority i.e., logic “0”. For cases (011,101,110,111), the node

n_{1}

gets programmed to 320 mV or more. Hence for theses cases,

n_{1} > n_{2}

and consequently

S A_{o u t}

= 1, resulting in the majority, i.e., logic “1”. The voltages at nodes

n_{1}

and

n_{2}

for all eight cases are tabulated in Table 1.

4.2. Full Adder (FA)

Having designed a majority gate using FTJs, we need to design a full adder to implement the circuit depicted in Figure 1. A full adder can be implemented using majority gates. In the literature, a full adder can be implemented in three logic levels (using 3-input majority gates [28]) or in two logic levels (using a 3-input majority gate followed by a 5-input majority gate [29]). However, both these methods incur latency and are not advantageous for our implementation since a single majority operation incurs ≈50

μ

s. Therefore, we formulated a strategy to get the Sum (A⊕B⊕

C_{i n}

) from the majority gate output itself. As depicted in Figure 5, in six out of eight cases, the

X O R (A, B, C_{i n})

is equal to

\bar{M A J} (A, B, C_{i n})

. Hence, we can derive the XOR (Sum of FA) from MAJ output if we can use a multiplexer to select between

M A J (A, B, C_{i n})

and

\bar{M A J} (A, B, C_{i n})

. The “Control” signal of the multiplexer is high for the six cases and thus,

X O R (A, B, C_{i n})

=

\bar{M A J} (A, B, C_{i n})

. For (

A, B, C_{i n}

) = (000) and (111) case, the “Control” signal goes low, and

X O R (A, B, C_{i n})

=

M A J (A, B, C_{i n})

. The “Control” signal is generated from the inputs

A, B, C_{i n}

and

E N

, as depicted in Figure 5. Only for (

A, B, C_{i n}

) = (000) or (111) there is a path between transistor P1 and N1. When

E N

is low, the capacitance at node

c_{1}

is charged to

V_{D D}

. When

E N

goes high, transistor

N 1

is ON, and it discharges node

c_{1}

if a path exists to ground. For (000,111) cases, a path does exist, and node

c_{1}

is discharged, and consequently, the “Control” signal goes low. Thus

X O R (A, B, C_{i n})

=

M A J (A, B, C_{i n})

. For the other six cases, there is no conducting path from

c_{1}

to ground. Consequently, the “Control” signal remains HIGH and thus

X O R (A, B, C_{i n})

=

\bar{M A J} (A, B, C_{i n})

. In this manner, we can get the Sum output in the same step as the majority operation (with a few picoseconds delay of the multiplexer), and this helps in reducing the latency of FA.

5. Simulation Methodology and Results

5.1. Simulation of Ternary Adder

As stated in Section 4.1, the FTJ-based majority gate is designed in 130 nm CMOS technology. The full adder is constructed by augmenting the FTJ-based majority gate with a 2:1 multiplexer and the control circuitry to generate the “Control” signal for the multiplexer, as depicted in Figure 5. The FTJ model used in this circuit was fitted to NaMLab devices which have a Tunneling Electro Resistance (TER) of ≈4. In literature, there exists a device with a higher TER of 30 [30] and 100 [31]. If FTJ devices with high TER are used for our adders, the MAJORITY gate latency (and hence full adder latency) can be reduced significantly, resulting in even lower latency. Any n-trit adder can be constructed by cascading two stages of full adders (Figure 1). The 4-trit adder was constructed as depicted in Figure 6. It must be noted that the FTJ-based majority gate requires logic “1” to be 2.25 V at its input, as opposed to the operating

V_{D D}

of 1.2 V. Hence the

S u m

and

C_{o u t}

of the first stage will be 1.2 V if they are logic “1” and they need to be boosted up to 2.25 V for the second stage of full adders. Therefore, level shifters (LS) were introduced between the stages to shift the logic “1” from 1.2 V to 2.25 V. The circuit of Figure 6 was verified using a simulation for functional correctness.

5.2. Energy, Latency and Power Consumption of Ternary Adder

We can analyze the energy for ternary addition by analyzing the energy required for 1-bit addition and scaling it proportionally. By simulating a 1-bit full adder in 130 nm CMOS, the average 1-bit addition energy was calculated to be 12.3 fJ (calculated by integrating the current drawn from

V_{D D}

, i.e.,

V_{D D}

\int I_{V D D} \cdot d t

). The low energy is due to a CMOS-like circuitry except for the majority gate, which is composed of a sense amplifier (SA). The SA consumes very low power since the cross-coupled latch is very energy efficient and does not conduct current for a long time during the sensing, i.e., the sense and reference path conducts current from

V_{D D}

to the ground for ns duration when either

S A_{o u t}

or

{\bar{S A}}_{o u t}

are discharged, and both paths are cut-off after that (Figure 4), thus conserving energy.

E n e r g y_{t r i t} = 2 \times 12.3 f J = 24.6 f J

(5)

Energy for 32-bit addition will be

E n e r g y_{32 - b i t} = 32 \times 24.6 f J = 787 f J

(6)

The latency for 32-bit addition is straightforward since ternary addition is performed in

O (1)

time, independent of the word length. Since a 1-bit FA requires 50

μ s

, the entire 32-bit addition is performed in 100

μ s

by cascading the two stages of FAs, as depicted in Figure 6. Having calculated the energy and latency, the power consumption is calculated as (

\frac{E n e r g y}{L a t e n c y}

) in watts (Table 2).

5.3. Comparison with 32-bit Adders in Other Memory Technologies

In the literature, there are many adders reported which are built from emerging NVMs. These adders range from pure NVM-based (without any CMOS transistors) to hybrid structures (NVM and CMOS together). Furthermore, some NVM-based adders are “in-memory” adders, i.e., the adder is not constructed as a stand-alone circuit but is realized in the memory array. To enable apples-to-apples comparison, we do not compare our results with such in-memory adders. Instead, we compare our adder with NVM adders proposed as dedicated adders. Table 2 lists the energy, latency, and power dissipation of NVM adders. The FTJ-based adder consumes very low energy compared to a pure CMOS adder or SOT-DW-based adder. In terms of power consumption, the proposed FTJ-based adder consumes the lowest power of 7.8 nW. This is due to the CMOS-like functionality of the FTJ-based adder where the FTJ is neither written nor switched during the entire computation (the majority operation and consequently the XOR operation are achieved as a READ operation of the FTJ device, Figure 4). This comes at the cost of speed, with our adder displaying a large latency (100 µs). This makes it more fitting for applications where reduced power consumption is the most stringent requirement, and slow speeds can be tolerated. Such applications could include in-sensor processing where some processing is required, but ≈

μ

s latency is acceptable, e.g., stress/strain sensors that monitor the health of building structures regularly, battery-powered medical implants which monitor health but do not require real-time processing, etc.

6. Conclusions

In this paper, a low-power adder is presented, which is constructed from FTJ and CMOS transistors. Since an FTJ has pA READ currents, any computing circuit built using these devices is slow. Therefore, a ternary architecture was used to conserve latency (

O (1)

time complexity compared to

O (n)

or

O (l o g (n))

). The ternary adder uses two stages of full adders. The full adder is built from the majority gate and manipulates its output to implement XOR operation. The majority operation is achieved using FTJ devices and CMOS transistors in a SA configuration. A 32-trit adder will consume 0.78 pJ, which is better than many other adders designed using emerging NVMs. However, the proposed adder could not compete with other adders in terms of latency due to the unavoidable long time to read these devices (pA READ current). Even with this shortcoming, our adder consumes a power of 7.8 nW, which is the lowest compared to many similar adders designed using NVM devices, including pure CMOS implementation. The proposed FTJ-based adder can be used in applications where low-power addition is the critical requirement and speed is not a strict requirement.

Author Contributions

Conceptualization, J.R., D.F., S.L. and S.S.; methodology, J.R.; software, S.L. and S.S.; validation, J.R.; formal analysis, Not applicable; investigation, J.R. and S.L.; resources, Not applicable; data curation, Not applicable; writing—original draft preparation, J.R. and S.L.; writing—review and editing, J.R., D.F., S.L. and S.S.; visualization, J.R. and S.L.; supervision, D.F. and S.S.; project administration, D.F. and S.S.; funding acquisition, D.F. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Deutsche Forschungsgemeinschaft (DFG)—Reconfigurable logic and Multi-bit in-memory processing with ferroelectric memristors—ReLoFeMris (Project number 441909639). The APC/cost of publication was supported by the Deutsche Forschungsgemeinschaft and Friedrich-Alexander-Universität Erlangen-Nürnberg within the funding program “Open Access Publication Funding”.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kim, Y.S.; Son, M.W.; Kim, K.M. Memristive Stateful Logic for Edge Boolean Computers. Adv. Intell. Syst. 2021, 3, 2000278. [Google Scholar] [CrossRef]
Datta, G.; Kundu, S.; Yin, Z.; Lakkireddy, R.T.; Mathai, J.; Jacob, A.; Beerel, P.A.; Jaiswal, A.R. P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications. Sci. Rep. 2022, 12, 14396. [Google Scholar] [CrossRef] [PubMed]
Lancaster, S.; Duong, Q.T.; Covi, E.; Mikolajick, T.; Slesazeck, S. Improvement of FTJ on-current by work function engineering for massive parallel neuromorphic computing. In Proceedings of the ESSCIRC 2022—IEEE 48th European Solid State Circuits Conference (ESSCIRC), Milan, Italy, 19–22 September 2022; pp. 137–140. [Google Scholar] [CrossRef]
Reuben, J.; Pechmann, S. Accelerated Addition in Resistive RAM Array Using Parallel-Friendly Majority Gates. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 1108–1121. [Google Scholar] [CrossRef]
Pan, Y.; Ouyang, P.; Zhao, Y.; Kang, W.; Yin, S.; Zhang, Y.; Zhao, W.; Wei, S. A Multilevel Cell STT-MRAM-Based Computing In-Memory Accelerator for Binary Convolutional Neural Network. IEEE Trans. Magn. 2018, 54, 9401305. [Google Scholar] [CrossRef]
Barla, P.; Joshi, V.K.; Bhat, S. Fully non-volatile hybrid full adder based on SHE+STT-MTJ/CMOS LIM architecture. IEEE Trans. Magn. 2022, 58, 3401311. [Google Scholar] [CrossRef]
Xiao, T.P.; Bennett, C.H.; Hu, X.; Feinberg, B.; Jacobs-Gedrim, R.; Agarwal, S.; Brunhaver, J.S.; Friedman, J.S.; Incorvia, J.A.C.; Marinella, M.J. Energy and Performance Benchmarking of a Domain Wall-Magnetic Tunnel Junction Multibit Adder. IEEE J. Explor.-Solid-State Comput. Devices Circuits 2019, 5, 188–196. [Google Scholar] [CrossRef]
Slesazeck, S.; Havel, V.; Breyer, E.; Mulaosmanovic, H.; Hoffmann, M.; Max, B.; Duenkel, S.; Mikolajick, T. Uniting The Trinity of Ferroelectric HfO2 Memory Devices in a Single Memory Cell. In Proceedings of the 2019 IEEE 11th International Memory Workshop (IMW), Monterey, CA, USA, 12–15 May 2019. [Google Scholar] [CrossRef]
Metze, G.; Robertson, J.E. Elimination of carry propagation in digital computers. In Proceedings of the IFIP Congress, Paris, France, 15–20 June 1959. [Google Scholar]
Avizienis, A. Signed-Digit Numbe Representations for Fast Parallel Arithmetic. IRE Trans. Electron. Comput. 1961, EC-10, 389–400. [Google Scholar] [CrossRef]
Reuben, J.; Fey, D. Carry-free Addition in Resistive RAM Array: n-bit Addition in 22 Memory Cycles. In Proceedings of the 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, 7–9 July 2021; pp. 157–163. [Google Scholar] [CrossRef]
Rajashekhara, T.; Chen, I.S. A fast adder design using signed-digit numbers and ternary logic. In Proceedings of the IEEE Technical Conference on Southern Tier, Columbia, SC, USA, 17–21 February 1990; pp. 187–194. [Google Scholar] [CrossRef]
Kornerup, P. Reviewing High-Radix Signed-Digit Adders. IEEE Trans. Comput. 2015, 64, 1502–1505. [Google Scholar] [CrossRef]
Fey, D.; Reichenbach, M.; Söll, C.; Biglari, M.; Röber, J.; Weigel, R. Using Memristor Technology for Multi-Value Registers in Signed-Digit Arithmetic Circuits. In Proceedings of the Second International Symposium on Memory Systems, MEMSYS ’16, Alexandria, VA, USA, 3–6 October 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 442–454. [Google Scholar] [CrossRef]
Ercegovac, M.D.; Lang, T. CHAPTER 2—Two-Operand Addition. In Digital Arithmetic; Ercegovac, M.D., Lang, T., Eds.; The Morgan Kaufmann Series in Computer Architecture and Design; Morgan Kaufmann: San Francisco, CA, USA, 2004; pp. 50–135. [Google Scholar] [CrossRef]
Schenk, T.; Pešić, M.; Slesazeck, S.; Schroeder, U.; Mikolajick, T. Memory technology—A primer for material scientists. Rep. Prog. Phys. 2020, 83, 086501. [Google Scholar] [CrossRef]
Cheema, S.S.; Shanker, N.; Hsu, C.H.; Datar, A.; Bae, J.; Kwon, D.; Salahuddin, S. One Nanometer HfO2-Based Ferroelectric Tunnel Junctions on Silicon. Adv. Electron. Mater. 2022, 8, 2100499. [Google Scholar] [CrossRef]
Covi, E.; Duong, Q.T.; Lancaster, S.; Havel, V.; Coignus, J.; Barbot, J.; Richter, O.; Klein, P.; Chicca, E.; Grenouillet, L.; et al. Ferroelectric Tunneling Junctions for Edge Computing. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021. [Google Scholar] [CrossRef]
Max, B.; Mikolajick, T.; Hoffmann, M.; Slesazeck, S. Retention characteristics of Hf_0.5Zr_0.5O₂-based ferroelectric tunnel junctions. In Proceedings of the 2019 IEEE 11th International Memory Workshop (IMW), Monterey, CA, USA, 12–15 May 2019. [Google Scholar]
Huang, H.H.; Wu, T.Y.; Chu, Y.H.; Wu, M.H.; Hsu, C.H.; Lee, H.Y.; Sheu, S.S.; Lo, W.C.; Hou, T.H. A comprehensive modeling framework for ferroelectric tunnel junctions. In Proceedings of the 2019 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 7–11 December 2019. [Google Scholar]
Preisach, F. Über die magnetische Nachwirkung. Z. Phys. 1935, 94, 277–302. [Google Scholar] [CrossRef]
Fontanini, R.; Barbot, J.; Segatto, M.; Lancaster, S.; Duong, Q.; Driussi, F.; Grenouillet, L.; Triozon, L.; Coignus, J.; Mikolajick, T.; et al. Interplay Between Charge Trapping and Polarization Switching in BEOL-Compatible Bilayer Ferroelectric Tunnel Junctions. IEEE J. Electron Devices Soc. 2022, 10, 593–599. [Google Scholar] [CrossRef]
Miller, S.; Nasby, R.; Schwank, J.; Rodgers, M.; Dressendorfer, P. Device modeling of ferroelectric capacitors. J. Appl. Phys. 1990, 68, 6463–6471. [Google Scholar] [CrossRef]
Jiang, B.; Zurcher; Jones; Gillespie; Lee. Computationally efficient ferroelectric capacitor model for circuit simulation. In Proceedings of the 1997 Symposium on VLSI Technology, Taipei, Taiwan, 3–5 June 1997; pp. 141–142. [Google Scholar]
Gibertini, P.; Fehlings, L.; Lancaster, S.; Duong, Q.T.; Mikolajick, T.; Dubourdieu, C.; Slesazeck, S.; Covi, E.; Deshpande, V. A Ferroelectric Tunnel Junction-based Integrate-and-Fire Neuron. In Proceedings of the 2022 IEEE International Conference on Electronics Circuits and Systems (ICECS), Glasgow, UK, 24–26 October 2022. [Google Scholar]
Lederer, M.; Olivo, R.; Lehninger, D.; Abdulazhanov, S.; Kämpfe, T.; Kirbach, S.; Mart, C.; Seidel, K.; Eng, L.M. On the Origin of Wake-Up and Antiferroelectric-Like Behavior in Ferroelectric Hafnium Oxide. Phys. Status Solidi (RRL)–Rapid Res. Lett. 2021, 15, 2100086. [Google Scholar] [CrossRef]
Lancaster, S.; Mikolajick, T.; Slesazeck, S. A multi-pulse wakeup scheme for on-chip operation of devices based on ferroelectric doped HfO₂ thin films. Appl. Phys. Lett. 2022, 120, 022901. [Google Scholar] [CrossRef]
Lakshmi, V.; Reuben, J.; Pudi, V. A Novel In-Memory Wallace Tree Multiplier Architecture Using Majority Logic. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 1148–1158. [Google Scholar] [CrossRef]
Jiang, H.; Angizi, S.; Fan, D.; Han, J.; Liu, L. Non-Volatile Approximate Arithmetic Circuits Using Scalable Hybrid Spin-CMOS Majority Gates. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 1217–1230. [Google Scholar] [CrossRef]
Kobayashi, M.; Tagawa, Y.; Mo, F.; Saraya, T.; Hiramoto, T. Ferroelectric HfO₂ Tunnel Junction Memory With High TER and Multi-Level Operation Featuring Metal Replacement Process. IEEE J. Electron Devices Soc. 2019, 7, 134–139. [Google Scholar] [CrossRef]
Ma, Z.; Zhang, Q.; Valanoor, N. A perspective on electrode engineering in ultrathin ferroelectric heterostructures for enhanced tunneling electroresistance. Appl. Phys. Rev. 2020, 7, 041316. [Google Scholar] [CrossRef]
Pan, C.; Naeemi, A. An Expanded Benchmarking of Beyond-CMOS Devices Based on Boolean and Neuromorphic Representative Circuits. IEEE J. Explor.-Solid-State Comput. Devices Circuits 2017, 3, 101–110. [Google Scholar] [CrossRef]
Deng, E.; Zhang, Y.; Kang, W.; Dieny, B.; Klein, J.O.; Prenat, G.; Zhao, W. Synchronous 8-bit Non-Volatile Full-Adder based on Spin Transfer Torque Magnetic Tunnel Junction. IEEE Trans. Circuits Syst. I Regul. Pap. 2015, 62, 1757–1765. [Google Scholar] [CrossRef]

Figure 1. Two 4-trit numbers

A_{3} A_{2} A_{1} A_{0}

and

B_{3} B_{2} B_{1} B_{0}

are added with two levels of full adders in constant time (

O (1)

), irrespective of the wordlength. Each input trit is encoded as two bits, a left bit and a right bit, using the coding depicted in the table at the top right. After being processed using two stages of binary full-adders, each pair of sum bits (

S_{0}^{L}

and

S_{0}^{R}

represent the left and right bits) are again decoded as a single trit to get the 5-trit sum A+B. Note that, after addition, both (Left bit, Right bit) = (0,1) and (Left bit, Right bit) = (1,0) are decoded as trit “0”.

Figure 1. Two 4-trit numbers

A_{3} A_{2} A_{1} A_{0}

and

B_{3} B_{2} B_{1} B_{0}

are added with two levels of full adders in constant time (

O (1)

), irrespective of the wordlength. Each input trit is encoded as two bits, a left bit and a right bit, using the coding depicted in the table at the top right. After being processed using two stages of binary full-adders, each pair of sum bits (

S_{0}^{L}

and

S_{0}^{R}

represent the left and right bits) are again decoded as a single trit to get the 5-trit sum A+B. Note that, after addition, both (Left bit, Right bit) = (0,1) and (Left bit, Right bit) = (1,0) are decoded as trit “0”.

Figure 2. FTJ: Device structure and switching.

Figure 3. (a) Polarization-voltage switching characteristics and (b) current-voltage leakage characteristics of a bilayer HZO/Al₂O₃ FTJ device with one TiAlN electrode. Key parameters for modelling the switching are labelled in (a).

Figure 4. FTJ-based majority gate proposed in this work: The inputs to the majority gate are applied to three FTJs on the left, and the output is available as a voltage at node

S A_{o u t}

.

Figure 4. FTJ-based majority gate proposed in this work: The inputs to the majority gate are applied to three FTJs on the left, and the output is available as a voltage at node

S A_{o u t}

.

Figure 5. Full adder implemented by MAJ gate and manipulating its output: Sum(

X O R

) is achieved by conditionally selecting between

M A J

and

\bar{M A J}

.

Figure 5. Full adder implemented by MAJ gate and manipulating its output: Sum(

X O R

) is achieved by conditionally selecting between

M A J

and

\bar{M A J}

.

Figure 6. Simulated 4-trit ternary adder made of full adders: Since inputs to the FA require logic “1” to be 2.25V, Level Shifters (LS) are used to shift 1.2 V (

V_{D D}

) to 2.25 V.

Figure 6. Simulated 4-trit ternary adder made of full adders: Since inputs to the FA require logic “1” to be 2.25V, Level Shifters (LS) are used to shift 1.2 V (

V_{D D}

) to 2.25 V.

Table 1. Majority gate implemented using sense amplifier.

A	B	$C_{in}$	$MAJ (A, B, C_{in})$	$n_{1}$	$n_{2}$	${SA}_{out}$
0	0	0	0	300 mV	315 mV	0
0	0	1	0	310 mV	315 mV	0
0	1	0	0	310 mV	315 mV	0
0	1	1	1	320 mV	315 mV	1
1	0	0	0	310 mV	315 mV	0
1	0	1	1	320 mV	315 mV	1
1	1	0	1	320 mV	315 mV	1
1	1	1	1	330 mV	315 mV	1

Table 2. Performance of 32-bit NVM-based adders.

NVM Device	Energy	Latency	Power	Ref.
ASL-PMA	3.68 pJ	29 ns	0.12 mW	[32]
STT-MRAM	4.2 pJ	0.6 ns	7 mW	[33]
SOT-DW	0.031 pJ	495 ns	62.6 nW	[7]
FTJ	0.78 pJ	100 μs	7.8 nW	This work
CMOS	0.1 pJ	0.4 ns	0.25 mW	[7]

ASL-PMA: All Spin Logic-Perpendicular Magnetic Anisotropy, SOT-DW: Spin-Orbit Torque- Domain Wall.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Reuben, J.; Fey, D.; Lancaster, S.; Slesazeck, S. A Low-Power Ternary Adder Using Ferroelectric Tunnel Junctions. Electronics 2023, 12, 1163. https://doi.org/10.3390/electronics12051163

AMA Style

Reuben J, Fey D, Lancaster S, Slesazeck S. A Low-Power Ternary Adder Using Ferroelectric Tunnel Junctions. Electronics. 2023; 12(5):1163. https://doi.org/10.3390/electronics12051163

Chicago/Turabian Style

Reuben, John, Dietmar Fey, Suzanne Lancaster, and Stefan Slesazeck. 2023. "A Low-Power Ternary Adder Using Ferroelectric Tunnel Junctions" Electronics 12, no. 5: 1163. https://doi.org/10.3390/electronics12051163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Low-Power Ternary Adder Using Ferroelectric Tunnel Junctions

Abstract

1. Introduction

2. Ternary Adder Built from Full Adders

3. Ferroelectric Tunnel Junctions: Basics

3.1. FTJ: Device Structure and Switching Operation

3.2. FTJ: Device Characteristics and Modeling

4. FTJ–Based Majority Gate

4.1. Operation of the FTJ-Based Majority Gate

4.2. Full Adder (FA)

5. Simulation Methodology and Results

5.1. Simulation of Ternary Adder

5.2. Energy, Latency and Power Consumption of Ternary Adder

5.3. Comparison with 32-bit Adders in Other Memory Technologies

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI