*Article* **Silicon-Compatible Memristive Devices Tailored by Laser and Thermal Treatments**

**Maria N. Koryazhkina 1,\*, Dmitry O. Filatov 1, Stanislav V. Tikhov 1, Alexey I. Belov 1, Dmitry S. Korolev 1, Alexander V. Kruglov 1, Ruslan N. Kryukov 1, Sergey Yu. Zubkov 1, Vladislav A. Vorontsov 1, Dmitry A. Pavlov 1, David I. Tetelbaum 1, Alexey N. Mikhaylov 1, Sergey A. Shchanikov 2, Sungjun Kim <sup>3</sup> and Bernardo Spagnolo 1,4**


**Abstract:** Nowadays, memristors are of considerable interest to researchers and engineers due to the promise they hold for the creation of power-efficient memristor-based information or computing systems. In particular, this refers to memristive devices based on the resistive switching phenomenon, which in most cases are fabricated in the form of metal–insulator–metal structures. At the same time, the demand for compatibility with the standard fabrication process of complementary metal– oxide semiconductors makes it relevant from a practical point of view to fabricate memristive devices directly on a silicon or SOI (silicon on insulator) substrate. Here we have investigated the electrical characteristics and resistive switching of SiO*x*- and SiN*x*-based memristors fabricated on SOI substrates and subjected to additional laser treatment and thermal treatment. The investigated memristors do not require electroforming and demonstrate a synaptic type of resistive switching. It is found that the parameters of resistive switching of SiO*x*- and SiN*x*-based memristors on SOI substrates are remarkably improved. In particular, the laser treatment gives rise to a significant increase in the hysteresis loop in *I*–*V* curves of SiN*x*-based memristors. Moreover, for SiO*x*-based memristors, the thermal treatment used after the laser treatment produces a notable decrease in the resistive switching voltage.

**Keywords:** memristor; silicon oxide; silicon nitride; SOI technology; resistive switching; electrical characteristics; laser treatment; thermal treatment

#### **1. Introduction**

A memristor is a two-terminal nanoelectronic element that changes and remembers its resistance depending on the applied voltage and the charge flowing through it. Its main difference from semiconductor memory elements, which implement a binary code and two stable states, is the multilevel, synaptic nature of the conduction switching [1]. It is believed that this will make it possible to create next-generation computers (with a non-von-Neumann architecture) and neuromorphic artificial intelligence systems on the basis of memristors [2–12]. The main disadvantages of memristors fabricated in the form of metal–insulator–metal (MIM) or metal–insulator–semiconductor (MIS) structures are the reproducibility of resistive switching (RS) parameters, which is insufficient for practical

**Citation:** Koryazhkina, M.N.; Filatov, D.O.; Tikhov, S.V.; Belov, A.I.; Korolev, D.S.; Kruglov, A.V.; Kryukov, R.N.; Zubkov, S.Y.; Vorontsov, V.A.; Pavlov, D.A.; et al. Silicon-Compatible Memristive Devices Tailored by Laser and Thermal Treatments. *J. Low Power Electron. Appl.* **2022**, *12*, 14. https:// doi.org/10.3390/jlpea12010014

Academic Editors: Alex Serb and Adnan Mehonic

Received: 21 October 2021 Accepted: 28 February 2022 Published: 2 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

33

use (stochasticity), high values of RS voltages, and the complexity of integration into a standard complementary metal–oxide–semiconductor (CMOS) fabrication process. Currently, approaches to solving these problems are being developed: the use of new materials and various interfaces [13–16], the use of signals with a special shape for RS [17], the use of optical radiation [18] or noise [19–21] as parameters controlling the switching dynamics, programming the amplitudes and durations of switching pulses [22,23], etc. Indeed, the wider application of memristors is limited by their insufficient stability, the high variability of RS parameters, and a lack of understanding of the drift–diffusion processes responsible [24]. One of the fundamental origins of the instability of the memristor parameters is the essentially stochastic nature of RS [20]. Furthermore, the noise sources can induce new ordered dynamical structures and cause new phase transition phenomena [25,26]. Therefore, of all these approaches, the one based on the constructive role of both internal (thermal) and external noise sources is most promising due to the intrinsic stochastic nature of resistive switching in memristor devices [19–21].

Transition metal oxides (e.g., HfO*<sup>x</sup>* [27,28], TaO*<sup>x</sup>* [29,30], ZrO*<sup>x</sup>* [31,32], TiO*<sup>x</sup>* [33,34], and more complex compounds such as perovskites [35]), as well as SiO*x*, and GeO*x*, are considered promising insulator materials for memristors. Recently, intensive research has also been carried out on memristive structures based on SiN*<sup>x</sup>* [36,37]. This is of practical interest due to their compatibility with the standard technology for creating modern integrated circuits. The use of SiO*x* and SiN*x* insulator films involves a number of practical advantages. For example, the authors of [38] carried out a comprehensive comparison of the RS parameters of memristive structures based on HfO*<sup>x</sup>* and SiO*<sup>x</sup>* and showed a lower variability in resistance in different states of SiO*x*-based memristors. In turn, the authors of [39] demonstrated the absence of changes in the value of currents in resistive states of SiN*x*-based memristors irradiated with As<sup>+</sup> during 10<sup>5</sup> cycles of RS and the minimum variability of switching voltages. It should be noted that SiO*x*- and SiN*x*-based memristors have a filamentary resistive switching mechanism [40,41].

The use of a semiconductor as one of the electrodes of a memristive structure is also important from the point of view of integrating memristors into a standard CMOS fabrication process [42,43]. One of the electrodes is silicon, which simplifies the technological process and allows the memory to be integrated monolithically on a single platform with a transistor [44,45]. In these articles, bulk silicon was used as an electrode. However, in the preparation of most semiconductor devices and microcircuits, preference is given to "silicon-on-insulator" (SOI) substrates due to its advantages over bulk silicon: lower power consumption and the higher performance and density of elements [46]. Therefore, from a practical point of view, it is advisable to implement memristive structures on SOI substrates. Despite the significant number of published studies of memristive structures with a bulk silicon electrode, structures on SOI substrates must be studied independently due to the peculiarities of morphology and structure of the latter. Thus, the development and investigation of memristive structures, in which the SOI substrate acts as an electrode, is of considerable theoretical and practical interest. However, such data are nearly absent in the literature, except for several separate reports on the use of SOI in memristive devices (see, for example, [47–50]).

Despite the practical advantages of using a semiconductor as an electrode in a memristive structure, one should not forget the presence of surface states (SS) at the insulator/semiconductor interface, which is undesirable from the point of view of creating memristive structures. These states make a significant contribution to the total serial resistance of the structure [51]. A decrease in their density leads to a redistribution of the external voltage, so that the electric field strength in the insulator increases, thereby stimulating resistive switching. Thermal treatment (TT) is a widely used method of dealing with such defects. Laser treatment (LT) can be used for the same purposes. In the latter case, the effect is achieved due to heating of the substrate because of the absorption of laser radiation in it. In addition, LT is used to modify the charge state of an insulator in a flash memory device, which is used to completely erase information in memory elements [52]. Therefore, LT and TT can be effectively used to change the electrical characteristics of memristive structures.

We propose a comprehensive approach to improving the parameters of RS: namely, increasing the resistance ratio in extreme resistive states and decreasing the RS voltages of memristive structures based on promising and accessible insulator layers—SiO*<sup>x</sup>* and SiN*x*, fabricated under industrial conditions on SOI substrates. This approach is based not only on the use of materials that are standard for the CMOS fabrication process, but also on the use of LT and TT, which are widely used in the microelectronic industry to control the electrical parameters of devices. In addition, the investigation of the frequency dependences of electrical characteristics of memristive structures carried out in this work makes it possible to obtain the necessary detailed information about the processes occurring in the insulator film and about the state of insulator/semiconductor interfaces in different resistive states [53]. Data in this paper are presented in the same order in which they were obtained, so that the reader can unambiguously determine the contribution of LT and TT to the change in resistive switching parameters.

To the best of our knowledge, such a comprehensive study of SiO*x*- and SiN*x*-based memristive structures fabricated on SOI substrates, including the influence of LT and TT on their electrical characteristics (RS parameters), has not been carried out previously.

#### **2. Materials and Methods**

SiO*<sup>x</sup>* and SiN*<sup>x</sup>* films (with a nominal thickness of 13 nm each) were deposited on commercial SOI substrates with a device layer thickness of 360 nm by plasma-enhanced chemical vapor deposition under the following conditions:


Top Au electrodes (20 nm) with a Zr sublayer (8 nm) with an area of *S*~10−<sup>2</sup> (in this study) and 10−<sup>3</sup> cm2 were deposited on the surface of insulators by magnetron sputtering at a temperature of 473 K. A schematic representation of the fabricated structures is shown in Figure 1. The devices were prepared in the form of a metal–insulator–semiconductor sandwich with a common bottom electrode (SOI) and local top (Au with a Zr sublayer) electrodes. Figure 2 is an optical image of a fragment of the device showing two top electrodes of a small area and one of a larger area. The optical image was obtained using a Leica DM 4000 M optical microscope (Wetzlar, Germany).

The electrical characteristics were measured using a semiconductor device parameter analyzer, Agilent B1500A (Santa Rosa, CA, USA). The sign of voltage across the structures corresponded to the potential of the top electrode relative to the potential of the bottom electrode. *I*–*V* curves and the small-signal *C*–*f*, *G*–*f*, and *R*–*f* characteristics of memristors were measured in parallel and series capacitor equivalent resistor–capacitor circuits (see Figure <sup>1</sup> for explanation) [54] in the frequency range 103–2 × <sup>10</sup><sup>6</sup> Hz. The values of parallel capacitance (*Cp*), parallel conductance loss (*Gp*/*ω*), dielectric loss tangent (tg*δ*), parallel (*Rp*), and series (*Rs*) resistances were determined. The parameters of parallel capacitor equivalent circuit are determined by the electronic phenomena in an insulator, while the parameters featuring a serial capacitor equivalent circuit are determined by the resistance of electrodes and that of the transition layer between the electrode and insulator film [54].

The information on relaxation processes in the insulator was obtained by analyzing the Cole–Cole diagrams—the dependences of *Gp*/*ω* on *Cp*, which were obtained from corresponding frequency dependences [55]. As shown below, the obtained diagrams were either a circular arc or a semicircle. Thus, in the first case, the spectrum of SS at the insulator/semiconductor interface was continuous, while the second case indicates the presence of a mono-level of SS. An analysis of the Cole–Cole diagrams makes it possible to estimate the effective density of SS at the Fermi level (*Nss*). In the case of a continuous spectrum of SS, for such an estimate, one can use the following equation [56]:

$$N\_{ss} = \frac{\left[G\_p/\omega\right]\_{\text{max}}}{0.4q^2S},\tag{1}$$

where [*Gp*/*ω*]*max* is the maximum value of parallel conductance loss, *q* is the electron charge, and *S* is the structure area (i.e., the area of the top electrode). In the case of a mono-level of SS, one can use the following equation [56]:

$$N\_{ss} = \frac{8kT[G\_p/\omega]\_{\text{max}}}{q^2S},\tag{2}$$

where *k* is the Boltzmann constant and *T* is the temperature.

In addition, measurements of capacitance–voltage and conductance–voltage characteristics were carried out in a parallel capacitor equivalent resistor–capacitor circuit at a small test signal frequencies of 10 and 100 kHz.

It should be noted that the investigated memristive structures initially had a conductive state. The investigations of the electrical characteristics of memristive structures were carried out in initial state (IS), in low-resistance state (LRS), and in high-resistance state (HRS).

**Figure 1.** Schematic representation of SiO*x*- and SiN*x*-based memristors and the simplest capacitor equivalent resistor–capacitor circuits.

**Figure 2.** Optical image of a fragment of the device.

As mentioned above, LT can be used to change the electrical characteristics of memristive structures, which determine RS parameters. It is assumed that LT will make it possible to reduce the built-in charge in insulator films and to lower the density of SS at the insulator/semiconductor interface in memristive structures. Therefore, some memristive structures were subjected to LT. For this, a semiconductor laser with a power of 1.5 W and a wavelength of 460 nm, which corresponds to a photon energy of 2.7 eV, was used in the continuous mode. Irradiation was carried out through the top electrode for 10 min. It should be noted that the top Au electrodes with a thickness of 20 nm were semitransparent for the laser wavelength used [57]. Under the influence of laser radiation, the structure heats up to ≈473 K.

Thermal treatment is a widely used method for changing the density of SS at the insulator/semiconductor interface. Therefore, in order to improve the state of this interface, some of the SiO*x*-based memristive structures were subjected to TT. For this purpose, memristive structures were placed in a hermetically closed metal thermostat, which was slowly heated at a rate of 13.5 K/min using an electric heater or cooled with liquid nitrogen. Investigations of electrical characteristics were carried out in a temperature range of 77–600 K in an atmosphere dried with silica gel. The temperature was maintained with an accuracy of 1 K.

Structural investigations of the SiO*x* and SiN*x* films and memristive structures based on them were carried out by X-ray photoelectron spectroscopy (XPS) and transmission electron microscopy (TEM). The profiling of samples using the XPS method implies the use of ion etching. The question arises of the correct determination of the etching rate and the existence of an error in determining the depth. If the rate can be determined using calibration samples, then the error is determined for each sample separately. A large contribution to the error when determining the depth is made by irregularities on the surface of the sample, due to which shading occurs during the etching process [58]. For correct interpretation of the data, information on the roughness obtained by atomic force microscopy (AFM) was used.

#### **3. Results and Discussion**

#### *3.1. SiOx-Based Memristive Structures on SOI Substrates*

According to the AFM data (Figure 3a), the root mean square roughness of the SiO*<sup>x</sup>* film is 1.8 nm. Figure 3b presents XPS data for SiO*x* films before and after annealing at 550 K. The stoichiometry of the SiO*x* film barely changes between before and after annealing, and is *x* ≈ 1.8. One can also notice a transition layer at the SiO*x*/SOI interface, the thickness of which is ~15 nm.

**Figure 3.** (**a**) AFM image of SiO*x* film surface; (**b**) distribution of chemical elements over depth of SiO*x* film before and after annealing at 550 K. The origin of the coordinates along the abscissa coincides with the SiO*x*/SOI interface.

Figure 4 shows TEM images of a cross section of a SiO*x*-based memristive structure after LT and TT. According to Figure 4, the SiO*<sup>x</sup>* film has an amorphous structure. At the same time, Si (area 4), ZrO (areas 1 and 3) and ZrO2 (area 2) nanocrystallites were found in the Zr sublayer and at the interface with the insulator. The structure of the observed nanocrystallites was determined by comparing the interplanar spacing in TEM images with the literature data. This means partial oxidation of Zr electrode and silicon oxide reduction in contact with this electrode during treatments.

**Figure 4.** High-resolution TEM images of two cross-sectional regions of a SiO*x*-based memristive structure after LT and TT. The inset shows scaled images of the nanocrystallites (parts that were used to determine the interplanar spacing are highlighted by yellow rectangles).

The SiO*x*-based memristive structures before LT and TT did not require electroforming [59], since initially they had a conductive state (Figure 5a, curve 1). When a voltage of −6 V was applied, the memristive structure switched from LRS to HRS (Figure 5a, curve 2). Subsequent application of voltage of +6 V did not lead to switching of the structure (Figure 5a, curve 3). In the absence of switching (Figure 5a, curves 1 and 3), the values

of the current through the device in the forward and reverse directions of the voltage sweep hardly differed.

**Figure 5.** *I*–*V* curves of SiO*x*-based memristive structure (**a**) before LT and TT, (**b**) after LT, (**c**) after TT and (**d**) after multiple RS. In the absence of switching (Figure 5a, curves 1 and 3), the values of the current through the device in the forward and reverse directions of the voltage sweep almost did not differ. The direction of the voltage sweep is shown by arrows.

The frequency dependences of the parameters of equivalent circuit of memristive structures in IS (i.e., for curve 1 in Figure 5a) and HRS (i.e., for curve 2 in Figure 5a) are shown in Figure 6. The structure in IS is characterized by large ohmic losses at a low frequencies (Figure 6, curves 2, 3) and a low parallel resistance *Rp* shunting the structure (Figure 6, curve 5). After switching into HRS, the losses decreased by three orders of magnitude (Figure 6, curves 7, 8), and the value of *Rp*, respectively, increased by three orders of magnitude (Figure 6, curve 10).

Note that the values of the relative permittivity of SiO*x* films calculated from the value of *Cp* by the equation for a parallel plate capacitor at a frequency of 1 kHz do not change with RS, while the value of tg*δ* changes by three orders of magnitude. This behavior of low-signal HF parameters indicates the filamentary mechanism of RS [60]. In this case, the active part of the film impedance changes locally, i.e., on a small (compared to the total electrode area) memristor area, while the resistance and dielectric losses remain almost unchanged for the rest of the film under the electrode.

**Figure 6.** Frequency dependences of (**a**) *Cp* (1, 6), *Gp*/*ω* (2, 7), tg*δ* (3, 8) and (**b**) *Rs* (4, 9), *Rp* (5, 10) obtained for memristive structure in IS (1–5) and HRS (6–10).

Dependences of *Cp* and *Gp*/*ω* on *V* (Figure 7) show that the semiconductor corresponds to the *n*-type, since the capacitance at a frequency of 100 kHz (see inset in Figure 7b) is in the form of a step with an increase towards the voltage *V* >0[56]. The concentration of equilibrium electrons in a silicon electrode can be estimated using the following equation [61]:

$$N\_D = \frac{2\left(2\varrho\_0 - \frac{kT}{q}\right)}{\varepsilon\_s \varepsilon\_0 q} \cdot \left(\frac{\frac{C\_{ox}}{C\_{min}} - 1}{C\_{ox}}\right)^{-2},\tag{3}$$

where *ND* is the donor concentration in the semiconductor, *ϕ<sup>0</sup>* is the height of the potential barrier at the insulator/semiconductor interface, *ε<sup>s</sup>* is the relative permittivity of the semiconductor, *ε<sup>0</sup>* is the vacuum permittivity, *Cox* is the oxide capacity, equal to the maximum value of capacity in Figure 7b in the dark, and *Cmin* is the minimum value of capacity in Figure 7b in the dark. The obtained value varied in the range of ~3 × 1019–3 × <sup>10</sup><sup>20</sup> cm−3. This variation is associated with strong fluctuations in capacitance due to the nonuniform distribution of impurities over the thickness of the silicon electrode.

**Figure 7.** Dependences of *Cp* (1, 2) and *Gp*/*ω* (3, 4) on *V* measured at a frequency of a small test signal (**a**) 10 and (**b**) 100 kHz and in the dark (1, 3) or under laser radiation (2, 4). Voltage sweep from −5 V to +5 V and vice versa.

The maxima in the dependences of *Gp*/*ω* on *V* (Figure 7a, curve 4 and Figure 7b, curves 3 and 4) in the theory of MIS structures are usually associated with SS at the

insulator/semiconductor interface. If one assumes a quasicontinuous SS distribution, the *Nss* value can be estimated using Equation (1). The value of *Nss* is, under laser radiation, −3.6·10<sup>12</sup> cm−2eV−<sup>1</sup> (at a frequency of 10 kHz) and 1.1 × <sup>10</sup><sup>12</sup> cm−2eV−<sup>1</sup> (at a frequency of 100 kHz), and in the dark −<sup>1</sup> × <sup>10</sup><sup>12</sup> cm−2eV−<sup>1</sup> (at a frequency of 100 kHz). Thus, the density of SS on the conductive Si electrode is large and increases with decreasing frequency and under laser radiation. The capture of carriers to these states should decrease the response time of memristors in the same way as a large series resistance.

Figure 5b shows the *I*–*V* curves of a memristive structure after LT. It can be seen that LT leads to a change in the polarity of RS: applying a negative voltage leads to the switching of the structure in LRS, and applying a positive voltage leads to the switching of the structure in HRS. Similar behavior was observed in AZO/CeO2/ITO/glass memory devices [62]. The effect can be explained in terms of the change in the active electrode of the structure, which plays the main role in the formation and oxidation of the filament; however, this requires additional investigation. The obtained *I*–*V* curves demonstrate a ratio of currents in LRS and HRS of more than 2 orders of magnitude.

The results of the effect of LT on electrical characteristics of memristive structure are shown in Figure 8. Frequency dependences of the parameters of equivalent circuit of the structure in LRS (i.e., after curve 1 in Figure 5b) and HRS (i.e., after curve 2 in Figure 5b) are shown. These data also indicate a change in the polarity of RS after LT and an almost unchanged value of the resistance of the bottom semiconductor electrode (~100 Ω). In addition, higher values of tg*δ* in HRS at a low frequency, as compared to structures before LT (Figure 6a), indicate incomplete oxidation of filaments.

**Figure 8.** Frequency dependences of (**a**) *Cp* (1, 6), *Gp*/*ω* (2, 7), tg*δ* (3, 8) and (**b**) *Rs* (4, 9), *Rp* (5, 10) obtained for memristive structure in LRS (1–5) and HRS (6–10). The data were obtained after LT.

TT in a dried atmosphere at 540 K in a hermetically closed metal thermostat also changes the electrical characteristics of SiO*x*-based memristive structures. This is evidenced by the frequency dependences of the parameters of equivalent circuit shown in Figure 9. Nonstandard behavior of dielectric losses and the value of parallel resistance with an increase in temperature from 77 to 540 K are noteworthy. Namely, usually, with an increase in the temperature, the concentration of free carriers in the insulator increases, so the values of tg*δ* [63] increase and those of *Rp* decrease. However, in this case, the values show the opposite tendency. The observed behavior is unusual for insulators and is probably associated with an irreversible change in the properties of the insulator because of TT. The polarity of RS after TT corresponds to the polarity after LT (Figure 5c). It should be noted that the RS voltage decreases after the TT of the structures.

**Figure 9.** Frequency dependences of (**a**) *Cp*, *Gp*/*ω*, tg*δ* and (**b**) *Rs*, *Rp* of memristive structure in HRS obtained at a temperature of 77 (dashed line) and 540 K (solid line).

Figure 10 shows the results of studying the stability of the parameters of equivalent circuit and RS parameters of memristive structures after TT under multiple switching in the mode of *I*–*V* curve, shown in Figure 5c. The experiment was carried out as follows. After switching the structure in LRS (i.e., for curve 1 in Figure 5c), the frequency dependences of the parameters of equivalent circuit were measured. Furthermore, after switching the structure in HRS (i.e., for curve 2 in Figure 5c), the frequency dependences of the parameters of equivalent circuit were measured again. Thus, multiple (within ~2 h) switching of memristive structure from LRS to HRS and vice versa occurred, with sequential measurement of the parameters of equivalent circuit. The times for which the parameters of equivalent circuit were measured were significantly shorter than the time intervals between switches. Therefore, changes in the parameters during the testing of structures could be neglected. Thus, the observed changes in parameters occur due either to the stochasticity of RS processes, or, less likely, to changes in structures in the intervals between switching.

Figure 10a,b shows that the parameters of equivalent circuit after switching into HRS are relatively reproducible in comparison with the parameters obtained after switching into LRS; this is indicated by the weak time dependence of *Cp*0, *Rp*0, and tg*δ*<sup>0</sup> (Figure 10a,b, curves 2, 4, 6). When switching into LRS, the time dependences of *Cp*0, *Rp*0, and tg*δ*<sup>0</sup> are characterized by non-monotonic behavior, which is reflected in significant (by more than two orders of magnitude for *Rp*<sup>0</sup> and tg*δ*0) chaotic changes (Figure 10a,b, curves 1, 3, and 5). The last result can be interpreted as follows. The selected mode of switching into HRS allows each time to destroy the active filament, and each switching into LRS leads to the formation of different (in terms of shape and location) filaments. It should be noted that, with multiple switching, regardless of the sign of the switching voltage and the state of the memristive structure, a monotonic decrease in the series resistance *Rs*<sup>∞</sup> from ~650 Ω to ~160 Ω was observed (Figure 10b, curve 7). It should be recalled that the value of series resistance is determined by the resistance of the semiconductor electrode. The observed behavior indicates the occurrence of electrochemical reactions on the semiconductor electrode and the accumulation of a positive charge on its surface during the recharging of the memristive structure.

Figure 10c,d shows the results of a statistical study for 10 *I*–*V* curves of memristive structures after TT. It can be seen that the currents through the structure in LRS and HRS differ by at least one order of magnitude (Figure 10c), and the voltages for RESET (switching from LRS to HRS, *VRESET*) and SET (switching from HRS to LRS, *VSET*) processes have a value in the selected range (Figure 10d).

**Figure 10.** (**a**,**b**) The parameters of equivalent circuit of memristive structure after TT and switching into LRS (1, 3, 5) and HRS (2, 4, 6) obtained at a frequency of a small test signal of 1 kHz (*Cp*0, tg*δ*0, *Rp*0) and 2 MHz (*Rs*∞); (**c**) dependences of the currents (at a reading voltage of +0.5 V) of memristive structure in LRS (red) and HRS (blue) after TT on the number of RS cycles; (**d**) distribution of voltages of SET (red) and RESET (blue) processes of memristive structure after TT.

Figure 5d shows the *I*–*V* curves of memristive structures after multiple RS. An increase in the voltage values of RESET and SET processes is seen, which indicates a significant change in electrical characteristics of the structure under multiple RS.

Figure 11 shows the frequency dependences of the parameters of equivalent circuit obtained for the memristive structure in HRS (i.e., for curve 2 in Figure 5d) after multiple switching. It should be emphasized that, in contrast to the dependencies shown above, these data indicate the complete oxidation of filaments when the structure is switched in HRS. This fact is seen, in particular, from a comparison with the frequency dependence of the tg*δ* structure in HRS in a low-frequency region. An increase in the tg*δ* and *Gp*/*ω* of the structures (Figure 11a) can be explained by the fact that there is no shunting of memristor by the value of *Rp* and the implementation of a series connection of the capacitance *Cp* and memristor electrodes. In this case, the losses at low frequencies are small, the parallel capacitance is equal to the series capacitance, and the series resistance is determined by the resistance of the semiconductor electrode (~170 Ω).

**Figure 11.** Frequency dependences of (**a**) *Cp*, *Gp*/*ω*, tg*δ* and (**b**) *Rs*, *Rp* of memristive structure in HRS after multiple RS.

For a quantitative comparison, Table 1 shows the values of *Cp*, *Gp/ω*, tg*δ*, *Rp*, and *Rs*, obtained at the frequencies of 1 and 100 kHz (indices 0 and ∞, respectively). Data were obtained for the SiO*x*-based memristor in different resistive states before LT and TT, after LT, during TT, and after multiple RS.


**Table 1.** SiO*x*-based memristor equivalent circuit parameters.

Figure 12 shows the Cole–Cole diagrams obtained for SiO*x*-based memristive structures in HRS. The data were obtained from the frequency dependences of the *Gp*/*ω* and *Cp* of memristive structure before LT and TT (see Figure 6a), after LT (see Figure 8a), and after TT and multiple switching (Figure 11a). It can be seen that all diagrams have a circular arc shape, i.e., the spectrum of SS at insulator/semiconductor interface is continuous in all cases. The values of *Nss* were estimated using Equation (1) and are 1.9 × 1012, 1.8 × <sup>10</sup>12, and 1.5 × <sup>10</sup><sup>12</sup> cm<sup>−</sup>2eV−1, respectively.

Thus, only the LT of the Au/Zr/SiO*x*/SOI memristive structures is not sufficient for a significant change in the value of the density of SS. Additional use of TT leads to a decrease in this value by a factor of ~1.3. Nevertheless, the combined effect of LT and TT on the Au/Zr/SiO*x*/SOI memristive structures results in a decrease in RS voltages of almost 2-fold. The effect is probably associated with the annealing of SS, which, in turn, leads to a decrease in the resistance of the structure. According to the model [64], the appearance of SS is associated with the disordering of silicon subsurface near the interface with the insulator. From this point of view, annealing promotes a decrease in the density of SS due to the relaxation of this disorder. However, one should also consider that annealing can lead to a change in the concentration of electrically active impurities in both the semiconductor and the insulator. As a result, the Fermi level at the insulator/semiconductor interface can shift towards a lower density of states.

**Figure 12.** The Cole–Cole diagrams obtained for SiO*x*-based memristive structures in HRS. The data were obtained before LT and TT, after LT, and after TT and multiple RS.

It should be noted that, along with the abovementioned influence of treatments on the density of SS, they can be responsible for the occurrence of RS. It can be assumed that the nanocrystallites observed by TEM are responsible for the initial conductive state of SiO*x*-based memristors. According to the estimates from TEM images, nanocrystallites reach diameters of ~7 nm. Note that TEM studies were carried out after LT and TT. Thus, probably, such treatments led to significant and irreversible oxidation of nanocrystallites and, before treatments, the sizes of nanocrystallites could be comparable to the thickness of the insulator film. The latter could lead to shunting the devices. This explanation is indirectly confirmed by the unusual behavior of dielectric losses and the value of parallel resistance, with an increase in temperature from 77 to 540 K.

#### *3.2. SiNx-Based Memristive Structures on SOI Substrates*

According to AFM data (Figure 13a), the root mean square roughness of the SiN*x* film is 1.9 nm. In Figure 13b, XPS data for SiN*<sup>x</sup>* film before and after annealing at 550 K are reported. It is shown that the stoichiometry of SiN*x* film before and after annealing hardly changes and *x* ≈ 1.25. One can also notice the presence of a transition layer at the SiN*x*/SOI interface, the thickness of which is ~18 nm.

In Figure 14, the TEM images of a cross section of SiN*x*-based memristive structures after LT are shown. According to Figure 14, the SiN*<sup>x</sup>* film has an amorphous structure. At the same time, the presence of ZrN (areas 1, 3, 5–7) and Si (area 4) nanocrystallites is confirmed inside amorphous SiN*x*. ZrO2 (area 2), ZrO (area 8), and ZrN (area 9) nanocrystallites are found in the Zr sublayer and at the interface with the insulator. The presence of Si3N4 nanocrystallites should also be noted (area 10). The structure of the observed nanocrystallites was determined by comparing the interplanar spacing in TEM images with the literature data. Like for the SiO*x*-based memristive structures, the SiN*x*-based structures considered in this section initially had a conductive state. It should be noted that SiN*x*-based memristive structures did not demonstrate RS before LT (Figure 15a, curve 1).

**Figure 14.** High-resolution TEM images of two cross-sectional regions of SiN*x*-based memristive structure after LT. The inset shows scaled images of the nanocrystallites (parts that were used to determine the interplanar spacing are highlighted by yellow rectangles).

In Figure 16, the frequency dependences of the parameters of equivalent circuit of memristive structure before LT are shown. The series resistance in the structure at a high frequency, which, as determined by the resistance of memristor electrodes, is ~110 Ω.

In the theory of MIS structures, using high-frequency *C*–*V*, it is possible to determine the type of dopant: as a DC sweep voltage is applied to the metal, a positive slope of 1/*C*<sup>2</sup> vs. *V* indicates acceptors and a negative slope indicates donors [65,66]. The 1/*C*<sup>2</sup> value increases with increasing absolute voltage value (Figure 17), which indicates the *n*-type conductivity of the semiconductor film. The nonlinearity of this dependence can be a consequence of inhomogeneous doping of the semiconductor film.

**Figure 15.** (**a**) *I*–*V* curves of SiN*x*-based memristive structure before (solid line) and after (dotted line) LT in semi-log plot. Curves 3 and 4 in linear plot (**b**). The direction of the voltage sweep is shown by arrows.

**Figure 16.** Frequency dependences of (**a**) *Cp*, *Gp*/*ω*, tg*δ* and (**b**) *Rs*, *Rp* obtained for memristive structure before LT.

**Figure 17.** Dependence of nonequilibrium capacitance on voltage in coordinates 1/*C*2–*V* obtained for memristive structure before LT. Data were measured at a small test signal frequency of 100 kHz.

The donor concentration *ND* can be estimated using the slope of the straight line, which extrapolates the data in Figure 17, and the following equation [56]:

$$N\_D = \frac{2}{\varepsilon\_s \varepsilon\_0 q S^2} \frac{\Delta V}{\Delta \frac{1}{C^2}}.\tag{4}$$

The *ND* value is ~5 × <sup>10</sup><sup>18</sup> cm<sup>−</sup>3. It should be noted that the obtained value is probably underestimated due to the presence of horizontal areas in the dependence.

The frequency dependence of the parameters of the equivalent circuit shows almost no changes when measured in the dark and under short-term laser radiation. This indicates the presence of an electron-enriched layer at the insulator/semiconductor interface. Therefore, like for the SiO*x*-based memristive structures considered above, the structures based on SiN*<sup>x</sup>* were subjected to LT in air in order to change the charge state of the traps in SiN*x*. Figure 15a (curves 2–4) shows *I*–*V* curves demonstrating a significant increase in the hysteresis loop (change in the current by ~3 orders of magnitude) after LT of the structures. It should be noted that the structures also demonstrated a synaptic nature of switching (Figure 15b) [67,68]. After LT, memristive structures showed an increased value of the relative permittivity (before LT, it was 4; after, it was −4.85). This value was calculated using the equation for a parallel plate capacitor at a frequency of 1 kHz. This behavior indicates the contribution of the space charge region in the semiconductor electrode to the capacitance of the capacitor before LT.

The effect of LT on the electrical characteristics of the memristive structure is illustrated in Figure 18. Frequency dependences of the parameters of equivalent circuit of the structure in LRS (i.e., for curve 2 in Figure 15a) and in HRS (i.e., for curve 4 in Figure 15a) after LT are shown.

**Figure 18.** Frequency dependences of (**a**) *Cp* (1, 4), *Gp*/*ω* (2, 5), tg*δ* (3, 6) and (**b**) *Rs* (7, 9), *Rp* (8, 10) obtained for memristive structure in LRS (4, 5, 6, 9, 10) and HRS (1, 2, 3, 7, 8). The data were obtained after LT.

It is worth noting the presence of large losses in the structures after switching into LRS with a voltage of −5 V, which is probably due to the presence of filaments; at the same time, the losses in the structure were significantly reduced (at a low frequency up to two orders of magnitude) after switching into HRS with a voltage of +4 V. However, there was no complete destruction of filaments. This was indicated by the presence of losses at a frequency of <10<sup>4</sup> Hz, which are characterized for losses due to leakage currents at low frequencies [55]. Also, in the memristive structure in HRS at a low frequency, the parallel resistance increased (up to 2 orders of magnitude), shunting it. In this case, the resistance of the silicon electrode remained almost unchanged.

For a quantitative comparison, Table 2 shows the values of *Cp*, *Gp/ω*, tg*δ*, *Rp*, and *Rs* obtained at the frequencies of 1 and 100 kHz (indices 0 and ∞, respectively). Data were obtained for the SiN*x*-based memristor in different resistive states before and after LT.



Figure 19 shows the Cole–Cole diagrams obtained for SiN*x*-based memristive structures. The data were obtained from the frequency dependences of the *Gp*/*ω* and *Cp* of memristive structure before LT (see Figure 16a) and after LT (see Figure 18a). Note that the memristive structure did not demonstrate resistive switching before laser treatment; therefore, the diagram for this case was obtained in the initial highly conductive state of memristive structure. At the same time, after LT, the two resistive states of memristive structure became distinguishable; therefore, the diagram for the second case was obtained under the conditions of HRS of the memristive structure.

**Figure 19.** The Cole–Cole diagrams obtained for SiN*x*-based memristive structures. The data were obtained before and after LT. Inset: same Cole–Cole diagram as before LT, but at full scale.

In the first case (before LT), the diagram had a circular arc shape, which indicates a uniform spectrum of SS at the insulator/semiconductor interface. The value of *Nss* was estimated using Equation (1) and is equal to 1.6 × 1012 cm−2eV−1. SS with such a highdensity value can reduce response times and contribute to the variability in RS voltage values. The sharp increase in *Gp*/*ω* at high values of *Cp* is due to the presence of conductive channels (see inset in Figure 19). In the second case (after LT), the shape of the diagram is close to a semicircle, which indicates the presence of a mono-level of SS. The value of *Nss*, estimated using Equation (2), was 1.5 × <sup>10</sup><sup>11</sup> cm<sup>−</sup>2, which is an order of magnitude lower than before LT.

Figure 20a,b shows the results of a statistical study for 10 *I*–*V* curves of memristive structures after LT. It can be seen that the currents through the structure in LRS and HRS differ by at least 8-fold (Figure 20a), and the voltages for RESET and SET processes have a value in the selected range (Figure 20b).

**Figure 20.** (**a**) Dependences of the currents (at a reading voltage of −0.5 V) of memristive structure in LRS (red) and HRS (blue) after LT on the number of RS cycles; (**b**) distribution of voltages of *VSET* (red) and *VRESET* (blue) processes of memristive structure after LT.

Thus, one can conclude that LT leads to a change in the spectrum of SS at the SiN*x*/SOI interface. This is probably due to the more significant, in comparison with SiO*x*-based structures, effect of LT on the charge state of traps in SiN*x*, which determine the conductivity with the optical activation energy (for SiN*x*<4/3, this value is equal to 2.6 eV [69]). It was reported in [70] that these traps can play a decisive role in the rupture and restoration of filaments during switching in SiN*x*-based memristors. Therefore, LT is an effective method for changing RS parameters in metal/SiN*x*/semiconductor memristive structures.

It should be noted that, along with the abovementioned influence of LT on the spectrum of SS, it can be responsible for the occurrence of RS. It can be assumed that the nanocrystallites observed by TEM are responsible for the initial conductive state of SiN*x*based memristors. According to the estimates from TEM images, nanocrystallites reach diameters of ~5–10 nm. Note that TEM studies were carried out after LT. Thus, probably, such treatment led to significant and irreversible oxidation of nanocrystallites and, before LT, the sizes of nanocrystallites could be comparable to the thickness of the insulator film. The latter could lead to shunting the devices.

#### **4. Conclusions**

This work demonstrates the robustness of the memristive phenomenon in thin-film structures based on promising and accessible insulator layers—SiO*x* and SiN*x*—fabricated on SOI substrates and subjected to additional laser and thermal treatments. It was shown that laser treatment leads to a significant increase in the hysteresis loop in *I*–*V* curves of the Au/Zr/SiN*x*/SOI memristive structures. The effect was explained by the positive charging of traps in the insulator and a decrease in the density of surface states at the insulator/semiconductor interface (by an order of magnitude). Moreover, laser treatment of the Au/Zr/SiO*x*/SOI memristive structures was not sufficient to produce a significant change in the value of the density of the surface states. Additional use of thermal treatment led to a decrease in this value by a factor of ~1.3. Furthermore, the combined effect of laser treatment followed by thermal treatment on the Au/Zr/SiO*x*/SOI memristive structures led to a near doubling of the resistive switching voltages. The effect was, probably, associated with the annealing of surface states, which, in turn, led to a decrease in the resistance of the structure.

The CMOS compatibility of memristive devices in our study was provided by two factors. First, it is a SOI substrate, which is used in the technology of integrated circuits, including radiation-resistant ones. Secondly, it is a switching layer material, which is also fabricated using industrial technology. In this sense, the top electrode is of no fundamental

importance, since in the framework of BEOL (back-end-of-line) integration it does not affect the basic FEOL (front-end-of-line) process. We chose a composite Au/Zr electrode, since it had previously proven itself well in MIM devices based on SiO*x* [71] and is semitransparent, which is important for laser treatment. However, as part of the further optimization of these devices, other combinations of oxidizable and inert metals can be selected and tested.

It should be emphasized that the device layer of silicon in the SOI structure can differ greatly from bulk silicon in terms of structure and surface quality. The latter significantly affects the surface state, which can play an important role in the resistive switching mechanism. Therefore, the use of a SOI substrate in combination with specific switching insulators and additional treatment methods is of fundamental importance.

**Author Contributions:** Conceptualization, S.V.T. and A.N.M.; methodology, S.V.T. and D.S.K.; software, A.I.B.; validation, M.N.K., D.O.F., D.I.T., A.N.M., S.K. and B.S.; formal analysis, M.N.K., D.O.F. and S.V.T.; investigation, S.V.T., A.I.B., A.V.K., R.N.K., S.Y.Z., V.A.V. and D.A.P.; resources, A.N.M. and S.K.; data curation, M.N.K., D.O.F., S.V.T., A.I.B., D.I.T. and A.N.M.; writing—original draft preparation, M.N.K., D.O.F., S.V.T. and A.I.B.; writing—review and editing, D.I.T., A.N.M., S.A.S., S.K. and B.S.; visualization, M.N.K., S.V.T. and A.I.B.; supervision, D.O.F., D.I.T., A.N.M., S.A.S., S.K. and B.S.; project administration, A.N.M. and S.A.S.; funding acquisition, A.N.M. and S.A.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Ministry of Science and Higher Education of Russian Federation (Project No. 13.2251.21.0098) and a National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (2021K1A3A1A49098073). The studies were performed using the hardware resources of the shared use center: Research and Education Center "Physics of Solid State Nanostructures", Lobachevsky State University of Nizhny Novgorod.

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Design of In-Memory Parallel-Prefix Adders**

**John Reuben**

Chair of Computer Architecture, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91058 Erlangen, Germany; johnreuben.prabahar@fau.de

**Abstract:** Computational methods in memory array are being researched in many emerging memory technologies to conquer the 'von Neumann bottleneck'. Resistive RAM (ReRAM) is a non-volatile memory, which supports Boolean logic operation, and adders can be implemented as a sequence of Boolean operations in the memory. While many in-memory adders have recently been proposed, their latency is exorbitant for increasing bit-width (*O*(*n*)). Decades of research in computer arithmetic have proven parallel-prefix technique to be the fastest addition technique in conventional CMOSbased binary adders. This work endeavors to move parallel-prefix addition to the memory array to significantly minimize the latency of in-memory addition. Majority logic was chosen as the fundamental logic primitive and parallel-prefix adders synthesized in majority logic were mapped to the memory array using the proposed algorithm. The proposed algorithm can be used to map any parallel-prefix adder to a memory array and mapping is performed in such a way that the latency of addition is minimized. The proposed algorithm enables addition in *O*(*log*(*n*)) latency in the memory array.

**Keywords:** resistive RAM (ReRAM); non-volatile memory (NVM); majority logic; memristor; 1Transistor-1Resistor (1T–1R); in-memory computing; processing-in-memory; parallel-prefix adder; logic-in-memory; memristive logic

**1. Introduction**

Conventional computer architecture is facing an acute problem—the 'von Neumann bottleneck' or 'memory wall'. The shuffling of data between processing and memory units is energy-consuming and time-consuming and degrades the performance of contemporary computing systems [1,2]. In other words, the energy needed to move data (between memory and processing units) forms a significant portion of the computational energy. To overcome the memory wall, the processor and memory unit must be brought closer to each other. A 3D stacking of DRAM dies over logic die, often referred to as near-memory computing [3], was pursued earlier to reduce the latency and energy for data movement between processor and memory. The recent trend is to move computing to the location of the data, i.e., in-memory computing.

In in-memory computing, the data are processed at their location (i.e., in the memory array) and not moved out of the memory array to a separate processing unit. At present, diverse operations from arithmetic operations to cognitive tasks such as machine learning and pattern recognition are being explored in memory arrays [4]. This article focuses on arithmetic operations and how adders can be implemented in memory. It should be noted that in-memory computing is pursued in many memory technologies—both conventional (SRAM, DRAM) and emerging non-volatile memories (Resistive RAM, STT-MRAM, PCM, FeFET). However, in this article, we restrict our focus to Resistive RAM technology to achieve a greater focus on the design of parallel-prefix adders. Resistive RAM device is a two-terminal Metal–Insulator–Metal structure in which data can be stored as resistance. A positive voltage across the structure forms a conductive filament (low resistance state) and a negative voltage ruptures the filament (high resistance state), leading to two stable resistances. Boolean gates can be implemented in the memory

**Citation:** Reuben, J. Design of In-Memory Parallel-Prefix Adders. *J. Low Power Electron. Appl.* **2021**, *11*, 45. https://doi.org/10.3390/ jlpea11040045

Academic Editors: Alex Serb and Adnan Mehonic

Received: 14 October 2021 Accepted: 17 November 2021 Published: 24 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

array by altering the structure of the memory array, the peripheral circuitry around the array, or both. Arithmetic circuits such as adders can be implemented as a chain of such Boolean operations.

Although different in-memory adders have been proposed in the literature, the latency of in-memory adders is a severe disadvantage in in-memory computing, i.e., an addition operation needs a long sequence of Boolean operations. A poorly optimized in-memory adder may take longer to compute (add two *n*-bit numbers) than the combined time it takes to fetch data from memory and add them in a CMOS-based processor. In a computing system, adders constitute the basic computational unit. In-memory adders have not had their latency studied and optimized for an increasing bit-width (*n*-bit operand). In practice, 32-bit/64-bit in-memory adders require hundreds of cycles due to *O*(*n*) latency requirements. It was originally proposed that parallel-prefix (PP) adders could bring down the latency caused by the rippling of carry in CMOS-based adders. PP adders are the fastest adders in conventional CMOS technology [5,6]. To improve the latency of in-memory adders, it is necessary to learn lessons from the decades of research on CMOS adders and adopt them for in-memory addition. Therefore, parallel-prefix adders were pursued in this work to improve the latency of in-memory adders. More specifically, we propose a generic methodology to design any PP adder in memory. As an example, we consider the Ladner– Fischer type of PP adder and demonstrate how this can be implemented in-memory in O(*log*(*n*) latency. The presented method requires no major modifications to the peripheral circuitry of the memory array and is also energy-efficient.

The rest of the paper is organised as follows. Section 2 reviews the state-of-the-art in-memory adders and classifies in-memory adders on the basis of state fullness, logic primitive and architecture. The review identifies the exorbitant latency of adders with increasing bit-width, as a significant issue that needs attention. Section 3 presents PP adders as a solution to the long latency incurred by the rippling of carry. Section 4.1 reviews the in-memory majority gate, which is the fundamental logic gate used in this work to implement the PP adder in the memory array. Section 4.2 elaborates how PP adders can be synthesized using majority logic. Having synthesized PP adders in majority logic, Section 4.3.3 elaborates how they can be mapped to the memory array. We present the simulation methodology in Section 5.1. In Section 5.2, we analyse how the latency of the proposed adder grows with increasing bit-width. Sections 5.3 and 5.4 analyse how the energy and area of the proposed adder grow with increasing bit-width. In Section 5.5, we compare the proposed adder with other adders reported in the literature, followed by the Conclusion in Section 6.

#### **2. In-Memory Adders: A Brief Review**

Conventionally, adders were designed using logic gates built from CMOS transistors. In contrast, an in-memory adder is designed using a 'functionally complete' Boolean logic primitive. NOR, for example, is functionally complete, since any Boolean logic can be expressed using NOR gates. Therefore, if an NOR gate can be implemented in the memory array, any arithmetic circuit can be implemented in the memory array. NAND, IMPLY + FALSE [7] and Majority + NOT [8] are other functionally complete logic primitives. In the last 5 years, several in-memory adders have been proposed. They can be classified as the following:


Stateful in-memory adders perform an addition by logic gates, where each gate is executed by manipulating the resistance of a memristor (i.e., the internal state) rather than by a mix of resistance and voltage [9]. If voltage is also used, in addition to resistance, the logic gate and the adder are said to be non-stateful (Figure 1a). This is one of the characteristics of in-memory adders that, with certain modifications to conventional memory, a particular

logic primitive can be realized and other logic primitives need to be realized in terms of this logic primitive. The NOR-based memristive logic family (MAGIC), for example, requires that all other gates (AND, OR, XOR) are expressed in terms of NOR gates and then used in the memory array. Similarly, in the NAND-based adder reported in [10], an XOR gate is implemented as NAND gates (one XOR requires four memory cycles). Figure 1b illustrates a one-bit full adder expressed solely as NOR/NAND/Majority gates (expressing a circuit using single logic primitive is preferred for in-memory implementation). Finally, an issue that is often overlooked in this emerging area is the issue of carry-propagation. The manner in which carry is propagated from LSB to MSB decides the speed of the in-memory adder. In the in-memory community, different adder architectures, from ripple carry (slowest) to parallel-prefix (fastest) adders, have been proposed.

**Figure 1.** (**a**) An in-memory logic gate (adder) is stateful if its only state variable is resistance. Non-stateful logic gates (adder) also use voltage in conjunction with resistance. (**b**) In-memory adder implementation favors homogeneity of logic primitives; 1-bit full adder in terms of NOR gates [11], NAND gates [12] and majority gates [13]; (**c**) Different carry-propagation techniques result in different adder architectures.

Table 1 lists different in-memory adders that have been reported recently and their latency for 8-bit and *n*-bit. The adders are also classified based on the three characteristics we reviewed—state variable, logic primitive and adder architecture. A key observation is that logic primitive plays an important role in determining the latency. IMPLY is a weak logic primitive, and generally incurs more latency than all other logic primitives. XOR and the majority are generally stronger logic primitives than OR/AND/NOR. This is evident from the fact that XOR-based and majority-based ripple-carry adders are faster than NAND/NOR-based ripple-carry adders [13]. In other words, with the adder configuration being the same (ripple-carry), logic primitive plays an important role in determining the latency of in-memory addition. Another important finding is that the adder architecture plays a key role in deciding the latency for increasing bit-width. This is evident from the latency of OR + AND-based adders in Table 1. Both adders used the same logic primitive (OR + AND), but [14] uses ripple-carry architecture, achieving a latency of 6*n* + 1 while [15] uses a parallel-prefix configuration to achieve a latency of 8*log*2(*n*) + 13. As an example, a 32-bit adder based on OR + AND logic will require 193 cycles and 53 cycles for ripple-carry and parallel-prefix architectures, respectively. Hence, for larger bit-widths, architecture (carry propagation technique) plays an important role in latency. In summary, both adder architecture and logic primitive influence the latency of in-memory adder. Therefore, majority logic primitive and parallel-prefix adder architecture were chosen in this work to drastically minimize the latency of in-memory adders.


**Table 1.** Latency of recently reported in-memory adders (8-bit and *n*-bit).

In a Complementary Resistive Switch (CRS) adder, RIMP/NIMP∗ denotes reverse implication and inverse implication.

#### **3. Parallel-Prefix Adders: A Solution for the Carry-Propagation Problem**

When two *n*-bit binary numbers A (*an*−<sup>1</sup>*an*−<sup>2</sup> · *a*0) and B (*bn*−<sup>1</sup>*bn*−<sup>2</sup> · *b*0) are added, the sum bit *Si* at the *i*th bit position is computed as,

$$S\_{\bar{i}} = H\_{\bar{i}} \oplus C\_{\bar{i}-1} \tag{1}$$

where, *Hi* = *Ai* ⊕ *Bi* and *Ci*−<sup>1</sup> is the carry computed in the previous bit position. To compute the sum bits of the next significant bit position, the incoming carry *Ci*−<sup>1</sup> is propagated to the next position. This is accomplished using carry generate bits (*Gi* = *Ai* · *Bi*) and carry propagate bits (*Pi* = *Ai* + *Bi*). The carry-out (*Ci*) of a particular bit position is always a function of the carry from the previous bit (*Ci*−1), and they are expressed as follows:

$$\mathbf{C}\_{i} = \mathbf{G}\_{i} + P\_{i} \cdot \mathbf{C}\_{i-1} \tag{2}$$

Thus, during the 8-bit addition of *a*7*a*6*a*5*a*4*a*3*a*2*a*1*a*<sup>0</sup> and *b*7*b*6*b*5*b*4*b*3*b*2*b*1*b*0, sum bit *S*<sup>6</sup> = *H*<sup>6</sup> ⊕ *C*<sup>5</sup> and *C*<sup>5</sup> is a function of *a*5, *b*5, *C*<sup>4</sup> according to Equations (1) and (2). In other words, *S*<sup>6</sup> cannot be computed until *C*<sup>5</sup> is computed, which recursively depends on the carry-out of the lower significant bit. This is the decades old carry-propagation problem and significantly affects the speed of *n*-bit addition as *n* grows. Ripple-carry adders are extremely slow for 32-bit/64-bit addition due to this carry propagation. To improve this situation, carry-skip adders were proposed, which allowed for carries to skip across block of bits instead of rippling through them. This was followed by Carry-lookahead adders, where carries were computed in parallel and achieved logarithmic logic depth [26]. Parallel-prefix (PP) adders improved on the carry-look ahead adder by expressing carrypropagation as a prefix computation [27]. They are the fastest family of adders [5,6] in conventional transistor-based implementations.

PP adders have a 'carry-generate block', followed by a 'sum-generate block' (Figure 2). Internally, the carry-generate block has a pre-processing stage, which computes *Gi*, *Pi*, *Hi* for every bit. Using them, carry bits *CoutCn*−<sup>1</sup> · *C*1*C*<sup>0</sup> are computed using the prefix computation technique. This is followed by the sum-generate block, where *Si* = *Hi* ⊕ *Ci*−<sup>1</sup> is computed. The reader is referred to [27,28] for a detailed explanation of the stages of a parallel-prefix adder. Kogge–Stone, Ladner–Fischer, Brent–Kung, Sklansky, Ling, etc., are examples of PP adders. According to the taxonomy of PP adders [29], these adders essentially form a compromise between logical depth, fan-out and wiring tracks. PP adders can reduce the logical depth to *O*(*log*(*n*)), for *n*-bit adders [30].

**Figure 2.** Generic Structure of PP adders: A 'carry-generate block' calculates carry by prefix computation and is then followed by a 'sum-generate block' ([30]).

#### **4. In-Memory Implementation of Parallel-Prefix Adders**

#### *4.1. In-Memory Majority Gate*

Before their implementation in memory, the PP adders must first be synthesized in terms of logic gates which can be implemented in memory. As stated, different logic primitives require different modifications to the memory array or its peripheral circuitry (or both). Therefore, for the in-memory implementation of adders, it is important to minimize the different types of logic primitives used. Consequently, it is beneficial to express the adder using one logic primitive, rather than four different logic primitives. Recently, an in-memory majority gate was proposed in [31,32]. The three inputs to the majority gate are the three resistances of the memory cells, and the output majority is computed as a READ operation (Figure 3a). This majority gate does not necessitate any major modifications to the peripheral circuitry of a regular memory array, and is also energy-efficient (access transistor for each memory cell minimizes sneak currents, thus lowering energy consumption when compared to other adders implemented in 1S–1R configuration). As depicted in Figure 3b, multiple majority gates can be executed in array columns, which suits PP adders with a similar structure.

#### *4.2. Homogeneous Synthesis of Parallel-Prefix Adders*

Conventionally, PP adders are synthesized in terms of AND, OR and XOR gates for CMOS implementation. Figure 4a depicts an eight-bit PP adder of the Ladner–Fischer type. Three different logic primitives are required—AND, OR and XOR. As stated, different logic primitives require different modifications to the memory array and its peripheral circuitry. If a particular Boolean logic gate cannot be implemented in the memory array, it has to be re-formulated in terms of a logic gate that can be implemented in the memory array. For example, in the NAND-based logic family reported in [10], the XOR gate cannot be implemented; therefore, it is expressed as four NAND gates. As depicted in Figure 4a, a single XOR becomes three levels of NAND logic, increasing its latency. In contrast, by expressing a PP adder purely in terms of MAJORITY+NOT gates, the PP adder can be efficiently implemented in the memory array. Furthermore, a majority-based PP adder achieves a marginal reduction in logical depth compared to conventional AND-OR-XOR implementation (Figure 4). This is due to the majority being a stronger logic primitive than NAND/NOR/IMPLY [13]. To synthesize PP adders in terms of majority gates, logic synthesis tools can be used. A logic synthesis tool is proposed in [8], which takes any AND-OR-INVERT-based logic and synthesizes it purely in terms of majority and NOT gates. Boolean logic minimization techniques such as re-shaping, push-up, node merging, etc., are used to re-synthesize and optimize conventional AND-OR-INVERT logic in terms of MAJORITY-INVERT [33–38]. Since the majority is the fundamental logic primitive for many emerging nanotechnologies, there are also works which pioneered the synthesis of PP adders solely in terms of majority gates. The reader is referred to [5,30,39] for such works. Therefore, a variety of techniques can be used to transform PP adders in terms of

majority and NOT gates. Figure 4b depicts a 8-bit PP adder, synthesized solely in terms of majority and NOT gates. In addition to achieving homogeneity, the majority-based PP adder incurs one level of reduction in logical depth compared to the AND-OR-XOR-based PP adder.

**Figure 3.** (**a**) In-memory Implementation of majority gate [31,32]: In a 1T-1R array, the resistances (*RA*, *RB*, *RC*) in the three rows will be parallel if three rows are selected at the same time. (Inputs of the majority gate *A*, *B*, *C* are represented as resistances *RA*, *RB*, *RC*). During READ, the effective resistance *Reff* can accurately be sensed to implement an in-memory majority gate. (**b**) NOT operation can be implemented by inverting the output of the SA. With a majority and NOT gate implemented as a READ operation, the array can be used to execute multiple levels of logic by writing back the data, simplifying computing to READ and WRITE operations.

**Figure 4.** (**a**) Eight-bit PP adder of Ladner–Fischer type expressed in terms of AND, OR , XOR gates. (**b**) Re-synthesized and optimized in terms of MAJORITY and NOT gates [5,30].

#### *4.3. Mapping Methodology*

Having synthesized the PP adder in terms of majority and NOT gates, they can be implemented in memory using the in-memory majority gate described in Section 4.1. The NOT gate can be implemented as a simple READ operation with the output inverted. The design of in-memory PP adders presented in this paper is generic and can be used to

implement any PP adder. However, in this section, the Ladner–Fischer adder of Figure 4b is chosen and the in-memory implementation (mapping) steps are elaborated.

#### 4.3.1. In-Memory Mapping as an Optimization Problem: Objectives

The mapping of the majority-based PP adder to the memory array can be treated as an optimization problem. Any optimization problem has objectives or goals, which should be achieved in the presence of certain constraints. The objectives of in-memory mapping are as follows:


The aforementioned objectives are no different from the objectives of any VLSI circuit. All objectives cannot be met simultaneously in this mapping, and trade-offs must be made between latency of addition (*O*1) and the area of array that is used (*O*3). Any arithmetic circuit implemented in memory is bound to be very slow due to the high latency of inmemory adders. The latency of in-memory adders reported in the literature grows, as *O*(*n*) and 32-bit/64-bit in memory require hundreds of cycles [9]. Therefore, in this mapping, we focus on and minimize the latency. Minimizing the latency might result in the array area being compromised. However, latency is the more serious issue compared to array area in in-memory addition, for the following reasons:


A significant portion of the energy consumed during in-memory addition is dissipated in the memory array. This is predominantly due to sneak-currents in the 1S–1R array. In contrast, our proposed PP adder is implemented in a transistor-accessed memory array (1T–1R); therefore, the energy dissipation in the array is negligible. The major energy consumption is the energy consumed while the cells switch states (WRITE) and the majority operation (READ). Therefore, the energy consumed during addition is minimized if latency is minimized. In other words, latency (*O*1) is the most important objective to be minimized.

4.3.2. In-Memory Mapping as an Optimization Problem: Constraints

The constraints are specific to this design methodology and can be summarized as follows:


*C*<sup>1</sup> must be satisfied during mapping because, during majority operation, three rows must simultaneously be selected. In principle, the three selected rows need not be contiguous and can be in different locations in the memory array (e.g., row 5, 8, 15 of a 64 × 64 array). However, row-decoding will become complicated. For practical in-memory implementation, the mapping must be 'peripheral circuit friendly'. In [32], a triple-row decoder is proposed for triple row-activation during majority operation. To implement this decoder, multiple single-row decoders were interleaved. Furthermore, the same row-decoder must be able to perform single-row decoding and triple-row decoding. This is because, during normal memory operation, a single row must be selected and, during majority, three rows must be selected. To this end, an address translator circuit is used in the row decoder, which seamlessly switches between single-row activation and triple-row activation. The triple-row decoder [32] is designed in such a way that only three consecutive rows can

be selected. Therefore, while mapping, the inputs of the majority gate (to be executed in memory in the next step) must be written in three consecutive rows.

Constraint *C*<sup>2</sup> is posed by a characteristic of non-volatile memories called endurance. A memory device's endurance refers to its ability to switch between two stable states while maintaining a sufficient resistance ratio. Experimentally reported endurances vary from 106 to 1012. Due to this limited endurance, the number of times a memory cell is switched during addition must be minimized.

#### 4.3.3. Algorithm

Having identified the objectives and constraints, we formulate a generic methodology to map any PP adder to the memory array. As stated, if the PP adder is available in terms of AND-OR-XOR gates, they must be re-synthesized in terms of the majority and NOT gates using logic synthesis techniques/tools. Given a majority-based PP adder, optimal in-memory implementation is an optimization problem—minimize *O*1, while meeting *C*<sup>1</sup> and *C*2.

The following steps implement the PP adder in the memory:


Figure 5 illustrates the mapping of an 8-bit PP adder to the memory array. Majority gates 1–8 of the first logic level are executed simultaneously in one memory cycle. Since we know that, at the next level, majority gates 9, 10, 11, 12, 13, 14 need to be executed, we write the outputs of the first logic level (*m*1, *m*2, *m*3, *m*4, *m*5, *m*6, *m*7, *m*8) to the exact location where they will be needed. When we write the output of the majority gates back to the array, they are written in consecutive rows (*C*1) and are not overwritten on existing data (*C*2). The in-memory steps are highlighted in yellow in Figure 5. The in-memory steps corresponding to logic levels 1 and 2 are:


In this manner, the seven logic levels of an 8-bit adder can be executed in memory in 18 cycles. A detailed mapping of all seven logic levels is presented in Appendix A.

**Figure 5.** Illustration of mapping of the first two logic levels to a memory array. Since each majority operation is executed as a READ operation, it can be written to the exact location it is needed at the next logic level while satisfying *C*<sup>1</sup> and *C*2. In the above mapping, eight columns share a sense amplifier.

#### **5. Performance of In-Memory Parallel-Prefix Adders**

#### *5.1. Simulation Methodology*

To verify the proposed in-memory adder through simulation, the 1T-1R memory array and its peripheral circuitry were designed in IHP's 130 nm CMOS process. The memory array was composed of 1T-1R cells in which the ReRAM is modelled using the Stanford-PKU model with a 130 nm NMOS transistor as access transistor. A time-based sense amplifier [9] was used to read from the array (majority operation) and an op-amp was used to simultaneously write multiple bits into the array. A triple-row decoder was designed by interleaving multiple single-row decoders. Detailed schematics of the peripheral circuitry are given in [9]. A simultaneous reading (majority operations) and writing across columns of the array was verified by simulation. As described in Section 4.3.3, the adder can be executed in memory as a sequence of READ (majority) and WRITE operations, which are orchestrated by the memory controller (the memory controller can be designed as a finite-state machine and was not designed in this work).

#### *5.2. Latency of In-Memory PP Adders with Increasing Bit-Width*

The latency of PP adders grows as *log* (*n*). From Figure 6, one can observe that, from 8-bit to 16-bit, the number of logic levels increased by only a single level, i.e., from seven levels to eight levels. The major advantage of the PP adder lies in this (*O*(*log*(*n*)) logic levels), and we aim to extend this advantage to our in-memory implementation. For the 16-bit version, we have to add an extra level of logic to the carry-generate block to calculate the carry (sum generate block remains at three logic levels; see Figure 6). In general, the number of logic levels, *l* is given by

$$l = \log\_2 n + 4\tag{3}$$

for the *n*-bit PP adder (Ladner–Fischer type) synthesized in majority logic [30]. When this 16-bit adder was mapped to the memory array following the procedure used for an 8-bit adder (Section 4.3.3), 22 cycles were incurred. As a result of the interconnections between logic levels, the number of in-memory cycles is always higher than the number of logic levels. For an 8-bit adder (Figure A1), a careful comparison of the in-memory cycles indicated that every logic level is translated into at least two cycles, i.e., 2*l* in-memory cycles. The first few logic levels of the carry–generate block required two more WRITE cycles in addition to the aforementioned WRITE cycles. This additional requirement applies for (*l* − 5) of the *l* levels. Consequently, the number of cycles required for *l* logic levels of an *n*-bit PP adder can be calculated as follows:

$$\begin{aligned} \text{Cycles}\_{in-\text{memory}} &= (2l) + 2(l-5) \\ &= 4l - 10 \\ &= 4(\log\_2 n + 4) - 10 \\ &= 4 \cdot \log\_2 n + 6 \end{aligned} \tag{4}$$

Therefore, any PP adder can be implemented in *O*(*log*2*n*) cycles, which is the fastest in-memory adder reported to date (a detailed comparison is given in Section 5.5).

#### *5.3. Energy of In-Memory PP Adders with Increasing Bit-Width*

The energy consumed during in-memory addition is composed of the actual energy consumed due to addition (switching ReRAM cells during writing; energy consumed in the SA during majority operation) and the array leakage energy. The array leakage energy is the inherent energy consumption due to sneak currents in transistor-less arrays (e.g., some works, such as [41], used a diode to suppress these sneak currents). However, the proposed adder is executed in a 1T–1R array where the sneak currents are negligible. Hence, array leakage energy can be neglected. The energy used to write into an ReRAM cell is *EWRITE* ≈ 12 pJ/bit for IHP's ReRAM. The energy used for majority operation is the energy consumed in the SA, and is given by, *EMAJ* ≈ 0.63 pJ/majority operation. As can be seen in Figure A1, during eight-bit addition, there are 36 majority operations; 8 NOT and 85 bits are written to the array. Neglecting the energy of an NOT operation (which is only 0.13 pJ/bit), the energy needed for eight-bit in-memory addition is

$$Energy\_{8-bit} = 36 \times E\_{MAf} + 84 \times E\_{WRITE} \tag{5}$$

Observing that *EWRITE* is 20 × *EMAJ*, the in-memory addition energy is dominated by the energy that is needed to write into the array.

$$Energy\_{8-bit} \approx \\$4 \times E\_{WRITE} \tag{6}$$

where *EWRITE* is the energy that is needed to write to a single bit. Similarly, during 16-bit addition in memory, 180 cells are written [9]. In general, for *n*-bit addition, (2*n* − 2) × 6 cells are written, making the energy for *n*-bit addition,

$$Energy\_{n-bit} \approx (2n-2) \times 6 \times E\_{WRITE} \tag{7}$$

To summarize, the energy for the proposed in-memory adder grows as ≈ 12*n* times the WRITE energy/bit.

**Figure 6.** Eight-bit and 16-bit PP adder (Ladner–Fischer type) expressed in majority logic [5,30]. From 8-bit to 16-bit, the number of logic levels increased from 7 to 8, i.e., (O(*log*(*n*)) latency in terms of logic levels, before mapping to the memory array.

#### *5.4. Area of In-Memory PP Adders with Increasing Bit-Width*

In all in-memory adders, the peripheral circuitry of the array is modified to support logic operations, resulting in an increase in the CMOS peripheral circuit area. This increase is a significant factor to consider, since this increase in the silicon area is used solely to make the array 'computable'. Therefore, a holistic comparison between in-memory adders should consider both the increase in the peripheral circuitry area and the array area (occupied during addition), with the former being the more significant factor. In this work, the triple-row decoder is the only change required, while all other parts of the peripheral circuitry do not change, since computation is performed using normal memory operations (READ and WRITE). The array area used during the addition is simple to calculate—only six rows are needed, independent of the adder size (see Figure A1). In the of Figure A1 mapping, it is assumed that eight columns share a sense amplifier (this is the case when considering pitch-matching, although there are works which assume a sense amplifier for each column). For 8-bit addition, 80 columns are needed, and for *n*-bit addition, 8*n* + 16 columns are needed. Therefore, for *n*-bit addition, the required array area is 6 × (8*n* + 16).

#### *5.5. Comparison with Other In-Memory Adders*

In this section, we compare the presented in-memory PP adder design methodology with other adders and evaluate the latency with increasing bit-width. In Table 2, the latency of the proposed in-memory PP adder is compared with the latency of in-memory adders summarized in Table 1. With the exception of the two PP adders, the latency of all other adders is *O*(*n*). The sklansky PP adder of [15] incurs a delay of 8*log*2(*n*) + 13, while the majority-based PP adder presented in this work incurs a latency of 4*log*2(*n*) + 6. With a PP architecture, majority logic-based implementation outperforms the OR/AND implementation of [15] in terms of latency. This proves that majority is a stronger logic primitive than OR/AND. The issue of latency becomes more evident when we observe the latency for increasing bit-width. For an 8-bit addition, the XOR-based ripple-carry adders [23,24] incur a latency of 18, which is the same latency as that incurred by the majority-based PP adder. A superficial observation may lead one to conclude that logic primitive alone plays a key role, and both XOR and MAJ are equally good, irrespective of the architecture used. However, the latency of XOR-based adders [23,24] grows to 2*n* + 2, while that of majority-based PP adder grows to 4*log*2(*n*) + 6. In other words, for a 32-bit addition, the XOR-based adders incurs 66 cycles, while the proposed majority-based PP adder will incur only 26 cycles. This disparity further increases for 64-bit additions. In Figure 7, the latency of in-memory adders is plotted for increasing bit-width to better visualize this trend. As plotted in Figure 7, the proposed adder is one of the in-memory adders with the least latency, since it logarithmically depends on *n*. This latency advantage is obtained with only a minor modification to the row-decoder of a conventional memory. It must be noted that most other in-memory adders, compared in Table 2, require significant modifications to the peripheral circuitry. The energy consumption of the proposed inmemory adder is mainly due to the HRS ↔ LRS switching energy of the cells during addition. The leakage energy, due to sneak-path currents (which constitutes a significant portion of the total addition energy in 1S–1R adders), is avoided by the access transistor.


**Table 2.** Latency comparison of in-memory adders (8-bit and *n*-bit).


**Figure 7.** Latency of in-memory adders with increasing bit-width, *n*. An adder with *O*(*log*(*n*)) latency is required for 32-bit/64-bit addition to harness the power of in-memory computation.

#### **6. Conclusions**

The latency of in-memory adders is a severe disadvantage in in-memory computing, i.e., any adder is implemented in the memory array as a long sequence of Boolean operations. A poorly optimized in-memory adder may take longer to compute than the combined time it takes to fetch data from memory and compute in a CMOS processor. In-memory adders have not had their latency analyzed and optimized for higher bit-width, and consequently incur *O*(*n*) latency for *n*-bit addition (32-bit/64-bit adders, typically used in microprocessors, will require hundreds of cycles). In this work, a design methodology is presented to tackle the exorbitant latency of in-memory adders. The strength of the majority logic primitive is coupled with the parallel-prefix (PP) adder architecture to achieve a latency of 4*log*2(*n*)+6 for parallel-prefix additions in the memory array. The main contribution of this work is a generic mapping methodology, used to map a parallelprefix adder circuit (synthesized in majority logic) to the memory array with minimum latency. Multiple majority operations can be performed simultaneously in the columns of the array, and could achieve a *O*(*log*(*n*)) latency for any PP adder. Using the proposed design methodology, 32-bit and 64-bit adders (used in processors) can be implemented in 26 and 30 memory cycles, respectively. This can pave the way for arithmetic and similar computing tasks to be efficiently performed at the data location.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A. Mapping of 8-Bit Ladner-Fischer Adder to Memory Array**

**Figure A1.** Mapping of the eight-bit LF adder of Figure 5 to memory array. All the majority gates in a level are simultaneously executed (red boxes). During parallel-prefix addition, *mi* represents the output of the *i*th majority gate, and *ci* is the carry (denoted in green color, since it is read as a voltage before being written into the array). 3 WRITE denotes writing cycles to 3 different rows, where more than 1 bit may be written in each row.

#### **References**

