Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator

Matsuzawa, Akira; Martinez Alonso, Abdel; Miyahara, Masaya

doi:10.3390/electronics13010050

Open AccessArticle

Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator

by

Akira Matsuzawa

^*,

Abdel Martinez Alonso

^*

and

Masaya Miyahara

Tech Idea Co., Ltd., Kawasaki 214-0021, Japan

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(1), 50; https://doi.org/10.3390/electronics13010050

Submission received: 15 November 2023 / Revised: 18 December 2023 / Accepted: 18 December 2023 / Published: 21 December 2023

(This article belongs to the Special Issue Ultra-Low-Voltage and Ultra-Low-Power Integrated Circuits and Systems Evolution)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This article describes the design and performance analysis of a charge domain mixed-signal multiply-accumulator (MAC) using RDAC, CDAC, and SAR-ADC with an 8-bit resolution for input, weight, and output. The arithmetic accuracy is mainly determined by the ADC, and the gain error has a significant impact. The mismatches and thermal noises of the RDAC and the CDAC are averaged by the number of multiply-accumulate units m connected to one ADC. As a result, if m is large enough, mismatches and thermal noises have a limited impact on the computation accuracy. Most of the computational energy is determined by the energy consumed by the SAR-ADC, and the computational energy per operation can be reduced by increasing m. This last metric is mainly determined by the charge and discharge energy of the CDAC for sufficiently large m values. Furthermore, since RDAC consumes energy unnecessarily, the turn-off timing of RDAC should be optimized. These MAC units have been designed and prototyped using 28 nm CMOS technology, integrating 12,288 arithmetic units while operating at 180 MHz, resulting in an arithmetic speed of 4.4 TOPS. The r-MVM accuracy is about 1% and a high energy efficiency of 240 TOPS/W as a MAC macro and 64.4 TOPS/W as a system has been achieved.

Keywords:

charge domain; Mixed-Signal; MAC; SAR-ADC; energy efficiency; arithmetic accuracy

1. Introduction

A multiply-accumulator (MAC) is a basic function unit for digital signal processors and AI processors. In AI processors, many attempts have been made to increase speed and energy efficiency by lowering bit precisions within the 4-1bit range. However, recognition accuracy was degraded depending on the applications and data set. In contrast, accuracy comparable to FP32 precision has been reported in processors using INT8 resolution [1]. Therefore, 8-bit resolution is enough and proper for general purpose AI processors and is considered feasible even if analog technology is used for the MAC circuit. Charge domain mixed signal MACs have been implemented in AI processors [2,3,4,5,6,7,8,9,10].

Figure 1 shows the principle of charge domain MAC operation using RDAC for input x, CDAC for weight w, and SAR-ADC for output y. Each CDAC has a total capacitance of C, with a weight of w for the input capacitance and a weight (1 − w) for the ground. In the reset phase, all charges in each capacitance are reset by closing the switch S₀ and selecting all CDACs to ground. Then, the switch S₀ is opened and the CDAC controls the capacitance connected to the RDAC or ground according to each weight w. Input x is converted to the corresponding analog voltage by the RDAC. In this state, the output voltage y_in is:

y_{i n} = \frac{1}{m} \sum_{i = 1}^{m} w_{i} x_{i}

(1)

where m is the number of MAC operations. Then, a voltage corresponding to the MAC operation of the input x and the weight w is generated at the output. The output voltage y_in is converted to a digital value y_out by the SAR-ADC as an output.

While this configuration is simple, it appears to be in the early stages of design and could benefit from further refinement. Additionally, a more thorough performance analysis is needed.

This paper describes a design of the charge domain mixed-signal MAC macro using RDAC, CDAC, and SAR-ADC with a [8/8/8] bits resolution for input, weight, and output. Also, a performance analysis on arithmetic accuracy and energy efficiency will be discussed. Finally, the measured results of the prototyped Proof-of-Concept (PoC) chip using 28 nm CMOS will be shown. The remainder of this article is structured as follows: Section 2 provides a detailed description of each component, including the RDAC, CDAC, SAR-ADC, and the overall chip design. Section 3 delves into a discussion on performance analysis, focusing on arithmetic accuracy and energy efficiency. Finally, Section 4 presents and discusses the chip fabrication and measured results from the prototyped Proof-of-Concept (PoC) chip, fabricated using 28 nm CMOS technology.

2. Design of Each Part and Overall Chip

2.1. RDAC

Figure 2 shows the RDAC circuit and layout. The RDAC uses a segment type using equal resistors for the upper four bits, and a binary type using R-2R resistors for the lower four bits. This configuration increases accuracy and reduces current consumption. The average current consumption of the RDAC, I_AVE can be expressed as the following: V_DD is the power supply voltage, R_o is the output resistance, and the upper bit size is M.

I_{A V E} = \frac{V_{D D}}{R_{o}} (\frac{1}{6} + \frac{1}{2^{M}})

(2)

In order to reduce current consumption, it is better to increase the upper bit M; however, this increases the area. This time, M = 4 is used. The output resistance R₀ of this RDAC is 1.5 KΩ and drives 64 CDACs. The size is 8.1 mm × 143 mm. The current flowing through the power supply can be shut down at a programmed time to reduce power dissipation.

2.2. CDAC

Figure 3 shows the CDAC circuits. An 8-bit differential CDAC is used to express binary weight. The capacitance of the CDAC is 5 fF. The RDAC drives 64 differential CDACs.

Figure 4 shows the layout of the CDAC with the register file and the top view of the MOM capacitors. The area of the differential CDAC is 35.5 mm² and the capacitance part is 13.7 mm². The register files provide the weight data to the CDAC, with an area of 29 mm². The capacitance uses the wiring MOM capacitance with 6 metal layers. The overlap length of the MOM capacitor is 1.02 μm and the total length of the 8-bit MOM capacitor is 3.08 μm with a unit MOM capacitor pitch of 0.28 μm.

Figure 5 illustrates the structure of the LSB capacitors and a side view of the MOM capacitance. The capacitance density is high with C5 using four layers, C4 two, and C3, C2, and C1 one each. C2 and C1 are half and a quarter the length of the others, respectively. This design reduces the CDAC area and stray capacitance. Despite minor accuracy concerns, the 8-bit resolution requirement is met as shown in Section 4. Chip Fabrication and Measurement Results.

2.3. SAR-ADC

As shown in Figure 6, a SAR-ADC is used since the occupied area is small and energy consumption is low. In Figure 6, a 4-bit resolution is used for simplicity; however, actual ADC uses an 8-bit resolution. A special circuit is added to cancel the common voltage fluctuation of the input voltage. The values W_p and W_n are the inputs to the CDAC and follow (3) when the weight w is taken at both positive and negative values. The following equation shows how W_p and W_n are calculated:

- 1 \leq w \leq 1

.

{\begin{matrix} W_{p} = \frac{1}{2} (1 + w) \\ W_{n} = \frac{1}{2} (1 - w) \end{matrix}

(3)

Therefore, at the output voltage of the MAC operation, the input voltage V_{in_p} and V_{in_n} of the ADC are:

{\begin{matrix} V_{i n_p} = \frac{V_{D D}}{2 m} (\sum_{i = 1}^{m} X_{i} + \sum_{i = 1}^{m} X_{i} w_{i}) \\ V_{i n_n} = \frac{V_{D D}}{2 m} (\sum_{i = 1}^{m} X_{i} - \sum_{i = 1}^{m} X_{i} w_{i}) \end{matrix}

(4)

Since the common voltage varies with input X, it is not desirable for the ADC operation. Therefore, as shown in Figure 6, a circuit to cancel this input-dependent common voltage is added to the SAR-ADC. Figure 7 shows how this circuit works.

Step 1: Sampling

The switches S₀ and S_bias are closed. The switches S_s and S_com select inputs V _{in_p} and V_{in_n}, and the switch S_bias selects ground. In this state, the input V _{in_p}, and V _{in_n} are applied to the capacitance C_cu and the total capacitance C_u of the DAC of the SAR-ADC.

Step 2: Equalizing

The switch S_com selects a short circuit between the two capacitances C_cu. At this time, each capacitance holds an equal charge Q_c.

Q_{c} = C_{c u} \frac{V_{D D}}{2 m} \sum_{i = 1}^{m} X_{i}

(5)

Step 3: Cancellation and bias

Finally, open the switch S₀ and switch S_s selects V_ref. The S_com selects the CDAC capacitance of the SAR-ADC and the switch S_bias selects V_DD. If C_cu = C_u, the input voltages V _{x_p} and V _{x_n} of the comparator are:

{\begin{matrix} V_{x_p} = \frac{V_{D D}}{2} - \frac{V_{D D}}{4 m} \sum_{i = 1}^{m} X_{i} w_{i} \\ V_{x_n} = \frac{V_{D D}}{2} + \frac{V_{D D}}{4 m} \sum_{i = 1}^{m} X_{i} w_{i} \end{matrix}

(6)

The common voltage is set to V_DD/2, which has no signal dependence and is appropriate for the operation. After that, normal SAR-ADC conversion is performed. The simulation shows that the minimum conversion period is 5 ns and the conversion energy is 0.8 pJ. The ADC size is 9 mm × 43 mm.

2.4. Overall Configuration of the Chip

Figure 8 illustrates the overall configuration of the chip. At the center, there are 12,288 CDACs. Above and below these, 96 RDACs are arranged, generating a voltage proportional to the input X. The chip also includes 64 SAR-ADCs, each connected to 192 CDACs to produce the output Y. The weight values W are stored in 96 Kbit SRAM on the left and are read sequentially. Input X is stored in 48 Kbit SRAM on the upper and lower sides, and output Y is stored in 96 Kbit SRAM on the right side. The chip also integrates a Phase-Locked Loop (PLL) that generates a clock and an SPI control circuit that exchanges data with the outside and controls the internal circuit.

Figure 9 shows the timing of each part. First, 64 weight data W are sequentially written to a register file that provides input data to the CDAC. Thereafter, input data X are given to the RDAC. The top and bottom RDACs generate voltages corresponding to this input X, and the output voltage of the CDAC is sampled by the SAR-ADC. The RDAC OFF pulse then disconnects the RDAC from the power supply and turns it off. At the same time, the SAR-ADC operation is performed, and the conversion value is output. Input-to-output latency is only one clock.

3. Performance Analysis

3.1. Arithmetic Accuracy

Figure 10 shows factors that contribute to calculation errors. ε_R is the error of the RDAC, ε_C is the error of the CDAC, ε_CT is the thermal noise induced on the CDAC, G_MAC is the gain caused primarily by the parasitic capacitance C_P in the MAC operation, m is the number of MAC circuits connecting parallelly to the one ADC, ε_q is an ADC quantization noise, ε_Al is a linearity error of the ADC, ε_AT is a thermal noise of the ADC, ε_OFF is an ADC offset voltage, and G_ADC is a gain of the ADC controlled by adjusting the reference voltage of the ADC. Voltage generated by the RDAC and capacitance of the CDAC are multiplied by taking a product, but for the sake of simplicity, we used an approximation in which the product of two variables, including the error of the RDAC and the CDAC, can be approximated by adding each error if the error is small enough.

The error of RDAC, ε_R is mainly caused by the resistance mismatch of the RDAC, in which the upper 4 bits use a thermometer code and 15 equal-value resistors, and the resistance area is large enough that the error is only 0.02% in standard deviation. Furthermore, the actual value should be obtained by the averaging effect of m, then ε_R is:

ε_{R} = \frac{ε_{R 0}}{\sqrt{m}}

(7)

where ε_R0 is the error of RDAC when m = 1. The actual m = 192 and ε_R is only 0.0014%. The error of the CDAC ε_C is primarily caused by a mismatch of the CDAC. Now, if the i-th input is x_i, the i-th weight is w_i_, and the i-th capacitance mismatch is δ_i, then the error caused by the MAC operation, ε_C is given by:

ε_{C}^{2} = \frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} \cdot w_{i} \cdot δ_{i})}^{2}

(8)

Assuming that the average value of x_iw_i is 1/4 and δ_i is all δ_C:

ε_{C}^{2} = \frac{δ_{C}^{2}}{16 m}

(9)

The capacitance mismatch δ_C is inversely proportional to the square root of the capacitance value, and A is the coefficient of proportionality [11,12].

δ_{c} = \frac{A}{\sqrt{C_{u}}}

(10)

Therefore, from Equation (9):

ε_{C} = \frac{A}{4 \sqrt{m C_{u}}}

(11)

From the datasheet, A is 0.85% when the capacitance is 1 fF and C_u is 5 fF; thus, ε_C is 0.1% when m = 1 and m is 192, so ε_C is 0.007%.

The thermal noise induced on the CDAC ε_CT is:

ε_{C T} = \frac{1}{V_{R}} \sqrt{\frac{k T}{m C_{u}}}

(12)

V_R is the maximum voltage of the input and about 0.8 V.

The value of the ε_CT at room temperature and m = 1 is 0.11% and 0.008% for m = 192.

G_MAC is determined by the parasitic capacitance of each CDAC C_pu.

G_{M A C} = \frac{C_{u}}{C_{u} + C_{p u}}

(13)

This value is about 0.8 from the measurement results.

The ADC quantization noise ε_q is:

ε_{q} = \frac{1}{2 \sqrt{3} \times 2^{N}}

(14)

where N is the resolution of the ADC. Since N = 8, ε_q is 0.12%.

The linearity error of the ADC ε_AL is mainly caused by the capacitance mismatch of CDAC in a SAR-ADC. The capacitance of the CDAC is 10 fF and the ε_AL can be estimated at 0.27% by (10).

The thermal noise of the ADC ε_AT is:

ε_{A T} = \frac{1}{V_{R}} \sqrt{\frac{k T}{C_{A D C}}}

(15)

Since the CADC is 10 fF, the value of ε_AT at room temperature is 0.08%.

ADC offset voltage ε_OFF can be estimated to be about 0.2%. G_ADC should be set to compensate for the decrease in the gain in the MAC circuits, shown in (13). Figure 11 shows each calculation error for the parallel number m.

The RDAC error ε_R_, CDAC error ε_C, and thermal noise ε_CT decrease rapidly by the averaging effect when the parallel number m increases and the effect of the calculation error can be neglected. The calculation error of the MAC unit consisting of the RDACs and the CDACs is very small.

Even if the capacitance of the CDAC is reduced to about 5 fF, the calculation error is 0.01% due to the averaging effect. In contrast, an ADC does not have such an averaging function, and its performance is dominated by its linearity error of about 0.27%. According to our simulation, even if we add up all the errors, the total error is estimated to be about 0.35%. The main challenge seems to be improving the compensation accuracy for the gain decrease in the MAC operation.

3.2. Energy Efficiency

The energy efficiency is determined by the MAC circuit consisting of the RDACs, the CDACs, and the SAR-ADCs. First, consider the MAC operation. The upper four bits of the RDAC use a thermometer code by connecting the same resistors in parallel, and the lower four bits use binary code in the R-2R configuration. The output resistor R_o is R_u/8 with the unit resistance R_u. Figure 12 shows an equivalent circuit as seen from the power supply V_DD when the load capacitance of the RDAC is C_L. G_D represents the output conductance of RDAC, w represents the weight, and G_R represents the conductance through which a current flows regardless of w.

In the step response of the voltage V_DD, the time constant τ is:

τ = \frac{C_{L}}{G_{D}}

(16)

The output voltage V_out is:

V_{o u t} = w V_{D D} (1 - e^{- \frac{t}{τ}})

(17)

The current flowing out of the power supply is G_D/2^M with G_R as M for the upper conversion bit.

I_{D D} = G_{D} V_{D D} {\frac{1}{2^{M}} + w (1 - w) + w^{2} e^{- \frac{t}{τ}}}

(18)

Since it takes about 6τ for the output voltage to settle to 1/2 LSB with 8-bit resolution, the energy E_D consumed at this time is integrated I_DD up to 6τ.

E_{D} = C_{L} V_{D D}^{2} {\frac{6}{2^{M}} + 6 w (1 - w) + w^{2}}

(19)

Find the average value from 0 to 1 to the w.

E_{D} = C_{L} V_{D D}^{2} {\frac{6}{2^{M}} + \frac{4}{3}}

(20)

The differential CDAC as the load of the RDAC consists of two capacitances in series, which are two weighted capacitances connected in series and have a differential configuration. It also has a parasitic capacitance C_pl by wiring. Thus, the load capacitance C_L is:

C_{L} = 2 w (1 - w) C_{u} + C_{p l}

(21)

Since w takes a value from 0 to 1, the average value can be found as:

C_{L} = \frac{C_{u}}{3} + C_{p l}

(22)

Therefore, Equation (20) substitutes Equation (22), and with M = 4.

E_{D} = 1.7 (\frac{C_{u}}{3} + C_{p l}) V_{D D}^{2}

(23)

ADC energy consumption is mainly determined by the SAR logic circuits. The circuit simulation shows that the conversion energy of the ADC E_conv is 0.8 pJ. Since m CDACs are connected to the input end of the ADC, the total energy consumption of the MAC, including ADC for one MAC operation, is given by the following Equation:

E_{M A C} = 1.7 (\frac{C_{u}}{3} + C_{p l}) V_{D D}^{2} + \frac{E_{c o n v}}{m}

(24)

Figure 13 shows how the energy consumption varies for different parallel numbers of m. The energy consumed by the MAC operation consisting of the RDACs and the CDACs is constant at about 2.4 fJ, and the contribution of the energy consumption of the ADC to the MAC operation decreases with the increase in m. In this example, at m = 400, the energy consumption of the MAC and the ADC are equal. In this chip, m = 192, so the contribution of ADC energy consumption is 5 fJ. At m = 192, the contribution of the energy consumption of ADC to the MAC operation is about twice.

Figure 14 shows how the estimated energy efficiency changes as a function of m when C_u = 5 fF, C_pl = 0.5 fF, V_DD = 0.8 V, and E_conv = 0.8 pJ. The larger the value of m, the higher the energy efficiency. Since this chip is set to m = 192, a high energy efficiency of about 300 TOPS/W can be expected. The energy consumption of the ADC when using 16 nm CMOS, E_conv is 0.4 pJ. At this time, an energy efficiency of about 450 TOPS/W can be expected at m = 192, if using 16 nm CMOS technology.

Therefore, the charge domain mixed signal MAC using RDAC, CDAC, and the SAR-ADC is expected to further increase its energy efficiency by reducing its energy consumption through further technology scaling.

4. Chip Fabrication and Measurement Results

The chip was prototyped using 28 nm CMOS technology. Figure 15 shows the chip layout and the chip photo. The chip size is 2.675 mm × 2.675 mm, and the MAC Macro consisting of RDACs, CDACs, and SAR-ADCs is about 1.4 mm × 1.5 mm.

Figure 16 illustrates the setup of the measurement environment and the system implementation. To maintain a low-noise measurement environment, all controllers and the MS-MAC were battery-powered. Communication was established using USB/SPI interfaces, which are standard features in commercially available FPGA boards.

Figure 17 shows the three chips’ measured linearity of the MAC with the SAR-ADCs when the CDAC weight is maximized and the RDAC is controlled. All three chips have linearity errors less than +0.3 LSB/−0.4 LSB for the differential nonlinearity (DNL) and +0.5 LSB/−0.9 LSB for the integral nonlinearity (INL) in 8-bit resolution.

The r-MVM test, as outlined in [5,10], is a hardware-centric method designed to assess and quantify the precision of mixed-signal processors. This technique holds particular importance in mixed-signal processing, where the accuracy of data conversion between analog and digital formats is crucial. Figure 18 visually depicts the r-MVM test methodology, providing an overview of the test process and highlighting the various stages involved in executing the r-MVM test. The implementation of the r-MVM test methodology was facilitated using MATLAB/Simulink, which enabled more efficient manipulation and testing of data sets. The study also incorporated FPGA-based controllers, which processed 192 K points across three distinct chips, further enhancing the testing process.

Figure 19a presents the measured results of the random Matrix Vector Multiplication (r-MVM) test. When sufficient set-up time was provided, the standard deviation of the r-MVM was approximately 1.1%. The error tends to increase when the input X and weight W are larger, suggesting the influence of gain error. Consequently, the error normalized by the output value, as depicted in Figure 19b, was obtained. This normalized error is nearly constant at about 0.3%, demonstrating good consistency with the estimation shown in Figure 11.

Therefore, the r-MVM error seems to be mainly due to the gain error. Therefore, more fine gain compensation could reduce the calculation error.

Figure 20 shows how the ON-time of the RDAC affects the measured r-MVM and the energy efficiency. The longer the ON-time of the RDAC, the lower the r-MVM but the lower the energy efficiency due to the ineffective RDAC current. The RDAC ON-time of approximately 1.0 ns is the optimal operating point. The time constant τ is proportional to the number of CDACs n driven by one RDAC, the output resistance of the RDAC R_o, and the average capacitance of the CDAC C_L_.

τ = n R_{o} C_{L}

(25)

Therefore, substituting n = 64, R_o = 1.5 KΩ, and C_L = 1.7 fF, τ is 163 ps. Hence, since 6.1 τ is required for 8-bit precision settling, the estimated settling time is about 1.0 ns, which is consistent with the measurement data.

Figure 21 shows the measured maximum operating frequency f_max, the energy efficiency as a function of the supply voltage with a fixed RDAC ON-time of 1.0 ns. If the power supply voltage V_DD is high, the maximum operating frequency f_max is high, but the energy efficiency decreases. Therefore, the power supply voltage of 0.8 V shows a good balance between the maximum operating frequency and the energy efficiency.

Figure 22 plots the measured energy efficiency and r-MVM of the three chips. Almost all the chips realize r-MVM of about 1% with an energy efficiency of up to about 240 TOPS/W. The measured energy efficiency of 240 TOPS/W is about 80% of the estimation shown in Figure 14 and shows that the expectation accuracy is high enough.

Table 1 shows a comparison of the performance of this chip and the MACs using other mixed-signal technologies. System-level energy efficiency takes into account the power consumption of SRAMs.

In terms of energy efficiency, the [4,4,4] configuration using 7 nm CMOS [3] has the highest energy efficiency of 351 TOPS/W, but for the [8,8,8] configuration, this chip shows the world’s top-class energy efficiency, in spite of using the most relaxed technology.

5. Conclusions

We have designed a charge domain mixed-signal MAC using RDAC, CDAC, and SAR-ADC with an 8-bit resolution for input, weight, and output and analyzed performance on arithmetic accuracy and energy efficiency.

The arithmetic accuracy is mainly determined by the ADC, and the gain error has a significant impact. The effect of mismatches and thermal noises of the RDAC and the CDAC can be suppressed by averaging. If the number of MAC units connected parallelly to the one ADC m is large enough, it has almost little effect.

Most of the computational energy is determined by the energy consumed by the SAR-ADC. However, by increasing m, the computational energy per operation can be reduced. If m is sufficiently large, the charge and discharge energy of the CDAC becomes the deciding factor. It is important to note that the RDAC consumes energy unnecessarily, so optimizing the control of its turn-off timing is crucial for achieving high-energy efficiency with minimal calculation error. Furthermore, technology scaling proves effective in enhancing energy efficiency.

These MAC units have been designed and prototyped using 28 nm CMOS technology, integrated 12,288 arithmetic units, operated at 180 MHz, and obtained an arithmetic speed of 4.4 TOPS. The r-MVM accuracy is about 1% and the high-energy efficiency of 240 TOPS/W as a MAC macro and 64.4 TOPS/W as a system have been achieved. This energy efficiency is the world’s top-class and higher energy efficiency is expected by using more scaled technology.

This charge domain mixed-signal MAC using RDAC, CDAC, and SAR-ADC is very attractive for AI processors. The MS-MAC macro has a fully digital I/O interface and can be seamlessly used as a replacement for digital MAC circuits. Moreover, the energy consumption associated with data transfer, and consequently, the system-level energy efficiency, can be improved by combining the MS-MAC macro with non-volatile memory, such as MRAM, FeRAM, and ReRAM.

Also, for FIR filters [13], correlators in GPS receivers [14], and matched filters in wireless communications [15] are other candidates for the application. Furthermore, one interesting candidate to be used is the high-speed and low-power digital filters on an ultra-high-speed DSP for optical communication systems [16] since data resolution is 8-bit and the power consumption of current digital circuits is very large.

Author Contributions

Conceptualization, A.M.; Methodology, A.M.; Software, A.M.A.; Validation, A.M. and A.M.A.; Formal analysis, A.M., A.M.A. and M.M.; Investigation, A.M., A.M.A. and M.M.; Resources, A.M.; Data curation, A.M. and A.M.A.; Writing—original draft, A.M.; Writing—review & editing, A.M., A.M.A. and M.M.; Visualization, A.M. and A.M.A.; Supervision, A.M. and M.M.; Project administration, A.M.; Funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is based on results obtained from a project, JPNP18004, subsidized by the New Energy and Industrial Technology Development Organization (NEDO).

Data Availability Statement

Data is contained within the article.

Acknowledgments

We would like to express sincere thanks to Lilan Yu, Pham Nam Hai, and Masato Motomura for their useful discussions and encouragement.

Conflicts of Interest

Author Akira Matsuzawa, Abdel Martinez Alonso and Masaya Miyahara were employed by the company Tech Idea Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Kawamoto, R.; Taichi, M.; Kabuto, M.; Watanabe, D.; Izumi, S.; Yoshimoto, M.; Kawaguchi, H.; Matsukawa, G.; Goto, T.; Kojima, M. A 1.15-TOPS 6.57-TOPS/W Neural Network Processor for Multi-Scale Object Detection with Reduced Convolutional Operations. IEEE J. Sel. Top. Signal Process. 2020, 14, 634–645. [Google Scholar] [CrossRef]
Bankman, D.; Yang, L.; Moons, B.; Verhels, M.; Murmann, B. An Always-On 3.8 mJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28-nm CMOS. IEEE J. Solid-State Circuits 2021, 54, 158–172. [Google Scholar] [CrossRef]
Sinangil, M.E.; Erbagci, B.; Naous, R.; Akarvardar, K.; Sun, D.; Khwa, W.-S.; Liao, H.-J.; Wang, Y.; Chang, J. A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS. IEEE J. Solid-State Circuits 2021, 56, 188–198. [Google Scholar] [CrossRef]
Xie, S.; Ni, C.; Sayal, A.; Jain, P.; Hamzaoglu, F.; Kulkarni, J.P. eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing. In Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 248–249. [Google Scholar] [CrossRef]
Wang, H.; Liu, R.; Dorrance, R.; Dasalukunte, D.; Liu, X.; Lake, D.; Carlton, B.; Wu, M. A 32.2 TOPS/W SRAM Compute-in Memory Macro Employing a Linear 8-bit C-2C Ladder for Charge Domain Computation in 22 nm for Edge Inference. In Proceedings of the 2022 Symposium on VLSI Technology & Circuits Digest of Technical Papers, Honolulu, HI, USA, 13–17 June 2022; pp. 36–37. [CrossRef]
Hsieh, S.-E.; Wei, C.-H.; Xue, C.-X.; Lin, H.-W.; Tu, W.-H.; Chang, E.-J.; Yang, K.-T.; Chen, P.-H.; Liao, W.-N.; Low, L.L.; et al. A 70.85-86.27TOPS/W PVT-Insensitive 8b Word-Wise ACIM with Post-Processing Relaxation. In Proceedings of the 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 136–137. [CrossRef]
Chen, P.; Wu, M.; Zhao, W.; Cui, J.; Wang, Z.; Zhang, Y.; Wang, Q.; Ru, J.; Shen, L.; Jia, T.; et al. A 22 nm Delta-Sigma Computing-In-Memory (ΔΣCIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing. In Proceedings of the 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 140–141. [CrossRef]
Jia, H.; Ozatay, M.; Tang, Y.; Valavi, H.; Pathak, R.; Lee, J.; Verma, N. A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing. In Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 236–238. [Google Scholar] [CrossRef]
Xue, C.-X.; Hung, J.-M.; Kao, H.-Y.; Huang, Y.-H.; Huang, S.-P.; Chang, F.-C.; Chen, P.; Liu, T.-W.; Jhang, C.-J.; Su, C.-I.; et al. A 22 nm 4 Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices. In Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 245–247. [Google Scholar] [CrossRef]
Khaddam-Aljameh, R.; Stanisavljevic, M.; Mas, J.F.; Karunaratne, G.; Brandli, M.; Liu, F.; Singh, A.; Muller, S.M.; Egger, U.; Petropoulos, A.; et al. HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs. IEEE J. Solid-State Circuits 2022, 57, 1027–1038. [Google Scholar] [CrossRef]
Pelgrom, M. Analog-to-Digital Conversion, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
Tripathi, V.; Murmann, B. Mismatch Characterization of Small Metal Fringe Capacitors. IEEE Trans. Circuits Syst. I Regul. Pap. 2014, 61, 2236–2242. [Google Scholar] [CrossRef]
Duppils, M.; Eklund, J.-E.; Svenson, C. A Novel Mixed Analog/Digital MAC Unit Implemented with SC Technique Suitable for Fully Programmable Narrow-Band FIR filter Applications. In Proceedings of the ICECS’99. 6th IEEE International Conference on Electronics, Circuit and Systems, Pafos, Cyprus, 5–8 September 1999; pp. 1197–1200. [Google Scholar] [CrossRef]
Li, J.; He, W.; Zhang, B.; Qi, L.; He, G.; Seok, M. CCSA: A 394TOPS/W Mixed-Signal GPS Accelerator with Charge-Based Correlation Computing for Signal Acquisition. In Proceedings of the 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 430–431. [Google Scholar] [CrossRef]
Yamasaki, T.; Nakayama, T.; Shibata, T. A Low-Power and Compact CDMA Matched Filter Based on Switched-Current Technology. IEEE J. Solid-State Circuits 2005, 40, 926–932. [Google Scholar] [CrossRef]
Cao, J.; Cui, D.; Nazemi, A.; He, T.; Li, G.; Catli, B.; Khanpour, M.; Hu, K.; Ali, T.; Zhang, H.; et al. A Transmitter and Receiver for 100 Gb/s Coherent Networks with Integrated 4 × 64 GS/s 8b ADCs and DACs in 20 nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 5–9 February 2017; pp. 484–485. [Google Scholar]

Figure 1. MAC configuration and calculation principle.

Figure 2. RDAC circuit and layout.

Figure 3. CDAC circuits.

Figure 4. CDAC with register file and top view of the MOM capacitors. (a) CDAC with register file; (b) Top view of the MOM capacitors.

Figure 5. Side view of the MOM capacitor and structure of LSB capacitors. (a) Side view of the MOM capacitors; (b) Structure of LSB capacitors.

Figure 6. SAR-ADC circuit and layout.

Figure 7. Canceling method to the input-dependent common voltage.

Figure 8. Configuration of the chip.

Figure 9. Timing diagram.

Figure 10. Factors that provide a calculation error.

Figure 11. Each calculation error vs. parallel number m.

Figure 12. Equivalent circuit of RDAC.

Figure 13. Energy consumption of MAC and ADC vs. m.

Figure 14. Energy efficiency vs. m.

Figure 15. Chip layout and photo.

Figure 16. Physical interface for measurement setup.

Figure 17. Measured linearity of the MACs with SAR-ADCs in 8-bit resolution.

Figure 18. r-MVM test method and measurement setup.

Figure 19. Measured and normalized r-MVM. (a) Measured data; (b) Normalized data.

Figure 20. Measured energy efficiency and r-MVM vs. ON -time of the RDAC.

Figure 21. f _max and energy efficiency vs. V_DD.

Figure 22. Energy efficiency and r-MVM of three chips.

Table 1. Comparison of the performance of MACs using other M/S technologies.

	[3]	[8]	[9]	[10]	[11]	This Work
Technology [nm]	7	16	22	22	14	28
Area [mm²]	0.0032 ⁷	25 ³	6 ²	0.25 ³	0.63 ⁵	1.1 ⁶/7.15 ³
X/W/Y [bit]	4/4/4	4/4/8	8/8/14	8/8/8	8/Analog/8	8/8/8
Operation Frequency [MHz]	181.8 ⁸	200	-	145–240	1000	180
ADC architecture	Flash	SAR	Sense-Amp	-	CCO-based	SAR
Peak Performance [GOPS]	372.4/186.2 ¹	11.8K/5.9K ¹	35	1000	1008 ¹⁰	4423 ⁹
Energy Efficiency [TOPS/W]	351/175.5 ¹	121/60.5 ¹	11.91	32.2	10.5 ¹⁰	241.1 ⁴: Macro 64.4 ⁴: System (T_sRDAC = 1.0ns)
Area Efficiency (TOPS/mm²)	116.3/58.1 ¹	2.67/1.34 ¹	0.013	4	1.59 ¹⁰	4.02 ⁶
r-MVM error [%FS]	-	-	-	σ = 0.65	σ = 1.94	σ = 1.1

Note: ¹: Normalized to 8b input, ²: Including I/O and test mode, ³: Including I/O, test mode, digital Ctrl., CLK Gen. ⁴: Peak performance with σ ≈ 1% at r-MVM test, ⁵: Calculated from area efficiency. ⁶: Active area of MS-MAC macro. ⁷: Including test and reconfigurable blocks. ⁸: Calculated as 1/(access time). ⁹: Calculated as 12,288 × MAC × 2 OPS/MAC × 180 MHz ¹⁰: running MNIST.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Matsuzawa, A.; Martinez Alonso, A.; Miyahara, M. Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator. Electronics 2024, 13, 50. https://doi.org/10.3390/electronics13010050

AMA Style

Matsuzawa A, Martinez Alonso A, Miyahara M. Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator. Electronics. 2024; 13(1):50. https://doi.org/10.3390/electronics13010050

Chicago/Turabian Style

Matsuzawa, Akira, Abdel Martinez Alonso, and Masaya Miyahara. 2024. "Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator" Electronics 13, no. 1: 50. https://doi.org/10.3390/electronics13010050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator

Abstract

1. Introduction

2. Design of Each Part and Overall Chip

2.1. RDAC

2.2. CDAC

2.3. SAR-ADC

2.4. Overall Configuration of the Chip

3. Performance Analysis

3.1. Arithmetic Accuracy

3.2. Energy Efficiency

4. Chip Fabrication and Measurement Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI