Next Article in Journal
A Voltage-Level Optimization Method for DC Remote Power Supply of 5G Base Station Based on Converter Behavior
Previous Article in Journal
Charging Dispatching Strategy for Islanded Microgrid Battery-Swapping Stations
Previous Article in Special Issue
Modeling of Cross-Coupled AC–DC Charge Pump Operating in Subthreshold Region
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator

by
Akira Matsuzawa
*,
Abdel Martinez Alonso
* and
Masaya Miyahara
Tech Idea Co., Ltd., Kawasaki 214-0021, Japan
*
Authors to whom correspondence should be addressed.
Electronics 2024, 13(1), 50; https://doi.org/10.3390/electronics13010050
Submission received: 15 November 2023 / Revised: 18 December 2023 / Accepted: 18 December 2023 / Published: 21 December 2023

Abstract

:
This article describes the design and performance analysis of a charge domain mixed-signal multiply-accumulator (MAC) using RDAC, CDAC, and SAR-ADC with an 8-bit resolution for input, weight, and output. The arithmetic accuracy is mainly determined by the ADC, and the gain error has a significant impact. The mismatches and thermal noises of the RDAC and the CDAC are averaged by the number of multiply-accumulate units m connected to one ADC. As a result, if m is large enough, mismatches and thermal noises have a limited impact on the computation accuracy. Most of the computational energy is determined by the energy consumed by the SAR-ADC, and the computational energy per operation can be reduced by increasing m. This last metric is mainly determined by the charge and discharge energy of the CDAC for sufficiently large m values. Furthermore, since RDAC consumes energy unnecessarily, the turn-off timing of RDAC should be optimized. These MAC units have been designed and prototyped using 28 nm CMOS technology, integrating 12,288 arithmetic units while operating at 180 MHz, resulting in an arithmetic speed of 4.4 TOPS. The r-MVM accuracy is about 1% and a high energy efficiency of 240 TOPS/W as a MAC macro and 64.4 TOPS/W as a system has been achieved.

1. Introduction

A multiply-accumulator (MAC) is a basic function unit for digital signal processors and AI processors. In AI processors, many attempts have been made to increase speed and energy efficiency by lowering bit precisions within the 4-1bit range. However, recognition accuracy was degraded depending on the applications and data set. In contrast, accuracy comparable to FP32 precision has been reported in processors using INT8 resolution [1]. Therefore, 8-bit resolution is enough and proper for general purpose AI processors and is considered feasible even if analog technology is used for the MAC circuit. Charge domain mixed signal MACs have been implemented in AI processors [2,3,4,5,6,7,8,9,10].
Figure 1 shows the principle of charge domain MAC operation using RDAC for input x, CDAC for weight w, and SAR-ADC for output y. Each CDAC has a total capacitance of C, with a weight of w for the input capacitance and a weight (1 − w) for the ground. In the reset phase, all charges in each capacitance are reset by closing the switch S0 and selecting all CDACs to ground. Then, the switch S0 is opened and the CDAC controls the capacitance connected to the RDAC or ground according to each weight w. Input x is converted to the corresponding analog voltage by the RDAC. In this state, the output voltage yin is:
y i n = 1 m i = 1 m w i x i
where m is the number of MAC operations. Then, a voltage corresponding to the MAC operation of the input x and the weight w is generated at the output. The output voltage yin is converted to a digital value yout by the SAR-ADC as an output.
While this configuration is simple, it appears to be in the early stages of design and could benefit from further refinement. Additionally, a more thorough performance analysis is needed.
This paper describes a design of the charge domain mixed-signal MAC macro using RDAC, CDAC, and SAR-ADC with a [8/8/8] bits resolution for input, weight, and output. Also, a performance analysis on arithmetic accuracy and energy efficiency will be discussed. Finally, the measured results of the prototyped Proof-of-Concept (PoC) chip using 28 nm CMOS will be shown. The remainder of this article is structured as follows: Section 2 provides a detailed description of each component, including the RDAC, CDAC, SAR-ADC, and the overall chip design. Section 3 delves into a discussion on performance analysis, focusing on arithmetic accuracy and energy efficiency. Finally, Section 4 presents and discusses the chip fabrication and measured results from the prototyped Proof-of-Concept (PoC) chip, fabricated using 28 nm CMOS technology.

2. Design of Each Part and Overall Chip

2.1. RDAC

Figure 2 shows the RDAC circuit and layout. The RDAC uses a segment type using equal resistors for the upper four bits, and a binary type using R-2R resistors for the lower four bits. This configuration increases accuracy and reduces current consumption. The average current consumption of the RDAC, IAVE can be expressed as the following: VDD is the power supply voltage, Ro is the output resistance, and the upper bit size is M.
I A V E = V D D R o ( 1 6 + 1 2 M )
In order to reduce current consumption, it is better to increase the upper bit M; however, this increases the area. This time, M = 4 is used. The output resistance R0 of this RDAC is 1.5 KΩ and drives 64 CDACs. The size is 8.1 mm × 143 mm. The current flowing through the power supply can be shut down at a programmed time to reduce power dissipation.

2.2. CDAC

Figure 3 shows the CDAC circuits. An 8-bit differential CDAC is used to express binary weight. The capacitance of the CDAC is 5 fF. The RDAC drives 64 differential CDACs.
Figure 4 shows the layout of the CDAC with the register file and the top view of the MOM capacitors. The area of the differential CDAC is 35.5 mm2 and the capacitance part is 13.7 mm2. The register files provide the weight data to the CDAC, with an area of 29 mm2. The capacitance uses the wiring MOM capacitance with 6 metal layers. The overlap length of the MOM capacitor is 1.02 μm and the total length of the 8-bit MOM capacitor is 3.08 μm with a unit MOM capacitor pitch of 0.28 μm.
Figure 5 illustrates the structure of the LSB capacitors and a side view of the MOM capacitance. The capacitance density is high with C5 using four layers, C4 two, and C3, C2, and C1 one each. C2 and C1 are half and a quarter the length of the others, respectively. This design reduces the CDAC area and stray capacitance. Despite minor accuracy concerns, the 8-bit resolution requirement is met as shown in Section 4. Chip Fabrication and Measurement Results.

2.3. SAR-ADC

As shown in Figure 6, a SAR-ADC is used since the occupied area is small and energy consumption is low. In Figure 6, a 4-bit resolution is used for simplicity; however, actual ADC uses an 8-bit resolution. A special circuit is added to cancel the common voltage fluctuation of the input voltage. The values Wp and Wn are the inputs to the CDAC and follow (3) when the weight w is taken at both positive and negative values. The following equation shows how Wp and Wn are calculated: 1 w 1 .
{ W p = 1 2 ( 1 + w ) W n = 1 2 ( 1 w )
Therefore, at the output voltage of the MAC operation, the input voltage Vin_p and Vin_n of the ADC are:
{ V i n _ p = V D D 2 m ( i = 1 m X i + i = 1 m X i w i ) V i n _ n = V D D 2 m ( i = 1 m X i i = 1 m X i w i )
Since the common voltage varies with input X, it is not desirable for the ADC operation. Therefore, as shown in Figure 6, a circuit to cancel this input-dependent common voltage is added to the SAR-ADC. Figure 7 shows how this circuit works.
  • Step 1: Sampling
The switches S0 and Sbias are closed. The switches Ss and Scom select inputs V in_p and Vin_n, and the switch Sbias selects ground. In this state, the input V in_p, and V in_n are applied to the capacitance Ccu and the total capacitance Cu of the DAC of the SAR-ADC.
  • Step 2: Equalizing
The switch Scom selects a short circuit between the two capacitances Ccu. At this time, each capacitance holds an equal charge Qc.
Q c = C c u V D D 2 m i = 1 m X i
  • Step 3: Cancellation and bias
Finally, open the switch S0 and switch Ss selects Vref. The Scom selects the CDAC capacitance of the SAR-ADC and the switch Sbias selects VDD. If Ccu = Cu, the input voltages V x_p and V x_n of the comparator are:
{ V x _ p = V D D 2 V D D 4 m i = 1 m X i w i V x _ n = V D D 2 + V D D 4 m i = 1 m X i w i
The common voltage is set to VDD/2, which has no signal dependence and is appropriate for the operation. After that, normal SAR-ADC conversion is performed. The simulation shows that the minimum conversion period is 5 ns and the conversion energy is 0.8 pJ. The ADC size is 9 mm × 43 mm.

2.4. Overall Configuration of the Chip

Figure 8 illustrates the overall configuration of the chip. At the center, there are 12,288 CDACs. Above and below these, 96 RDACs are arranged, generating a voltage proportional to the input X. The chip also includes 64 SAR-ADCs, each connected to 192 CDACs to produce the output Y. The weight values W are stored in 96 Kbit SRAM on the left and are read sequentially. Input X is stored in 48 Kbit SRAM on the upper and lower sides, and output Y is stored in 96 Kbit SRAM on the right side. The chip also integrates a Phase-Locked Loop (PLL) that generates a clock and an SPI control circuit that exchanges data with the outside and controls the internal circuit.
Figure 9 shows the timing of each part. First, 64 weight data W are sequentially written to a register file that provides input data to the CDAC. Thereafter, input data X are given to the RDAC. The top and bottom RDACs generate voltages corresponding to this input X, and the output voltage of the CDAC is sampled by the SAR-ADC. The RDAC OFF pulse then disconnects the RDAC from the power supply and turns it off. At the same time, the SAR-ADC operation is performed, and the conversion value is output. Input-to-output latency is only one clock.

3. Performance Analysis

3.1. Arithmetic Accuracy

Figure 10 shows factors that contribute to calculation errors. εR is the error of the RDAC, εC is the error of the CDAC, εCT is the thermal noise induced on the CDAC, GMAC is the gain caused primarily by the parasitic capacitance CP in the MAC operation, m is the number of MAC circuits connecting parallelly to the one ADC, εq is an ADC quantization noise, εAl is a linearity error of the ADC, εAT is a thermal noise of the ADC, εOFF is an ADC offset voltage, and GADC is a gain of the ADC controlled by adjusting the reference voltage of the ADC. Voltage generated by the RDAC and capacitance of the CDAC are multiplied by taking a product, but for the sake of simplicity, we used an approximation in which the product of two variables, including the error of the RDAC and the CDAC, can be approximated by adding each error if the error is small enough.
The error of RDAC, εR is mainly caused by the resistance mismatch of the RDAC, in which the upper 4 bits use a thermometer code and 15 equal-value resistors, and the resistance area is large enough that the error is only 0.02% in standard deviation. Furthermore, the actual value should be obtained by the averaging effect of m, then εR is:
ε R = ε R 0 m
where εR0 is the error of RDAC when m = 1. The actual m = 192 and εR is only 0.0014%. The error of the CDAC εC is primarily caused by a mismatch of the CDAC. Now, if the i-th input is xi, the i-th weight is wi, and the i-th capacitance mismatch is δi, then the error caused by the MAC operation, εC is given by:
ε C 2 = 1 m 2 i = 1 m ( x i w i δ i ) 2
Assuming that the average value of xiwi is 1/4 and δi is all δC:
ε C 2 = δ C 2 16 m
The capacitance mismatch δC is inversely proportional to the square root of the capacitance value, and A is the coefficient of proportionality [11,12].
δ c = A C u
Therefore, from Equation (9):
ε C = A 4 m C u
From the datasheet, A is 0.85% when the capacitance is 1 fF and Cu is 5 fF; thus, εC is 0.1% when m = 1 and m is 192, so εC is 0.007%.
The thermal noise induced on the CDAC εCT is:
ε C T = 1 V R k T m C u
VR is the maximum voltage of the input and about 0.8 V.
The value of the εCT at room temperature and m = 1 is 0.11% and 0.008% for m = 192.
GMAC is determined by the parasitic capacitance of each CDAC Cpu.
G M A C = C u C u + C p u
This value is about 0.8 from the measurement results.
The ADC quantization noise εq is:
ε q = 1 2 3 × 2 N
where N is the resolution of the ADC. Since N = 8, εq is 0.12%.
The linearity error of the ADC εAL is mainly caused by the capacitance mismatch of CDAC in a SAR-ADC. The capacitance of the CDAC is 10 fF and the εAL can be estimated at 0.27% by (10).
The thermal noise of the ADC εAT is:
ε A T = 1 V R k T C A D C
Since the CADC is 10 fF, the value of εAT at room temperature is 0.08%.
ADC offset voltage εOFF can be estimated to be about 0.2%. GADC should be set to compensate for the decrease in the gain in the MAC circuits, shown in (13). Figure 11 shows each calculation error for the parallel number m.
The RDAC error εR, CDAC error εC, and thermal noise εCT decrease rapidly by the averaging effect when the parallel number m increases and the effect of the calculation error can be neglected. The calculation error of the MAC unit consisting of the RDACs and the CDACs is very small.
Even if the capacitance of the CDAC is reduced to about 5 fF, the calculation error is 0.01% due to the averaging effect. In contrast, an ADC does not have such an averaging function, and its performance is dominated by its linearity error of about 0.27%. According to our simulation, even if we add up all the errors, the total error is estimated to be about 0.35%. The main challenge seems to be improving the compensation accuracy for the gain decrease in the MAC operation.

3.2. Energy Efficiency

The energy efficiency is determined by the MAC circuit consisting of the RDACs, the CDACs, and the SAR-ADCs. First, consider the MAC operation. The upper four bits of the RDAC use a thermometer code by connecting the same resistors in parallel, and the lower four bits use binary code in the R-2R configuration. The output resistor Ro is Ru/8 with the unit resistance Ru. Figure 12 shows an equivalent circuit as seen from the power supply VDD when the load capacitance of the RDAC is CL. GD represents the output conductance of RDAC, w represents the weight, and GR represents the conductance through which a current flows regardless of w.
In the step response of the voltage VDD, the time constant τ is:
τ = C L G D
The output voltage Vout is:
V o u t = w V D D ( 1 e t τ )
The current flowing out of the power supply is GD/2M with GR as M for the upper conversion bit.
I D D = G D V D D { 1 2 M + w ( 1 w ) + w 2 e t τ }
Since it takes about 6τ for the output voltage to settle to 1/2 LSB with 8-bit resolution, the energy ED consumed at this time is integrated IDD up to 6τ.
E D = C L V D D 2 { 6 2 M + 6 w ( 1 w ) + w 2 }
Find the average value from 0 to 1 to the w.
E D = C L V D D 2 { 6 2 M + 4 3 }
The differential CDAC as the load of the RDAC consists of two capacitances in series, which are two weighted capacitances connected in series and have a differential configuration. It also has a parasitic capacitance Cpl by wiring. Thus, the load capacitance CL is:
C L = 2 w ( 1 w ) C u + C p l
Since w takes a value from 0 to 1, the average value can be found as:
C L = C u 3 + C p l
Therefore, Equation (20) substitutes Equation (22), and with M = 4.
E D = 1.7 ( C u 3 + C p l ) V D D 2
ADC energy consumption is mainly determined by the SAR logic circuits. The circuit simulation shows that the conversion energy of the ADC Econv is 0.8 pJ. Since m CDACs are connected to the input end of the ADC, the total energy consumption of the MAC, including ADC for one MAC operation, is given by the following Equation:
E M A C = 1.7 ( C u 3 + C p l ) V D D 2 + E c o n v m
Figure 13 shows how the energy consumption varies for different parallel numbers of m. The energy consumed by the MAC operation consisting of the RDACs and the CDACs is constant at about 2.4 fJ, and the contribution of the energy consumption of the ADC to the MAC operation decreases with the increase in m. In this example, at m = 400, the energy consumption of the MAC and the ADC are equal. In this chip, m = 192, so the contribution of ADC energy consumption is 5 fJ. At m = 192, the contribution of the energy consumption of ADC to the MAC operation is about twice.
Figure 14 shows how the estimated energy efficiency changes as a function of m when Cu = 5 fF, Cpl = 0.5 fF, VDD = 0.8 V, and Econv = 0.8 pJ. The larger the value of m, the higher the energy efficiency. Since this chip is set to m = 192, a high energy efficiency of about 300 TOPS/W can be expected. The energy consumption of the ADC when using 16 nm CMOS, Econv is 0.4 pJ. At this time, an energy efficiency of about 450 TOPS/W can be expected at m = 192, if using 16 nm CMOS technology.
Therefore, the charge domain mixed signal MAC using RDAC, CDAC, and the SAR-ADC is expected to further increase its energy efficiency by reducing its energy consumption through further technology scaling.

4. Chip Fabrication and Measurement Results

The chip was prototyped using 28 nm CMOS technology. Figure 15 shows the chip layout and the chip photo. The chip size is 2.675 mm × 2.675 mm, and the MAC Macro consisting of RDACs, CDACs, and SAR-ADCs is about 1.4 mm × 1.5 mm.
Figure 16 illustrates the setup of the measurement environment and the system implementation. To maintain a low-noise measurement environment, all controllers and the MS-MAC were battery-powered. Communication was established using USB/SPI interfaces, which are standard features in commercially available FPGA boards.
Figure 17 shows the three chips’ measured linearity of the MAC with the SAR-ADCs when the CDAC weight is maximized and the RDAC is controlled. All three chips have linearity errors less than +0.3 LSB/−0.4 LSB for the differential nonlinearity (DNL) and +0.5 LSB/−0.9 LSB for the integral nonlinearity (INL) in 8-bit resolution.
The r-MVM test, as outlined in [5,10], is a hardware-centric method designed to assess and quantify the precision of mixed-signal processors. This technique holds particular importance in mixed-signal processing, where the accuracy of data conversion between analog and digital formats is crucial. Figure 18 visually depicts the r-MVM test methodology, providing an overview of the test process and highlighting the various stages involved in executing the r-MVM test. The implementation of the r-MVM test methodology was facilitated using MATLAB/Simulink, which enabled more efficient manipulation and testing of data sets. The study also incorporated FPGA-based controllers, which processed 192 K points across three distinct chips, further enhancing the testing process.
Figure 19a presents the measured results of the random Matrix Vector Multiplication (r-MVM) test. When sufficient set-up time was provided, the standard deviation of the r-MVM was approximately 1.1%. The error tends to increase when the input X and weight W are larger, suggesting the influence of gain error. Consequently, the error normalized by the output value, as depicted in Figure 19b, was obtained. This normalized error is nearly constant at about 0.3%, demonstrating good consistency with the estimation shown in Figure 11.
Therefore, the r-MVM error seems to be mainly due to the gain error. Therefore, more fine gain compensation could reduce the calculation error.
Figure 20 shows how the ON-time of the RDAC affects the measured r-MVM and the energy efficiency. The longer the ON-time of the RDAC, the lower the r-MVM but the lower the energy efficiency due to the ineffective RDAC current. The RDAC ON-time of approximately 1.0 ns is the optimal operating point. The time constant τ is proportional to the number of CDACs n driven by one RDAC, the output resistance of the RDAC Ro, and the average capacitance of the CDAC CL.
τ = n R o C L
Therefore, substituting n = 64, Ro = 1.5 KΩ, and CL = 1.7 fF, τ is 163 ps. Hence, since 6.1 τ is required for 8-bit precision settling, the estimated settling time is about 1.0 ns, which is consistent with the measurement data.
Figure 21 shows the measured maximum operating frequency fmax, the energy efficiency as a function of the supply voltage with a fixed RDAC ON-time of 1.0 ns. If the power supply voltage VDD is high, the maximum operating frequency fmax is high, but the energy efficiency decreases. Therefore, the power supply voltage of 0.8 V shows a good balance between the maximum operating frequency and the energy efficiency.
Figure 22 plots the measured energy efficiency and r-MVM of the three chips. Almost all the chips realize r-MVM of about 1% with an energy efficiency of up to about 240 TOPS/W. The measured energy efficiency of 240 TOPS/W is about 80% of the estimation shown in Figure 14 and shows that the expectation accuracy is high enough.
Table 1 shows a comparison of the performance of this chip and the MACs using other mixed-signal technologies. System-level energy efficiency takes into account the power consumption of SRAMs.
In terms of energy efficiency, the [4,4,4] configuration using 7 nm CMOS [3] has the highest energy efficiency of 351 TOPS/W, but for the [8,8,8] configuration, this chip shows the world’s top-class energy efficiency, in spite of using the most relaxed technology.

5. Conclusions

We have designed a charge domain mixed-signal MAC using RDAC, CDAC, and SAR-ADC with an 8-bit resolution for input, weight, and output and analyzed performance on arithmetic accuracy and energy efficiency.
The arithmetic accuracy is mainly determined by the ADC, and the gain error has a significant impact. The effect of mismatches and thermal noises of the RDAC and the CDAC can be suppressed by averaging. If the number of MAC units connected parallelly to the one ADC m is large enough, it has almost little effect.
Most of the computational energy is determined by the energy consumed by the SAR-ADC. However, by increasing m, the computational energy per operation can be reduced. If m is sufficiently large, the charge and discharge energy of the CDAC becomes the deciding factor. It is important to note that the RDAC consumes energy unnecessarily, so optimizing the control of its turn-off timing is crucial for achieving high-energy efficiency with minimal calculation error. Furthermore, technology scaling proves effective in enhancing energy efficiency.
These MAC units have been designed and prototyped using 28 nm CMOS technology, integrated 12,288 arithmetic units, operated at 180 MHz, and obtained an arithmetic speed of 4.4 TOPS. The r-MVM accuracy is about 1% and the high-energy efficiency of 240 TOPS/W as a MAC macro and 64.4 TOPS/W as a system have been achieved. This energy efficiency is the world’s top-class and higher energy efficiency is expected by using more scaled technology.
This charge domain mixed-signal MAC using RDAC, CDAC, and SAR-ADC is very attractive for AI processors. The MS-MAC macro has a fully digital I/O interface and can be seamlessly used as a replacement for digital MAC circuits. Moreover, the energy consumption associated with data transfer, and consequently, the system-level energy efficiency, can be improved by combining the MS-MAC macro with non-volatile memory, such as MRAM, FeRAM, and ReRAM.
Also, for FIR filters [13], correlators in GPS receivers [14], and matched filters in wireless communications [15] are other candidates for the application. Furthermore, one interesting candidate to be used is the high-speed and low-power digital filters on an ultra-high-speed DSP for optical communication systems [16] since data resolution is 8-bit and the power consumption of current digital circuits is very large.

Author Contributions

Conceptualization, A.M.; Methodology, A.M.; Software, A.M.A.; Validation, A.M. and A.M.A.; Formal analysis, A.M., A.M.A. and M.M.; Investigation, A.M., A.M.A. and M.M.; Resources, A.M.; Data curation, A.M. and A.M.A.; Writing—original draft, A.M.; Writing—review & editing, A.M., A.M.A. and M.M.; Visualization, A.M. and A.M.A.; Supervision, A.M. and M.M.; Project administration, A.M.; Funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is based on results obtained from a project, JPNP18004, subsidized by the New Energy and Industrial Technology Development Organization (NEDO).

Data Availability Statement

Data is contained within the article.

Acknowledgments

We would like to express sincere thanks to Lilan Yu, Pham Nam Hai, and Masato Motomura for their useful discussions and encouragement.

Conflicts of Interest

Author Akira Matsuzawa, Abdel Martinez Alonso and Masaya Miyahara were employed by the company Tech Idea Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Kawamoto, R.; Taichi, M.; Kabuto, M.; Watanabe, D.; Izumi, S.; Yoshimoto, M.; Kawaguchi, H.; Matsukawa, G.; Goto, T.; Kojima, M. A 1.15-TOPS 6.57-TOPS/W Neural Network Processor for Multi-Scale Object Detection with Reduced Convolutional Operations. IEEE J. Sel. Top. Signal Process. 2020, 14, 634–645. [Google Scholar] [CrossRef]
  2. Bankman, D.; Yang, L.; Moons, B.; Verhels, M.; Murmann, B. An Always-On 3.8 mJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28-nm CMOS. IEEE J. Solid-State Circuits 2021, 54, 158–172. [Google Scholar] [CrossRef]
  3. Sinangil, M.E.; Erbagci, B.; Naous, R.; Akarvardar, K.; Sun, D.; Khwa, W.-S.; Liao, H.-J.; Wang, Y.; Chang, J. A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS. IEEE J. Solid-State Circuits 2021, 56, 188–198. [Google Scholar] [CrossRef]
  4. Xie, S.; Ni, C.; Sayal, A.; Jain, P.; Hamzaoglu, F.; Kulkarni, J.P. eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing. In Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 248–249. [Google Scholar] [CrossRef]
  5. Wang, H.; Liu, R.; Dorrance, R.; Dasalukunte, D.; Liu, X.; Lake, D.; Carlton, B.; Wu, M. A 32.2 TOPS/W SRAM Compute-in Memory Macro Employing a Linear 8-bit C-2C Ladder for Charge Domain Computation in 22 nm for Edge Inference. In Proceedings of the 2022 Symposium on VLSI Technology & Circuits Digest of Technical Papers, Honolulu, HI, USA, 13–17 June 2022; pp. 36–37. [CrossRef]
  6. Hsieh, S.-E.; Wei, C.-H.; Xue, C.-X.; Lin, H.-W.; Tu, W.-H.; Chang, E.-J.; Yang, K.-T.; Chen, P.-H.; Liao, W.-N.; Low, L.L.; et al. A 70.85-86.27TOPS/W PVT-Insensitive 8b Word-Wise ACIM with Post-Processing Relaxation. In Proceedings of the 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 136–137. [CrossRef]
  7. Chen, P.; Wu, M.; Zhao, W.; Cui, J.; Wang, Z.; Zhang, Y.; Wang, Q.; Ru, J.; Shen, L.; Jia, T.; et al. A 22 nm Delta-Sigma Computing-In-Memory (ΔΣCIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing. In Proceedings of the 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 140–141. [CrossRef]
  8. Jia, H.; Ozatay, M.; Tang, Y.; Valavi, H.; Pathak, R.; Lee, J.; Verma, N. A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing. In Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 236–238. [Google Scholar] [CrossRef]
  9. Xue, C.-X.; Hung, J.-M.; Kao, H.-Y.; Huang, Y.-H.; Huang, S.-P.; Chang, F.-C.; Chen, P.; Liu, T.-W.; Jhang, C.-J.; Su, C.-I.; et al. A 22 nm 4 Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices. In Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 245–247. [Google Scholar] [CrossRef]
  10. Khaddam-Aljameh, R.; Stanisavljevic, M.; Mas, J.F.; Karunaratne, G.; Brandli, M.; Liu, F.; Singh, A.; Muller, S.M.; Egger, U.; Petropoulos, A.; et al. HERMES-Core—A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs. IEEE J. Solid-State Circuits 2022, 57, 1027–1038. [Google Scholar] [CrossRef]
  11. Pelgrom, M. Analog-to-Digital Conversion, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
  12. Tripathi, V.; Murmann, B. Mismatch Characterization of Small Metal Fringe Capacitors. IEEE Trans. Circuits Syst. I Regul. Pap. 2014, 61, 2236–2242. [Google Scholar] [CrossRef]
  13. Duppils, M.; Eklund, J.-E.; Svenson, C. A Novel Mixed Analog/Digital MAC Unit Implemented with SC Technique Suitable for Fully Programmable Narrow-Band FIR filter Applications. In Proceedings of the ICECS’99. 6th IEEE International Conference on Electronics, Circuit and Systems, Pafos, Cyprus, 5–8 September 1999; pp. 1197–1200. [Google Scholar] [CrossRef]
  14. Li, J.; He, W.; Zhang, B.; Qi, L.; He, G.; Seok, M. CCSA: A 394TOPS/W Mixed-Signal GPS Accelerator with Charge-Based Correlation Computing for Signal Acquisition. In Proceedings of the 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 430–431. [Google Scholar] [CrossRef]
  15. Yamasaki, T.; Nakayama, T.; Shibata, T. A Low-Power and Compact CDMA Matched Filter Based on Switched-Current Technology. IEEE J. Solid-State Circuits 2005, 40, 926–932. [Google Scholar] [CrossRef]
  16. Cao, J.; Cui, D.; Nazemi, A.; He, T.; Li, G.; Catli, B.; Khanpour, M.; Hu, K.; Ali, T.; Zhang, H.; et al. A Transmitter and Receiver for 100 Gb/s Coherent Networks with Integrated 4 × 64 GS/s 8b ADCs and DACs in 20 nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 5–9 February 2017; pp. 484–485. [Google Scholar]
Figure 1. MAC configuration and calculation principle.
Figure 1. MAC configuration and calculation principle.
Electronics 13 00050 g001
Figure 2. RDAC circuit and layout.
Figure 2. RDAC circuit and layout.
Electronics 13 00050 g002
Figure 3. CDAC circuits.
Figure 3. CDAC circuits.
Electronics 13 00050 g003
Figure 4. CDAC with register file and top view of the MOM capacitors. (a) CDAC with register file; (b) Top view of the MOM capacitors.
Figure 4. CDAC with register file and top view of the MOM capacitors. (a) CDAC with register file; (b) Top view of the MOM capacitors.
Electronics 13 00050 g004
Figure 5. Side view of the MOM capacitor and structure of LSB capacitors. (a) Side view of the MOM capacitors; (b) Structure of LSB capacitors.
Figure 5. Side view of the MOM capacitor and structure of LSB capacitors. (a) Side view of the MOM capacitors; (b) Structure of LSB capacitors.
Electronics 13 00050 g005
Figure 6. SAR-ADC circuit and layout.
Figure 6. SAR-ADC circuit and layout.
Electronics 13 00050 g006
Figure 7. Canceling method to the input-dependent common voltage.
Figure 7. Canceling method to the input-dependent common voltage.
Electronics 13 00050 g007
Figure 8. Configuration of the chip.
Figure 8. Configuration of the chip.
Electronics 13 00050 g008
Figure 9. Timing diagram.
Figure 9. Timing diagram.
Electronics 13 00050 g009
Figure 10. Factors that provide a calculation error.
Figure 10. Factors that provide a calculation error.
Electronics 13 00050 g010
Figure 11. Each calculation error vs. parallel number m.
Figure 11. Each calculation error vs. parallel number m.
Electronics 13 00050 g011
Figure 12. Equivalent circuit of RDAC.
Figure 12. Equivalent circuit of RDAC.
Electronics 13 00050 g012
Figure 13. Energy consumption of MAC and ADC vs. m.
Figure 13. Energy consumption of MAC and ADC vs. m.
Electronics 13 00050 g013
Figure 14. Energy efficiency vs. m.
Figure 14. Energy efficiency vs. m.
Electronics 13 00050 g014
Figure 15. Chip layout and photo.
Figure 15. Chip layout and photo.
Electronics 13 00050 g015
Figure 16. Physical interface for measurement setup.
Figure 16. Physical interface for measurement setup.
Electronics 13 00050 g016
Figure 17. Measured linearity of the MACs with SAR-ADCs in 8-bit resolution.
Figure 17. Measured linearity of the MACs with SAR-ADCs in 8-bit resolution.
Electronics 13 00050 g017
Figure 18. r-MVM test method and measurement setup.
Figure 18. r-MVM test method and measurement setup.
Electronics 13 00050 g018
Figure 19. Measured and normalized r-MVM. (a) Measured data; (b) Normalized data.
Figure 19. Measured and normalized r-MVM. (a) Measured data; (b) Normalized data.
Electronics 13 00050 g019
Figure 20. Measured energy efficiency and r-MVM vs. ON -time of the RDAC.
Figure 20. Measured energy efficiency and r-MVM vs. ON -time of the RDAC.
Electronics 13 00050 g020
Figure 21. f max and energy efficiency vs. VDD.
Figure 21. f max and energy efficiency vs. VDD.
Electronics 13 00050 g021
Figure 22. Energy efficiency and r-MVM of three chips.
Figure 22. Energy efficiency and r-MVM of three chips.
Electronics 13 00050 g022
Table 1. Comparison of the performance of MACs using other M/S technologies.
Table 1. Comparison of the performance of MACs using other M/S technologies.
[3][8][9][10][11]This Work
Technology [nm]71622221428
Area [mm2]0.0032 725 36 20.25 30.63 51.1 6/7.15 3
X/W/Y [bit]4/4/44/4/88/8/148/8/88/Analog/88/8/8
Operation Frequency [MHz]181.8 8200-145–2401000180
ADC architecture FlashSARSense-Amp-CCO-basedSAR
Peak Performance [GOPS]372.4/186.2 111.8K/5.9K 13510001008 104423 9
Energy Efficiency
[TOPS/W]
351/175.5 1121/60.5 111.9132.210.5 10241.1 4: Macro
64.4 4: System
(TsRDAC = 1.0ns)
Area Efficiency
(TOPS/mm2)
116.3/58.1 12.67/1.34 10.01341.59 104.02 6
r-MVM error [%FS]---σ = 0.65σ = 1.94σ = 1.1
Note: 1: Normalized to 8b input, 2: Including I/O and test mode, 3: Including I/O, test mode, digital Ctrl., CLK Gen. 4: Peak performance with σ ≈ 1% at r-MVM test, 5: Calculated from area efficiency. 6: Active area of MS-MAC macro. 7: Including test and reconfigurable blocks. 8: Calculated as 1/(access time). 9: Calculated as 12,288 × MAC × 2 OPS/MAC × 180 MHz 10: running MNIST.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Matsuzawa, A.; Martinez Alonso, A.; Miyahara, M. Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator. Electronics 2024, 13, 50. https://doi.org/10.3390/electronics13010050

AMA Style

Matsuzawa A, Martinez Alonso A, Miyahara M. Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator. Electronics. 2024; 13(1):50. https://doi.org/10.3390/electronics13010050

Chicago/Turabian Style

Matsuzawa, Akira, Abdel Martinez Alonso, and Masaya Miyahara. 2024. "Design and Performance Analysis of a [8/8/8] Charge Domain Mixed-Signal Multiply-Accumulator" Electronics 13, no. 1: 50. https://doi.org/10.3390/electronics13010050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop