1. Introduction
The fast growth of the Internet of Things (IoT) and the upward trend of the mobile market are increasing the use of voice communication and speech recognition applications [
1,
2,
3]. In some cases, an always-listening function is required to interact with the user voice commands, consequently, a low power consumption should be a key feature of such systems. In addition, low cost, small size, and easy integration are often the main imposed requirements for a microphone design. In recent years, to meet these basic needs, conventional Electret Condenser Microphones (ECM) have been replaced by Microelectromechanical systems (MEMS) sensors [
4,
5,
6,
7]. The MEMS high signal-to-noise ratio (SNR), good sensitivity, and the possibility to place a large number of these sensors in the same device make this technology well suited for current audio applications [
8,
9].
The low bandwidth of the audio signals enables the design of interfaces for MEMS acoustic sensors, usually employing oversampled Sigma Delta (
) data converters [
10,
11].
modulators based on a switched-capacitor (SC) technology have a larger power consumption in comparison with continuous-time (CT)
s, but with a more robust behavior against circuit impairments like modulator clock jitter [
12]. Current deep submicron CMOS technologies have led to working with reduced power supply voltages, what makes this challenging is the analog design of either SC-
s or CT-
s. However, this technology scaling impacts the performance of digital circuits in a positive way, especially in terms of area and power consumption.
Recently, time-to-digital conversion is becoming a good solution to overcome the challenges that low supply voltages pose to conventional analog circuit designs [
13,
14,
15]. An alternative option to replace
modulators are the oscillator-based analog-to-digital converters (ADCs). They are noise-shaped architectures wherein the information is encoded into the frequency of a digital signal; in contrast with the classical voltage encoding, therefore, the dynamic range (DR) is not limited by the power supply. In addition, they can be mostly implemented with digital circuits, taking advantage of the technology integration benefits and lower power consumption.
As shown in
Figure 1a, readout circuits for capacitive MEMS sensors can be done by directly connecting the sensing element as the load of an oscillator [
16,
17]. This way, the reactance change in the sensor modulates the oscillation frequency. By using some digital circuitry, this frequency-modulated signal can be converted into a noise-shaped bit stream. Alternatively, in
Figure 1b, the sensor can be connected via an analog interface, generating a voltage signal proportional to the MEMS acoustic stimuli, which can be converted to the digital domain by a combination of a voltage-controlled oscillator (VCO) and digital circuitry. Both approaches behave like a CT-
modulator, showing the first-order noise-shaping property at its output spectrum.
Typically, VCO-based analog-to-digital converters (VCO-ADCs) are implemented with ring oscillators (ROs) [
18,
19,
20,
21,
22]. As in a conventional ADC, the VCO-ADC performance is limited by flicker and thermal noise. This circuit noise appears as phase noise in the RO, which is demodulated at the output, after sampling, as a low frequency noise affecting the overall system SNR. Another factor that limits the DR in VCO-ADC open-loop configuration is the distortion resulting from the nonlinear relationship between the input voltage and the RO oscillation frequency for large input signals. Due to these limitations, a careful design of the RO present in the ADC is required.
This paper introduces a pseudo-differential architecture for a MEMS microphone readout circuit based on a mostly digital
ADC. The MEMS sensor is coupled to an impedance converter that modulates the frequency of a RO. The time-encoded output data of the RO is sampled with a time-to-digital converter (TDC). Finally, a thermometer-to-binary (T2B) encoder and a binary adder generate a multibit noise-shaped digital signal, which can be easily transformed into any standard audio interface by means of digital signal processing, without requiring a data-weighted averaging (DWA) technique or feedback digital-to-analog converter (DAC) linearity calibration, compared to a multibit SC-
. Since the proposed VCO-ADC is mainly implemented with digital circuitry, it is a scalable solution in terms of area and power consumption, in contrast to SC-
-based readout circuits for MEMS microphones. Furthermore, the SNR performance of this VCO-ADC implemented in a 130 nm CMOS process can compete with the SC-
or the CT-
performance [
11,
23].
The paper is organized as follows.
Section 2 presents the system level architecture.
Section 3 describes the circuit implementation at transistor level.
Section 4 shows electrical and acoustical measurements obtained from the prototyped CMOS ASIC. Finally,
Section 5 concludes the paper.
2. System Level Architecture
Figure 2 shows the main blocks at the system level of the proposed VCO-ADC. It is composed of two single-ended channels which, combined in a pseudo-differential architecture, present full integration with dual-backplate (DBP) MEMS microphones. This kind of transducer is built by adding a second backplate to the conventional single-backplate (SBP) MEMS microphone, resulting in a differential capacitive sensor with even-order harmonics cancellation [
8]. In the presented architecture, both the positive (P) and the negative (N) channels are identical. As a requirement of the target audio application, the output of the ADC is a multibit sequence. Nevertheless, if a single-bit signal is preferred instead, the multibit output can be processed with a noise-shaper coder.
The output of the capacitive MEMS sensor is coupled into the ADC input via an impedance converter, generating the signal , which sets the frequency of the VCO. The oscillator output, after a level shifter, is divided by a factor of four in order to adjust the oscillation frequency and make it compatible with the implemented demodulation circuit. The frequency-modulated signal obtained at the output of the divider is passed through a 31-stage delay line, producing a delayed copy of for each tap. The output signals of the delay chain – are sampled and demodulated, applying the first-order difference (). By using a T2B converter, the demodulated signals – form the multibit signal . The digital output of the converter is given by the two’s-complement subtraction of both single-ended branches.
The signal level applied to the microphone is expressed in a logarithmic scale, assuming 0 dB as a reference for the human hearing threshold of 20 Pa of sound pressure level (SPL). The sensitivity of the DBP MEMS to be used in the proposed ADC is 12 mV (94 dB).
As mentioned above, the voltage signal
controls the frequency of
, being the frequency variation in the VCO proportional to the magnitude sensed by the transducer connected to the ADC. Due to power consumption and noise reasons, it would be advisable that the target frequency of signal
in
Figure 1 is
= 4 MHz and the oscillator gain should be
= 12 MHz/V, with a relative frequency deviation
=
/
= 3 V
. Given that the instantaneous oscillation frequency is
these VCO design parameters set a limit for the single-ended peak amplitude of
close to ±333 mV (≈126 dB
), but this is only an ideal assumption. Actually, certain factors like the distortion components can be seen as a frequency variation caused by the input signal.
Currently use of low order modulators is due to the development of new audio interfaces like MIPI SoundWire®. It includes support for multiple data rates in the order of tens of megahertz, which is higher than the standard sampling rates of ADCs, reducing the need for high-order modulators. In this design, the sampling frequency () is 20 MHz. However, depending on the VCO quantization architecture, this sampling rate might not be enough to achieve the resolution required for audio applications.
For example, using a single reset counter that counts the edges of
in a sampling clock period, the theoretical signal-to-quantization-noise ratio (SQNR) that could be achieved in
Figure 1 is
where
A is the amplitude of the input signal and OSR is the oversampling ratio equals
/(2BW) [
19]. Applying Equation (
2) and assuming a differential input signal with
A = 16.97 mV, which corresponds to 1 Pa (94 dB
) of sound pressure in the target MEMS microphone, with the VCO parameters depicted before, the estimated SQNR of this configuration will be limited to 47.74 dB over the audio bandwidth (BW = 20 kHz). Instead, a valid alternative is the use of a TDC for the quantization of the time-encoded signal
, which emulates a much higher sampling rate than the actual sampling clock frequency. By using this solution, we can get a multibit sequence presenting an enhanced SQNR without increasing the system clock frequency and with an excellent trade-off between the VCO oscillation frequency, the number of VCO phases, and the number of stages in the TDC.
In this paper, we propose a different approach based on a high-rate sampler that interpolates samples between two clock edges and implements an analog finite-impulse-response (FIR) decimator. This system can be better explained using the pulse frequency modulation (PFM) approach introduced in [
24,
25]. In
Figure 3a, signal
is passed through a PFM modulator composed of a VCO, an edge detector block, and a pulse-shaping filter
. According to [
25], the PFM modulator behaves like a signal coder, whose output spectrum reproduces the input signal together with some modulation components
(
Figure 3c). For band-limited signals,
lie at frequencies much higher than the input signal.
As a virtue of the pulse shaping filter
, the spectrum of signal
also has nulls at the multiples of
(
Figure 3c). These nulls provide a first-order spectral shaping after sampling of components
and their aliases (
Figure 3d). To further improve the SQNR, we add a low pass FIR filter implemented with continuous time delays, as shown in
Figure 3b. The FIR filter reduces the level of modulation components
prior to sampling. As a consequence, the signal at the output of the PFM modulator can be converted into a multibit signal. As an intuitive explanation of the FIR filter operation, the power of the input signal is now multiplied by the number of stages of the delay line, while the modulation components are filtered by this extra low-pass FIR filter, as evidenced in
Figure 3b. This results in an ADC resolution enhancement.
The first component of the TDC is the 31-stage delay line, shown in
Figure 4a. The ideal delay time of the entire chain corresponds to the sampling period of the system, denoted by
=
. Ideally, the time delay of every single element is
and equals
/31. By implementing this number of basic delay units, the resolution achieved in the proposed first-order VCO-ADC will be enough for an audio application, as will be shown afterwards from behavioral simulations. Also, the required value of
in every delay unit can be implemented with the selected CMOS process employing basic digital buffers.
The output taps of the delay line
–
are registered with the system clock and the first-order difference is applied, as shown in
Figure 4b. Then, using the T2B encoder, the demodulated single-bit sequences
–
are combined into a 5-bit signal
. This is the output of the single-ended channel, and it shows a first-order noise-shaping property at its power spectrum. The pseudo-differential output of the VCO-ADC is the 6-bit sequence
which, unlike the single-ended configuration, cancels the even harmonic distortion components.
Figure 5 depicts the behavior of the quantization method applied in both the single-ended channels. The signal
is delayed by a time
in each of the delay elements of the TDC,
being the last delayed component with a delay time equivalent to
. The states of the signals
–
are sampled at the rising edges of the
signal. The first difference of these sampled data is computed, generating the discrete sequences
–
. Finally, the multibit signal
is given by the sum of the values of these discrete sequences at every sampling period.
Figure 6 shows the result of a behavioral simulation of the VCO-ADC architecture presented in
Figure 2 without the impedance converters. The input-referred spectra have been calculated using an input tone corresponding to 94 dB
at 1 kHz. The SQNR obtained in the audio bandwidth under these conditions is 80.9 dB and 86.1 dB-A if an A-Weighting filter is applied (
Figure 6b). Note that the simulated spectra show first-order noise shaping. The A-Weighting curve is commonly used in audio measurements to mimic the sound pressure detected by the human ear, which is less sensitive to low audio frequencies.
Analysis of Nonidealities
As mentioned in
Section 1, phase noise and distortion may affect the performance of the VCO implementation. The effect of phase noise in oscillators has been extensively studied in [
26,
27,
28], concluding that phase noise is influenced by certain factors such as the topology of the oscillator, the oscillation frequency, the size of the transistors, and the power consumption. In this ADC, the VCO is optimized in terms of phase noise and distortion by applying the method described in [
29]. Furthermore, the distortion is mitigated by the pseudo-differential configuration.
However, jitter present in the sampling clock may have a negative impact on the performance of the proposed VCO-ADC. Jitter can be seen as a deviation of the sampling period from the ideal value and, in real implementations, its presence may be unavoidable [
30]. Some approaches for the jitter sensitivity reduction in CT-
s have been published, for example in [
31,
32].
Figure 7a shows the simulated SQNR for the proposed system assuming different values of clock jitter. The SQNR remains without important changes up to 1% of period jitter rms value (
= 50 ns). The performance of the system will be significantly degraded if a sampling clock with a jitter
above 1% is applied.
Another possible nonideality that may adversely impact the performance of the system is the mismatch of the digital delay line. In a CMOS prototype implementation, due to Process-Voltage-Temperature (PVT) variations, the real delay time (
) of the delay elements could be different from the nominal case.
Figure 7b shows the simulated SQNR for different values of delay mismatch, represented as a percent of
. A loss of 2 dB in the SQNR can be observed within a margin of ±5% of mismatch in every element of the delay line, which is a permissible variation according to the specifications posed by the target audio application.
3. Circuit Design
Figure 8 shows the simplified schematic of the analog core for the single-ended channel configuration in the proposed VCO-ADC. The MEMS sensor is biased by a high-ohmic biasing circuit [
8] and a NMOS (M0) transistor in the common-drain amplifier configuration. M0 acts as a voltage buffer stage that adapts the high-impedance input signal
into the low-impedance output signal
. The DC operating point of the buffer is established by the gigaohms order bias resistor, denoted as
, which is implemented by two asymmetric branches of stacked PMOS diodes. The dimensions of M0 have been estimated in order to minimize its noise contribution to the overall ADC SNR and also to keep its gain close to unity.
The low phase noise, the ease of implementation, and the good sensitivity make the ROs excellent candidates to be used in VCO-ADCs [
18,
19,
20,
21,
22]. In addition, one of the most important advantages of ROs is the possibility to use their multiphase output. This allows a multibit quantization approach presenting an enhanced SQNR, but at the cost of involving more complex digital circuits. In this VCO-ADC a single RO output phase has been connected to the 31-stage delay line in order to minimize the area and complexity of the digital circuitry. However, by using the implemented TDC solution, a sufficient SQNR is achieved in the 20 kHz audio bandwidth, as has already been proven by behavioral simulations in the previous Section.
The oscillator implemented in this converter is a 5-stage inverter-based RO built with two stacked rings, which shows a better phase noise compared to the conventional single-ring architecture [
33]. In
Figure 8, both stacked rings, connected through the NMOS devices of the upper chain of inverters to the PMOS transistors of the lower side, oscillate like a conventional RO having the same
, but with a difference in phase of 180
. Given that only one RO output phase has to be connected to the TDC, the stacked signals at
and
with amplitude
VSS–
and
–
, respectively, are combined into a single signal of amplitude
VSS–
by means of M5 and M6. A buffer is employed to square the RO output oscillation. The signal amplitude after the buffer is still variable and depends on the level of
. Therefore, the level shifter of
Figure 8 is needed to keep constant logic levels at the input of the digital circuitry, regardless of the VCO-ADC input signal. This way, the level of the time-encoded signal
is always compatible with the digital supply voltage, which is
= 1.8 V in this case. The PMOS transistors of the level shifter (M7–M8) are W/L = 1
m/400 nm and the NMOS (M9–M10) are W/L = 4
m/400 nm, whereas the transistors of the inverters (M11 and M12) are W/L = 1.1
m/400 nm and W/L = 740 nm/400 nm, respectively.
The RO design process involved the methodology proposed in [
29] to find an optimized oscillator in terms of phase noise, sensitivity, and distortion that benefits from a reduced simulation time. This methodology is based on periodic steady-state (
PSS) sweep simulations to estimate the
, the gain, and the distortion of the RO. Additionally, the periodic noise (
pnoise) analysis is used to compute the phase noise, which can be referred to the ADC input to predict the SNR. As shown in [
29], such analyses achieve very good accuracy with an important speed up in the simulation time, which allows an interactive optimization of the RO design instead of using conventional slow transient simulations.
After applying this optimization process, the selected RO have W/L = 288
m/900 nm for M1–M3 and W/L = 288
m/1.9
m for M2–M4 devices. This RO achieves
= 16.2 MHz and
= 3.04 V
, which are very close to the target values mentioned in
Section 2.
Figure 9 shows a predicted DR for the differential oscillator configuration, which has been estimated from the equations presented in [
29]. To get these estimations, the RO simulation setup included the noise contribution of the impedance converter supplied at
=
= 1.8 V. The peak signal-to-noise and distortion ratio (SNDR) predicted is 80 dB-A. Note that the TDC quantization noise is not considered here and also has an impact on the overall VCO-ADC SNDR, as will be shown in
Section 4.
Figure 10 illustrates some of the digital blocks of the proposed VCO-ADC. The frequency division of the RO output is performed by the two D-type Flip-Flops of
Figure 10a. Here, the inverted output terminal of each Flip-Flop is connected to the data input terminal, resulting in a division by a factor of four of the first Flip-Flop clock signal. The composition of the 31-stage delay line is shown in
Figure 10b. Every delay unit is formed by two digital buffers, each one implemented with four inverters, where W/L(M1) = 750 nm/400 nm, W/L(M2) = 500 nm/400 nm, W/L(M3–M5) = 500 nm/500 nm, W/L(M4–M6) = 500 nm/1
m, W/L(M7) = 1.42
m/400 nm, and W/L(M8) = 920 nm/400 nm. Post-layout simulations show a total delay time of the entire chain equal to 47 ns, which is very close to the specified ADC sampling period of 50 ns. This leads to an individual delay time
= 1.52 ns in each unit. It is important to note that to achieve a total delay time closer to
requires a big effort, since the delay time of each unit is highly dependent on its layout implementation. Monte Carlo simulation results show that the
standard deviation value is within 5% of the margin, so the delay cell mismatch does not cause a negative impact on the system performance, as proven in
Section 2.
In the proposed VCO-ADC, the specified
is higher than twice the maximum oscillation frequency after the divider. This allows the use of the XOR-based demodulation circuit of
Figure 10c to compute the first-order difference for the delayed copies of the RO output [
19,
34]. This circuit accounts for the oscillation rising and falling edges, as already described in
Figure 5. The remaining blocks of the digital core that process the demodulated single-bit sequences
–
are the T2B and the differential two’s-complement subtractor. Both of them are implemented with an array of full adders to generate the 6-bit ADC output, without carry propagate functions.