5.1. Sensing Interfaces
The sensing interfaces are defined by the target application, as discussed in
Section 3. Based on the power profiles available to the system, the designer must choose between using a commercial sensor or developing one. Commercial sensors have either analog or digital interfaces. Some commercial sensors provide an analog signal, and thus, require an on-chip ADC to measure the voltage they produce. Other sensors provide a digital output with a standard interface to SoCs. The designer could also choose to design the sensor to integrate within the SoC as part of the analog sensing interfaces or as a stand-alone sensor with a digital interface to the SoC.
Two common digital SoC interfaces are SPI and I2C. Their goal is to enable serial communication through a simple protocol. The main differences in these standards that affect the design of ultra-low power systems are the pin count and energy efficiency. I2C requires fewer pins than SPI but uses open-drain pins with on-board pull-up resistors instead of CMOS pins. This causes short circuit currents when the SoC or sensor tries to pull down an I2C line, needlessly draining the storage element. This becomes especially problematic when I2C is operated at low frequencies. SPI, on the other hand, has no static current, enabling energy efficient communication at low frequencies.
As the system power reduces, I/O is still a dominating component in the power budget [
6]. Thus, if a stand-alone sensor is being developed, the digital interface can be customized to reduce power. Reducing the operating voltage of the digital interface between chips on a board reduces the power and energy dramatically at the cost of throughput [
15]. This voltage can be tuned for the required sensor throughput enabling the application to reach its minimum energy and power consumption. If the sensor and SoC need to be physically separated over longer distances, the transmission can be made differential to improve the reliability [
16]. One such application is a wearable heart rate monitor, where the sensor is placed within a shirt close to the chest, while the SoC is placed on the sleeve to allow maximum exposure to skin and light. With these optimizations, improving off-chip communication to sensing interfaces, as shown in reference [
6], can reduce their contributions to the system budget by over 94% for the proposed shipping integrity application, reducing the average system power to below 260 nW.
5.2. Digital Sub-System
The sensing interfaces communicate with the rest of the SoC through the system bus. Different bus architectures are available in the literature. The most common of these are the wishbone and the advanced microcontroller bus architecture (AMBA). The wishbone bus is an open source architecture that enables flexible interfacing between the controllers and different components. This architecture is highly customizable to meet different application needs. On the other hand, AMBA includes different targeted bus protocols. The two bus protocols mostly used in low power systems are the advanced high-performance bus (AHB) and the advanced peripheral bus (APB). The AHB bus is a pipelined bus architecture used to provide a high speed interface between the controller(s) and the different memories in the system. The APB bus, on the other hand, is a serial bus with a single master designed to interface between the controller and low speed peripherals. Generally, digital sensing interfaces are placed on the APB bus. The choice of bus architecture impacts the rate at which the system can run, as we will discuss later in this section.
Next, the main controller must be designed/implemented. Many low power SoCs [
2,
3,
7,
17] use the ARM’s low power Cortex M0/M0+ due to its reduced gate count, its flexible instruction set, and its software and tool support. However, the Cortex M0/M0+ follows the von Neumann architecture, which limits the performance of the system by using the system bus for instruction fetches. When running the system at a low clock rate, the number of cycles consumed by the instruction fetches become especially limiting. Thus, battery-less SoCs, such as those described in references [
5,
6], use a custom core with a dedicated memory bus to decouple instruction fetches from data transfer. However, designing a custom core with a custom ISA is a challenge. Recently, RISC-V has gained momentum by providing an open source instruction set architecture (ISA) that is easily extensible with software and tool support [
18].
Once the core, bus, and sensing interface have been chosen, the users can start developing the software necessary to read information out of the sensor and process it. This is important for two main reasons: (1) it helps to determine the minimum size of the instruction and data memories needed, and (2) it provides the opportunity to optimize the architecture for energy efficiency by highlighting the functions/operations that take the longest time and/or the largest amount of code to execute. Memories are one of the always-on components within IoT SoCs and contribute significantly to the power budget. Thus, reducing the amount of on-chip memory is one way to reduce a system’s power consumption. However, the choice of memory size can only be made if the target application(s) software is developed before the chip is designed. The programming code helps to determine the size of the instruction memory. To determine the size of the data memory, the designer must consider whether the raw data from the sensor must be communicated or whether a processed version is enough. For the former, the designer must account for the size of the raw data needed as well as any data that must be processed and saved. The programming code will also help to determine other potential data that must be saved/held in the data memory.
In IoT applications, data transfer and processing are two main functions that the system performs. Data processing can be optimized by implementing hardware accelerators to process the data efficiently and to reduce the load on the core. A traditional way to off-load data transfer from the core is to include a direct memory access (DMA) on the bus, as was done in reference [
5]. However, the core must still configure the DMA before data transfer can start, and the DMA still relies on the bus for data transfer. Thus, the authors in reference [
6] proposed a data flow architecture where data is transferred between different components in the system through a dedicated data-flow path that bypasses the bus. Data is moved from the sensing interface directly into the accelerator by processing it through a dedicated path. This architecture can only be made possible if the application space of the SoC is determined prior to the design time. However, such an architecture could offer a significant improvement in processor idle time, reductions in the instruction and data memory sizes, and improvements in energy efficiency.
Affecting all of the choices in the digital system architecture is the required sensing rate of the application. The required sensing rate imposes a lower bound on the system’s operating frequency, and the operating frequency linearly impacts the power consumption of the system. In the time between samples, the system must read the output of the sensor, move it to a storage location, or process it through an accelerator or the main core and potentially, react to the outcome. Thus, the system frequency must be fast enough to handle all of these operations in the time between samples to avoid losing important data. The time required for each of these operations depends on the sensing interface used and the system architecture. If an integrated ADC is used, the sensing rate, the number of cycles required to perform reads and write to peripherals on the bus, and the data processing time play important roles in determining the minimum clock rate. Equation (1) illustrates the lower bound on the clock rate imposed by these factors:
where
TCLK is the clock period,
TS is the time between samples,
NC is the number of configuration words needed to configure the ADC,
NB is the number of data words expected from the ADC,
NBW is the number of cycles needed to write a register on the system bus,
NBR is the number of cycles needed to read a register on the system bus, and
NP is the number of cycles needed by the software to manage the data transfer and process the data. To capture a sample from the ADC, the core must first configure (
NC) the ADC and then read its output (
ND), process it, and move it to the data memory.
On the other hand, if a digital sensing interface is used, the overhead of configuring and reading the sensor serially can often become the bottleneck in ultra-low power and self-powered systems. Many sensors dictate a communication rate for a given sensing rate, forcing the designer to run the digital sensing interface at that rate. For example, the ADXL345 accelerometer recommends a minimum SPI clock frequency of 100 KHz when the output sensing rate is 200 Hz. Thus, the SPI master interface on the SoC must be designed to run at a minimum clock rate of 200 KHz to interface with this sensor. Here, the designer can choose to run the system clock at the same rate as the SPI master interface, or to decouple the system clock from the SPI master clock. If the designer chooses the former, the system clock rate must respect Equation (2):
where
TCLK is the clock period, and
TSPI is the SPI clock period. Otherwise, the designer must have a clock source that is capable of running at the minimum speed imposed by Equation (2) to feed the sensing interface. This clock source can be gated when the sensing interface is not in use. A second clock is also needed to drive the rest of the system. Equation (3) below shows the relationship between this clock and the sensing rate, assuming an SPI sensing interface is used:
where
TCLK is the clock period,
TSPI is the SPI clock period,
TS is the time between samples,
NC is the number of configuration bytes needed to start and configure the SPI master on the SoC,
NSC is the number of command bytes needed to start a read operation from the sensor,
NSR is the number of read bytes expected from the sensor,
NBW is the number of cycles needed to write a register on the system bus,
NBR is the number of cycles needed to read a register on the system bus, and
NP is the number of cycles needed by the software to manage the data transfer and process the data.
To start the data transfer between the SoC and sensor through SPI, the core must first configure (NC) the SPI interface and then load the commands (NSC) needed to start a read operation from the sensor. To do this, the core uses the bus to write the data to the SPI master peripheral. This phase requires (NC + NSC) × NBW × TCLK(NC + NSC) × NBW × TCLK(NC + NSC) × NBW × TCLK. Next, the SPI master begins the transfer of command (NSC) and data (NSR) bytes. The SPI master transfers a single bit through the bus every TSPI; thus, the completion of the transfer requires (NSC + NSR) × 8 × TSPI(NSC + NSR) × 8 × TSPI(NSW + NSR) × 8 × TSPI. Assuming the SPI core does not have a buffer to hold the received data, the core must transfer the data from the SPI core to the data memory or to the accelerator processing the data. To do this operation, the core performs a bus read, followed by a bus write operation. Thus, moving all the bytes out of the SPI core requires (NBW + NBR) × NSR × TCLK. Finally, the time required to process the data depends on the application and requires NP × TCLKNP × TCLK. Summing all these times gives Equation (3) which relates the clock rate to the sensing and communication rates. Equation (3) assumes a Harvard architecture where the instruction fetches are decoupled from data transfers.
As shown in Equation (3), bus transfers play an important role in determining the minimum clock rate. For applications with very slow sensing rates, the overhead of bus transfers could be an acceptable trade-off for a simplified design architecture. However, for applications with faster sensing rates, the overhead might not be acceptable. Thus, reducing the number of bus transfers becomes a must. Designers can choose to implement a Harvard architecture to overcome the Von Neumann bottleneck. A pipelined bus, such as the AHB, is another way to reduce the number of cycles required to perform consecutive read/writes on the bus. The data flow architecture presented in reference [
6] is a third way to completely eliminate the required bus transfers. Hardware accelerators can also help by processing the data in an efficient manner.
Once the architecture and clock rate are chosen, the designer must estimate the power consumption of the system to ensure the design remains within the power budget. Synthesis tools provide an initial estimate of the power consumption of the design based on the RTL description and the characterized standard cell libraries. These tools also allow the designer to explore the benefits of using different low power features, such as sub-threshold design, multi-threshold design, voltage scaling, power gating, and clock gating, to reduce the power consumption further. With the introduction of a new test methodology for sub-threshold design [
19], these low power features become feasible in battery-less SoC products. The designer can also build a digital power manager to control the different low power features depending on the current harvesting environment. The digital power manager can combine insights from the energy harvesting unit with models of the harvesting profile of the system’s environment to anticipate changes to the harvesting conditions and react accordingly.
When the harvesting conditions are poor, the battery-less system can lose power completely. To aid in the recovery of critical data, designers can choose to implement a non-volatile processor with non-volatile memory (NVM) or a processor with off-chip NVM for backup and recovery. Recently, nonvolatile processing research has resulted in lower power, non-volatile processors designed for use in ambient harvesting. These processors offer complete recovery from loss of power by saving the state of the entire processor before power loss [
20]. The trade-off between a nonvolatile processor and a low power processor with partial backup NVM [
6] is a function of the active and leakage power of each processor, backup and recovery costs, and probability of failure for a given harvesting mechanism. NVPs generally have a higher active power than low power processors with off-chip NVM, but shorter backup and recovery times. NVPs can also be limited by the manufacturing technology, since not all process technologies support non-volatile elements. NVPs also require more design time compared to the off-chip NVM, especially when a commercial NVM is used. However, designers can implement low power NVM features to reduce their power consumption compared to commercial NVMs. To assist in the design choice, Equation (4) breaks down the power budget of the system into the active (
PACT), leakage (
PLEAK), backup (
PBACKUP), and recovery power (
PRECOVERY). The active power is only consumed during the active duty cycle (
DACT) which is impacted by the sampling scheme. Backup and recovery power are only incurred when the system loses power. The probability of power failure (
PrFAIL) can be derived from the energy harvesting profile described in
Section 4. The NVP usually has higher
PACT and
PLEAK than low power processors with off-chip NVM that are completely shut off in normal mode. However, their backup and recovery times might exceed those of the NVP due to the requirement of the serial interface to move data from the processor to the off-chip NVM. Thus, the designer can use Equation (4) to compare the two architectures and choose the optimal solution for the target application:
5.3. Energy Harvesting Power Management Unit
The application’s environment dictates the type of harvester to use and the energy harvesting unit to implement. To harvest energy from heat or temperature fluctuations, a thermoelectric generator (TEG) or pyroelectric device [
21] can be used. Photovoltaic (PV) cells produce energy from light. Piezoelectric harvesters or triboelectric nanogenerators [
22] produce energy from vibrations. Each of these harvesters imposes different constraints on the energy harvesting unit. A TEG produces very low DC voltages, as low as 10 mVs. A PV cell also produces a DC voltage output but in a higher range. A piezoelectric harvester, on the other hand, produces an AC voltage that must then be converted to DC before it can be stored.
A number of energy harvesting units that extract power from these harvesters have been presented in the literature [
4,
5,
6,
7,
23]. Most energy harvesting units have a Maximum Power Point Tracking (MPPT) circuit designed to extract the maximum power from the harvester and into the system. The maximum power point of a TEG occurs at 50% of its open circuit voltage, whereas that of a PV cell occurs at 76% of its open circuit voltage [
5]. In addition to the MPPT circuit, the energy harvesting unit generally requires a boost converter with an off-chip inductor to boost the input voltage with high efficiency. This is especially true for TEG harvesting since the input voltage could be as low as 10 mV. A PV cell, on the other hand, produces a much higher input voltage (starting from ~600 mV) and thus, could benefit from a switched capacitor voltage doubler circuit [
24] to boost its input voltage without the need for an off-chip inductor.
Due to the low input voltage of the TEG harvester, a cold start circuit is required to kickstart the system. Some systems rely on an Radio Frequency (RF) power [
4], and thus use an RF harvester as a cold start circuit that rectifies the input and stores the extracted energy onto the super-capacitor. Another approach relies on a Ring Oscillator (RO) with a clock doubler circuit that can start the system at an input voltage of 220 mV [
23]. The choice of cold start circuit depends on the environment in which the sensor is deployed. If an RF transmitter is unavailable to kickstart the system, the RO cold start circuit can be used instead.
In addition to the energy harvesting unit, regulation circuits are needed to produce a stable supply for the rest of the system. The designers must carefully choose the supply voltages and output powers of each of the rails produced by the regulation unit. The sensing interface required by the application determines the characteristics of at least one of the rails. If an off-chip sensor with a digital interface is to be used with the system, the minimum supply voltage and expected current drawn must be taken into account in the design of the regulation circuit. Next, the power analysis from the synthesis tools (refer to
Section 5.2) can be used to determine the supply voltage and expected current drawn from the digital power rail. Once the characteristics of the power rails have been determined, the designer can explore different regulation schemes to determine the scheme that has the highest power conversion efficiency for the target load. Here, the designer must investigate the available power from the harvesting source (information from the harvesting profile can be used here) and the power consumption of the load circuits (the rest of the SoC and any off-chip components). These two numbers will impose a minimum power conversion efficiency that the power management unit must meet in order for the system to operate reliably. Another factor that could play a role in the choice of regulation scheme is the form factor of the system. A single-inductor-multiple output (SIMO) regulator could be used to produce the different power rails of the system at the cost of an additional inductor in the system [
23]. To eliminate the extra inductor needed, the authors in reference [
25] relied on a switched capacitor regulator to produce the different power rails, while the authors in reference [
24] developed a scheme to share an inductor between the regulation and harvesting units. Choosing the optimal harvesting unit and regulators for a system’s application ensures that the system is taking full advantage of the available ambient sources to improve its lifetime and reliability.