1. Introduction
Cardiovascular diseases (CVDs), including myocardial infarction, cardiac arrhythmia, cardiomyopathy, and myocarditis, account for 31% of global mortality, as reported by the World Health Organization (WHO) [
1,
2]. Among the various chronic CVDs, cardiac arrhythmia stands out as the most prevalent, resulting in the highest number of fatalities attributed to cardiac arrests. One of the most effective tools for diagnosing heart diseases is the electrocardiogram (ECG), a noninvasive technique that records the fluctuation of the heart’s bio-electric activities.
However, existing wearable ECG devices currently rely heavily on cloud platforms for disease diagnosis, necessitating the transmission of raw ECG data from the device to a remote setup via mobile phones or the Internet [
3]. This causes high power consumption and delayed diagnosis, hindering prompt treatment. Consequently, there is an urgent need to develop wearable healthcare devices capable of providing high-precision medical diagnoses to address this issue.
The classification of ECG signals typically involves a combination of pre-processing, feature extraction, and classification algorithms [
4]. The raw ECG signal may be contaminated by various sources of noise. Therefore, pre-processing techniques such as digital filter hardware design techniques [
5,
6,
7] help to denoise the ECG. In terms of classification, conventional machine learning methods such as Support Vector Machine (SVM) [
8,
9,
10], and K-Nearest Neighbors (KNN) [
11,
12] operate by classifying the time/frequency-domain feature extracted by the feature extraction step from the input signals. However, the classification accuracy largely relies on the quality of the manually selected features, which is limited by the experience of the designer. Furthermore, feature engineering requires significant design efforts and poses challenges for hardware implementation in terms of resource consumption reduction.
Unlike conventional machine learning methods, neural networks, specifically convolutional neural networks (CNNs) [
13,
14,
15,
16], do not require the design of complex feature extraction algorithms as they automatically extract high-level features from the input and perform classification simultaneously. Therefore, CNN-based methods demonstrate exceptional classification and prediction capabilities across a vast spectrum of diverse databases.
Kiranyaz Serkan et al. [
17] proposed an adaptive implementation of 1-D CNNs for real-time classification of patient-specific ECG signals. By utilizing 1-D CNNs, the two main components of feature extraction and classification in traditional ECG classification are fused into a single step, negating the need to extract manual crafted features. Tesfai Huruy et al. [
18] presented a lightweight 1-D CNNs considering channel shuffle over the group and depth-wise convolutions, which takes 2-s ECG signal segments as input. The proposed model outperformed traditional CNNs with 9× fewer trainable parameters and improved the F1-score by 2%.
For ECG classification using 2-D CNNs, a 2-D transformation must be applied to make the time series suitable for the image-like 2-D input. Amin Ullah et al. [
19] transformed the 1D ECG recordings to 2D gray-scale images with a size of 512 × 512, which were then fed as input for the 2D CNN for the detection of five types of arrhythmias. GYU-HO CHOI et al. [
20] proposed a user recognition system that conducts 2D feature extraction by converting the P waves, QRS complexes, and T waves of a single period of a 1D ECG signal into a spectrogram, resizes the spectrogram into 2D using a bi-cubic interpolation method, and then applies classification for user recognition. Wenhan Liu et al. [
21] proposed a multilead-CNN model including sub 2D convolutional layers and lead asymmetric pooling layers, taking multilead ECG as input to detect myocardial infarction.
In general, 1-D CNNs are often preferred over 2-D CNNs for real-time ECG classification due to their simpler operation, higher computation speed, and fewer learnable parameters. This makes 1-D CNNs more suitable for real-time ECG classification and easier to implement in hardware.
However, the end-to-end neural network incurs high computational complexity due to its multi-layered architecture that carries huge numbers of neurons (computing elements), resulting in significant latency and energy consumption. Hence, application-specific dedicated hardware optimization techniques are required for ECG classification. Additionally, in contrast to 2-D CNNs, the convolutional kernels in 1-D CNNs only need to slide in one direction along the input features. Consequently, existing 2-D CNN accelerators are not as effective as 1-D CNN.
Furthermore, the reduced control complexity presents an opportunity to explore more efficient data reuse and computational parallelism in 1-D CNN accelerator hardware design. Currently, there are ECG classification processors based on 1-D CNNs. However, these architectures are specifically tailored to a particular 1-D CNN structure, limiting their flexibility and hindering subsequent algorithm upgrades.
Hence, this work focuses on building a 1-D CNNs accelerator hardware architecture for arrhythmia detection and quick diagnosis while maintaining acceptable flexibility for more potential application. This work aims to optimize the hardware in three key aspects: reducing computation time, memory access time, and memory footprint to achieve a better trade-off between area, accuracy, energy, latency, and computational complexity.
The contributions of this work are as follows:
An instruction driven 1-D CNN accelerator that allows for flexible configuration of CNNs architecture and eliminates the intermediate data transfer by memory access between the convolution layer and the activation layer, as well as the activation layer and the pooling layer.
A process element (PE) array with a 1D arrangement that exploits the parallelism in inter-kernel and intra-kernel patterns and data reuse in row-stationary (RS) dataflow.
A CORDIC-based module, utilizing the FP16 data format, is employed for the computation of the Tanh and Sigmoid activation functions.
The rest of this paper is organized as follows.
Section 2 presents an overview of the characteristics of ECG signals, provides essential information about the dataset used, and introduces the background knowledge on CNNs.
Section 3 presents the overall architecture of the processor, including the detailed architecture and design considerations of each module.
Section 4 shows the results of algorithm testing and chip measurements, along with a comparison to previous works. Finally,
Section 5 concludes the study.
2. Background
2.1. ECG Signals and Database
ECG is a widely used diagnostic tool in the field of cardiology, allowing healthcare professionals to assess the electrical activity of the heart. The ECG waveform consists of recognizable components, as shown in
Figure 1, including the P wave, QRS complex, and T wave, each representing a specific phase of the cardiac cycle. Changes in the amplitude, duration, or morphology of these components can indicate underlying cardiac abnormalities.
The original ECG signals used in this paper are provided by the MIT-BIH database [
22]. This database contains 48 records of heartbeats with a sampling rate of 360 Hz with an 11 bit resolution for approximately 30 min of data from 47 different patients. Each record comprises two ECG leads, with the primary lead being a modification of lead II, which serves as experiment data in this paper. The secondary lead is typically a modified version of lead V1, but in some cases, it may be V2, V5, or V4.
According to the standard developed by the Association for the Advancement of Medical Instrumentation (AAMI) [
23], the heartbeat types that exist in the MIT-BIH database are grouped into five different classes: Normal (N), Supraventricular ectopic beat (SVEB), Ventricular ectopic beat (VEB), Fusion beat (F) and Unknown beat (Q), which consists of 17 subcategories, as shown in
Table 1.
In this paper, we developed two 1D-CNN models and corresponding instructions for two-class (i.e., normal/abnormal) classification and five-class classification following the AAMI standard to demonstrate the flexibility of the processor.
2.2. Convolutional Neural Networks
An artificial neural network is a computational model inspired by the structure and functioning of the human brain. It comprises interconnected artificial neurons, analogous to the interconnected nerve cells in the human nervous system that communicate through synapses. Within the realm of artificial neural networks, CNNs are a specific type that has demonstrated remarkable effectiveness in various tasks related to image and signal processing.
CNNs generally consists of four types of layers: convolution layer, pooling layer, activation layer, and fully connected layer. Among them, the convolutional layer and the fully connected (FC) layer stand out as having the highest computational complexity. Qiu et al. [
24] conducted an analysis of the computational and memory requirements associated with convolution and FC layers. Their findings revealed that convolution layers exhibit a high degree of computational intensity, while FC layers demonstrate a significant demand for memory resources.
2.3. Running Example: 1-D Convolution Layer Optimization Overview
Figure 2 presents the code implementation of a 1-D convolution layer, which employs four nested loops. In 1-D convolution, the filter moves unidirectionally, and both the input feature map and the convolution operation are one-dimensional. The outermost loop iterates through all the output feature maps (fmap_out[]). Each specific output feature map is computed as the summation of convolutions performed between each input feature map and its corresponding convolution kernel. It is important to note that the computational complexity of evaluating the deeply nested four loops can be significantly high, depending on the size of each layer. Consequently, parallelizing the computation of all the output feature maps becomes crucial to mitigate the computational burden.
The parallelization can be accomplished through two distinct approaches: either mapping the partitions generated by the outermost four loops to different processing elements (PEs) or assigning the convolution of each partition (innermost four loops) to different PEs. An essential consideration in this domain is determining the specific values of b1 … b4. This decision carries implications regarding data dependencies among the PEs, data communication requirements, and the achievable level of parallelism, which leads to different parallelization strategies, considering storage limitations and data communication. Currently, there are three kinds of parallelism: inter-output, inter-kernel, and intra-kernel.
In this paper, we employ a MAC group design, where each MAC group consists of multiple MAC units, which enables simultaneous computation of multiply-accumulate operations, facilitating intra-kernel parallelism. Additionally, four MAC groups are employed to simultaneously compute different elements within the same output feature map, further contributing to inter-kernel parallelism.
Furthermore, the parallelism approach determines the data access pattern, and thus different parallelization approaches result in different data flows. To minimize memory access time, we explore the row-stationary (RS) dataflow which reuses all the types of data: weights, activations and partial sums by leveraging on-chip data and weight buffers, along with direct data transfer between convolutional and activation layers, as well as between activation and pooling layers.
5. Conclusions
This work focuses on designing a reconfigurable AI processor for ECG-based applications, aiming to achieve low power consumption and high accuracy. The proposed processor leverages diverse architectural and circuit techniques to optimize the trade-off between area, accuracy, energy, latency, and computational complexity. These techniques include an instruction-driven AI processor to support versatile 1D CNN processing, a PE array design that simultaneously considers parallelism and data reuse to maximize computational throughput and reduce memory access time, an activation unit based on the CORDIC algorithm, supporting both Tanh and Sigmoid computations with low hardware complexity.
The processor is implemented using 110 nm CMOS process technology with FP16 data format for easy quantization. Its performance is demonstrated on two typical ECG classification applications, achieving 97.8% and 93.5% accuracy for two-class and five-class classification, respectively. The design is validated on the MIT-BIH database, but further research is needed to address real-world variations in patient data. Future work will focus on developing an adaptive learning engine to improve classification accuracy and personalize the system for individual users.
Despite the limitations of the current implementation, the underlying conceptual framework of the proposed AI processor remains highly valuable. Its inherent flexibility allows for adjustments in the instruction to accommodate the execution of new CNN algorithms tailored to specific applications and datasets. This adaptability, coupled with the efficient instruction-driven architecture, renders the proposed AI processor a practical and effective solution for real-time ECG classification on wearable devices, paving the way for further advancements in AI-powered healthcare technologies.