Research on Approximate Computation of Signal Processing Algorithms for AIoT Processors Based on Deep Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper investigates the utilization of Artificial Intelligence of Things (AIoT)
processor for approximate computing. The detailed comments are listed below.
- The literature review is incomplete. Recent works in the technical literature should be added to strengthen the introduction section.
- What are the contributions of this paper?
- The authors should explain if there are some technical difficulties about approximate computation of signal processing algorithms for AIoT processors based on deep learning.
- In practical applications, which of the four functions FFT, DCT, FIR and IIR performs the best?
- The authors need to check the manuscript to correct the writing issues and typos carefully.
Author Response
1. Comments 1: [The literature review is incomplete. Recent works in the technical literature should be added to strengthen the introduction section.] |
Response 1: Dear reviewer: Thank you for this suggestion. New references published in 2024 and 2025 have been added and substituted. Older references, including references [6], [12], [13], [14], and [22], have been replaced. New references [17] and [24] have been added. |
Comments 2: [What are the contributions of this paper?] |
Response 2: Dear reviewer: Thank you for pointing this out. The main contributions of this paper are as follows: Through a literature review, it was found that approximate computing is primarily achieved using ASICs, FPGAs, CPUs, and MCUs, with little to no reports found on implementing approximate computing through AIoT when using AIoT and approximate as keywords. Currently, AIoT is widely employed, yet it is constrained by its limited computational capacity. Consequently, we hypothesize the potential of utilizing AI accelerators within AIoT to facilitate high-performance approximate computing, with the aim of achieving or exceeding the computational efficacy of MCUs. This initiative is intended to explore the feasibility and methodologies of implementing approximate signal processing techniques for AIoT. In pursuit of this hypothesis, we have conducted the following research activities: (1) Within the framework of the NAS (Neural Architecture Search) algorithm, by defining the model and search strategy, as well as establishing cost and evaluation functions, we have identified the optimal network structure. This provides support for the deployment of approximate computing for signal processing functions on AI accelerators. (2) Leveraging the hardware characteristics of the AI accelerator on the MAX78000 and incorporating quantization-aware training methods, we have successfully deployed the model onto the development board, achieving deep learning-based approximate computation for signal processing functions. (3) Utilizing the ARM Cortex-M4 processor and the CMSIS-DSP library on the MAX78000, we have implemented the computation of signal processing functions. It has been demonstrated that approximate computation using the AI accelerator can enhance the computational efficiency of signal processing functions such as FFT, DCT, and FIR. Through this article, developers working with AIoT devices may consider AI accelerator-based approximate computation as a viable alternative when MCUs fall short of meeting real-time requirements. The implementation process detailed herein also serves as a technical reference for such endeavors. |
3. Comments 3: [The authors should explain if there are some technical difficulties about approximate computation of signal processing algorithms for AIoT processors based on deep learning. ] |
Response 3: Dear reviewer: We think the main technical difficulties of this article are as follows: (1) Unlike other AI accelerators, the AI accelerator on an AIoT processor has relatively low computational power. Consequently, if the optimal network structure is slightly larger, it may not be successfully deployed on the AIoT processor. Conversely, if the network is too small, its performance might not surpass that of the processor's embedded DSP. Therefore, it necessitates iterative experimentation, coupled with a degree of experience and skill, to ascertain an appropriate network structure. (2) Deploying the optimal network structure onto the development board requires quantization and training, which can lead to a degradation in the performance of the initially obtained network structure. If the performance degradation is significant, it becomes necessary to select an alternative network structure from the best candidates for quantization and training. These steps must be repeated multiple times until successful deployment is achieved. (3) The DSP on AIoT devices boasts an excellent structure and performance, and for digital signal processing, its built-in functions can employ various optimization algorithms to enhance performance. Therefore, it is not easy for approximate computing on AI accelerators to surpass its performance. As evidenced by the experimental results in this article, among the four functions, the IIR function's results are inferior to those of the DSP function.
|
4. Comments 4: [In practical applications, which of the four functions FFT, DCT, FIR and IIR performs the best?] |
Response 4: Dear reviewer: In practical applications of signal processing, the requirements for corresponding functions (such as accuracy, computational speed, and energy consumption) may vary across different scenarios. If we consider solely the aspect of speed enhancement, according to the conclusions drawn from this article, the FFT approximate computation implemented on AIoT exhibits the highest speed improvement. Therefore, it is more suitable for situations where real-time performance is of paramount importance. |
5. Comments 5: [The authors need to check the manuscript to correct the writing issues and typos carefully. Writing issues and typos ] |
Response 5: Dear reviewer: Thank you for this suggestion. We have thoroughly proofread the grammar and spelling throughout the manuscript, and the relevant revisions in the paper have been highlighted in blue. We sincerely hope that the language quality has been significantly improved. |
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsWhile the topic of approximate computation is timely, the novelty of the approach is questionable. The NAS approach and AI accelerator deployment are relatively established methods, and the contributions made by this paper beyond the existing literature are not clearly highlighted.
1- Clarify what new knowledge this paper brings compared to other work in AIoT processing and approximate computing.
2- Include a more comprehensive set of performance metrics, especially those relevant to real-time applications in AIoT devices (e.g., energy efficiency, latency).
3- Include a comparison of the MAX78000 with other AIoT platforms in terms of both performance and usability.
4- Expand the discussion of the tradeoff between accuracy and computational efficiency and its potential impact on application-specific tasks.
5- Provide more detailed explanations of the results presented in figures and tables, particularly in terms of their implications for the field.
6- I couldn't find a comprehensive literature review of the previous related works ,especially in approximate computing. Please provide more related references. You can start by using the following work
[a] DOI: 10.1109/TFUZZ.2024.3371026
Author Response
1. Comments 1: [Clarify what new knowledge this paper brings compared to other work in AIoT processing and approximate computing.] |
Response 1: Dear reviewer: Through a review of relevant literature, it was found that approximate computing is primarily implemented using ASICs, FPGAs, CPUs, and MCUs, while almost no reports (using "AIoT" and "approximate" as keywords) were found on the implementation of approximate computing using AIoTs. Additionally, although AIoT devices with limited computational power have been widely adopted, their capabilities often fall short of meeting high-performance demands. Based on this observation, we proposed a hypothesis: Can the AI accelerators on AIoT devices be utilized to achieve high-performance approximate computing, with performance potentially matching or even surpassing that of their own MCUs? To explore this hypothesis, we conducted an in-depth investigation into the feasibility and implementation methods of approximate computing for signal processing in AIoT. The research results confirmed the viability of this hypothesis. Therefore, this study offers developers of AIoT devices with constrained computational resources a novel approach: when the MCU cannot meet real-time requirements, approximate computing based on AI accelerators may serve as an effective alternative. Moreover, the implementation process detailed in this paper provides practical insights for related technologies. |
2. Comments 2: [Include a more comprehensive set of performance metrics, especially those relevant to real-time applications in AIoT devices (e.g., energy efficiency, latency).] |
Response 2: Dear reviewer: For performance comparison, we used computation time as the primary metric. Since the comparison involved different computing units (DSP and AI accelerator), we also incorporated computational cost as a supplementary metric. However, for energy efficiency evaluation, most development board tools do not support measuring the energy efficiency of internal processor units (e.g., AI accelerator units, DSP units). Moreover, real-time energy efficiency data is typically not a standard feature provided by development board tools (for example, the manual of the development board tool we used did not include methods for obtaining energy efficiency data). Therefore, during the initial model design, energy-related constraints were not included in Equation (1). Additionally, in the subsequent evaluation phase, due to the lack of effective measurement methods, energy efficiency could not be adequately assessed. |
3. Comments 3: [Include a comparison of the MAX78000 with other AIoT platforms in terms of both performance and usability.] |
Response 3: Dear reviewer: Thank you for this suggestion. We conducted a renewed literature review (using "AIoT" and "approximate" as keywords) but found no reports on the implementation of approximate computing using AI accelerators on the AIoT platforms (although some IoT devices have achieved approximate computing by adding an FPGA). Furthermore, If we train and deploy the same network architecture on other AIoT platforms, the significance of the comparative results may be limited. |
4. Comments 4: [Expand the discussion of the tradeoff between accuracy and computational efficiency and its potential impact on application-specific tasks.] |
Response 4: Dear reviewer: Thank you for this suggestion. We have supplemented the relevant content to lines 162–168 on page 4 of the manuscript. The specific details can also be referenced here. [Adjusting the values of αand βcan guide the search process toward different optimization objectives. A higherαvalue directs the search toward higher computational accuracy, while a higher β value prioritizes faster computation speeds. The threshold of computing accuracy also impacts computational precision and efficiency. By appropriately relaxing the threshold value according to the data precision requirements of different applications, both computational accuracy and efficiency can be improved.]
|
5. Comments 5: [Provide more detailed explanations of the results presented in figures and tables, particularly in terms of their implications for the field.] |
Response 5: Dear reviewer: Thank you for this suggestion. As per your suggestion, we have added a discussion of the content in Table 1 on page 5, lines 187 to 199 of the manuscript. The specific details can also be referenced here. [As shown in Table 1, the network structures of these four functions all exhibit high computational accuracy. Among them, the FFT function achieves the highest accuracy, while the DCT function has the lowest. According to the definition of accuracy, the 96.9% accuracy of the DCT function implies that only 3.1% of the test data results are imprecise, with the judgment threshold for these imprecise data strictly set at 0.01. In terms of network structure, the FFT function has the largest structure size, with the third hidden layer containing 64 neurons. This is likely due to the inherent computational complexity of the FFT and the involvement of complex number operations during its computation. The other three network structures are relatively smaller, and even under the constraint of a maximum of 64 neurons per layer, they utilize fewer neurons in each layer.] For Table 3, we conducted a further analysis of the runtime results by incorporating the network structure from Table 1. The specific modifications can be found in lines 232 to 242 on page 6. For more details, please refer to this section.[From the analysis of Table 3, it can be observed that FFT, due to its relatively complex network structure, reaches the maximum number of neurons (64) in its third layer, requiring more computational time and resources. In contrast, DCT demonstrates higher efficiency in terms of both computational load and time, indicating superior performance. Although the computation time for FIR and IIR is relatively long, it remains within an acceptable range. In the network structures of both FIR and IIR, the number of neurons in each layer is the same; however, there is a significant difference in computational load as shown in Table 3. This discrepancy may be attributed to the fact that IIR calculations involve feedback, resulting in lower utilization of the AI acceleration unit's computational modules, with the majority of resources being consumed by data transmission and waiting.] For Table 4, further discussion has been added to the original text, specifically in lines 267 to 275 on page 7. For more details, please refer to this section.[A comprehensive analysis reveals that when using the AI accelerator for approximate computations, although the computational load increases for some functions due to network design and other factors, the computation time for FFT, DCT, and FIR functions is significantly reduced, especially for the computationally intensive FFT operations. This demonstrates that the computational efficiency of the AI acceleration unit under approximate computation is considerably improved compared to that of the MCU. When developing AIoT applications with high real-time requirements, approximate computation based on the AI accelerator can serve as a viable alternative for completing data-intensive operations with high real-time demands.] |
6. Comments 6: [I couldn't find a comprehensive literature review of the previous related works, especially in approximate computing. Please provide more related references. You can start by using the following work.] |
Response 6: Dear reviewer: Thank you for this suggestion. Based on your suggestions, we have added several new references, including this suggested article, to enhance the paper. Specifically, we have incorporated the relevant literature into the reference list and updated some of the older references. Specifically, we replaced references [6], [12], [13], [14], and [22], and added newer references [17] and [24]. |
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI have no further comments.
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for addressing my comments.
Comments on the Quality of English LanguageThank you for polishing the paper