Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessCommunication

Peer-Review Record

Real-Time Vehicle Sound Detection System Based on Depthwise Separable Convolution Neural Network and Spectrogram Augmentation

Remote Sens. 2022, 14(19), 4848; https://doi.org/10.3390/rs14194848

by Chaoyi Wang^*,†

, Yaozhe Song^†, Haolong Liu^†, Huawei Liu^†, Jianpo Liu^†, Baoqing Li^† and Xiaobing Yuan^†

Reviewer 1:

Christine Dewi

Reviewer 2: Anonymous

Remote Sens. 2022, 14(19), 4848; https://doi.org/10.3390/rs14194848

Submission received: 24 August 2022 / Revised: 21 September 2022 / Accepted: 26 September 2022 / Published: 28 September 2022

(This article belongs to the Special Issue Advanced Machine Learning and Deep Learning Approaches for Remote Sensing)

Round 1

Reviewer 1 Report

Real-time Vehicle Sound Detection System Based on Depthwise Separable Convolution Neural Network and Spectrogram Augmentation

This paper proposes a light-weighted model combined with data augmentation for vehicle detection in an intelligent sensor system.

-In my opinion, this paper is lack contribution and novelty.

-In academic work, comparing the obtained results to some related/recently published works under the same conditions (i.e., databases + protocols of evaluation) is necessary. The objective is to show the superiority of the presented work against the existing ones.

- The author needs to explain more about the proposed method.

- Some of the tables and figures are not explained properly.

- The quality of the figure is also low, too small, and not clear.

- Need to discuss the future works in the conclusion.

- What are the benefits and limitations of your proposed method?

- What is the real implementation of this research?

- Add more update references (2021-2022)

- Explanation of Figure 2 to Figure 5 is too less.

Author Response

Dear reviewer,

Thanks indeed for your precious and constructive suggestions.

The followings are my responses to your comments.

Point 1:

-In my opinion, this paper is lack contribution and novelty.

Response 1:

This paper aims to solve a practical problem, and provide a feasible plan from the hardware structure to the implementation process of the proposed algorithm, which is valuable as a practical case. Previously we used a traditional signal processing method for vehicle detection for the project, and this paper proposes a more up-to-date solution. This paper shows that the solution is not only at a theoretical level, but also feasible and have a high performance in both accuracy and computational cost.

Response 2:

Our existing solution is described in “Guo, F.; Huang, J.; Zhang, X.; Cheng, Y.; Liu, H.; Li, B. A two-stage detection method for moving targets in the wild based on microphone array. IEEE Sensors Journal 2015, 15, 5795-5803.”

In my revised manuscript, this method is described in Section 2.6 and the results are shown for comparison in the result section.

- The author needs to explain more about the proposed method.

Response 3:

Thanks for your suggestion and I explained the proposed method more by doing the following steps. In section 2, materials and methods, I added the diagram of system hardware and system circuit layout to explain the hardware structure of the system in the revised version in section 2.1. I explain the composition of the dataset and the recording scene more in detail in section 2.2. I added the description of feature extraction process with diagram in section 2.3. I mathematically formalize the data augmentation method in section 2.4 for a clearer description. In section 2.7, I added more explanation of convolution neural network parameters. In section 3.1, I added the result of the ability to detect different types of vehicle of the models and I added a new section 3.2, complexity calculation to describe the computational cost.

- Some of the tables and figures are not explained properly.

Response 4:

In the revised version, the tables and figures are checked as well as their descriptions to make sure they are explained properly.

- The quality of the figure is also low, too small, and not clear.

Response 5:

In the revised version, all the figures are saved with a dpi of 1200.

- Need to discuss the future works in the conclusion.

Response 6:

I added “In the future, we intend to discover some practical signal processing methods including filtering and deep learning based signal denosing method to make the system more robust to wind noise and enhance the SNR.”

- What are the benefits and limitations of your proposed method?

Response 7:

Benefits: High accuracy compared with the previous method and low computational cost.

Limitations: The accuracy gets affected when the sensors are placed too far from the road and hence the critical distance experiments will be conducted. The detection result is also affected when SNR is low, especially for small wheeled vehicle.

- What is the real implementation of this research?

Response 8:

The real implementation is added and described in section 2.1

- Add more update references (2021-2022)

Response 9:

In the revised version, 13 additional references are add. 4 references of them are more updated references. They are:

Allegro, G.; Fascista, A.; Coluccia, A. Acoustic Dual-Function Communication and Echo-Location in Inaudible Band. Sensors 2022, 22, 1284.

Dawton, B.; Ishida, S.; Arakawa, Y. C-AVDI: Compressive Measurement-Based Acoustic Vehicle Detection and Identification. IEEE Access 2021, 9, 159457-159474. 2022

Wang, X. Vehicle Image Detection Method Using Deep Learning in UAV Video. Computational Intelligence and Neuroscience 2022, 2022.

Kumari, S.; Agrawal, D. A Review on Video Based Vehicle Detection and Tracking using Image Processing. Journal homepage: www. ijrpr. com ISSN 2582, 7421.

- Explanation of Figure 2 to Figure 5 is too less.

Response 10:

In the revised manuscript, additional descriptions are added for each figure.

Reviewer 2 Report

1) In the Introduction, the authors should provide a more complete overview of the audio techniques that have been proposed in the literature to locate/classify targets. Before delving into the specific class of deep neural networks at line 25, I suggest adding a preliminary discussion where some pointers to different techniques used in the literature to solve similar problems (e.g., object detection) are given. For instance, to make the reader confident with the topic, from a quick search on Google scholar the following interesting and quite recent references can be found and added:

- "Acoustic Dual-Function Communication and Echo-Location in Inaudible Band", Sensors, 2022;

- "C-AVDI: Compressive Measurement-Based Acoustic Vehicle Detection and Identification", IEEE Access, 2021.

2) Page 2, line 50: "augmentaion" should be "augmentation".

3) To improve the clarity, the different steps performed for data augmentation in Sec. 2.3 should be mathematically formalized (as it was done for Sec. 2.4).

4) Page 3, line 115, "Depthwise separable convolutions is..." should be "Depthwise separable convolutions are...".

5) For completeness, the results in Sec. 3 should be enriched with a complexity analysis showing quantitatively (in terms of the reduction provided in eq. (4)) to which extent the adopted depthwise separable convolution actually reduces the computational costs.

Author Response

Dear reviewer,

Thanks indeed for your precious and constructive suggestions.

The followings are my responses to your comments.

- "Acoustic Dual-Function Communication and Echo-Location in Inaudible Band", Sensors, 2022;

- "C-AVDI: Compressive Measurement-Based Acoustic Vehicle Detection and Identification", IEEE Access, 2021.

Response 1:

To make a more complete overview, I added additional 13 references including the above 2 references. I added the demanding of vehicle detection from the beginning and mentioned the frequently used object detection techniques such as image detection and then describe the reason why we choose acoustic sensors for detection.

Page 2, line 50: "augmentaion" should be "augmentation".

Response 2:

Thanks indeed for your correctness and the error has been corrected.

To improve the clarity, the different steps performed for data augmentation in Sec. 2.3 should be mathematically formalized (as it was done for Sec. 2.4).

Response 3:

The steps have been mathematically formalized as you suggested.

Page 3, line 115, "Depthwise separable convolutions is..." should be "Depthwise separable convolutions are...".

Response 4:

Thanks for your correctness and the error has been corrected.

For completeness, the results in Sec. 3 should be enriched with a complexity analysis showing quantitatively (in terms of the reduction provided in eq. (4)) to which extent the adopted depthwise separable convolution actually reduces the computational costs.

Response 5:

I added a new Section 3.2, Complexity Calculation to describe the computational cost and the results are listed in tables in section 3.2.

Article Menu

Printed Edition

Real-Time Vehicle Sound Detection System Based on Depthwise Separable Convolution Neural Network and Spectrogram Augmentation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI