*Article* **The Application and Improvement of Deep Neural Networks in Environmental Sound Recognition**

#### **Yu-Kai Lin 1, Mu-Chun Su 1,\* and Yi-Zeng Hsieh 2,3,4,\***


Received: 28 July 2020; Accepted: 21 August 2020; Published: 28 August 2020

#### **Featured Application: Authors are encouraged to provide a concise description of the specific application or a potential application of the work. This section is not mandatory.**

**Abstract:** Neural networks have achieved grea<sup>t</sup> results in sound recognition, and many di fferent kinds of acoustic features have been tried as the training input for the network. However, there is still doubt about whether a neural network can e fficiently extract features from the raw audio signal input. This study improved the raw-signal-input network from other researches using deeper network architectures. The raw signals could be better analyzed in the proposed network. We also presented a discussion of several kinds of network settings, and with the spectrogram-like conversion, our network could reach an accuracy of 73.55% in the open-audio-dataset "Dataset for Environmental Sound Classification 50" (ESC50). This study also proposed a network architecture that could combine di fferent kinds of network feeds with di fferent features. With the help of global pooling, a flexible fusion way was integrated into the network. Our experiment successfully combined two di fferent networks with di fferent audio feature inputs (a raw audio signal and the log-mel spectrum). Using the above settings, the proposed ParallelNet finally reached the accuracy of 81.55% in ESC50, which also reached the recognition level of human beings.

**Keywords:** deep neural network; convolutional neural network; environmental sound recognition; feature combination
