1. Introduction
In addition to the sense of sight, which is an important tool for interaction between humans and their environment, understanding the environment’s acoustic signals is also crucial for survival. Using acoustic signals, one can often prejudge a scene or event without actually observing it. With the advancement of processors and AI systems, the proximity of machine learning (ML) algorithms to human perception, and the emergence of the smart city concept, urban smart monitoring systems based on audio and video have become increasingly popular. Considering the smart city concept, the automatic processing of audio and video data from urban areas enables city authorities to respond to incidents and quickly improve service quality. In congested urban areas, image-based event detection systems have long been used to monitor vehicle traffic or automatically detect events. These systems have the advantage of never getting tired, rarely making mistakes, and providing comprehensive documentation of crimes and violations. Additionally, this topic is occasionally used in security systems. Under several circumstances, cameras cannot cover a scene; therefore, adding an acoustic signal can complement the monitoring system and enhance systems’ accuracy and efficiency. So far, numerous methods have been proposed for processing and classifying urban events using the acoustic signal spectrogram, each with strengths and weaknesses. One aspect overlooked in spectrogram-based AED systems is the analysis of acoustic events based on similarities and differences of spectrogram bands related to different events separately. Considering this analysis, the trained system can use the spectrogram bands that differ between events to enhance its efficiency. Due to the lack of comprehensive research investigating the similarities and differences of spectrogram bands for different events, this article introduces a method to identify similar and dissimilar spectrogram bands for urban events from a mathematical and probabilistic perspective. In the meantime, the proposed method can be effectively used to select compelling features in the design and implementation of ML algorithms. The main advantages of the proposed method can be categorized into three areas:
By analyzing the potential similarities and differences in the spectrogram bands of each urban event, one can focus on the similarities and dissimilarities within and outside the group of events, thus aiding in event classification.
Since the proposed method uses a probabilistic model and confidence interval to evaluate the similarity and difference of spectrogram bands across different events, there is no need for heuristic methods. Also, mathematical analysis is more reliable than heuristic methods in ensuring accurate results.
The proposed method can be used to identify irrelevant features in AED systems, and in cases where there is a high degree of similarity between two or more events, a secondary classifier can be designed using the outcomes of the proposed method to minimize errors.
The proposed method can be used to identify useless bands in acoustic-based systems that use the spectrogram as a feature. Normality should be checked in each case to ensure the accuracy of the result.
The article is structured as follows: In
Section 2, recent methods for AED and feature selection based on mathematical models are presented.
Section 3 describes the proposed method.
Section 4 is devoted to the results obtained by the proposed method and its comparison against other existing methods. Finally, a summary of the study is presented in the conclusion.
2. Literature Review
In recent years, multiresolution analyses, such as spectrograms, mel frequency cepstral coefficients (MFCCs), and wavelets, have been widely used in signal analysis and AED because of their suitability for finding patterns in time-varying signals. Hajihashemi et al. [
1,
2] used MFCC and wavelets for sound analysis in AED and acoustic scene classification. The authors also used wavelet scattering as another spectral feature in [
1]. Roy et al. [
3] used the spectrogram as a time-frequency expression of arterial Doppler signals to predict blood clots and microemboli. Several features were extracted from the spectrogram, such as the root mean of the local power spectrum and the modal frequency. Ibs-von Seht [
4] aimed to provide an overview of volcanic activity using the spectrogram of seismic signals. Hafez et al. [
5] predicted the timing of an earthquake using the spectrogram of signals obtained from the ground. Broussard and Givens [
6] analyzed the oscillations of the posterior parietal cortex in rats and the impact of different acoustic signals on them using the spectrogram.
Liu et al. [
7] employed the spectrogram in conjunction with the Hilbert–Huang Transform (HHT) for sleep apnea detection. Dennis et al. [
8] used spectrogram and Hough transform features to detect acoustic events. Towsey et al. [
9] used features extracted from the spectrogram to estimate the number of birds in a natural environment. Vales et al. [
10] predicted earthquakes by analyzing data collected from the spectrogram of low-frequency terms of terrestrial signals. Oliveira et al. [
11] proposed an efficient method to detect bird activity using a spectrogram-based filter. The spectrogram separated the background sound from the bird’s voice in this method.
Ghosh et al. [
12] applied the spectrogram and Wigner–Ville transform of the vibration signal for vehicle detection. Xie et al. [
13] introduced an AED system that used features extracted from spectrograms. Additionally, Xie et al. [
14] used spectrograms, linear predictive coding, and MFCCs to estimate the number of frogs based on ambient sound. Using ambient sound, Sánchez-Gendriz and Padovese [
15] analyzed biological choruses. In this study, features were extracted using the spectrogram, and an effective graphical expression for biological choruses was presented using the amplitude of the spectrogram in some frequency bands. Zhaoa et al. [
16] separated the sounds of different bird species using MFCC, spectrogram, and an autoregressive model. Shervegar et al. [
17] proposed a phonocardiogram spectrogram-based system for heart disease classification.
Nobre et al. [
18] measured the biological parameters of caged domestic animals using an electric field. This research relied on the spectrogram to determine the frequency characteristics of long recordings. Ye et al. [
19] used a combination of local and global features, including spectrogram entropy, to detect urban events. Goenka et al. [
20] proposed a method for detecting seizures using quantitative electroencephalogram spectrograms. Hoyos-Barcelóa et al. [
21] used local features of the acoustic signal spectrogram to detect coughs. The proposed method was implemented in a smartphone application and showed promising results. Waldman et al. [
22] detected high-frequency oscillations within the human skull using electroencephalographic (EEG) signal spectroscopy. Yan et al. [
23] used spectrograms to diagnose seizures based on a convolutional neural network (CNN) classifier whose input was spectral images.
Oliveira et al. [
24] relied on the capabilities of spectrograms and ML methods, such as neural networks (NN) and support vector machines (SVM), to classify EEG signals and diagnose epilepsy. In addition to the spectrogram, cross-correlation and discrete Fourier transform were used in this study. Zhang et al. [
25] employed acoustic sensors and a phase-sensitive optical time-domain reflectometer to distinguish five different acoustics. The authors extracted features using the spectrogram. Sahai et al. [
26] considered spectrogram-related features for musical font separation. This method applied the spectrogram image as the input to the VGG network. Lin et al. [
27] applied spectrogram features as the input to a deep neural network (DNN) and trained a semi-supervised CNN as an AED system in an urban area.
Spadini et al. [
28] evaluated several acoustic features in detecting urban events, and the spectrogram was among them. Su et al. [
29] used a two-stage CNN network to classify environmental acoustics considering features such as a log-mel spectrogram and MFCC-based features. Gloaguen et al. [
30] proposed spectrogram features and non-negative matrix factorization to estimate road traffic levels. Satar et al. [
31] proposed an AED method based on the spectrogram of data collected by the hydrophone. The continuous wavelet transform and spectrogram were used by Lapins et al. [
32] to analyze seismic signals caused by volcanic activity. The audio spectrogram was among the acoustic features suggested by Vafeiadis et al. [
33] for a smart home AED system. For environmental monitoring and counting of low detectable species, Znidersic et al. [
34] used the spectrogram of an acoustic signal. Robinet et al. [
35] used the spectrogram to extract transient noise characteristics in gravitational wave detectors.
Azab and Khasawneh [
36] used the spectrogram to detect malware files. Kachaa et al. [
37] analyzed the different conditions of dysarthric speech, which is a speech disorder related to muscle weakness, using the spectrogram of voice signals to interpret the different states of this disorder. Zeng et al. [
38] extracted the spectrogram of arm movements and used this feature to classify the movements.
Franzoni et al. [
39] proposed an emotion recognition system using a human voice spectrogram and a CNN-based classifier. Sinha et al. [
40] extracted the audio spectrogram, and converted it to an image that was inputted into a CNN for audio classification. Luz et al. [
41] relied on different acoustic features, such as the spectrogram, to detect events in an urban space based on a CNN-based classifier. In analyzing the heart’s electrocardiogram (ECG) signals, Gupta et al. [
42] used the spectrogram. Manhertz and Bereczky [
43] used the short-time Fourier transform (STFT) spectrogram to analyze vibration in a rotating electric machine and identify faults in early stages. In a study by Lara et al. [
44], seismic and volcanic events were detected using a spectrogram and deep learning. Pham et al. [
45] used a spectrogram-based method to classify scenes based on a CNN-DNN architecture.
Liu et al. [
46] combined convolutional recurrent neural networks and mel spectrogram, delta, and delta-delta features for underwater target recognition. Kadyan and Bawa [
47] proposed a two-level augmentation scheme via the spectrogram of speech signals using transfer learning techniques for an automatic speech recognition system. Pahuja and Avijeet [
48] proposed a bird sound-based recognition system for classifying eight species of birds using the STFT spectrogram for feature extraction. Zhang et al. [
49] used gradient-weighted class activation mapping as a CNN and the mel spectrogram as a feature for acoustic scene classification. Cheng et al. [
50] suggested a spectrogram-based sound recognition system using AlexNet, which identified passing vehicles with modified loud exhausts. Wang et al. [
51] used the spectrogram of underwater signals for time-frequency tracking and enhancement of whistle signals. The application of whistle signals is in research about cetaceans. You et al. [
52] used audio spectrogram transformers and a CNN to generate embeddings for the few-shot learning of bioacoustic AED. Bhangale et al. [
53] combined mel spectrogram and other acoustic features and used them as input for a parallel emotion network for speech emotion recognition. Özseven [
54] discussed the effectiveness of the spectrogram as a time-frequency domain image in urban sound classification. Latif et al. [
55], Shafik et al. [
56] and Mushtaq et al. [
57] used the spectrogram as an effective acoustic feature in speech emotion recognition, speaker identification, and environmental sound classification, respectively. All of these approaches were based on deep learning.
The spectrogram has also been used in many medical applications, such as cough detection [
58], detection of cardiovascular disease and epilepsy using ECG and EEG signals [
24,
59], sleep spindles [
60], and scalp peak ripples [
61] detection using EEG signals. It has also been used for sleep apnea–hypopnea syndrome diagnosis based on nasal airflow signal [
62] and snoring detection system using voice [
63]. In industrial applications, the spectrogram has been used for fault detection using vibration signals [
64], fault detection in gearboxes based on sound [
65], and fault detection in rotary systems using data from various sensors [
66]. Verification of bird diversity [
67] and the detection of shoots in forests [
68] using ambient sound, seismo-acoustic event prediction using vibration signals and ground waves [
69], and an AED system [
70] are other state-of-the-art applications that use the spectrogram as a feature extraction method.
In most of the studies mentioned above, the classifier employed was a DL network. According to the reviewed studies, the following findings were observed:
In some methods, the spectrogram image was used as the input to a two-dimensional (2D) DNN;
Various features extracted from the spectrogram are used as the input;
In some cases, experts have analyzed spectrogram images to distinguish between different conditions;
Based on our best search, no quantitative methods have been proposed to determine which frequency bands of the spectrogram are most effective for the application under study.
In the current study, an efficient method to separate the useful from useless bands of the spectrogram regardless of the used classifier was developed based on statistical tests. The proposed method can be used to determine the similarities and differences in the frequency bands of the spectrogram considering different classes. Using the results of the proposed method, the noise is reduced by removing similar frequency bands from the spectrogram, and the accuracy and learning speed are increased. Therefore, in an AED system, the proposed method can provide insights into the different events associated with the spectrogram frequency bands relative to the background and increase the system’s accuracy and speed.