3.2.2. Spectral Features

Spectral features of this study include timbral features that have been successful in music recognition [61]. Timbral features define the quality of a sound [62] and they are a complete opposite of most general features like pitch and intensity. It has been revealed that a strong relationship exists between voice quality and emotional content in a speech [58]. Siedenburg et al. [63] considered spectral features as significant in distinguishing between classes of speech and music. They claimed that the temporal evolution of the spectrum of audio signals mainly accounts for the timbral perception. Timbral features of a sound help in differentiating between sounds that have the same pitch and loudness and they assist in the classification of audio samples with similar timbral features into unique classes [61]. The application of spectral timbral features like spectral centroid, spectral roll-off point, spectral flux, time domain zero crossings and root mean squared energy amongs<sup>t</sup> others has been demonstrated for speech analysis [64].

The authors have included eight spectral features of MFCC, spectral roll-o ff point, spectral flux, spectral centroid, spectral compactness, fast Fourier transforms, spectral variability and LPCC. MFCC is one of the most widely used feature extraction methods for speech analysis because of its computational simplicity, superior ability of distinction and high robustness to noise [42]. It has been successfully applied to discriminate phonemes and can determine what is said and how it is said [35]. The method is based on the human hearing system that provides a natural and real reference for speech recognition [46]. In addition, it is based on the sense that human ears perceive sound waves of di fferent frequencies in a non-linear mode [42]. Spectral roll-o ff point defines a frequency below which at least 85% of the total spectral energy are present and provides a measure of spectral shape [65]. Spectral flux characterizes the dynamic variation of spectral information, is related to perception of music rhythm and it captures spectrum di fference between two adjacent frames [66].

Spectral centroid describes the center of gravity of the magnitude spectrum of short-time Fourier transform (STFT). It is a spectral moment that is helpful in modeling sharpness or brightness of sound [67]. Spectral compactness is obtained by comparing the components in the magnitude spectrum of a frame and magnitude spectrum of neighboring frames [68]. Fast Fourier transform (FFT) is a useful method of analyzing the frequency spectrum of a speech signal and features based on the FTT algorithm have the strongest frequency component in Hertz [69,70]. Spectral variability is a measure of variance of the magnitude spectrum of a signal and is attributed to di fferent sources, including phonetic content, speaker, channel, coarticulation and context [71]. Spectral compactness is an audio feature that measures the noisiness of a speech signal and is closely related to the spectral smoothness of speech signal [72]. LPCCs are spectral features obtained from the envelope of LPC and are computed from sample points of a speech waveform. They have low vulnerability to noise, yielded a lower error rate in comparison to LPC features and can discriminate between di fferent emotions [73]. LPCC features are robust, but they are not based on an auditory perceptual frequency scale like MFCC [74].
