3.2.1. Prosodic Features

Prosodic features are acoustic features prominently used in emotion recognition and speech signal processing because they carry essential paralinguistic information. This information complements a message with an intention that can paint a flawless picture about attitude or emotion [35,56]. In addition, prosodic features are considered as suprasegmental information because they help in defining and structuring the flow of speech [35]. Prosodic continuous speech features such as pitch and energy convey much content of emotions in speech [57] and are important for delivering the emotional cues of the speakers. These features include formant, timing and articulation features and they characterize the perceptual properties of speech typically used by human beings to perform different speech tasks [57].

The present authors have included three important prosodic features of energy, fundamental frequency and ZCR. Signal energy models the voice intensity, volume or loudness and reflects the pause and ascent of the voice signal. It is often associated with the human respiratory system and is one of the most important characteristics of human aural perception. The logarithm function is often used to reflect minor changes of energy because the energy of an audio signal is influenced by the recording conditions. Fundamental frequency provides tonal plus rhythmic characteristics of a speech and carries useful information about the speaker. ZCR determines the information about the number of times a signal waveform crosses the zero amplitude line because of a transition from a positive/negative value to a negative/positive value in a given time. It is suitable for detecting voice activity, end point, voiced sound segment, unvoiced sound segment, silent segmen<sup>t</sup> and approximating the measure of noisiness in speech [58]. ZCR is an acoustic feature that has been classified as a prosodic feature [49,59,60]. In particular, energy and pitch were declared prosody features with a low frequency domain while ZCR and formants are high frequency features [60].
