3.1.1. RAVDESS

RAVDESS is a gender balanced set of validated speeches and songs that consists of eight emotions of 24 professional actors speaking similar statements in a North American accent. It is a multiclass database of angry, calm, disgust, fear, happy, neutral, sad and surprise emotions with 1432 American English utterances. Each of the 24 recorded vocal utterances comprises of three formats, which are audio-only (16bit, 48kHz .wav), audio-video (720p H.264, AAC 48kHz, .mp4) and video-only (no sound). The audio-only files were used across all the eight emotions because this study concerns speech emotion recognition. Figure 1 shows that angry, calm, disgust, fear, happy and sad emotion classes constituted 192 audio files each. The surprise emotion had 184 files and the neutral emotion had the lowest number of audio files of 96.

**Figure 1.** The Ryerson audio-visual database of emotional speech and song (RAVDESS).
