Recent developments in interactive and robotic systems have motivated researchers for recognizing human’s emotion from speech. The present study aimed to classify emotional speech signals using a two stage classifier based on arousal-valence emotion model. In this metho More
Recent developments in interactive and robotic systems have motivated researchers for recognizing human’s emotion from speech. The present study aimed to classify emotional speech signals using a two stage classifier based on arousal-valence emotion model. In this method, samples are firstly classified based on the arousal level using conventional prosodic and spectral features. Then, valence related emotions are classified using the proposed non-linear dynamics features (NLDs). NLDs are extracted from the geometrical properties of the reconstructed phase space of speech signal. For this purpose, four descriptor contours are employed to represent the geometrical properties of the reconstructed phase space. Then, the discrete cosine transform (DCT) is used to compress the information of these contours into a set of low order coefficients. The significant DCT coefficients of the descriptor contours form the proposed NLDs. The classification accuracy of the proposed system has been evaluated using the 10-fold cross-validation technique on the Berlin database. The average recognition rate of 96.35% and 87.18% were achieved for females and males, respectively. By considering the total number of male and female samples, the overall recognition rate of 92.34% is obtained for the proposed speech emotion recognition system.
Manuscript profile