Investigating HuBERT-based Speech Emotion Recognition Generalisation Capability

Letian Li, Cornelius Glackin, Nigel Cannings, Vito Veneziano, Jack Barker, Olakunle Oduola, Chris Woodruff, Thea Laird, James Laird, Yi Sun

Research output: Contribution to conferencePaperpeer-review

7 Downloads (Pure)

Abstract

Transformer-based architectures have made significant progress in speech emotion recognition (SER). However, most published SER research trained and tested models on data from the same corpus, resulting in poor generalisation ability to unseen data collected from different corpora. To address this, we applied the HuBERT model to a combined training set consisting of five publicly available datasets (IEMOCAP, RAVDESS, TESS, CREMA-D, and 80% CMU-MOSEI) and conducted cross-corpus testing on the Strong Emotion (StrEmo) Dataset (a natural dataset collected by the authors) and two publicly available datasets (SAVEE and 20% CMU-MOSEI). Our best result achieved an F1 score of 0.78 over the three test sets, with an F1 score of 0.86 for StrEmo specifically. Additionally, we are pleased to release the spreadsheet of key information on the StrEmo dataset as supplementary material to the conference.
Original languageEnglish
Publication statusPublished - 2024
EventThe 23rd International Conference on Artificial Intelligence and Soft Computing 2024 - Zakopane, Poland
Duration: 16 Jun 202420 Jun 2024
https://icaisc.eu/

Conference

ConferenceThe 23rd International Conference on Artificial Intelligence and Soft Computing 2024
Abbreviated titleICAISC 2024
Country/TerritoryPoland
CityZakopane
Period16/06/2420/06/24
Internet address

Fingerprint

Dive into the research topics of 'Investigating HuBERT-based Speech Emotion Recognition Generalisation Capability'. Together they form a unique fingerprint.

Cite this