Abstract
Transformer-based architectures have made significant progress in speech emotion recognition (SER). However, most published SER research trained and tested models on data from the same corpus, resulting in poor generalisation ability to unseen data collected from different corpora. To address this, we applied the HuBERT model to a combined training set consisting of five publicly available datasets (IEMOCAP, RAVDESS, TESS, CREMA-D, and 80% CMU-MOSEI) and conducted cross-corpus testing on the Strong Emotion (StrEmo) Dataset (a natural dataset collected by the authors) and two publicly available datasets (SAVEE and 20% CMU-MOSEI). Our best result achieved an F1 score of 0.78 over the three test sets, with an F1 score of 0.86 for StrEmo specifically. Additionally, we are pleased to release the spreadsheet of key information on the StrEmo dataset as supplementary material to the conference.
Original language | English |
---|---|
Publication status | Published - 20 Jun 2024 |
Event | The 23rd International Conference on Artificial Intelligence and Soft Computing 2024 - Zakopane, Poland Duration: 16 Jun 2024 → 20 Jun 2024 https://icaisc.eu/ |
Conference
Conference | The 23rd International Conference on Artificial Intelligence and Soft Computing 2024 |
---|---|
Abbreviated title | ICAISC 2024 |
Country/Territory | Poland |
City | Zakopane |
Period | 16/06/24 → 20/06/24 |
Internet address |