TY - JOUR
T1 - Leveraging Large Language Model ChatGPT for enhanced understanding of end-user emotions in social media feedbacks
AU - Khan, Nek Dil
AU - Khan, Javed Ali
AU - Li, Jianqiang
AU - Ullah, Tahir
AU - Zhao, Qing
N1 - © 2024 Elsevier Ltd.
PY - 2024/10/10
Y1 - 2024/10/10
N2 - For software evolution, user feedback has become a meaningful way to improve applications. Recent studies show a significant increase in analyzing end-user feedback from various social media platforms for software evolution. However, less attention has been given to the end-user feedback for low-rating software applications. Also, such approaches are developed mainly on the understanding of human annotators who might have subconsciously tried for a second guess, questioning the validity of the methods. For this purpose, we proposed an approach that analyzes end-user feedback for low-rating applications to identify the end-user opinion types associated with negative reviews (an issue or bug). Also, we utilized Generative Artificial Intelligence (AI), i.e., ChatGPT, as an annotator and negotiator when preparing a truth set for the deep learning (DL) classifiers to identify end-user emotion. For the proposed approach, we first used the ChatGPT Application Programming Interface (API) to identify negative end-user feedback by processing 71853 reviews collected from 45 apps in the Amazon store. Next, a novel grounded theory is developed by manually processing end-user negative feedback to identify frequently associated emotion types, including anger, confusion, disgust, distrust, disappointment, fear, frustration, and sadness. Next, two datasets were developed, one with human annotators using a content analysis approach and the other using ChatGPT API with the identified emotion types. Next, another round is conducted with ChatGPT to negotiate over the conflicts with the human-annotated dataset, resulting in a conflict-free emotion detection dataset. Finally, various DL classifiers, including LSTM, BILSTM, CNN, RNN, GRU, BiGRU and BiRNN, are employed to identify their efficacy in detecting end-users emotions by preprocessing the input data, applying feature engineering, balancing the data set, and then training and testing them using a cross-validation approach. We obtained an average accuracy of 94%, 94%, 93%, 92%, 91%, 91%, and 85%, with LSTM, BILSTM, RNN, CNN, GRU, BiGRU and BiRNN, respectively, showing improved results with the truth set curated with human and ChatGPT. Using ChatGPT as an annotator and negotiator can help automate and validate the annotation process, resulting in better DL performances.
AB - For software evolution, user feedback has become a meaningful way to improve applications. Recent studies show a significant increase in analyzing end-user feedback from various social media platforms for software evolution. However, less attention has been given to the end-user feedback for low-rating software applications. Also, such approaches are developed mainly on the understanding of human annotators who might have subconsciously tried for a second guess, questioning the validity of the methods. For this purpose, we proposed an approach that analyzes end-user feedback for low-rating applications to identify the end-user opinion types associated with negative reviews (an issue or bug). Also, we utilized Generative Artificial Intelligence (AI), i.e., ChatGPT, as an annotator and negotiator when preparing a truth set for the deep learning (DL) classifiers to identify end-user emotion. For the proposed approach, we first used the ChatGPT Application Programming Interface (API) to identify negative end-user feedback by processing 71853 reviews collected from 45 apps in the Amazon store. Next, a novel grounded theory is developed by manually processing end-user negative feedback to identify frequently associated emotion types, including anger, confusion, disgust, distrust, disappointment, fear, frustration, and sadness. Next, two datasets were developed, one with human annotators using a content analysis approach and the other using ChatGPT API with the identified emotion types. Next, another round is conducted with ChatGPT to negotiate over the conflicts with the human-annotated dataset, resulting in a conflict-free emotion detection dataset. Finally, various DL classifiers, including LSTM, BILSTM, CNN, RNN, GRU, BiGRU and BiRNN, are employed to identify their efficacy in detecting end-users emotions by preprocessing the input data, applying feature engineering, balancing the data set, and then training and testing them using a cross-validation approach. We obtained an average accuracy of 94%, 94%, 93%, 92%, 91%, 91%, and 85%, with LSTM, BILSTM, RNN, CNN, GRU, BiGRU and BiRNN, respectively, showing improved results with the truth set curated with human and ChatGPT. Using ChatGPT as an annotator and negotiator can help automate and validate the annotation process, resulting in better DL performances.
KW - ChatGPT
KW - Data-driven requirements engineering
KW - Generative Artificial Intelligence
KW - Large Language Models
KW - Sentiment Analysis
KW - Software evolution
UR - http://www.scopus.com/inward/record.url?scp=85206451604&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2024.125524
DO - 10.1016/j.eswa.2024.125524
M3 - Article
AN - SCOPUS:85206451604
SN - 0957-4174
VL - 261
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 125524
ER -