TY - JOUR
T1 - An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers
AU - Fatima, Eman
AU - Kanwal, Hira
AU - Khan, Javed Ali
AU - Khan, Nek Dil
N1 - © 2024 The Author(s), under exclusive licence to Springer Science Business Media, LLC, part of Springer Nature.
PY - 2024/8/27
Y1 - 2024/8/27
N2 - App stores enable users to provide insightful feedback on apps, which developers can use for future software application enhancement and evolution. However, finding user reviews that are valuable and relevant for quality improvement and app enhancement is challenging because of increasing end-user feedback. Also, to date, according to our knowledge, the existing sentiment analysis approaches lack in considering sarcasm and its types when identifying sentiments of end-user reviews for requirements decision-making. Moreover, no work has been reported on detecting sarcasm by analyzing app reviews. This paper proposes an automated approach by detecting sarcasm and its types in end-user reviews and identifying valuable requirements-related information using natural language processing (NLP) and deep learning (DL) algorithms to help software engineers better understand end-user sentiments. For this purpose, we crawled 55,000 end-user comments on seven software apps in the Play Store. Then, a novel sarcasm coding guideline is developed by critically analyzing end-user reviews and recovering frequently used sarcastic types such as Irony, Humor, Flattery, Self-Deprecation, and Passive Aggression. Next, using coding guidelines and the content analysis approach, we annotated the 10,000 user comments and made them parsable for the state-of-the-art DL algorithms. We conducted a survey at two different universities in Pakistan to identify participants’ accuracy in manually identifying sarcasm in the end-user reviews. We developed a ground truth to compare the results of DL algorithms. We then applied various fine-tuned DL classifiers to first detect sarcasm in the end-user feedback and then further classified the sarcastic reviews into more fine-grained sarcastic types. For this, end-user comments are first pre-processed and balanced with the instances in the dataset. Then, feature engineering is applied to fine-tune the DL classifiers. We obtain an average accuracy of 97%, 96%, 96%, 96%, 96%, 86%, and 90% with binary classification and 90%, 91%, 92%, 91%, 91%, 75%, and 89% with CNN, LSTM, BiLSTM, GRU, BiGRU, RNN, and BiRNN classifiers, respectively. Such information would help improve the performance of sentiment analysis approaches to understand better the associated sentiments with the identified new features or issues.
AB - App stores enable users to provide insightful feedback on apps, which developers can use for future software application enhancement and evolution. However, finding user reviews that are valuable and relevant for quality improvement and app enhancement is challenging because of increasing end-user feedback. Also, to date, according to our knowledge, the existing sentiment analysis approaches lack in considering sarcasm and its types when identifying sentiments of end-user reviews for requirements decision-making. Moreover, no work has been reported on detecting sarcasm by analyzing app reviews. This paper proposes an automated approach by detecting sarcasm and its types in end-user reviews and identifying valuable requirements-related information using natural language processing (NLP) and deep learning (DL) algorithms to help software engineers better understand end-user sentiments. For this purpose, we crawled 55,000 end-user comments on seven software apps in the Play Store. Then, a novel sarcasm coding guideline is developed by critically analyzing end-user reviews and recovering frequently used sarcastic types such as Irony, Humor, Flattery, Self-Deprecation, and Passive Aggression. Next, using coding guidelines and the content analysis approach, we annotated the 10,000 user comments and made them parsable for the state-of-the-art DL algorithms. We conducted a survey at two different universities in Pakistan to identify participants’ accuracy in manually identifying sarcasm in the end-user reviews. We developed a ground truth to compare the results of DL algorithms. We then applied various fine-tuned DL classifiers to first detect sarcasm in the end-user feedback and then further classified the sarcastic reviews into more fine-grained sarcastic types. For this, end-user comments are first pre-processed and balanced with the instances in the dataset. Then, feature engineering is applied to fine-tune the DL classifiers. We obtain an average accuracy of 97%, 96%, 96%, 96%, 96%, 86%, and 90% with binary classification and 90%, 91%, 92%, 91%, 91%, 75%, and 89% with CNN, LSTM, BiLSTM, GRU, BiGRU, RNN, and BiRNN classifiers, respectively. Such information would help improve the performance of sentiment analysis approaches to understand better the associated sentiments with the identified new features or issues.
KW - Apps reviews
KW - CrowdRE
KW - Deep learning
KW - Sarcasm detection & classification
KW - Sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85202164270&partnerID=8YFLogxK
U2 - 10.1007/s10515-024-00468-3
DO - 10.1007/s10515-024-00468-3
M3 - Article
AN - SCOPUS:85202164270
SN - 0928-8910
VL - 31
JO - Automated Software Engineering
JF - Automated Software Engineering
IS - 2
M1 - 69
ER -