Abstract
Language is an essential medium for human communication. It allows us to convey information, express our ideas, and give instructions to others. The rise of sarcasm can be attributed to the increasing number of negative comments and expressions posted on social networks such as Twitter, Facebook, and newspapers. Due to the use of positive vocabulary in sarcastic comments, it is hard to detect sarcasm in news reports. Sarcasm is intentionally used in news reports to grab the readers’ attention. Unfortunately, many people find it hard to identify the ironic tone of the headlines and may pass incorrect information. This work focuses on detecting sarcasm in newspaper headlines and investigates the performance of four machine learning algorithms (Logistic Regression, Naive Bayes, decision tree, and Random Forest) and one deep learning model BiLSTM (Bi-directional Long Short-Term Memory) for sarcasm detection in news headlines. We demonstrate that regardless of the machine learning model, the application of vectorization technique, i.e. BoW (Bag of Words) and TF–IDF (Term Frequency–Inverse Document Frequency) has minimal influence on the ability to detect sarcasm in news headlines. We also show that the performance of the three machine learning algorithms (Logistic Regression, Random Forest, and decision tree) remains stable across two tokenization techniques (Unigram or Bigram) except Naive Bayes which secured a higher precision with Unigram analysis. We further found that BiLSTM is the most preferred model for sarcasm detection in news headlines.
Original language | English |
---|---|
Title of host publication | Data Science and Emerging Technologies |
Subtitle of host publication | Proceedings of DaSET 2023 |
Editors | Edit Yap Bee Wah, Dhiya Al-Jumeily OBE, Michael W. Berry |
Publisher | Springer Nature Link |
Pages | 237-250 |
Number of pages | 14 |
ISBN (Electronic) | 978-981-97-0293-0, 978-981-97-0293-0 |
ISBN (Print) | 978-981-97-0292-3 |
DOIs | |
Publication status | E-pub ahead of print - 27 Apr 2024 |
Event | The International Conference on Data Science and Emerging Technologies DaSET 2023 - Virtual conference at UNITAR International University, Malaysia Duration: 4 Dec 2023 → 5 Dec 2023 Conference number: 2 https://icdaset.com/daset2023/ |
Publication series
Name | Lecture Notes on Data Engineering and Communications Technologies |
---|---|
Publisher | Springer |
Volume | 191 |
ISSN (Print) | 2367-4512 |
ISSN (Electronic) | 2367-4520 |
Conference
Conference | The International Conference on Data Science and Emerging Technologies DaSET 2023 |
---|---|
Abbreviated title | DaSET 2023 |
Country/Territory | Malaysia |
Period | 4/12/23 → 5/12/23 |
Internet address |
Keywords
- Deep learning
- Machine learning
- Natural language processing
- Newspaper headlines
- Sarcasm detection