Abstract
In the age of information, we are overwhelmed with large amounts of data. The quest to know more in less time has increased the need for efficient text summarization models that convert information into precise summaries such that essential details are not overlooked. Recently, GPT-3.5 has demonstrated impressive performance in text completion, generation, and question answering. However, its effectiveness in generating concise and coherent summaries for scientific articles and news reports remains under-explored. This work evaluates the performance of GPT-3.5 in summarizing scientific research articles and news data. Scientific articles were collected from arXiv STEM dataset, whereas news articles were sampled from the CNN/DailyMail dataset. Using the GPT-3.5 OpenAI API, the pre-trained model is prompted to generate summaries of the scientific and news articles. In the next step, the ROUGE score is computed for the generated summaries against the reference summaries to analyse the performance of the model. Our results show that GPT-3.5 performs slightly better in summarizing scientific articles as compared to news articles with an average ROUGE score of 0.35 and 0.31, respectively. Moreover, in agreement with the literature, we show that the ROUGE is not the best measure for evaluating text similarity as it heavily relies on similar vocabulary rather than semantics.
Original language | English |
---|---|
Title of host publication | Data Science and Emerging Technologies |
Subtitle of host publication | Proceedings of DaSET 2023 |
Editors | Yap Bee Wah, Dhiya Al-Jumeily OBE, Michael W. Berry |
Place of Publication | Singapore |
Publisher | Springer Nature Link |
Pages | 49-61 |
Number of pages | 13 |
ISBN (Electronic) | 978-981-97-0293-0, 978-981-97-0293-0 |
ISBN (Print) | 978-981-97-0292-3 |
DOIs | |
Publication status | E-pub ahead of print - 27 Apr 2024 |
Event | The International Conference on Data Science and Emerging Technologies DaSET 2023 - Virtual conference at UNITAR International University, Malaysia Duration: 4 Dec 2023 → 5 Dec 2023 Conference number: 2 https://icdaset.com/daset2023/ |
Publication series
Name | Lecture Notes on Data Engineering and Communications Technologies |
---|---|
Publisher | Springer |
Volume | 191 |
ISSN (Print) | 2367-4512 |
ISSN (Electronic) | 2367-4520 |
Conference
Conference | The International Conference on Data Science and Emerging Technologies DaSET 2023 |
---|---|
Abbreviated title | DaSET 2023 |
Country/Territory | Malaysia |
Period | 4/12/23 → 5/12/23 |
Internet address |
Keywords
- ChatGPT
- Large language model
- Natural language processing
- Scientific papers
- Text summarization