Abstract
Software is designed for humans and must account for their values. However, current research and practice focus on a narrow range of well-explored values, e.g. security, overlooking a more comprehensive perspective. Those exploring a broader array of values rely on manual identification, which is labour-intensive and prone to human bias. Moreover, existing methods offer limited reliability as they fail to explain their findings. In this paper, we propose leveraging the reasoning capabilities of Large Language Models (LLMs) for automated inference about values. This allows for not only detecting values but also explaining how they are expressed in the software. We aim to examine the effectiveness of LLMs, specifically ChatGPT (Chat Generative Pre-Trained Transformer), in automated detection and explanation of values in software artifacts. Using ChatGPT, we investigate how mobile APIs align with human values based on their documentation. Human evaluation of ChatGPT's findings shows a reciprocal shift in understanding values, with both ChatGPT and experts adjusting their assessments through dialogue. While experts recognise ChatGPT's potential for revealing values, emphasis is placed on human involvement to enhance the accuracy of the findings by detecting and eliminating convincing but inaccurate explanations provided by the language model due to potential hallucinations or confabulations.
| Original language | English |
|---|---|
| Article number | 2478278 |
| Pages (from-to) | 1-37 |
| Number of pages | 37 |
| Journal | Behaviour and Information Technology |
| Early online date | 3 May 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 3 May 2025 |
Keywords
- ChatGPT
- Human values
- LLMs
- software