Abstract
This paper proposes a novel, scalable agentic AI architecture designed to enhance human activity recognition across data modalities by embedding memory-driven reasoning and context awareness. The architecture integrates multimodal sensing, deliberative reasoning through supervised learning and context-aware language models, and memory mechanisms, including short-term memory for tracking immediate activity transitions and long-term memory for embedding experiential knowledge. The evaluation of the proposed model using two major datasets namely RHM (6.7K video clips of 14 known activities) and Toyota Smart Home (16K video clips of 31 unknown activities) demonstrates significant improvements, achieving 60% accuracy when combining contextual information with supervised model output, compared to 40% accuracy with context alone and 35% with supervised models on unseen data. By overcoming the limitations of traditional HAR approaches, this research advances the development of responsive and intelligent robotic systems, facilitating more natural and effective human-robot collaboration.
Original language | English |
---|---|
Title of host publication | International Conference on Social Robotics + AI 2025 |
Pages | 1-14 |
Number of pages | 14 |
Publication status | Accepted/In press - 16 Jun 2025 |