Autonomous mobile service robots, such as lawnmowers or cleaning robots, operating in human-populated environments need to reason about human-human interactions to support safe and socially aware navigation. For such systems, interaction understanding is not primarily a fine-grained recognition problem, but a perception problem under limited sensing quality and computational resources. Many existing approaches focus on holistic group activity recognition, often relying on complex and computationally expensive models that are not well suited for mobile robotic platforms. In this work, we argue that pairwise human interactions constitute a minimal yet sufficient perceptual unit for robot-centric social understanding. We study the problem of identifying interacting person pairs and classifying coarse-grained interaction behaviors sufficient for downstream group-level reasoning and robot decision-making. To this end, we adopt a two-stage framework in which candidate interacting pairs are first identified using lightweight geometric and motion cues, and interaction types are subsequently classified using a relation network. We evaluate the proposed approach on the JRDB dataset, where it achieves competitive performance with reduced computational cost and model size compared to appearance-based methods. Additional experiments on the Collective Activity Dataset (CAD) and zero-shot evaluation on a lawnmower-collected dataset further demonstrate the generalizability of the proposed framework. These results suggest that simple geometric and motion cues provide a practical and efficient basis for interaction-aware perception in mobile service robots. Code is released.
翻译:暂无翻译