Every year, 10 million pets enter shelters, separated from their families. Despite desperate searches by both guardians and lost animals, 70% never reunite, not because matches do not exist, but because current systems look only at appearance, while animals recognize each other through sound. We ask, why does computer vision treat vocalizing species as silent visual objects? Drawing on five decades of cognitive science showing that animals perceive quantity approximately and communicate identity acoustically, we present the first multimodal reunification system integrating visual and acoustic biometrics. Our species-adaptive architecture processes vocalizations from 10Hz elephant rumbles to 4kHz puppy whines, paired with probabilistic visual matching that tolerates stress-induced appearance changes. This work demonstrates that AI grounded in biological communication principles can serve vulnerable populations that lack human language.
翻译:每年,有1000万只宠物进入收容所,与它们的家人分离。尽管监护人和失踪动物都进行了绝望的搜寻,但70%的动物从未重聚,原因并非匹配不存在,而是因为当前系统仅关注外貌,而动物之间通过声音相互识别。我们不禁要问:为什么计算机视觉将发声物种视为无声的视觉对象?基于五十年来认知科学的研究成果——该领域表明动物能近似感知数量并通过声音传递身份信息——我们提出了首个结合视觉与声学生物特征的多模态重聚系统。我们的物种自适应架构可处理从10赫兹大象低鸣到4千赫兹幼犬呜叫的发声,并配以能够容忍压力诱导外貌变化的概率视觉匹配。这项工作表明,植根于生物交流原理的人工智能能够服务于缺乏人类语言的弱势群体。