An Individual Identity-Driven Framework for Animal Re-Identification

Reliable re-identification of individuals within large wildlife populations is crucial for biological studies, ecological research, and wildlife conservation. Classic computer vision techniques offer a promising direction for Animal Re-identification (Animal ReID), but their backbones' close-set nature limits their applicability and generalizability. Despite the demonstrated effectiveness of vision-language models like CLIP in re-identifying persons and vehicles, their application to Animal ReID remains limited due to unique challenges, such as the various visual representations of animals, including variations in poses and forms. To address these limitations, we leverage CLIP's cross-modal capabilities to introduce a two-stage framework, the \textbf{Indiv}idual \textbf{A}nimal \textbf{ID}entity-Driven (IndivAID) framework, specifically designed for Animal ReID. In the first stage, IndivAID trains a text description generator by extracting individual semantic information from each image, generating both image-specific and individual-specific textual descriptions that fully capture the diverse visual concepts of each individual across animal images. In the second stage, IndivAID refines its learning of visual concepts by dynamically incorporating individual-specific textual descriptions with an integrated attention module to further highlight discriminative features of individuals for Animal ReID. Evaluation against state-of-the-art methods across eight benchmark datasets and a real-world Stoat dataset demonstrates IndivAID's effectiveness and applicability. Code is available at \url{https://github.com/ywu840/IndivAID}.

翻译：对大规模野生动物种群中的个体进行可靠重识别对于生物学研究、生态学研究和野生动物保护至关重要。经典的计算机视觉技术为动物重识别提供了一个有前景的方向，但其骨干网络的封闭集特性限制了其适用性和泛化能力。尽管像CLIP这样的视觉-语言模型在行人和车辆重识别中已展现出有效性，但由于动物独特的视觉表征（如姿态和形态的多样性）等挑战，其在动物重识别中的应用仍然有限。为应对这些限制，我们利用CLIP的跨模态能力，提出了一个专为动物重识别设计的两阶段框架——**个体动物身份驱动**框架。在第一阶段，IndivAID通过从每张图像中提取个体语义信息来训练文本描述生成器，生成既能捕捉图像特异性又能捕捉个体特异性的文本描述，从而全面表征动物图像中每个个体的多样化视觉概念。在第二阶段，IndivAID通过集成注意力模块动态融合个体特异性文本描述，进一步突出个体的判别性特征，从而精细化学习视觉概念。在八个基准数据集和一个真实世界的白鼬数据集上，与最先进方法的对比评估验证了IndivAID的有效性和适用性。代码发布于 \url{https://github.com/ywu840/IndivAID}。