Traditionally, bioacoustics has relied on spectrograms and continuous, per-frame audio representations for the analysis of animal sounds, also serving as input to machine learning models. Meanwhile, the International Phonetic Alphabet (IPA) system has provided an interpretable, language-independent method for transcribing human speech sounds. In this paper, we introduce ISPA (Inter-Species Phonetic Alphabet), a precise, concise, and interpretable system designed for transcribing animal sounds into text. We compare acoustics-based and feature-based methods for transcribing and classifying animal sounds, demonstrating their comparable performance with baseline methods utilizing continuous, dense audio representations. By representing animal sounds with text, we effectively treat them as a "foreign language," and we show that established human language ML paradigms and models, such as language models, can be successfully applied to improve performance.
翻译:传统上,生物声学领域主要依赖语谱图和基于逐帧的连续音频表征来分析动物声音,这些表征也作为机器学习模型的输入。与此同时,国际音标系统为人类语音转录提供了一种可解释且独立于语言的方法。本文提出ISPA(跨物种音标系统),这是一种精准、简洁且可解释的系统,专为将动物声音转录为文本而设计。我们比较了基于声学和基于特征的方法在动物声音转录与分类中的表现,证明其性能与采用连续密集音频表征的基线方法相当。通过将动物声音转化为文本表征,我们将其有效视为"外语",并证明已建立的人类语言机器学习范式与模型(如语言模型)可成功应用于提升性能。