Raga identification in Indian Art Music (IAM) remains challenging due to the presence of numerous rarely performed Ragas that are not represented in available training datasets. Traditional classification models struggle in this setting, as they assume a closed set of known categories and therefore fail to recognise or meaningfully group previously unseen Ragas. Recent works have tried categorizing unseen Ragas, but they run into a problem of catastrophic forgetting, where the knowledge of previously seen Ragas is diminished. To address this problem, we adopt a unified learning framework that leverages both labeled and unlabeled audio, enabling the model to discover coherent categories corresponding to the unseen Ragas, while retaining the knowledge of previously known ones. We test our model on benchmark Raga Identification datasets and demonstrate its performance in categorizing previously seen, unseen, and all Raga classes. The proposed approach surpasses the previous NCD-based pipeline even in discovering the unseen Raga categories, offering new insights into representation learning for IAM tasks.
翻译:印度艺术音乐中的拉格识别仍面临挑战,原因在于存在大量罕见演奏的拉格,这些拉格在现有训练数据集中未被充分表征。传统分类模型在此场景下表现不佳,因其假设已知类别构成封闭集合,导致无法识别或有效归类先前未见的拉格。近期研究尝试对未见拉格进行分类,但遭遇灾难性遗忘问题——即模型对已见拉格的知识会逐渐衰减。为解决该问题,我们采用统一学习框架,同时利用标注与未标注音频数据,使模型能够发现对应于未见拉格的连贯类别,同时保持对已知拉格的知识记忆。我们在拉格识别基准数据集上测试模型,并展示其在已见、未见及全部拉格类别上的分类性能。所提方法在发现未见拉格类别方面甚至超越了基于NCD的先前流程,为印度艺术音乐任务的表征学习提供了新见解。