Explainable Deep Learning Analysis for Raga Identification in Indian Art Music

The task of Raga Identification is a very popular research problem in Music Information Retrieval. Few studies that have explored this task employed various approaches, such as signal processing, Machine Learning (ML) methods, and more recently Deep Learning (DL) based methods. However, a key question remains unanswered in all of these works: do these ML/DL methods learn and interpret Ragas in a manner similar to human experts? Besides, a significant roadblock in this research is the unavailability of ample supply of rich, labeled datasets, which drives these ML/DL based methods. In this paper, we introduce "Prasarbharti Indian Music" version-1 (PIM-v1), a novel dataset comprising of 191 hours of meticulously labeled Hindustani Classical Music (HCM) recordings, which is the largest labeled dataset for HCM recordings to the best of our knowledge. Our approach involves conducting ablation studies to find the benchmark classification model for Automatic Raga Identification (ARI) using PIM-v1 dataset. We achieve a chunk-wise f1-score of 0.89 for a subset of 12 Raga classes. Subsequently, we employ model explainability techniques to evaluate the classifier's predictions, aiming to ascertain whether they align with human understanding of Ragas or are driven by arbitrary patterns. We validate the correctness of model's predictions by comparing the explanations given by two ExAI models with human expert annotations. Following this, we analyze explanations for individual test examples to understand the role of regions highlighted by explanations in correct or incorrect predictions made by the model.

翻译：拉格识别任务是音乐信息检索领域一个非常热门的研究问题。少数探索该任务的研究采用了多种方法，例如信号处理、机器学习方法以及近年来基于深度学习的方法。然而，所有这些工作中仍有一个关键问题尚未得到解答：这些机器学习/深度学习方法学习和解释拉格的方式是否与人类专家相似？此外，这项研究的一个主要障碍是缺乏充足、标注丰富的可用数据集，而这正是驱动这些基于机器学习/深度学习方法的必要条件。本文中，我们介绍了"Prasarbharti印度音乐"版本1（PIM-v1），这是一个新颖的数据集，包含191小时经过精心标注的北印度古典音乐录音，据我们所知，这是目前最大的标注北印度古典音乐录音数据集。我们的方法包括进行消融研究，以找到使用PIM-v1数据集进行自动拉格识别（ARI）的基准分类模型。我们在12个拉格类别的子集上实现了0.89的分块f1分数。随后，我们采用模型可解释性技术来评估分类器的预测，旨在确定这些预测是与人类对拉格的理解相一致，还是由任意模式驱动。我们通过比较两个可解释人工智能模型给出的解释与人类专家标注，验证了模型预测的正确性。之后，我们分析了个别测试样本的解释，以理解解释所突出强调的音频区域在模型做出正确或错误预测中所起的作用。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日