It is very important to access a rich music dataset that is useful in a wide variety of applications. Currently, available datasets are mostly focused on storing vocal or instrumental recording data and ignoring the requirement of its visual representation and retrieval. This paper attempts to build an XML-based public dataset, called SANGEET, that stores comprehensive information of Hindustani Sangeet (North Indian Classical Music) compositions written by famous musicologist Pt. Vishnu Narayan Bhatkhande. SANGEET preserves all the required information of any given composition including metadata, structural, notational, rhythmic, and melodic information in a standardized way for easy and efficient storage and extraction of musical information. The dataset is intended to provide the ground truth information for music information research tasks, thereby supporting several data-driven analysis from a machine learning perspective. We present the usefulness of the dataset by demonstrating its application on music information retrieval using XQuery, visualization through Omenad rendering system. Finally, we propose approaches to transform the dataset for performing statistical and machine learning tasks for a better understanding of Hindustani Sangeet. The dataset can be found at https://github.com/cmisra/Sangeet.
翻译:访问丰富的音乐数据集对于多种应用场景至关重要。当前可用数据集大多侧重于存储声乐或器乐录音数据,而忽略了其可视化表示与检索的需求。本文尝试构建一个名为SANGEET的基于XML的公共数据集,用于存储著名音乐学家Pt. Vishnu Narayan Bhatkhande创作的印度斯坦音乐(北印度古典音乐)作品的完整信息。SANGEET以标准化方式保存任何给定作品的所有必要信息,包括元数据、结构、记谱、节奏和旋律信息,以便高效便捷地存储和提取音乐信息。该数据集旨在为音乐信息研究任务提供基准真相信息,从而支持基于机器学习视角的多种数据驱动分析。我们通过展示该数据集在基于XQuery的音乐信息检索及基于Omenad渲染系统的可视化应用,验证了其实用性。最后,我们提出了对数据集进行转换的方法,以执行统计和机器学习任务,从而更深入地理解印度斯坦音乐。该数据集可访问 https://github.com/cmisra/Sangeet 获取。