Beat and downbeat tracking models have improved significantly in recent years with the introduction of deep learning methods. However, despite these improvements, several challenges remain. Particularly, the adaptation of available models to underrepresented music traditions in MIR is usually synonymous with collecting and annotating large amounts of data, which is impractical and time-consuming. Transfer learning, data augmentation, and fine-tuning techniques have been used quite successfully in related tasks and are known to alleviate this bottleneck. Furthermore, when studying these music traditions, models are not required to generalize to multiple mainstream music genres but to perform well in more constrained, homogeneous conditions. In this work, we investigate simple yet effective strategies to adapt beat and downbeat tracking models to two different Latin American music traditions and analyze the feasibility of these adaptations in real-world applications concerning the data and computational requirements. Contrary to common belief, our findings show it is possible to achieve good performance by spending just a few minutes annotating a portion of the data and training a model in a standard CPU machine, with the precise amount of resources needed depending on the task and the complexity of the dataset.
翻译:近年来,随着深度学习方法的引入,节拍与强拍追踪模型取得了显著进展。然而,即便已有改进,仍面临诸多挑战。特别是在音乐信息检索(MIR)领域,将现有模型适配至代表性不足的音乐传统时,通常需要收集并标注大量数据,这一过程既不切实际又耗时费力。迁移学习、数据增强与微调技术已在相关任务中成功应用,并被证实能有效缓解这一瓶颈。此外,在研究这些音乐传统时,模型无需泛化至多种主流音乐类型,只需在更受限、同质的条件下表现良好即可。本研究针对两种不同拉丁美洲音乐传统,探索了简单有效的节拍与强拍追踪模型适配策略,并从数据与计算需求角度分析了这些适配在实际应用中的可行性。与普遍认知相反,我们的研究结果表明:仅需花费数分钟标注部分数据,并在标准CPU机器上完成模型训练,即可实现良好性能,所需具体资源量取决于任务复杂度与数据集特征。