We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.
翻译:我们提出了一种具有双重目标的节拍与强拍追踪系统:适用于多样化音乐类型的普适性,以及高精度。我们通过在多数据集上进行训练来实现普适性——这些数据集包括独奏乐器录音、包含节拍变化的曲目,以及具有高速度变化的古典音乐——并通过移除常用的动态贝叶斯网络后处理来达成,该后处理会引入对节拍和速度的约束。为实现高精度,除其他改进外,我们开发了一种能容忍标注微小时间偏移的损失函数,以及一种在频率或时间维度上交替使用卷积与Transformer的架构。尽管未使用DBN,我们的系统在F1分数上超越了当前最先进水平。然而,它仍可能失败,尤其是在处理困难且代表性不足的音乐流派时,并且在连续性指标上表现较差,因此我们公开了我们的模型、代码和预处理数据集,并邀请其他人来挑战这一系统。