In recent years, the task of Automatic Music Transcription (AMT), whereby various attributes of music notes are estimated from audio, has received increasing attention. At the same time, the related task of Multi-Pitch Estimation (MPE) remains a challenging but necessary component of almost all AMT approaches, even if only implicitly. In the context of AMT, pitch information is typically quantized to the nominal pitches of the Western music scale. Even in more general contexts, MPE systems typically produce pitch predictions with some degree of quantization. In certain applications of AMT, such as Guitar Tablature Transcription (GTT), it is more meaningful to estimate continuous-valued pitch contours. Guitar tablature has the capacity to represent various playing techniques, some of which involve pitch modulation. Contemporary approaches to AMT do not adequately address pitch modulation, and offer only less quantization at the expense of more model complexity. In this paper, we present a GTT formulation that estimates continuous-valued pitch contours, grouping them according to their string and fret of origin. We demonstrate that for this task, the proposed method significantly improves the resolution of MPE and simultaneously yields tablature estimation results competitive with baseline models.
翻译:近年来,自动音乐转录(AMT)任务——即从音频中估计音符的多种属性——受到越来越多的关注。与此同时,多音高估计(MPE)作为几乎所有AMT方法中不可或缺的组成部分(即使仅隐含存在),仍是一项具有挑战性的任务。在AMT语境下,音高信息通常被量化至西方音阶的标准音高。即便在更一般的情境中,MPE系统产生的音高预测也往往存在一定程度的量化。对于吉他指法转录(GTT)等特定AMT应用而言,估计连续值的音高轮廓更具实际意义。吉他指法谱能表示多种演奏技法,其中部分技法涉及音高调制。现有AMT方法无法充分处理音高调制问题,且往往通过增加模型复杂度来降低量化程度。本文提出一种GTT公式,可估计连续值音高轮廓,并按其所属的琴弦和品柱进行分组。实验表明,该方法能显著提升MPE的分辨率,同时获得与基准模型相当甚至更优的指法谱估计结果。