Instrument playing technique (IPT) is a key element of musical presentation. However, most of the existing works for IPT detection only concern monophonic music signals, yet little has been done to detect IPTs in polyphonic instrumental solo pieces with overlapping IPTs or mixed IPTs. In this paper, we formulate it as a frame-level multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument. We create a new dataset, Guzheng\_Tech99, containing Guzheng recordings and onset, offset, pitch, IPT annotations of each note. Because different IPTs vary a lot in their lengths, we propose a new method to solve this problem using multi-scale network and self-attention. The multi-scale network extracts features from different scales, and the self-attention mechanism applied to the feature maps at the coarsest scale further enhances the long-range feature extraction. Our approach outperforms existing works by a large margin, indicating its effectiveness in IPT detection.
翻译:乐器演奏技法(IPT)是音乐表现的关键要素。然而,现有IPT检测研究大多仅针对单音音乐信号,在存在重叠或混合IPT的多声部器乐独奏作品中,相关检测工作鲜有开展。本文将IPT检测问题建模为帧级多标签分类任务,并将其应用于中国拨弦乐器——古筝。我们构建了新数据集Guzheng\_Tech99,包含古筝录音以及每个音符的起始时间、结束时间、音高和IPT标注。针对不同IPT时长差异显著的特点,我们提出了一种融合多尺度网络与自注意力机制的新方法:多尺度网络从不同尺度提取特征,而应用于最粗尺度特征图的自注意力机制进一步增强了长距离特征提取能力。该方法在IPT检测任务上以显著优势超越现有研究成果,充分验证了其有效性。