Evaluation for continuous piano pedal depth estimation tasks remains incomplete when relying only on conventional frame-level metrics, which overlook musically important features such as direction-change boundaries and pedal curve contours. To provide more interpretable and musically meaningful insights, we propose an evaluation framework that augments standard frame-level metrics with an action-level assessment measuring direction and timing using segments of press/hold/release states and a gesture-level analysis that evaluates contour similarity of each press-release cycle. We apply this framework to compare an audio-only baseline with two variants: one incorporating symbolic information from MIDI, and another trained in a binary-valued setting, all within a unified architecture. Results show that the MIDI-informed model significantly outperforms the others at action and gesture levels, despite modest frame-level gains. These findings demonstrate that our framework captures musically relevant improvements indiscernible by traditional metrics, offering a more practical and effective approach to evaluating pedal depth estimation models.
翻译:在钢琴踏板深度连续估计任务中,仅依赖传统的帧级指标进行评估仍不完整,因为这些指标忽略了方向变化边界和踏板曲线轮廓等音乐上重要的特征。为提供更具可解释性和音乐意义的分析,我们提出一个评估框架:该框架在标准帧级指标基础上,增加了动作级评估(利用按压/保持/释放状态片段测量方向与时机)和手势级分析(评估每个按压-释放周期的轮廓相似性)。我们将此框架应用于比较纯音频基线模型与两个变体:一个融合了MIDI符号信息,另一个在二值化设定下训练,所有模型均采用统一架构。结果表明,尽管帧级指标提升有限,但融合MIDI信息的模型在动作级和手势级评估上显著优于其他模型。这些发现证明,我们的框架能够捕捉传统指标无法识别的音乐相关性改进,为踏板深度估计模型的评估提供了更实用有效的方法。