The advancement of Machine learning (ML), Large Audio Language Models (LALMs), and autonomous AI agents in Music Information Retrieval (MIR) necessitates a shift from static tagging to rich, human-aligned representation learning. However, the scarcity of open-source infrastructure capable of capturing the subjective nuances of audio annotation remains a critical bottleneck. This paper introduces \textbf{LabelBuddy}, an open-source collaborative auto-tagging audio annotation tool designed to bridge the gap between human intent and machine understanding. Unlike static tools, it decouples the interface from inference via containerized backends, allowing users to plug in custom models for AI-assisted pre-annotation. We describe the system architecture, which supports multi-user consensus, containerized model isolation, and a roadmap for extending agents and LALMs. Code available at https://github.com/GiannisProkopiou/gsoc2022-Label-buddy.
翻译:随着机器学习(ML)、大型音频语言模型(LALMs)及自主AI智能体在音乐信息检索(MIR)领域的进步,亟需从静态标记转向丰富且与人类对齐的表征学习。然而,能够捕捉音频标注主观细微差异的开源基础设施的匮乏,仍然是一个关键瓶颈。本文介绍 \textbf{LabelBuddy},一种旨在弥合人类意图与机器理解之间鸿沟的开源协作式自动标记音频标注工具。与静态工具不同,它通过容器化后端将界面与推理过程解耦,允许用户接入自定义模型以实现AI辅助的预标注。我们描述了该系统架构,该架构支持多用户共识、容器化模型隔离,以及扩展智能体和LALMs的路线图。代码发布于 https://github.com/GiannisProkopiou/gsoc2022-Label-buddy。