Personalizing Automatic Speech Recognition (ASR) for non-normative speech remains challenging because data collection is labor-intensive and model training is technically complex. To address these limitations, we propose Adapt4Me, a web-based decentralized environment that operationalizes Bayesian active learning to enable end-to-end personalization without expert supervision. The app exposes data selection, adaptation, and validation to lay users through a three-stage human-in-the-loop workflow: (1) rapid profiling via greedy phoneme sampling to capture speaker-specific acoustics; (2) backend personalization using Variational Inference Low-Rank Adaptation (VI-LoRA) to enable fast, incremental updates; and (3) continuous improvement, where users guide model refinement by resolving visualized model uncertainty via low-friction top-k corrections. By making epistemic uncertainty explicit, Adapt4Me reframes data efficiency as an interactive design feature rather than a purely algorithmic concern. We show how this enables users to personalize robust ASR models, transforming them from passive data sources into active authors of their own assistive technology.
翻译:个性化自动语音识别(ASR)在非规范语音场景中仍具挑战性,因为数据采集劳动密集且模型训练技术复杂。为应对这些局限,我们提出Adapt4Me——一个基于网络的去中心化环境,通过部署贝叶斯主动学习实现无需专家监督的端到端个性化。该应用通过三阶段人机协同工作流,向非专业用户开放数据选择、适配与验证功能:(1)通过贪心音素采样实现快速画像,捕获说话者特定声学特征;(2)利用变分推断低秩适配(VI-LoRA)进行后端个性化,支持快速增量更新;(3)持续改进,用户通过低摩擦的Top-K修正解决可视化模型不确定性,引导模型优化。通过显式化认知不确定性,Adapt4Me将数据效率重新定义为交互式设计特征而非纯算法问题。我们展示该方法如何赋能用户个性化稳健ASR模型,使其从被动数据源转变为辅助技术的主动创作者。