LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification

Deep learning-based machine listening is broadening the scope of industrial acoustic analysis, yet its widespread implementation on live shop floors is hindered by the reliance on large, task-specific annotated datasets for every new task. While emerging general-purpose sound foundation models aim to alleviate data dependency, they reveal critical dilemmas in practice. General-purpose sound foundation models are computationally expensive and fail in industrial scenarios characterized by tonal harmonics, broadband noise, and transient fault events, making instant, on-site deployment impractical. These challenges combined mean that a practical, end-to-end system for deploying a sound foundation model on a live shop floor has remained elusive. To address this challenge, this study introduces LISTEN (Lightweight Industrial Sound-representable Transformer for Edge Notification), the first lightweight foundation model specialized for industrial sound. Through Knowledge Distillation (KD) from the large-scale teacher model IMPACT (Industrial Machine Perception via Acoustic Cognitive Transformer), we construct LISTEN optimized for resource-constrained edge environments. By freezing the backbone and training only a shallow head on minimal target-process data, rather than performing full fine-tuning or retraining, LISTEN achieves nearly identical performance to IMPACT across diverse manufacturing processes. This study further demonstrates a complete system for real-time machine monitoring, encompassing data acquisition with Industrial Internet of Things (IIoT) devices, rapid model adaptation using minimal annotated data, and real-time monitoring on a low-cost edge device. By validating the entire system on a live CNC machine, this work establishes the first feasible end-to-end system for deploying a lightweight industrial sound foundation model in an active industrial environment.

翻译：基于深度学习的机器听觉正在拓展工业声学分析的应用范围，但其在实际车间中的广泛部署仍受限于每项新任务都需要大规模、任务特定的标注数据集。尽管新兴的通用声音基础模型旨在缓解数据依赖问题，但在实践中暴露出关键困境：通用声音基础模型计算成本高昂，且在以调谐谐波、宽带噪声和瞬态故障事件为特征的工业场景中表现不佳，导致即时现场部署难以实现。这些挑战共同表明，一种能在实际车间部署声音基础模型的端到端系统至今仍难以实现。为解决此难题，本研究提出LISTEN（面向边缘通知的轻量级工业可表示声音Transformer），这是首个专为工业声音设计的轻量级基础模型。通过从大规模教师模型IMPACT（基于声学认知Transformer的工业机器感知）进行知识蒸馏，我们构建了针对资源受限边缘环境优化的LISTEN模型。通过冻结骨干网络并仅使用最小目标过程数据训练浅层分类头（而非进行全参数微调或重新训练），LISTEN在多种制造过程中实现了与IMPACT几乎一致的性能。本研究进一步展示了完整的实时机器监控系统，涵盖基于工业物联网（IIoT）设备的数据采集、基于最小标注数据的快速模型自适应，以及低功耗边缘设备上的实时监控。通过在实际数控机床上验证整个系统，本工作建立了首个可在活跃工业环境中部署轻量级工业声音基础模型的可行端到端系统。