In this report, we describe the technical details of our submission to the EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023, by Team "AcieLee" (username: Yuqi\_Li). The task is to classify the audio caused by interactions between objects, or from events of the camera wearer. We conducted exhaustive experiments and found learning rate step decay, backbone frozen, label smoothing and focal loss contribute most to the performance improvement. After training, we combined multiple models from different stages and integrated them into a single model by assigning fusion weights. This proposed method allowed us to achieve 3rd place in the CVPR 2023 workshop of EPIC-SOUNDS Audio-Based Interaction Recognition Challenge.
翻译:本报告描述了“AcieLee”团队(用户名:Yuqi\_Li)在EPIC-SOUNDS音频交互识别挑战赛2023中的技术细节。任务是对物体间交互或相机佩戴者事件引起的音频进行分类。我们进行了详尽的实验,发现学习率阶梯衰减、骨干网络冻结、标签平滑和焦点损失对性能提升贡献最大。训练后,我们组合了来自不同阶段的多个模型,并通过分配融合权重将其整合为单一模型。该方法使我们能够在CVPR 2023 EPIC-SOUNDS音频交互识别挑战赛研讨会上获得第三名。