Spoken Language Understanding (SLU) is a core component of conversational systems, enabling machines to interpret user utterances. Despite its importance, developing effective SLU systems remains challenging due to the scarcity of labeled training data and the computational burden of deploying Large Language Models (LLMs) in real-world applications. To further alleviate these issues, we propose an Adaptive Feature Distillation framework that transfers rich semantic representations from a General Text Embeddings (GTE)-based teacher model to a lightweight student model. Our method introduces a dynamic adapter equipped with a Residual Projection Neural Network (RPNN) to align heterogeneous feature spaces, and a Dynamic Distillation Coefficient (DDC) that adaptively modulates the distillation strength based on real-time feedback from intent and slot prediction performance. Experiments on the Chinese profile-based ProSLU benchmark demonstrate that AFD-SLU achieves state-of-the-art results, with 95.67% intent accuracy, 92.02% slot F1 score, and 85.50% overall accuracy.
翻译:口语理解是对话系统的核心组件,旨在使机器能够解析用户话语。尽管其重要性不言而喻,但由于标注训练数据的稀缺性以及在现实应用中部署大型语言模型所带来的计算负担,开发高效的口语理解系统仍然面临挑战。为了进一步缓解这些问题,我们提出了一种自适应特征蒸馏框架,该框架将基于通用文本嵌入模型的教师模型中丰富的语义表征迁移至一个轻量级的学生模型。我们的方法引入了一个配备残差投影神经网络的动态适配器,用于对齐异构特征空间;同时提出了一种动态蒸馏系数,该系数能够根据意图识别与槽位填充性能的实时反馈自适应地调节蒸馏强度。在基于中文用户画像的ProSLU基准测试上的实验表明,AFD-SLU取得了最先进的性能,其意图识别准确率达到95.67%,槽位填充F1分数达到92.02%,整体准确率为85.50%。