This paper explores humor detection through a linguistic lens, prioritizing syntactic, semantic, and contextual features over computational methods in Natural Language Processing. We categorize features into syntactic, semantic, and contextual dimensions, including lexicons, structural statistics, Word2Vec, WordNet, and phonetic style. Our proposed model, Colbert, utilizes BERT embeddings and parallel hidden layers to capture sentence congruity. By combining syntactic, semantic, and contextual features, we train Colbert for humor detection. Feature engineering examines essential syntactic and semantic features alongside BERT embeddings. SHAP interpretations and decision trees identify influential features, revealing that a holistic approach improves humor detection accuracy on unseen data. Integrating linguistic cues from different dimensions enhances the model's ability to understand humor complexity beyond traditional computational methods.
翻译:本文从语言学视角探究幽默检测,优先关注自然语言处理中的句法、语义及上下文特征,而非单纯依赖计算方法。我们将特征划分为句法、语义与上下文三个维度,涵盖词汇资源、结构统计量、Word2Vec、WordNet及语音风格等要素。我们提出的Colbert模型利用BERT嵌入与并行隐藏层来捕捉句子连贯性。通过整合句法、语义与上下文特征,我们训练Colbert模型进行幽默检测。特征工程在考察BERT嵌入的同时,深入分析了关键的句法与语义特征。SHAP可解释性分析与决策树识别出具有影响力的特征,结果表明:采用整体性方法能提升模型在未见数据上的幽默检测准确率。整合多维度语言学线索增强了模型理解幽默复杂性的能力,其效果超越了传统计算方法。