Humour-aware information retrieval poses unique challenges beyond standard semantic retrieval, as systems must account not only for topical relevance but also for humour-specific linguistic phenomena such as wordplay, phonetic ambiguity, and polysemy. In this paper, Team DUTH studies multilingual humour-aware information retrieval using the CLEF 2025 JOKER Task 1 benchmark, which evaluates humour retrieval in English and Portuguese. Our approach combines multilingual XLM-RoBERTa-based dense retrieval with additional system variants, including neural re-ranking, in order to assess the extent to which general-purpose Transformer models can capture humour-specific relevance. The results reveal substantial cross-lingual variation. While the Portuguese runs demonstrate comparatively strong performance across MAP, MRR, and early precision metrics, the English runs perform significantly worse, with relevant humorous documents frequently appearing at lower ranks. These findings highlight the limitations of purely semantic dense representations for humour retrieval, particularly when humour depends on surface-level cues that are not explicitly modelled by multilingual encoders. We further analyse contributing factors to this discrepancy, including dataset characteristics, query-document alignment, and variation in humour mechanisms. Overall, the Team DUTH experiments establish multilingual dense-retrieval and re-ranking baselines and provide insights into the challenges of modelling humour-aware relevance within the JOKER framework.
翻译:幽默感知信息检索面临着超越标准语义检索的独特挑战,系统不仅要考虑主题相关性,还必须处理幽默特有的语言现象,如文字游戏、语音歧义和多义词。本文中,DUTH团队利用CLEF 2025 JOKER第一任务基准数据集(该基准评估英语和葡萄牙语的幽默检索效果)开展多语言幽默感知信息检索研究。我们的方法结合了基于XLM-RoBERTa的多语言稠密检索与包括神经重排序在内的多种系统变体,旨在评估通用Transformer模型捕捉幽默特定相关性的能力。研究结果揭示了显著的跨语言差异。葡萄牙语实验在MAP、MRR及早期精确率指标上表现相对出色,而英语实验性能明显较差,相关幽默文档往往出现在较低的排序位置。这些发现凸显了纯语义稠密表征在幽默检索中的局限性——当幽默依赖于多语言编码器未明确建模的表面线索时尤为突出。我们进一步分析了导致这一差异的潜在因素,包括数据集特征、查询-文档对齐度及幽默机制变化。总体而言,DUTH团队的实验为多语言稠密检索与重排序建立了基准,并为在JOKER框架内建模幽默感知相关性面临的挑战提供了见解。