Reverse Dictionary (RD) is the task of obtaining the most relevant word or set of words given a textual description or dictionary definition. Effective RD methods have applications in accessibility, translation or writing support systems. Moreover, in NLP research we find RD to be used to benchmark text encoders at various granularities, as it often requires word, definition and sentence embeddings. In this paper, we propose a simple approach to RD that leverages LLMs in combination with embedding models. Despite its simplicity, this approach outperforms supervised baselines in well studied RD datasets, while also showing less over-fitting. We also conduct a number of experiments on different dictionaries and analyze how different styles, registers and target audiences impact the quality of RD systems. We conclude that, on average, untuned embeddings alone fare way below an LLM-only baseline (although they are competitive in highly technical dictionaries), but are crucial for boosting performance in combined methods.
翻译:反向词典任务旨在根据文本描述或词典定义获取最相关的单词或词组集合。有效的反向词典方法在辅助功能、翻译或写作支持系统中具有应用价值。此外,在自然语言处理研究中,反向词典常被用作评估不同粒度文本编码器的基准,因其通常需要词嵌入、定义嵌入和句子嵌入。本文提出一种结合大语言模型与嵌入模型的简单反向词典方法。尽管方法简洁,该方法在广泛研究的反向词典数据集上超越了有监督基线,同时表现出更低的过拟合倾向。我们还针对不同词典开展了一系列实验,分析了不同文体、语域及目标受众如何影响反向词典系统的性能。研究结论表明:平均而言,未经调优的嵌入模型单独使用时性能远低于纯大语言模型基线(尽管在高度专业化的词典中具有竞争力),但在组合方法中对性能提升具有关键作用。