Crosswords are a form of word puzzle that require a solver to demonstrate a high degree of proficiency in natural language understanding, wordplay, reasoning, and world knowledge, along with adherence to character and length constraints. In this paper we tackle the challenge of solving crosswords with Large Language Models (LLMs). We demonstrate that the current generation of state-of-the art (SoTA) language models show significant competence at deciphering cryptic crossword clues, and outperform previously reported SoTA results by a factor of 2-3 in relevant benchmarks. We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with LLMs for the very first time, achieving an accuracy of 93\% on New York Times crossword puzzles. Contrary to previous work in this area which concluded that LLMs lag human expert performance significantly, our research suggests this gap is a lot narrower.
翻译:填字游戏是一种文字谜题形式,要求解题者在自然语言理解、文字游戏、推理和世界知识方面展现出高水平的能力,同时需遵守字符和长度限制。在本文中,我们利用大型语言模型(LLMs)应对解决填字游戏的挑战。我们证明,当前最先进(SoTA)的语言模型在破解隐晦的填字游戏线索方面表现出显著能力,并在相关基准测试中的表现较此前报道的最先进结果提升了2-3倍。我们还开发了一种基于此性能的搜索算法,首次利用LLMs解决完整填字网格问题,在《纽约时报》填字游戏中实现了93%的准确率。与先前认为LLMs显著落后于人类专家水平的研究结论不同,我们的研究表明这一差距实际上要小得多。