The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

Linguistics olympiad problems (LOPs) are a category of self-sufficient puzzles consisting of a scaled-down corpus representative of certain linguistic phenomena, from which the solver must deduce a primitive set of rules of the language and then translate a new set of elements. The linguistics olympiads (LOs) have become a worldwide phenomenon with 43 different territories taking part in the International Linguistics Olympiad (IOL) 2025. While the typology and solving strategies of LOPs have been analysed, their scientific facet and connections to academic linguistics have yet to be explored. LOPs are directly connected to many linguistic fields, e.g., linguistic typology, linguistic relativity, and linguistics fieldwork. Recently, LOPs have become a research focus as benchmarks for large language models, thus highlighting their usefulness in computational linguistics. Nevertheless, they have not yet been integrated into mainstream linguistics research. This paper attempts to open new directions of including this particular type of puzzle in academic research by offering a structured evaluation of LOPs as linguistic data sources and proposes criteria for their responsible use in academic research. Starting from a set of over 1800 LOPs, this study critically examines the potential of LOPs as a novel corpus for linguistics research by discussing their strengths and limitations as tools, as well as the areas of linguistics into which these problems could fit. This work forms the foundation for a broader initiative aimed at bridging the gap between LOs and academic linguistics, by establishing a robust theoretical framework for LOPs.

翻译：语言学奥林匹克问题（LOPs）是一类自足型谜题，由代表特定语言现象的缩略语料库构成，解题者需从中推导出该语言的基本规则集，然后翻译新的元素组。语言学奥林匹克（LOs）已成为全球性现象，2025年将有43个不同地区参与国际语言学奥林匹克（IOL）。尽管LOPs的类型学和解题策略已得到分析，但其科学层面以及与学术语言学的联系仍有待探究。LOPs与诸多语言学领域直接相关，例如语言类型学、语言相对论和语言田野调查。近年来，LOPs作为大语言模型的基准测试成为研究焦点，凸显了其在计算语言学中的实用价值。然而，它们尚未被纳入主流语言学研究。本文尝试通过系统评估LOPs作为语言学数据源的结构化方法，为将这类特殊谜题纳入学术研究开辟新方向，并提出在学术研究中负责任使用LOPs的准则。本研究基于1800多个LOPs，通过讨论其作为工具的优劣势以及这些谜题可适用的语言学领域，批判性地审视了LOPs作为语言学新语料库的潜力。本工作为一项更广泛的倡议奠定基础，旨在通过为LOPs建立稳健的理论框架，弥合LOs与学术语言学之间的鸿沟。