A bounded rationality account of dependency length minimization in Hindi

The principle of DEPENDENCY LENGTH MINIMIZATION, which seeks to keep syntactically related words close in a sentence, is thought to universally shape the structure of human languages for effective communication. However, the extent to which dependency length minimization is applied in human language systems is not yet fully understood. Preverbally, the placement of long-before-short constituents and postverbally, short-before-long constituents are known to minimize overall dependency length of a sentence. In this study, we test the hypothesis that placing only the shortest preverbal constituent next to the main-verb explains word order preferences in Hindi (a SOV language) as opposed to the global minimization of dependency length. We characterize this approach as a least-effort strategy because it is a cost-effective way to shorten all dependencies between the verb and its preverbal dependencies. As such, this approach is consistent with the bounded-rationality perspective according to which decision making is governed by "fast but frugal" heuristics rather than by a search for optimal solutions. Consistent with this idea, our results indicate that actual corpus sentences in the Hindi-Urdu Treebank corpus are better explained by the least effort strategy than by global minimization of dependency lengths. Additionally, for the task of distinguishing corpus sentences from counterfactual variants, we find that the dependency length and constituent length of the constituent closest to the main verb are much better predictors of whether a sentence appeared in the corpus than total dependency length. Overall, our findings suggest that cognitive resource constraints play a crucial role in shaping natural languages.

翻译：依赖长度最小化原则旨在使句法相关的词语在句子中保持紧密排列，该原则被认为普遍塑造了人类语言结构以实现有效沟通。然而，依赖长度最小化在人类语言系统中的实际应用程度尚不完全明确。在动词前，长成分优先于短成分的排列，而在动词后，短成分优先于长成分的排列，已知能最小化句子的整体依赖长度。本研究检验了一个假设：在印地语（一种主语-宾语-动词语言）中，仅将最短的动词前成分置于主动词旁即可解释词序偏好，而非通过全局最小化依赖长度。我们将这种方法描述为“最小努力策略”，因为它是一种降低动词与其动词前依赖之间所有依赖长度的经济有效方式。因此，这一方法与有限理性视角一致，该视角认为决策受“快速但节俭”的启发式规则支配，而非追求最优解。与此观点相符，我们的结果表明，印地语-乌尔都语树库中的实际语料句子比全局依赖长度最小化更能被最小努力策略所解释。此外，在区分语料句子与反事实变体的任务中，我们发现主动词最近成分的依赖长度和成分长度比总依赖长度更能预测句子是否出现在语料库中。总体而言，我们的发现表明认知资源约束在塑造自然语言中起关键作用。