Recent years have witnessed a steep increase in linguistic databases capturing syntactic variation. We survey and describe 21 publicly available morpho-syntactic databases, focusing on such properties as data structure, user interface, documentation, formats, and overall user friendliness. We demonstrate that all the surveyed databases can be fruitfully categorized along two dimensions: units of description and the design principle. Units of description refer to the type of the data the database represents (languages, constructions, or expressions). The design principles capture the internal logic of the database. We identify three primary design principles, which vary in their descriptive power, granularity, and complexity: monocategorization, multicategorization, and structural decomposition. We describe how these design principles are implemented in concrete databases and discuss their advantages and limitations. Finally, we outline essential desiderata for future modern databases in linguistics.
翻译:近年来,记录句法变异的语言学数据库数量急剧增长。本文综述了21个公开可用的形态句法数据库,重点关注数据结构、用户界面、文档说明、数据格式及整体用户友好性等特征。研究表明,所有被调查的数据库均可依据描述单位与设计原则两个维度进行有效分类:描述单位指数据库所表征的数据类型(语言、构式或表达式),设计原则则体现数据库的内在逻辑。我们识别出三种主要设计原则——单分类、多分类与结构分解,它们在描述能力、粒度与复杂度方面存在差异。本文具体阐述了这些设计原则在数据库中的实现方式,并讨论了其优势与局限。最后,我们概述了未来现代语言学数据库应满足的基本需求。