The frequency of the preferred order for a noun phrase formed by demonstrative, numeral, adjective and noun has received significant attention over the last two decades. We investigate the actual distribution of the 24 possible orders. There is no consensus on whether it is well-fitted by an exponential or a power law distribution. We find that an exponential distribution is a much better model. This finding and other circumstances where an exponential-like distribution is found challenge the view that power-law distributions, e.g., Zipf's law for word frequencies, are inevitable. We also investigate which of two exponential distributions gives a better fit: an exponential model where the 24 orders have non-zero probability (a geometric distribution truncated at rank 24) or an exponential model where the number of orders that can have non-zero probability is variable (a right-truncated geometric distribution). When consistency and generalizability are prioritized, we find higher support for the exponential model where all 24 orders have non-zero probability. These findings strongly suggest that there is no hard constraint on word order variation and then unattested orders merely result from undersampling, consistently with Cysouw's view.
翻译:由指示词、数词、形容词和名词构成的名词短语中,其优势语序的频率在过去二十年中受到广泛关注。我们研究了24种可能语序的实际分布。关于其是否更符合指数分布或幂律分布,目前尚未达成共识。我们发现指数分布是更优的模型。这一发现以及其他存在类指数分布的情形,对"幂律分布(如词频的齐夫定律)具有必然性"的观点提出了挑战。我们还比较了两种指数分布模型的拟合优度:一种是24种语序均具有非零概率的指数模型(截断于第24位的几何分布),另一种是允许具有非零概率的语序数量可变的指数模型(右截断几何分布)。当优先考虑一致性与泛化能力时,我们发现所有24种语序均具有非零概率的指数模型获得更高支持。这些发现有力地表明,语序变异并不存在硬性约束,未观测到的语序仅源于采样不足,这与Cysouw的观点一致。