Rhythm of the Deep: A Computational-Linguistic Test of Duality of Patterning in Sperm Whale Codas

Human language has often been described as combining structure at two levels: lower-level units combine into larger units, which then combine into larger sequences. We test for this design feature, duality of patterning, in sperm whale codas using 1,483 codas from the Dominica Sperm Whale Project. Because acoustic similarity can imitate symbolic structure, we treat the problem as computational-linguistic structure discovery from continuous audio rather than as a direct claim about language or meaning. We use a consensus of frozen audio encoders, held-out structural tests, per-statistic nulls, and acoustic-null recoverability gates. The evidence supports a narrow two-tier architecture. At the lower tier, clicks compose into codas not by a stable ordered rule, but by which clicks are present together with their inter-click rhythm. At the upper tier, coda tokens show bout-level sequential dependence, with an NSB second-order transfer-entropy lift of 0.132 bits (p = 0.002). Under tempo scaling, encoder-derived click identity is strongly rate-bound, while coda identity remains substantially more stable, yielding a measurable abstraction gradient across the click-to-coda step. Rhythm-only baselines recover substantial lower-tier structure but fail to reproduce the upper-tier sequential-dependence signal. We do not claim language, semantics, perception, or human-like phonemes. Instead, we report representation-level evidence for a duality-of-patterning-like architecture whose lower tier is rhythmic rather than segmental, and provide a portable null-controlled framework for testing combinatorial structure in induced acoustic token systems.

翻译：人类语言常被描述为在两个层级上组合结构：较低层级的单元组合为较大单元，这些较大单元再组合为更长的序列。我们使用多米尼克抹香鲸项目的1,483个叫声，检验抹香鲸叫声是否具备这一设计特征——双重模式。由于声学相似性可模拟符号结构，我们将该问题视为从连续音频中发现计算语言学结构，而非直接关于语言或意义的论断。我们采用共识性冻结音频编码器、留出结构测试、逐统计量零模型和声学零模型可恢复性门控。证据支持一个狭窄的双层架构：在较低层级，咔嗒声并非通过稳定的有序规则组合成叫声，而是通过同时出现的咔嗒声及其间节奏构成；在较高层级，叫声令牌显示出回合级序列依赖性，其NSB二阶转移熵提升为0.132比特（p=0.002）。在节奏缩放下，编码器推导的咔嗒身份强烈受速率约束，而叫声身份保持显著更稳定，在从咔嗒到叫声的步骤中产生可测量的抽象梯度。仅基于节奏的基线可恢复大量较低层级结构，但无法复现上层序列依赖性信号。我们不主张语言、语义、知觉或人类式音位。相反，我们报告了表征层级上存在类似双重模式的架构证据，其较低层级为节奏性而非分段性，并为测试诱导声学令牌系统中的组合结构提供了可移植的零模型控制框架。