We present vega-mir, an open-source Python library that bundles nine information-theoretic and statistical metrics for the analysis of symbolic music corpora behind a small, tested, citable API, and demonstrates two of them at corpus scale in case studies not addressed by the upstream Cygnus paper. Of the nine metrics, three (Shannon entropy, Kullback-Leibler divergence, Zipfian fits) were deployed in the companion Cygnus arXiv preprint; two (network analysis on chord-transition graphs and spectral analysis of rubato curves) are deployed in full case studies here; the four remaining (multi-dimensional Gini, chi-squared stationarity, Higuchi fractal dimension, interval distribution) are validated against analytic anchors and exercised as sanity checks on a bundled 8-composer dataset. The two case studies yield two main observations. First, on the fourteen MAESTRO composers with N >= 10 pieces, the PageRank value of the gravity-centre node correlates with the marginal Kullback-Leibler distance at rho = 0.61 (Spearman, composer-level jackknife N = 14); the categorical gravity-centre identity takes five distinct values across the corpus but is not itself correlated with marginal KL (rho = 0.13, p = 0.21). Second, on the 247-piece Bach multi-master corpus (Schiff, Gould, Richter), Gould holds the highest periodicity ratio of the three performers, not the lowest, inverting the cliché that low scalar rubato reads as "metronomic": Gould's rubato is small in amplitude but structured in time, with a median dominant period of 66 beats against Schiff's 102 and Richter's 104.
翻译:我们提出vega-mir,这是一个开源Python库,将九种面向符号音乐语料库分析的信息论与统计指标整合到小型、可测试、可引用的API中,并通过两个未被上游Cygnus论文涵盖的语料库规模案例研究展示其中两种方法。在九种指标中,三种(香农熵、Kullback-Leibler散度、齐普夫拟合)已部署于配套的Cygnus arXiv预印本;两种(和弦转换图网络分析与弹性速度曲线频谱分析)在此通过完整案例研究展开;其余四种(多维基尼系数、卡方平稳性检验、Higuchi分形维数、区间分布)基于解析锚点进行验证,并在内置的8位作曲家数据集上作为健全性检查。两项案例研究得出两个主要发现。第一,在MAESTRO数据集中N≥10首作品的14位作曲家中,引力中心节点的PageRank值与边际Kullback-Leibler距离呈秩相关(Spearman相关系数rho=0.61,基于作曲家层面的刀切法N=14);类别化引力中心标识在语料库中呈现五种不同取值,但其本身与边际KL不相关(rho=0.13,p=0.21)。第二,基于包含247首作品的巴赫多演绎者语料库(Schiff、Gould、Richter),Gould在三者中具有最高的周期比而非最低,颠覆了“低标量弹性速度近似节拍器”的惯常认知:Gould的弹性速度幅度虽小但具有时间结构化特征,其弹性速度曲线中值主导周期为66拍,而Schiff为102拍、Richter为104拍。