Digital-humanities work on semantic shift often alternates between handcrafted close readings and opaque embedding machinery. We present a reproducible expert-system style pipeline that quantifies lexical drift and its instability in the Old Bailey Corpus (1674-1913), coupling interpretable trajectories with legally meaningful axes. We bin proceedings by decade with dynamic merging for low-resource slices, train skip-gram embeddings, align spaces through orthogonal Procrustes to a 1900s anchor, and measure both geometric displacement and neighborhood turnover. We add split-half baselines and seed-sensitivity checks to separate within-bin instability from temporal change. Three visual analytics outputs (drift magnitudes, semantic trajectories, and movement along a mercy-versus-retribution axis) expose how justice, crime, poverty, and insanity evolve with penal reforms, transportation debates, and Victorian moral politics. The pipeline is implemented as auditable scripts so results can be reproduced in other historical corpora.
翻译:数字人文领域中对语义变迁的研究常游走于手工精读与不透明的嵌入模型之间。我们提出一种可复现的专家系统型流水线方法,通过将可解释的演变轨迹与具有法律意义的分析维度相结合,量化《老贝利法庭记录语料库》(1674-1913年)中词语漂移现象及其不稳定性。具体而言,我们按十年间隔划分庭审记录,对低资源切片实施动态合并策略,训练skip-gram嵌入模型,通过正交普鲁克分析将向量空间对齐至1900年代的锚点,并同时测量几何位移与邻域更新率。我们引入裂半基线检验与种子敏感性检测,以区分分箱内部波动与时间性变化。三类可视化分析产出(漂移量级、语义轨迹及沿"宽宥-惩戒"维度的运动趋势)揭示了"正义""犯罪""贫困""疯癫"等概念如何随刑罚改革、流放政策辩论及维多利亚时代道德政治变迁而演进。该流水线以可审计脚本实现,确保结果可在其他历史语料库中复现。