Many European languages possess rich biblical translation histories, yet existing corpora - in prioritizing linguistic breadth - often fail to capture this depth. To address this gap, we introduce a multilingual corpus of 657 New Testament translations, of which 352 are unique, with unprecedented depth in five languages: English (208 unique versions from 396 total), French (41 from 78), Italian (18 from 33), Polish (30 from 48), and Spanish (55 from 102). Aggregated from 12 online biblical libraries and one preexisting corpus, each translation is manually annotated with metadata that maps the text to a standardized identifier for the work, its specific edition, and its year of revision. This canonicalization empowers researchers to define "uniqueness" for their own needs: they can perform micro-level analyses on translation families, such as the KJV lineage, or conduct macro-level studies by deduplicating closely related texts. By providing the first resource designed for such flexible, multilevel analysis, our corpus establishes a new benchmark for the quantitative study of translation history.
翻译:许多欧洲语言拥有丰富的圣经翻译历史,然而现有语料库——因优先考虑语言广度——往往未能捕捉这种深度。为填补这一空白,我们引入了一个包含657个新约圣经翻译的多语言语料库,其中352个为独特译本,在五种语言中具有前所未有的深度:英语(总计396个译本中的208个独特版本)、法语(78个中的41个)、意大利语(33个中的18个)、波兰语(48个中的30个)和西班牙语(102个中的55个)。该语料库汇集自12个在线圣经图书馆和一个已有语料库,每个翻译均经过人工标注,其元数据将文本映射到作品的标准化标识符、具体版本及修订年份。这种规范化使研究人员能够根据自身需求定义“独特性”:他们可以对翻译家族(如KJV谱系)进行微观层面的分析,或通过去重密切相关的文本进行宏观层面的研究。通过提供首个专为此类灵活、多层次分析设计的资源,我们的语料库为翻译历史的定量研究设立了新的基准。