Identifying parallel passages in biblical Hebrew is foundational in biblical scholarship for uncovering intertextual relationships. Traditional methods rely on manual comparison, which is labor-intensive and prone to human error. This study evaluates the potential of pre-trained transformer-based language models, including E5, AlephBERT, MPNet, and LaBSE, for detecting textual parallels in the Hebrew Bible. Focusing on known parallels between the books of Samuel/Kings and Chronicles, I assessed each model's capability to generate word embeddings that delineate parallel from non-parallel passages. Utilizing cosine similarity and Wasserstein Distance measures, I found that E5 and AlephBERT show significant promise, with E5 excelling in parallel detection and AlephBERT demonstrating stronger non-parallel differentiation. These findings indicate that pre-trained models can enhance the efficiency and accuracy of detecting intertextual parallels in ancient texts, suggesting broader applications for ancient language studies.
翻译:识别希伯来圣经中的平行段落是揭示互文关系的圣经研究基础工作。传统方法依赖人工比对,不仅耗时费力且易受人为误差影响。本研究评估了包括E5、AlephBERT、MPNet和LaBSE在内的预训练Transformer语言模型在检测希伯来圣经文本平行关系方面的潜力。聚焦于《撒母耳记》/《列王纪》与《历代志》之间已知的平行段落,我评估了各模型生成词嵌入以区分平行与非平行段落的能力。通过余弦相似度和Wasserstein距离度量分析,发现E5和AlephBERT展现出显著潜力:E5在平行检测方面表现优异,而AlephBERT在非平行区分上更具优势。这些结果表明预训练模型能够提升古代文本互文平行关系检测的效率和准确性,为古代语言研究提供了更广阔的应用前景。