Identifying parallel passages in biblical Hebrew (BH) is central to biblical scholarship for understanding intertextual relationships. Traditional methods rely on manual comparison, a labor-intensive process prone to human error. This study evaluates the potential of pre-trained transformer-based language models, including E5, AlephBERT, MPNet, and LaBSE, for detecting textual parallels in the Hebrew Bible. Focusing on known parallels between Samuel/Kings and Chronicles, I assessed each model's capability to generate word embeddings distinguishing parallel from non-parallel passages. Using cosine similarity and Wasserstein Distance measures, I found that E5 and AlephBERT show promise; E5 excels in parallel detection, while AlephBERT demonstrates stronger non-parallel differentiation. These findings indicate that pre-trained models can enhance the efficiency and accuracy of detecting intertextual parallels in ancient texts, suggesting broader applications for ancient language studies.
翻译:识别希伯来圣经中的平行经文是理解互文关系的核心研究任务。传统方法依赖人工比对,这一过程不仅耗时费力,且易受人为误差影响。本研究评估了包括E5、AlephBERT、MPNet和LaBSE在内的预训练Transformer语言模型在希伯来圣经文本平行性检测中的潜力。聚焦于《撒母耳记》/《列王纪》与《历代志》之间已知的平行段落,我评估了各模型生成词嵌入以区分平行与非平行经文的能力。通过余弦相似度和Wasserstein距离度量分析,发现E5与AlephBERT均表现出潜力:E5在平行检测方面表现优异,而AlephBERT在非平行文本区分上更具优势。这些结果表明,预训练模型能够提升古代文本互文平行性检测的效率和准确性,为古代语言研究提供了更广阔的应用前景。