Midrash collections are complex rabbinic works that consist of text in multiple languages, which evolved through long processes of unstable oral and written transmission. Determining the origin of a given passage in such a compilation is not always straightforward and is often a matter of dispute among scholars, yet it is essential for scholars' understanding of the passage and its relationship to other texts in the rabbinic corpus. To help solve this problem, we propose a system for classification of rabbinic literature based on its style, leveraging recent advances in natural language processing for Hebrew texts. Additionally, we demonstrate how this method can be applied to uncover lost material from a specific midrash genre, Tan\d{h}uma-Yelammedenu, that has been preserved in later anthologies.
翻译:《米德拉什》文集是复杂的拉比文献,包含多种语言文本,历经不稳定口述与书面传承的漫长演变过程。在这类汇编中确定某段落的出处并非易事,且常成为学者间的争议焦点,但这对学者理解该段落及其与拉比文献集中其他文本的关系至关重要。为解决此问题,我们提出了一套基于文献风格的拉比文本分类系统,借助希伯来语自然语言处理领域的最新进展。此外,我们展示了如何应用该方法,从后世文集所保存的特定中篇注释体裁——坦胡马-耶拉姆德努——中发掘遗失的材料。