Artificial intelligence approaches are being adapted to many research areas, including digital humanities. We built a methodology for large-scale analyses in folkloristics. Using machine learning and natural language processing, we automatically detected motifs in a large collection of Cinderella variants and analysed their similarities and differences with clustering and dimensionality reduction. The results show that large language models detect complex interactions in tales, enabling computational analysis of extensive text collections and facilitating cross-lingual comparisons.
翻译:人工智能方法正被应用于包括数字人文在内的诸多研究领域。我们构建了一套适用于民间故事学的大规模分析方法论。借助机器学习与自然语言处理技术,我们自动检测了大量灰姑娘故事变体中的母题,并通过聚类与降维方法分析其异同。研究结果表明,大型语言模型能够识别故事中复杂的交互关系,实现对海量文本集合的计算分析,并为跨语言比较研究提供技术支持。