We introduce a graph-aware autoencoder ensemble framework, with associated formalisms and tooling, designed to facilitate deep learning for scholarship in the humanities. By composing sub-architectures to produce a model isomorphic to a humanistic domain we maintain interpretability while providing function signatures for each sub-architectural choice, allowing both traditional and computational researchers to collaborate without disrupting established practices. We illustrate a practical application of our approach to a historical study of the American post-Atlantic slave trade, and make several specific technical contributions: a novel hybrid graph-convolutional autoencoder mechanism, batching policies for common graph topologies, and masking techniques for particular use-cases. The effectiveness of the framework for broadening participation of diverse domains is demonstrated by a growing suite of two dozen studies, both collaborations with humanists and established tasks from machine learning literature, spanning a variety of fields and data modalities. We make performance comparisons of several different architectural choices and conclude with an ambitious list of imminent next steps for this research.
翻译:我们提出了一种面向人文学科深度学习的图感知自编码器集成框架,并配套了相应的形式化工具与建模体系。通过组合子架构构建与人文学科领域同构的模型,我们既保持了解释性,又为每个子架构选择提供了函数签名,使得传统研究者与计算研究者无需颠覆现有实践即可开展协作。以美国大西洋奴隶贸易史研究为范例,我们展示了该方法的具体应用场景,并贡献了多项技术创新:新型混合图卷积自编码器机制、面向常见图拓扑结构的批处理策略,以及特定场景的掩码技术。通过涵盖多领域与多数据模态的二十余项研究工作(包括人文学者协作项目与机器学习领域经典任务),验证了该框架在拓宽学科参与度方面的有效性。我们比较了多种架构选择的性能表现,并在结语中提出该研究领域亟待突破的关键方向。