This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities. We showcase how this process can be initially steered by defining analytical lenses through document rewriting prompts and instruction-based embeddings, and further aligned with user intent through tools for refining clusters and mechanisms for fine-tuning the embedding model. The demonstration highlights a typical workflow, illustrating how DH researchers can leverage Perspectives's interactive document map to uncover topics, sentiments, or other relevant categories, thereby gaining insights and preparing their data for subsequent in-depth analysis.
翻译:本文介绍Perspectives,作为话语分析工具套件的一个交互式扩展,旨在赋能数字人文(DH)学者探索和组织大规模非结构化文档集合。Perspectives实现了一个灵活的、以分析维度为中心的文档聚类流程,并具备人机协同优化能力。我们展示了该流程如何通过文档重写提示和基于指令的嵌入来定义分析视角以进行初始引导,并借助聚类优化工具和嵌入模型微调机制进一步与用户意图对齐。演示部分重点展示了一个典型工作流,阐明DH研究者如何利用Perspectives的交互式文档地图来发现主题、情感或其他相关类别,从而获得洞见并为后续深度分析准备数据。