Qualitative thematic exploration of data by hand does not scale and researchers create and update a personalized point of view as they explore data. As a result, machine learning (ML) approaches that might help with exploration are challenging to apply. We developed Teleoscope, a web-based system that supports interactive exploration of large corpora (100K-1M) of short documents (1-3 paragraphs). Teleoscope provides visual programming workflows that have semantic and computational meaning; helping researchers to retrace, share, and recompute their sense-making process. Attempting to create qualitative "themes" rather than "topics," our NLP approach tunes an ML model to "think like you" without significant retraining. Here, we present our two-year design process and validation of Teleoscope, including a multi-week study with qualitative researchers (N = 5), a six-month field deployment with a qualitative research group, and an on-going public release.
翻译:摘要:人工定性主题探索无法规模化,研究者在探索数据时会创建并更新个性化的视角。因此,可能有助于探索的机器学习(ML)方法在应用时面临挑战。我们开发了Teleoscope,一个支持对大型语料库(10万至100万篇短文档,每篇1-3段)进行交互式探索的网页系统。Teleoscope提供具有语义和计算意义的可视化编程工作流,帮助研究者回溯、共享和重新计算其意义建构过程。我们的自然语言处理方法旨在创建定性"主题"而非"话题",通过调整ML模型使其"像你一样思考"而无需大量重新训练。本文呈现了Teleoscope的两年设计过程与验证,包括一项持续数周的定性研究者研究(N=5)、一项为期六个月的定性研究小组实地部署,以及正在进行中的公开发布。