Dimensionality reduction is used as an important tool for unraveling the complexities of high-dimensional datasets in many fields of science, such as cell biology, chemical informatics, and physics. Visualizations of the dimensionally reduced data enable scientists to delve into the intrinsic structures of their datasets and align them with established hypotheses. Visualization researchers have thus proposed many dimensionality reduction methods and interactive systems designed to uncover latent structures. At the same time, different scientific domains have formulated guidelines or common workflows for using dimensionality reduction techniques and visualizations for their respective fields. In this work, we present a critical analysis of the usage of dimensionality reduction in scientific domains outside of computer science. First, we conduct a bibliometric analysis of 21,249 academic publications that use dimensionality reduction to observe differences in the frequency of techniques across fields. Next, we conduct a survey of a 71-paper sample from four fields: biology, chemistry, physics, and business. Through this survey, we uncover common workflows, processes, and usage patterns, including the mixed use of confirmatory data analysis to validate a dataset and projection method and exploratory data analysis to then generate more hypotheses. We also find that misinterpretations and inappropriate usage is common, particularly in the visual interpretation of the resulting dimensionally reduced view. Lastly, we compare our observations with recent works in the visualization community in order to match work within our community to potential areas of impact outside our community.
翻译:降维技术作为解析细胞生物学、化学信息学及物理学等诸多科学领域中高维数据集复杂性的重要工具,其应用日益广泛。通过降维数据的可视化呈现,科研人员得以深入探究数据集的内在结构,并将其与既有假设进行关联验证。为此,可视化研究领域已提出多种旨在揭示潜在结构的降维方法与交互系统。与此同时,不同科学领域也针对其学科特点制定了使用降维技术与可视化的指导原则或标准工作流程。本研究对计算机科学之外的科学领域中降维技术的应用展开批判性分析。首先,通过对21,249篇采用降维技术的学术文献进行文献计量分析,我们观察到不同学科领域在技术使用频率上的显著差异。继而,我们从生物学、化学、物理学及商业四个领域抽样选取71篇文献展开深度调研。通过该调研,我们揭示了跨领域的共性工作流程、分析过程与应用模式,包括采用验证性数据分析以确认数据集与投影方法的有效性,继而通过探索性数据分析生成新假设的混合应用模式。研究同时发现,对降维结果的误读与不当使用现象普遍存在,尤其在降维视图的视觉解读方面。最后,我们将观察结果与可视化领域的最新研究进行对比,以期将本领域的研究工作与外部学科潜在的影响领域进行对接。