This paper deals with developing techniques for the reconstruction of high-dimensional datasets given each bivariate projection, as would be found in a matrix scatterplot. A graph-based solution is introduced, involving clique-finding, providing a set of possible rows that might make up the original dataset. Complications are discussed, including cases where phantom cliques are found, as well as cases where an exact solution is impossible. Additional methods are shown, with some dealing with fully deducing rows and others dealing with having to creatively produce methods that find some possibilities to be more likely than others. Results show that these methods are highly successful in recreating a significant portion of the original dataset in many cases - for randomly generated and real-world datasets - with the factors leading to a greater rate of failure being lower dimension, higher n, and lower interval.
翻译:本文旨在开发从双变量投影(如矩阵散点图中所见)重建高维数据集的技术。我们提出一种基于图的解决方案,涉及团检测,能够提供可能构成原始数据集的行集合。文中探讨了若干复杂情况,包括发现伪团以及无法获得精确解的情形,并展示了其他方法:部分方法致力于完全推断各行,另一些则需创造性地生成方法以评估某些可能性更优于其他选择。结果表明,对于随机生成数据集和真实世界数据集,这些方法在多数情况下能成功重建原始数据的显著部分——而失败率较高的因素包括维度较低、样本量较大以及区间较小。