This study proposes median consensus embedding (MCE) to address variability in low-dimensional embeddings caused by random initialization in dimensionality reduction techniques such as t-distributed stochastic neighbor embedding. MCE is defined as the geometric median of multiple embeddings. By assuming multiple embeddings as independent and identically distributed random samples and applying large deviation theory, we prove that MCE achieves consistency at an exponential rate. Furthermore, we develop a practical algorithm to implement MCE by constructing a distance function between embeddings based on the Frobenius norm of the pairwise distance matrix of data points. Application to real-world data demonstrates that MCE converges rapidly and significantly reduces instability. These results confirm that MCE effectively mitigates random initialization issues in embedding methods.
翻译:本研究提出中位数共识嵌入(MCE),以解决降维技术(如t分布随机邻域嵌入)中因随机初始化导致的低维嵌入变异性问题。MCE被定义为多个嵌入的几何中位数。通过假设多个嵌入为独立同分布的随机样本并应用大偏差理论,我们证明了MCE能以指数速率达到一致性。此外,我们基于数据点成对距离矩阵的Frobenius范数构建嵌入间的距离函数,开发了一种实现MCE的实用算法。在实际数据上的应用表明,MCE收敛迅速且能显著降低不稳定性。这些结果证实了MCE能有效缓解嵌入方法中的随机初始化问题。