We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
翻译:我们提出了一种对现有降维(DR)目标的灵活改编,能够同时缩减样本和特征维度。输入样本与嵌入样本之间的对应关系通过半松弛Gromov-Wasserstein最优传输(OT)问题求解。当嵌入样本大小与输入样本匹配时,我们的模型可恢复经典的流行降维模型;当嵌入维度不加约束时,OT方案能提供具有竞争力的硬聚类结果。我们强调了融合降维与聚类的中间阶段对于真实数据总结的重要性,并将该方法应用于图像数据集的可视化。