Accessing information in learned representations is critical for annotation, discovery, and data filtering in disciplines where high-dimensional datasets are common. We introduce What We Don't C, a novel approach based on latent flow matching that disentangles latent subspaces by explicitly removing information included in conditional guidance, resulting in meaningful residual representations. This allows factors of variation which have not already been captured in conditioning to become more readily available. We show how guidance in the flow path necessarily represses the information from the guiding, conditioning variables. Our results highlight this approach as a simple yet powerful mechanism for analyzing, controlling, and repurposing latent representations, providing a pathway toward using generative models to explore what we don't capture, consider, or catalog.
翻译:在数据维度普遍较高的学科领域中,访问学习表示中的信息对于标注、发现和数据筛选至关重要。本文提出“我们未捕获的内容”(What We Don't C),这是一种基于隐空间流匹配的新方法,通过显式移除条件引导中包含的信息来实现隐子空间的解缠结,从而生成具有语义的残差表示。这使得尚未被条件变量捕获的变化因素变得更容易被提取。我们证明了流路径中的引导必然会抑制来自引导条件变量的信息。实验结果凸显了该方法作为一种简洁而强大的机制,可用于分析、控制和重构隐空间表示,为利用生成模型探索我们未捕获、未考虑或未编目的信息提供了途径。