We present $\texttt{LAMINAR}$, a novel unsupervised machine learning pipeline designed to enhance the representation of structure within data via producing a more-informative distance metric. Analysis methods in the physical sciences often rely on standard metrics to define geometric relationships in data, which may fail to capture the underlying structure of complex data sets. $\texttt{LAMINAR}$ addresses this by using a continuous-normalising-flow and inverse-transform-sampling to define a Riemannian manifold in the data space without the need for the user to specify a metric over the data a-priori. The result is a locally-adaptive-metric that produces structurally-informative density-based distances. We demonstrate the utility of $\texttt{LAMINAR}$ by comparing its output to the Euclidean metric for structured data sets.
翻译:本文提出$\texttt{LAMINAR}$,一种新颖的无监督机器学习流程,旨在通过生成更具信息量的距离度量来增强数据内部结构的表示。物理科学中的分析方法通常依赖标准度量来定义数据中的几何关系,这可能无法捕捉复杂数据集的底层结构。$\texttt{LAMINAR}$通过使用连续归一化流和逆变换采样,在数据空间中定义黎曼流形,而无需用户预先指定数据上的度量。其结果是产生一个局部自适应度量,该度量能生成基于密度的、具有结构信息量的距离。我们通过将$\texttt{LAMINAR}$的输出与结构化数据集的欧几里得度量进行比较,展示了其实用性。