In this paper, we are concerned with estimating the joint probability of random variables $X$ and $Y$, given $N$ independent observation blocks $(\boldsymbol{x}^i,\boldsymbol{y}^i)$, $i=1,\ldots,N$, each of $M$ samples $(\boldsymbol{x}^i,\boldsymbol{y}^i) = \bigl((x^i_j, y^i_{\sigma^i(j)}) \bigr)_{j=1}^M$, where $\sigma^i$ denotes an unknown permutation of i.i.d. sampled pairs $(x^i_j,y_j^i)$, $j=1,\ldots,M$. This means that the internal ordering of the $M$ samples within an observation block is not known. We derive a maximum-likelihood inference functional, propose a computationally tractable approximation and analyze their properties. In particular, we prove a $\Gamma$-convergence result showing that we can recover the true density from empirical approximations as the number $N$ of blocks goes to infinity. Using entropic optimal transport kernels, we model a class of hypothesis spaces of density functions over which the inference functional can be minimized. This hypothesis class is particularly suited for approximate inference of transfer operators from data. We solve the resulting discrete minimization problem by a modification of the EMML algorithm to take addional transition probability constraints into account and prove the convergence of this algorithm. Proof-of-concept examples demonstrate the potential of our method.
翻译:本文研究在给定 $N$ 个独立观测块 $(\boldsymbol{x}^i,\boldsymbol{y}^i)$($i=1,\ldots,N$)的条件下,估计随机变量 $X$ 与 $Y$ 联合概率密度的问题。每个观测块包含 $M$ 个样本 $(\boldsymbol{x}^i,\boldsymbol{y}^i) = \bigl((x^i_j, y^i_{\sigma^i(j)}) \bigr)_{j=1}^M$,其中 $\sigma^i$ 表示独立同分布采样对 $(x^i_j,y_j^i)$($j=1,\ldots,M$)的未知排列,即观测块内部 $M$ 个样本的配对顺序未知。我们推导了最大似然推断泛函,提出了可计算近似方案,并分析了其性质。特别地,我们证明了 $\Gamma$ 收敛性,表明当观测块数量 $N$ 趋于无穷时,能从经验近似恢复真实密度。通过引入熵最优传输核,我们构建了一类密度函数假设空间,使得推断泛函可在此空间中最小化。该假设类特别适用于从数据中近似推断传递算子。我们通过修改EMML算法将额外转移概率约束纳入考虑,求解了最终的离散最小化问题,并证明了该算法的收敛性。概念验证示例展示了该方法的应用潜力。