Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at https://github.com/ssokota/mec .
翻译:最小熵耦合(MEC)——为给定边缘分布寻找具有最小熵的联合分布的过程——在因果推断和隐写术等领域具有应用。然而,现有算法要么对于大支撑集分布在计算上难以处理,要么仅限于特定分布类型且对超参数选择敏感。本研究通过将先前的迭代MEC(IMEC)方法族统一到一个广义的基于划分的形式化框架中,以应对这些局限性。基于此框架,我们推导出一种称为ARIMEC的新型IMEC算法,该算法能够处理任意离散分布,并引入了一种使IMEC对次优超参数设置具有鲁棒性的方法。这些创新促进了IMEC在语言模型的高通量隐写术等场景中的应用。我们的代码库可在 https://github.com/ssokota/mec 获取。