The estimation of categorical distributions under marginal constraints summarizing some sample from a population in the most-generalizable way is key for many machine-learning and data-driven approaches. We provide a parameter-agnostic theoretical framework that enables this task ensuring (i) that a categorical distribution of Maximum Entropy under marginal constraints always exists and (ii) that it is unique. The procedure of iterative proportional fitting (IPF) naturally estimates that distribution from any consistent set of marginal constraints directly in the space of probabilities, thus deductively identifying a least-biased characterization of the population. The theoretical framework together with IPF leads to a holistic workflow that enables modeling any class of categorical distributions solely using the phenomenological information provided.
翻译:在从总体中以最可泛化的方式总结样本的边际约束下估计类别分布,是许多机器学习和数据驱动方法的关键。我们提出了一个无参数的理论框架,能够确保:(i)在边际约束下最大熵的类别分布必然存在;(ii)该分布唯一。迭代比例拟合(IPF)方法可直接在概率空间中从任何一致的边际约束集自然估计该分布,从而演绎性地确定总体的无偏特征。该理论框架与IPF共同形成一个整体性工作流,使得仅利用所提供现象学信息即可建模任意类别的类别分布。