We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
翻译:我们经常遇到需要根据预定义的权重约束,对观测数据的经验分布进行最优加权调整的问题。这类约束通常体现为对加权调整后经验分布的矩、尾部行为、形态、模态数量等方面的限制。本文通过引入基于权重的非参数分布约束,并利用最大熵原理和最优传输工具建立通用框架,显著增强了此类方法的灵活性。其核心思想是确保观测数据的最大熵加权调整经验分布在最优传输度量下接近预先指定的概率分布,同时允许存在细微偏差。通过三个不同应用场景(即投资组合配置、复杂调查的半参数推断以及机器学习算法中的算法公平性保证)的实证,展示了该框架在数据重加权需满足统计任务核心优化问题中附加约束时的普适性。