We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
翻译:我们常常遇到需要根据预定义的权重约束,对观测数据的经验分布进行最优加权调整的问题。这类约束通常体现为对加权后经验分布的矩、尾部行为、形状、模态数量等的限制。本文通过引入具有非参数特性的分布约束条件,并基于最大熵原理与最优传输工具构建通用框架,显著增强了此类方法的灵活性。核心思想是保证观测数据的最大熵加权经验分布在最优传输度量下接近预设的概率分布,同时允许细微偏差。该框架的通用性通过三个不同场景得到验证:投资组合分配、复杂调查的半参数推断,以及机器学习算法中的算法公平性保障——这些场景中均需通过数据重加权满足统计任务核心优化问题的约束条件。