We study the problems of data compression, gambling and prediction of a sequence $x^n=x_1x_2...x_n$ from an alphabet ${\cal X}$, in terms of regret and expected regret (redundancy) with respect to various smooth families of probability distributions. We evaluate the regret of Bayes mixture distributions compared to maximum likelihood, under the condition that the maximum likelihood estimate is in the interior of the parameter space. For general exponential families (including the non-i.i.d.\ case) the asymptotically mimimax value is achieved when variants of the prior of Jeffreys are used. %under the condition that the maximum likelihood estimate is in the interior of the parameter space. Interestingly, we also obtain a modification of Jeffreys prior which has measure outside the given family of densities, to achieve minimax regret with respect to non-exponential type families. This modification enlarges the family using local exponential tilting (a fiber bundle). Our conditions are confirmed for certain non-exponential families, including curved families and mixture families (where either the mixture components or their weights of combination are parameterized) as well as contamination models. Furthermore for mixture families we show how to deal with the full simplex of parameters. These results also provide characterization of Rissanen's stochastic complexity.
翻译:我们研究数据压缩、赌博以及从字母表${\cal X}$中预测序列$x^n=x_1x_2...x_n$的问题,重点关注相对于各类平滑概率分布族的遗憾与期望遗憾(冗余度)。在最大似然估计位于参数空间内部的条件下,我们评估了贝叶斯混合分布相较于最大似然估计的遗憾。对于一般指数族(包括非独立同分布情形),当采用Jeffreys先验的变体时,可达到渐近极小极大值。有趣的是,我们还得到了一种Jeffreys先验的修正形式,该修正通过在给定密度族外引入测度(利用局部指数倾斜——一种纤维丛结构扩展分布族),实现了相对于非指数型分布族的极小极大遗憾。我们的条件在特定非指数族中得到验证,包括曲指数族与混合族(其中混合分量或其组合权重被参数化)以及污染模型。此外,针对混合族,我们展示了如何处理完整的参数单纯形。这些结果也为Rissanen随机复杂度提供了理论表征。