Squared families: Searching beyond regular probability models

We introduce squared families, which are families of probability densities obtained by squaring a linear transformation of a statistic. Squared families are singular, however their singularity can easily be handled so that they form regular models. After handling the singularity, squared families possess many convenient properties. Their Fisher information is a conformal transformation of the Hessian metric induced from a Bregman generator. The Bregman generator is the normalising constant, and yields a statistical divergence on the family. The normalising constant admits a helpful parameter-integral factorisation, meaning that only one parameter-independent integral needs to be computed for all normalising constants in the family, unlike in exponential families. Finally, the squared family kernel is the only integral that needs to be computed for the Fisher information, statistical divergence and normalising constant. We then describe how squared families are special in the broader class of $g$-families, which are obtained by applying a sufficiently regular function $g$ to a linear transformation of a statistic. After removing special singularities, positively homogeneous families and exponential families are the only $g$-families for which the Fisher information is a conformal transformation of the Hessian metric, where the generator depends on the parameter only through the normalising constant. Even-order monomial families also admit parameter-integral factorisations, unlike exponential families. We study parameter estimation and density estimation in squared families, in the well-specified and misspecified settings. We use a universal approximation property to show that squared families can learn sufficiently well-behaved target densities at a rate of $\mathcal{O}(N^{-1/2})+C n^{-1/4}$, where $N$ is the number of datapoints, $n$ is the number of parameters, and $C$ is some constant.

翻译：我们引入平方族，即通过对统计量的线性变换进行平方而获得的概率密度族。平方族具有奇异性，但这种奇异性易于处理，从而使其形成正则模型。处理奇异性后，平方族具备诸多便利特性：其Fisher信息是Bregman生成元诱导的Hessian度量的共形变换；Bregman生成元即归一化常数，并在该族上产生统计散度；该归一化常数具有实用的参数积分分解形式，这意味着与指数族不同，仅需计算一个与参数无关的积分即可获得族中所有归一化常数；最后，平方族核是计算Fisher信息、统计散度及归一化常数时唯一需要计算的积分。我们进一步阐明平方族在更广泛的$g$-族中的特殊性——$g$-族通过对统计量的线性变换施加足够正则的函数$g$而获得。剔除特殊奇点后，正齐次族与指数族是仅有的两类$g$-族，其Fisher信息为Hessian度量的共形变换，且生成元仅通过归一化常数依赖参数。与指数族不同，偶数阶单项式族同样允许参数积分分解。我们在设定正确与设定错误的两种情形下，研究平方族中的参数估计与密度估计问题。利用通用逼近性质，我们证明平方族能以$\mathcal{O}(N^{-1/2})+C n^{-1/4}$的速率充分学习行为良好的目标密度，其中$N$为数据点数量，$n$为参数数量，$C$为某常数。