Our goal is to develop a general strategy to decompose a random variable $X$ into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, $X$ can be "thinned" into independent random variables $X^{(1)}, \ldots, X^{(K)}$, such that $X = \sum_{k=1}^K X^{(k)}$. In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct $X$. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.
翻译:我们的目标是开发一种通用策略,将随机变量 $X$ 分解为多个独立的随机变量,且不损失关于未知参数的任何信息。近期一篇论文表明,对于某些著名的自然指数族,可将 $X$ "稀释"为独立的随机变量 $X^{(1)}, \ldots, X^{(K)}$,使得 $X = \sum_{k=1}^K X^{(k)}$。本文通过放宽该求和约束,仅要求这些独立随机变量的某个已知函数能精确重构 $X$,从而推广了上述方法。此推广具有双重目的:第一,极大扩展了可执行稀释操作的分布族范围;第二,将表面看似截然不同的样本拆分与数据稀释统一为同一原理的应用——这一共享原理即为充分性。我们利用这一洞见,对多种分布族执行了广义稀释操作。