The multivariate inverse hypergeometric (MIH) distribution is an extension of the negative multinomial (NM) model that accounts for sampling without replacement in a finite population. Even though most studies on longitudinal count data with a specific number of `failures' occur in a finite setting, the NM model is typically chosen over the more accurate MIH model. This raises the question: How much information is lost when inferring with the approximate NM model instead of the true MIH model? The loss is quantified by a measure called deficiency in statistics. In this paper, asymptotic bounds for the deficiencies between MIH and NM experiments are derived, as well as between MIH and the corresponding multivariate normal experiments with the same mean-covariance structure. The findings are supported by a local approximation for the log-ratio of the MIH and NM probability mass functions, and by Hellinger distance bounds.
翻译:多元逆超几何(MIH)分布是负多项(NM)模型的一种扩展,用于描述有限总体中无放回抽样的情形。尽管大多数针对具有特定“失败”次数的纵向计数数据研究发生在有限情境中,但实际中通常选择NM模型而非更精确的MIH模型。这引发了一个问题:当使用近似NM模型而非真实MIH模型进行推断时,会损失多少信息?这种损失通过统计学中的“缺陷”(deficiency)度量来量化。本文推导了MIH实验与NM实验之间缺陷的渐近界,以及MIH实验与具有相同均值-协方差结构的对应多元正态实验之间的缺陷界。这些发现得到了MIH与NM概率质量函数对数比值的局部近似以及Hellinger距离界的支撑。