We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.
翻译:我们提出了一种适用于建模变维观测数据(如不同长度的文本文档或变长序列)的概率密度的拟合优度衡量方法。该衡量方法属于核斯坦因偏差(KSD)的一种实例,该类方法已被用于构建未归一化密度的拟合优度检验。KSD通过其斯坦因算子定义:当前用于检验的算子仅适用于固定维空间。作为主要贡献,我们通过识别合适的斯坦因算子将KSD扩展至变维场景,并提出了一种新型KSD拟合优度检验。与先前变体相同,本方法无需密度归一化,从而可评估大规模模型类别。实验表明,该检验在离散序列数据基准测试中具有良好性能。