We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.
翻译:我们提出了一种适用于观测数据维度可变情形(如不同长度的文本或变长序列)的概率密度模型拟合优度度量方法。该方法采用核斯坦因散度(KSD)框架,该框架此前已用于构建非归一化密度的拟合优度检验。KSD通过其斯坦因算子定义:现有检验算子仅适用于固定维度空间。作为核心贡献,我们通过识别适用于可变维度场景的斯坦因算子,将KSD扩展至变维设定,并提出一种新型KSD拟合优度检验方法。与先前变体相同,该KSD检验无需密度归一化,从而可评估大规模模型类别。在离散序列数据基准测试中,实验表明该方法具有良好性能。