Urn models for innovation capture fundamental empirical laws shared by several real-world processes. The so-called urn model with triggering includes, as particular cases, the urn representation of the two-parameter Poisson-Dirichlet process and the Dirichlet process, seminal in Bayesian non-parametric inference. In this work, we leverage this connection to introduce a general approach for quantifying closeness between symbolic sequences and test it within the framework of the authorship attribution problem. The method demonstrates high accuracy when compared to other related methods in different scenarios, featuring a substantial gain in computational efficiency and theoretical transparency. Beyond the practical convenience, this work demonstrates how the recently established connection between urn models and non-parametric Bayesian inference can pave the way for designing more efficient inference methods. In particular, the hybrid approach that we propose allows us to relax the exchangeability hypothesis, which can be particularly relevant for systems exhibiting complex correlation patterns and non-stationary dynamics.
翻译:创新过程的瓮模型捕捉了多个现实世界过程共有的基本经验规律。所谓的触发瓮模型包含了作为特例的两参数泊松-狄利克雷过程与狄利克雷过程的瓮表示,这两者在贝叶斯非参数推断中具有开创性意义。本研究利用这一关联,提出了一种量化符号序列间相似度的通用方法,并在作者归属问题的框架内进行了验证。该方法在不同场景下相较于其他相关方法展现出高精度,同时在计算效率和理论透明度方面获得显著提升。除了实际应用优势外,本研究证明了瓮模型与非参数贝叶斯推断间新建立的关联可为设计更高效的推断方法开辟道路。特别地,我们提出的混合方法能够放宽可交换性假设,这对于呈现复杂相关模式和非平稳动态的系统尤为重要。