Urn models for innovation have proven to capture fundamental empirical laws shared by several real-world processes. The so-called urn model with triggering includes, as particular cases, an urn representation of the two-parameter Poisson-Dirichlet process and the Dirichlet process, seminal in Bayesian non-parametric inference. In this work, we leverage this connection to introduce a novel approach for quantifying closeness between symbolic sequences and test it within the framework of the authorship attribution problem. The method demonstrates high accuracy when compared to other state-of-the-art methods in different scenarios, featuring a substantial gain in computational efficiency and theoretical transparency. Beyond the practical convenience, this work demonstrates how the recently established connection between urn models and non-parametric Bayesian inference can pave the way for designing more efficient inference methods. In particular, the hybrid approach that we propose allows us to relax the exchangeability hypothesis, which can be particularly relevant for systems exhibiting complex correlation patterns and non-stationary dynamics.
翻译:用于创新的瓮模型已被证明能够捕捉到若干真实世界过程所共有的基本经验定律。所谓的带有触发机制的瓮模型,作为特例,包含了双参数泊松-狄利克雷过程及狄利克雷过程的瓮表示,这些过程在贝叶斯非参数推断中具有开创性意义。在本工作中,我们利用这一联系提出了一种量化符号序列之间接近程度的新方法,并在作者归属问题的框架下对其进行检验。与不同场景下的其他前沿方法相比,该方法展现出高准确性,同时在计算效率和理论透明性方面具有显著优势。除了实际便利性之外,本工作还展示了瓮模型与非参数贝叶斯推断之间近期建立的关联如何为设计更高效的推断方法铺平道路。特别是,我们提出的混合方法允许放宽交换性假设,这对于表现出复杂相关模式和非平稳动态的系统可能尤为重要。