多元计数数据的贝叶斯非参数建模：未知特征数量情形 (Bayesian nonparametric modeling of multivariate count data with an unknown number of traits)

Feature and trait allocation models are fundamental objects in Bayesian nonparametrics and play a prominent role in several applications. Existing approaches, however, typically assume full exchangeability of the data, which may be restrictive in settings characterized by heterogeneous but related groups. In this paper, we introduce a general and tractable class of Bayesian nonparametric priors for partially exchangeable trait allocation models, relying on completely random vectors. We provide a comprehensive theoretical analysis, including closed-form expressions for marginal and posterior distributions, and illustrate the tractability of our framework in the cases of binary and Poisson-distributed traits. A distinctive aspect of our approach is that the number of traits is a random quantity, thereby allowing us to model and estimate unobserved traits. Building on these results, we also develop a novel mixture model that infers the group partition structure from the data, effectively clustering trait allocations. This extension generalizes Bayesian nonparametric latent class models and avoids the systematic overclustering that arises when the number of traits is assumed to be fixed. We demonstrate the practical usefulness of our methodology through an application to the `Ndrangheta criminal network from the Operazione Infinito investigation, where our model provides insights into the organization of illicit activities.

翻译：特征与特征分配模型是贝叶斯非参数统计中的基础对象，在众多应用中具有重要地位。然而，现有方法通常假设数据具有完全可交换性，这在存在异质但相关群体的场景中可能显得局限。本文针对部分可交换特征分配模型，基于完全随机向量提出了一类通用且易处理的贝叶斯非参数先验分布。我们提供了完整的理论分析，包括边缘分布与后验分布的闭式表达式，并通过二值特征与泊松分布特征的案例展示了本框架的易处理性。本方法的一个显著特点是特征数量为随机变量，从而能够对未观测特征进行建模与估计。基于这些结果，我们还提出了一种新的混合模型，能够从数据中推断群体划分结构，实现对特征分配的有效聚类。该扩展推广了贝叶斯非参数潜在类别模型，并避免了固定特征数量假设导致的系统性过聚类问题。我们通过应用于'无限行动'调查中的'光荣会'犯罪网络案例，展示了本方法的实用价值——该模型为非法活动的组织模式提供了新的洞察。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【机器学习术语宝典】机器学习中英文术语表

专知会员服务

61+阅读 · 2020年7月12日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日