Understanding the dependence structure between response variables is an important component in the analysis of correlated multivariate data. This article focuses on modeling dependence structures in multivariate binary data, motivated by a study aiming to understand how patterns in different U.S. senators' votes are determined by similarities (or lack thereof) in their attributes, e.g., political parties and social network profiles. To address such a research question, we propose a new Ising similarity regression model which regresses pairwise interaction coefficients in the Ising model against a set of similarity measures available/constructed from covariates. Model selection approaches are further developed through regularizing the pseudo-likelihood function with an adaptive lasso penalty to enable the selection of relevant similarity measures. We establish estimation and selection consistency of the proposed estimator under a general setting where the number of similarity measures and responses tend to infinity. Simulation study demonstrates the strong finite sample performance of the proposed estimator, particularly compared with several existing Ising model estimators in estimating the matrix of pairwise interaction coefficients. Applying the Ising similarity regression model to a dataset of roll call voting records of 100 U.S. senators, we are able to quantify how similarities in senators' parties, businessman occupations and social network profiles drive their voting associations.
翻译:理解响应变量之间的依赖结构是分析相关多元数据的重要组成部分。本文聚焦于建模多元二元数据中的依赖结构,其研究动机源于一项旨在理解美国不同参议员投票模式如何由其属性(例如政党和社会网络档案)的相似性(或缺乏相似性)所决定的研究。为应对此类研究问题,我们提出了一种新的伊辛相似性回归模型,该模型将伊辛模型中的成对交互系数对一组可从协变量获得/构建的相似性度量进行回归。通过使用自适应lasso惩罚项对伪似然函数进行正则化,我们进一步开发了模型选择方法,以实现对相关相似性度量的筛选。在相似性度量数量和响应变量数量趋于无穷的一般设定下,我们建立了所提出估计量的估计与选择一致性。模拟研究证明了所提出估计量在有限样本下的优异性能,特别是在估计成对交互系数矩阵方面,与几种现有的伊辛模型估计量相比表现突出。将伊辛相似性回归模型应用于100名美国参议员的唱名表决投票记录数据集,我们能够量化参议员在政党、商人职业和社会网络档案方面的相似性如何驱动他们的投票关联。