Understanding the dependence structure between response variables is an important component in the analysis of correlated multivariate data. This article focuses on modeling dependence structures in multivariate binary data, motivated by a study aiming to understand how patterns in different U.S. senators' votes are determined by similarities (or lack thereof) in their attributes, e.g., political parties and social network profiles. To address such a research question, we propose a new Ising similarity regression model which regresses pairwise interaction coefficients in the Ising model against a set of similarity measures available/constructed from covariates. Model selection approaches are further developed through regularizing the pseudo-likelihood function with an adaptive lasso penalty to enable the selection of relevant similarity measures. We establish estimation and selection consistency of the proposed estimator under a general setting where the number of similarity measures and responses tend to infinity. Simulation study demonstrates the strong finite sample performance of the proposed estimator in terms of parameter estimation and similarity selection. Applying the Ising similarity regression model to a dataset of roll call voting records of 100 U.S. senators, we are able to quantify how similarities in senators' parties, businessman occupations and social network profiles drive their voting associations.
翻译:理解响应变量之间的依赖结构是分析相关多元数据的重要组成部分。本文聚焦于多元二元数据中依赖结构的建模,研究动机源于一项旨在理解美国参议员投票模式如何由其属性(如政党和社会网络特征)的相似性(或差异性)决定的研究。为解答该研究问题,我们提出了一种新型的伊辛相似性回归模型,该模型将伊辛模型中的成对交互系数回归到一组来自协变量的相似性度量上。进一步地,我们通过采用自适应LASSO惩罚对伪似然函数进行正则化,开发了模型选择方法,从而能够筛选出相关的相似性度量。在相似性度量与响应变量数量趋于无穷的一般设定下,我们建立了所提出估计量的估计与选择一致性。仿真研究表明,所提出估计量在参数估计和相似性选择方面具有优异的有限样本性能。将伊辛相似性回归模型应用于包含100位美国参议员唱名投票记录的数据集后,我们得以量化参议员的政党、商人职业及社交网络特征的相似性如何驱动其投票关联性。