Motivated by the empirical observation of power-law distributions in the credits (e.g., ``likes'') of viral posts in social media, we introduce a high-dimensional tail index regression model and propose methods for estimation and inference of its parameters. First, we propose a regularized estimator, establish its consistency, and derive its convergence rate. Second, we debias the regularized estimator to facilitate inference and prove its asymptotic normality. Simulation studies corroborate our theoretical findings. We apply these methods to the text analysis of viral posts on X (formerly Twitter).
翻译:受社交媒体中病毒式传播帖子(例如“点赞”数)呈现幂律分布的实证观察所启发,本文引入了一种高维尾部指数回归模型,并提出了其参数的估计与推断方法。首先,我们提出了一种正则化估计量,建立了其一致性并推导了其收敛速率。其次,我们对正则化估计量进行去偏处理以辅助统计推断,并证明了其渐近正态性。模拟研究验证了我们的理论结果。我们将这些方法应用于X平台(原Twitter)上病毒式传播帖子的文本分析。