Motivated by the empirical power law of the distributions of credits (e.g., the number of "likes") of viral posts in social media, we introduce the high-dimensional tail index regression and methods of estimation and inference for its parameters. We propose a regularized estimator, establish its consistency, and derive its convergence rate. To conduct inference, we propose to debias the regularized estimate, and establish the asymptotic normality of the debiased estimator. Simulation studies support our theory. These methods are applied to text analyses of viral posts in X (formerly Twitter) concerning LGBTQ+.
翻译:受社交媒体病毒式传播帖子信用(如“点赞”数)分布的经验幂律规律启发,我们引入高维尾指数回归及其参数的估计与推断方法。提出一种正则化估计量,建立其相合性并推导收敛速度。针对推断问题,提出对正则化估计量进行去偏处理,并建立去偏估计量的渐近正态性。模拟实验验证了理论结果。该方法被应用于X平台(原Twitter)上关于LGBTQ+内容的病毒式传播帖子的文本分析。