Empirical research in economics increasingly relies on restricted-access data held by multiple firms or agencies, making it impossible to construct the estimator of interest on the pooled sample. At the same time, heavy-tailed distributions are pervasive in economics and finance outcomes such as prices, expenditures and loan sizes. We study sparse, robust estimation in the restricted-access setting. The infeasible pooled benchmark is convoluted rank regression (CRR), a smooth rank-based estimator designed for heavy-tailed outcomes. Because the CRR criterion is a non-additive U-statistic, existing communication-efficient methods built for additive empirical losses do not directly apply. We propose distributed convoluted rank regression (DCRR), a surrogate criterion built from a single local CRR loss and an aggregated gradient correction, and show that it shares the same population minimizer as the pooled CRR objective. Building on this surrogate, we develop a two-stage sparse procedure: an iterative $l_1$- penalized stage followed by a folded-concave refinement. For the resulting estimator, we establish non-asymptotic error bounds, a distributed strong oracle property, and a distributed criterion for consistent model selection. Simulations and an application to used-car prices show that DCRR closely approximates pooled CRR and improves on naive divide-and-conquer, particularly under heavy-tailed errors.
翻译:实证经济学研究日益依赖由多家企业或机构持有的受限访问数据,这使得无法在合并样本上构建目标估计量。与此同时,重尾分布普遍存在于价格、支出与贷款规模等经济与金融结果变量中。本文研究受限访问场景下的稀疏稳健估计方法。不可实现的全样本基准为卷积秩回归(CRR)——一种专为重尾结果设计的平滑秩基估计量。由于CRR准则为非可加U统计量,现有面向可加经验损失构建的通信高效方法无法直接适用。我们提出分布式卷积秩回归(DCRR)——基于单个局部CRR损失与聚合梯度校正构建的代理准则,并证明其与全样本CRR目标具有相同的总体最小化点。基于该代理准则,我们开发了两阶段稀疏算法:迭代$l_1$惩罚阶段后接折叠凹修正阶段。对于所得估计量,我们建立了非渐近误差界、分布式强Oracle性质及用于一致模型选择的分布式准则。仿真实验与二手车价格应用表明,DCRR能紧密逼近全样本CRR,并在重尾误差下优于朴素分治策略。