We study a noisy linear observation model with an unknown permutation called permuted/shuffled linear regression, where responses and covariates are mismatched and the permutation forms a discrete, factorial-size parameter. This unknown permutation is a key component of the data-generating process, yet its statistical investigation remains challenging due to its discrete nature. In this study, we develop a general statistical inference framework on the permutation and regression coefficients. First, we introduce a localization step that reduces the permutation space to a small candidate set building on recent advances in the repro samples method, whose miscoverage decays polynomially with the number of Monte Carlo samples. Then, based on this localized set, we provide statistical inference procedures: a conditional Monte Carlo test of permutation structures with valid finite-sample Type-I error control. We also develop coefficient inference that remains valid under alignment uncertainty of permutations. For computational purposes, we develop a linear assignment problem computable in polynomial time complexity and demonstrate that its solution asymptotically converges to that of the conventional least squares problem with large computational cost. Extensions to partially permuted designs and ridge regularization are also discussed. Extensive simulations and an application to Beijing air-quality data corroborate finite-sample validity, strong power to detect mismatches, and practical scalability.
翻译:我们研究一种包含未知置换的含噪线性观测模型,称为置换/混洗线性回归,其中响应变量与协变量存在错配,而置换构成了一个离散的、阶乘规模的参数。这一未知置换是数据生成过程的关键组成部分,但由于其离散特性,其统计研究仍面临挑战。在本研究中,我们针对置换与回归系数构建了一个通用的统计推断框架。首先,我们引入基于重抽样方法最新进展的定位步骤,将置换空间缩减至一个较小的候选集合,其误覆盖概率随蒙特卡洛样本数量多项式衰减。随后,基于该局部化集合,我们提出了统计推断方法:一种具有有限样本第一类错误有效控制的置换结构条件蒙特卡洛检验。我们还开发了在置换对齐不确定性下仍保持有效的系数推断方法。针对计算需求,我们构建了一个可在多项式时间复杂度内求解的线性分配问题,并证明其解在渐进意义上收敛于传统计算成本高昂的最小二乘问题的解。本文还讨论了向部分置换设计与岭正则化的扩展。大量模拟实验及对北京空气质量数据的应用验证了该方法的有限样本有效性、检测错配的强功效以及实际可扩展性。