We study a linear observation model with an unknown permutation called \textit{permuted/shuffled linear regression}, where responses and covariates are mismatched and the permutation forms a discrete, factorial-size parameter. The permutation is a key component of the data-generating process, yet its statistical investigation remains challenging due to its discrete nature. We develop a general statistical inference framework on the permutation and regression coefficients. First, we introduce a localization step that reduces the permutation space to a small candidate set building on recent advances in the repro samples method, whose miscoverage decays polynomially with the number of Monte Carlo samples. Then, based on this localized set, we provide statistical inference procedures: a conditional Monte Carlo test of permutation structures with valid finite-sample Type-I error control. We also develop coefficient inference that remains valid under alignment uncertainty of permutations. For computational purposes, we develop a linear assignment problem computable in polynomial time and demonstrate that, with high probability, the solution is equivalent to that of the conventional least squares with large computational cost. Extensions to partially permuted designs and ridge regularization are further discussed. Extensive simulations and an application to air-quality data corroborate finite-sample validity, strong power to detect mismatches, and practical scalability.
翻译:我们研究一种带有未知置换的线性观测模型,称为\textit{置换/混洗线性回归},其中响应变量与协变量错配,而置换构成了一个离散的、阶乘规模的参数。置换是数据生成过程的关键组成部分,但由于其离散性质,其统计研究仍然具有挑战性。我们针对置换和回归系数开发了一个通用的统计推断框架。首先,我们引入了一个定位步骤,基于重抽样方法的最新进展,将置换空间缩减为一个小的候选集,其误覆盖概率随蒙特卡洛样本数量多项式衰减。然后,基于这个局部化集合,我们提供了统计推断程序:一种具有有效有限样本第一类错误控制的置换结构条件蒙特卡洛检验。我们还开发了在置换对齐不确定性下仍然有效的系数推断方法。出于计算目的,我们开发了一个可在多项式时间内求解的线性分配问题,并证明在高概率下,其解等价于计算成本高昂的传统最小二乘法的解。进一步讨论了扩展到部分置换设计和岭正则化的情形。大量的模拟实验和空气质量数据的应用验证了有限样本有效性、检测错配的强大功效以及实际可扩展性。