If the assumed model does not accurately capture the underlying structure of the data, a statistical method is likely to yield sub-optimal results, and so model selection is crucial in order to conduct any statistical analysis. However, in case of massive datasets, the selection of an appropriate model from a large pool of candidates becomes computationally challenging, and limited research has been conducted on data selection for model selection. In this study, we conduct subdata selection based on the A-optimality criterion, allowing to perform model selection on a smaller subset of the data. We evaluate our approach based on the probability of selecting the best model and on the estimation efficiency through simulation experiments and two real data applications.
翻译:如果假设模型未能准确捕捉数据的潜在结构,统计方法很可能产生次优结果,因此模型选择对任何统计分析都至关重要。然而,在数据量庞大的情况下,从大量候选模型中选取合适的模型面临计算挑战,且目前针对模型选择的数据选取研究十分有限。本研究基于A-最优性准则进行子数据选取,从而允许在较小的数据子集上执行模型选择。我们通过模拟实验和两个实际数据应用,基于最佳模型选取概率和估计效率两个维度对方法进行了评估。