Mixture-of-Experts models are commonly used when there exist distinct clusters with different relationships between the independent and dependent variables. Fitting such models for large datasets, however, is computationally virtually impossible. An attractive alternative is to use a subdata selected by ``maximizing" the Fisher information matrix. A major challenge is that no closed-form expression for the Fisher information matrix is available for such models. Focusing on clusterwise linear regression models, a subclass of MoE models, we develop a framework that overcomes this challenge. We prove that the proposed subdata selection approach is asymptotically optimal, i.e., no other method is statistically more efficient than the proposed one when the full data size is large.
翻译:混合专家模型通常用于处理自变量与因变量间存在不同关系的独立分簇场景。然而,在大型数据集上拟合此类模型在计算上几乎不可行。一种有吸引力的替代方案是使用通过"最大化" Fisher 信息矩阵选取的子数据。其主要挑战在于这类模型的 Fisher 信息矩阵不存在闭式表达式。针对混合专家模型子类中的分簇线性回归模型,我们构建了一个克服该挑战的框架。我们证明,所提出的子数据选择方法具有渐近最优性,即当全数据规模较大时,尚未有统计效率高于该方法的其他方法。