We revisit the replica method for analyzing inference and learning in parametric models, considering situations where the data-generating distribution is unknown or analytically intractable. Instead of assuming idealized distributions to carry out quenched averages analytically, we use a variational Gaussian approximation for the replicated system in grand canonical formalism in which the data average can be deferred and replaced by empirical averages, leading to stationarity conditions that adaptively determine the parameters of the trial Hamiltonian for each dataset. This approach clarifies how fluctuations affect information extraction and connects directly with the results of mathematical statistics or learning theory such as information criteria. As a concrete application, we analyze linear regression and derive learning curves. This includes cases with real-world datasets, where exact replica calculations are not feasible.
翻译:我们重新审视了用于分析参数化模型中推断与学习过程的复本方法,考虑数据生成分布未知或解析不可处理的情形。不同于假设理想化分布以解析执行淬火平均,我们在巨正则形式中对复本系统采用变分高斯近似,使得数据平均可被延后并替换为经验平均,从而产生自适应确定每个数据集试验哈密顿量参数的平稳条件。该方法阐明了涨落如何影响信息提取,并直接与数理统计或学习理论(如信息准则)的结果相连接。作为具体应用,我们分析了线性回归并推导了学习曲线,其中包括使用真实世界数据集的案例——这些情况下精确的复本计算是不可行的。