Researchers often represent relations in multi-variate correlational data using Gaussian graphical models, which require regularization to sparsify the models. Acknowledging that they often study the modular structure of the inferred network, we suggest integrating it in the cross-validation of the regularization strength to balance under- and overfitting. Using synthetic and real data, we show that this approach allows us to better recover and infer modular structure in noisy data compared with the graphical lasso, a standard approach using the Gaussian log-likelihood when cross-validating the regularization strength.
翻译:研究者常借助高斯图模型表示多变量相关数据的关系,该模型需要正则化以实现稀疏化。考虑到研究者通常关注推断网络的模块化结构,我们建议在正则化强度的交叉验证中整合模块结构,以平衡欠拟合与过拟合。通过使用合成数据和真实数据,我们展示了该方法相较于图套索(一种在交叉验证正则化强度时采用高斯对数似然的标准方法)能更有效地恢复并推断含噪数据中的模块化结构。