This paper investigates the problem of simultaneously predicting multiple binary responses by utilizing a shared set of covariates. Our approach incorporates machine learning techniques for binary classification, without making assumptions about the underlying observations. Instead, our focus lies on a group of predictors, aiming to identify the one that minimizes prediction error. Unlike previous studies that primarily address estimation error, we directly analyze the prediction error of our method using PAC-Bayesian bounds techniques. In this paper, we introduce a pseudo-Bayesian approach capable of handling incomplete response data. Our strategy is efficiently implemented using the Langevin Monte Carlo method. Through simulation studies and a practical application using real data, we demonstrate the effectiveness of our proposed method, producing comparable or sometimes superior results compared to the current state-of-the-art method.
翻译:本文研究了利用共享协变量集同时预测多个二元响应的问题。我们的方法融合了二元分类的机器学习技术,且不对潜在观测值做出任何假设。相反,我们关注于一组预测器,旨在确定能够最小化预测误差的那个。与以往主要处理估计误差的研究不同,我们直接使用PAC-Bayesian界技术分析了我们方法的预测误差。本文提出了一种伪贝叶斯方法,能够处理不完整的响应数据。我们的策略通过Langevin蒙特卡洛方法高效实现。通过模拟研究和使用真实数据的实际应用,我们证明了所提出方法的有效性,与当前最先进的方法相比,产生了相当甚至有时更优越的结果。