Environmental, Social, and Governance (ESG) datasets are frequently plagued by significant data gaps, leading to inconsistencies in ESG ratings due to varying imputation methods. This paper explores the application of established machine learning techniques for imputing missing data in a real-world ESG dataset, emphasizing the quantification of uncertainty through prediction intervals. By employing multiple imputation strategies, this study assesses the robustness of imputation methods and quantifies the uncertainty associated with missing data. The findings highlight the importance of probabilistic machine learning models in providing better understanding of ESG scores, thereby addressing the inherent risks of wrong ratings due to incomplete data. This approach improves imputation practices to enhance the reliability of ESG ratings.
翻译:环境、社会和治理(ESG)数据集常受严重数据缺失问题困扰,导致因不同插补方法而产生的ESG评级不一致。本文探讨了在真实世界ESG数据集中应用成熟的机器学习技术进行缺失数据插补,并重点通过预测区间量化不确定性。通过采用多重插补策略,本研究评估了插补方法的稳健性,并量化了与缺失数据相关的不确定性。研究结果凸显了概率机器学习模型在更深入理解ESG分数方面的重要性,从而应对因数据不完整导致错误评级的固有风险。该方法改进了插补实践,以提升ESG评级的可靠性。