Effective IT change management is important for businesses that depend on software and services, particularly in highly regulated sectors such as finance, where operational reliability, auditability, and explainability are essential. A significant portion of IT incidents are caused by changes, making it important to identify high-risk changes before deployment. This study presents a predictive incident risk scoring approach at a large international bank. The approach supports engineers during the assessment and planning phases of change deployments by predicting the potential of inducing incidents. To satisfy regulatory constraints, we built the model with auditability and explainability in mind, applying SHAP values to provide feature-level insights and ensure decisions are traceable and transparent. Using a one-year real-world dataset, we compare the existing rule-based process with three machine learning models: HGBC, LightGBM, and XGBoost. LightGBM achieved the best performance, particularly when enriched with aggregated team metrics that capture organisational context. Our results show that data-driven, interpretable models can outperform rule-based approaches while meeting compliance needs, enabling proactive risk mitigation and more reliable IT operations.
翻译:有效的IT变更管理对依赖软件和服务的业务至关重要,特别是在金融等高度监管领域,这些领域对运维可靠性、可审计性和可解释性有严格要求。相当比例的IT事件由变更引发,这使得在部署前识别高风险变更变得尤为重要。本研究提出了一种面向大型国际银行的事件风险评分预测方法。该方法在变更部署的评估与规划阶段,通过预测诱发事件的可能性为工程师提供支持。为满足监管约束,我们在模型构建中充分考虑了可审计性与可解释性,应用SHAP值提供特征级洞察,确保决策可追溯且透明。基于一年的真实数据集,我们将现有的基于规则的流程与三种机器学习模型(HGBC、LightGBM和XGBoost)进行了对比。实验表明,LightGBM在性能上表现最优,尤其在引入反映组织上下文的聚合团队指标后效果更佳。研究结果表明,数据驱动的可解释模型不仅在满足合规要求的前提下能超越基于规则的方法,还能实现主动风险缓解,从而提升IT运维可靠性。