Due to various sources of uncertainty, emergent behavior, and ongoing changes, the reliability of many socio-technical systems depends on an iterative and collaborative process in which organizations (1) analyze and learn from system failures, and then (2) co-evolve both the technical and human parts of their systems based on what they learn. Many organizations have defined processes for learning from failure, often involving postmortem analyses conducted after any system failures that are judged to be sufficiently severe. Despite established processes and tool support, our preliminary research, and professional experience, suggest that it is not straightforward to take what was learned from a failure and successfully improve the reliability of the socio-technical system. To better understand this collaborative process and the associated challenges, we are conducting a study of how teams learn from failure. We are gathering incident reports from multiple organizations and conducting interviews with engineers and managers with relevant experience. Our analytic interest is in what is learned by teams as they reflect on failures, the learning processes involved, and how they use what is learned. Our data collection and analysis are not yet complete, but we have so far analyzed 13 incident reports and seven interviews. In this short paper we (1) present our preliminary findings, and (2) outline our broader research plans.
翻译:由于各种不确定性来源、涌现行为以及持续变化,许多社会技术系统的可靠性依赖于一个迭代式协作过程:组织(1)分析并从系统故障中学习,然后(2)根据所学内容使系统的技术部分和人类部分协同演化。许多组织已定义了从失败中学习的流程,通常包括在系统发生被认为足够严重的故障后开展事后分析。尽管有既定流程和工具支持,我们的初步研究与专业经验表明,将从失败中获得的经验教训成功转化为社会技术系统可靠性的提升并非易事。为更深入理解这一协作过程及其相关挑战,我们正在开展一项关于团队如何从失败中学习的研究。我们收集了来自多个组织的事件报告,并对具有相关经验的工程师和管理人员进行访谈。我们的分析重点包括:团队在反思失败时的学习内容、所涉及的学习过程,以及他们如何运用所学知识。目前数据收集与分析尚未完成,但已初步分析了13份事件报告和7次访谈。在这篇短文中,我们(1)呈现初步研究发现,(2)概述更广泛的研究计划。