反事实伤害：一个反驳论点 (Counterfactual Harm: A Counter-argument)

As AI systems are increasingly used to guide decisions, it is essential that they follow ethical principles. A core principle in medicine is non-maleficence, often equated with ``do no harm''. A formal definition of harm based on counterfactual reasoning has been proposed and popularized. This notion of harm has been promoted in simple settings with binary treatments and outcomes. Here, we highlight a problem with this definition in settings involving multiple treatment options. Illustrated by an example with three tuberculosis treatments (say, A, B, and C), we demonstrate that the counterfactual definition of harm can produce intransitive results: B is less harmful than A, C is less harmful than B, yet C is more harmful than A when compared pairwise. This intransitivity poses a challenge as it may lead to practical (clinical) decisions that are difficult to justify or defend. In contrast, an interventionist definition of harm based on expected utility forgoes counterfactual comparisons and ensures transitive treatment rankings.

翻译：随着人工智能系统日益被用于指导决策，确保其遵循伦理原则至关重要。医学中的一项核心原则是无害原则，常被等同于"不造成伤害"。一种基于反事实推理的伤害形式化定义已被提出并得到推广。这一伤害概念在二元处理和结果的简单情境中得到了提倡。在此，我们指出该定义在涉及多种治疗方案的情境中存在的问题。通过一个包含三种结核病治疗方案（例如A、B和C）的示例，我们证明反事实的伤害定义可能产生非传递性的结果：B比A伤害更小，C比B伤害更小，但在成对比较时C却比A伤害更大。这种非传递性构成了挑战，因为它可能导致难以证明或辩护的实际（临床）决策。相比之下，基于期望效用的干预主义伤害定义摒弃了反事实比较，并确保了治疗方案排序的传递性。