Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

Algorithmic and data-driven decisions and recommendations are commonly used in high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are common but difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.

翻译：算法与数据驱动的决策和建议常被应用于刑事司法、医学及公共政策等高风险管理决策场景。本研究探讨是否可能利用1969年底引入后即时测量的结果，改进越南战争期间采用的安全评估算法。这一实证应用提出了高风险管理算法决策中常见的若干方法论挑战：首先，在新算法实施前，必须刻画并控制其导致结果劣于现有算法的风险；其次，现有算法具有确定性，学习新算法需要透明的外推方法；第三，现有算法涉及常见但难以优化的离散决策表。为应对这些挑战，我们提出平均条件风险（ACRisk）指标，该指标首先量化新算法策略导致个体子组结果恶化的风险，随后在子组分布上取平均值。我们同时提出一个贝叶斯策略学习框架，在控制后验期望ACRisk的同时最大化后验期望值。该框架将异质性处理效应估计与策略优化相分离，支持灵活效应估计及复杂策略类别的优化。我们将由此产生的机会约束优化问题表征为约束线性规划问题。分析表明，与越南战争期间实际采用的算法相比，学习得到的算法将大多数区域评估为更安全的状态，并更强调经济和政治因素而非军事因素。