Decision-making problems often feature uncertainty stemming from heterogeneous and context-dependent human preferences. To address this, we propose a sequential learning-and-optimization pipeline to learn preference distributions and leverage them to solve downstream problems, for example risk-averse formulations. We focus on human choice settings that can be formulated as (integer) linear programs. In such settings, existing inverse optimization and choice modelling methods infer preferences from observed choices but typically produce point estimates or fail to capture contextual shifts, making them unsuitable for risk-averse decision-making. Using a bounded-variance score function gradient estimator, we train a predictive model mapping contextual features to a rich class of parameterizable distributions. This approach yields a maximum likelihood estimate. The model generates scenarios for unseen contexts in the subsequent optimization phase. In a synthetic ridesharing environment, our approach reduces average post-decision surprise by up to 114$\times$ compared to a risk-neutral approach with perfect predictions and up to 25$\times$ compared to leading risk-averse baselines.
翻译:决策问题常常存在不确定性,这种不确定性源于异质且依赖情境的人类偏好。为解决这一问题,我们提出了一种顺序学习与优化的流程,用于学习偏好分布并利用其解决下游问题,例如风险规避的公式化问题。我们关注可被表述为(整数)线性规划的人类选择场景。在此类场景中,现有的逆优化和选择建模方法虽能从观察到的选择中推断偏好,但通常仅产生点估计或未能捕捉情境变化,因此不适用于风险规避决策。通过使用有界方差得分函数梯度估计器,我们训练了一个预测模型,该模型将情境特征映射到一类丰富的可参数化分布。该方法产生最大似然估计。该模型在后续优化阶段为未见情境生成场景。在一个合成的拼车环境中,与具有完美预测的风险中性方法相比,我们的方法将平均决策后意外降低了高达114倍;与领先的风险规避基线相比,降低了高达25倍。