Model-based optimization (MBO) is increasingly applied to design problems in science and engineering. A common scenario involves using a fixed training set to train models, with the goal of designing new samples that outperform those present in the training data. A major challenge in this setting is distribution shift, where the distributions of training and design samples are different. While some shift is expected, as the goal is to create better designs, this change can negatively affect model accuracy and subsequently, design quality. Despite the widespread nature of this problem, addressing it demands deep domain knowledge and artful application. To tackle this issue, we propose a straightforward method for design practitioners that detects distribution shifts. This method trains a binary classifier using knowledge of the unlabeled design distribution to separate the training data from the design data. The classifier's logit scores are then used as a proxy measure of distribution shift. We validate our method in a real-world application by running offline MBO and evaluate the effect of distribution shift on design quality. We find that the intensity of the shift in the design distribution varies based on the number of steps taken by the optimization algorithm, and our simple approach can identify these shifts. This enables users to constrain their search to regions where the model's predictions are reliable, thereby increasing the quality of designs.
翻译:基于模型的优化(MBO)正越来越多地应用于科学与工程领域的设计问题。常见场景是使用固定训练集训练模型,目标是设计出超越训练数据中现有样本的新样本。该场景面临的主要挑战是分布偏移——即训练样本与设计样本的分布存在差异。虽然一定程度的偏移在所难免(因为目标是创建更优设计),但这种变化可能对模型精度产生负面影响,进而降低设计质量。尽管该问题普遍存在,但解决它需要深厚的领域知识与巧妙的实践。为此,我们为设计实践者提出一种检测分布偏移的简洁方法:该方法利用未标注设计分布的已知信息训练二元分类器,将训练数据与设计数据区分开,并将分类器的对数几率分数作为分布偏移的代理度量。我们通过离线MBO在真实场景中验证该方法,并评估分布偏移对设计质量的影响。研究发现:设计分布偏移的强度随优化算法迭代步数动态变化,而我们的简单方法能有效识别这些偏移。这使得用户可将搜索范围约束在模型预测可靠的区域,从而提升设计质量。