Fairness- and uncertainty-aware data generation for data-driven design

The design dataset is the backbone of data-driven design. Ideally, the dataset should be fairly distributed in both shape and property spaces to efficiently explore the underlying relationship. However, the classical experimental design focuses on shape diversity and thus yields biased exploration in the property space. Recently developed methods either conduct subset selection from a large dataset or employ assumptions with severe limitations. In this paper, fairness- and uncertainty-aware data generation (FairGen) is proposed to actively detect and generate missing properties starting from a small dataset. At each iteration, its coverage module computes the data coverage to guide the selection of the target properties. The uncertainty module ensures that the generative model can make certain and thus accurate shape predictions. Integrating the two modules, Bayesian optimization determines the target properties, which are thereafter fed into the generative model to predict the associated shapes. The new designs, whose properties are analyzed by simulation, are added to the design dataset. An S-slot design dataset case study was implemented to demonstrate the efficiency of FairGen in auxetic structural design. Compared with grid and randomized sampling, FairGen increased the coverage score at twice the speed and significantly expanded the sampled region in the property space. As a result, the generative models trained with FairGen-generated datasets showed consistent and significant reductions in mean absolute errors.

翻译：设计数据集是数据驱动设计的基石。理想情况下，数据集应在形状与属性空间上均匀分布，以高效探索潜在关联。然而，经典实验设计侧重于形状多样性，导致属性空间的探索产生偏差。近期开发的方法或需从大规模数据集中进行子集选择，或采用存在严重限制的假设。本文提出了一种面向公平性与不确定性感知的数据生成方法（FairGen），能够从少量数据开始主动检测并生成缺失的属性。在每次迭代中，其覆盖模块计算数据覆盖范围以指导目标属性的选择；不确定性模块确保生成模型能够进行确定且准确的形状预测。通过整合这两个模块，贝叶斯优化确定目标属性，并将其输入生成模型以预测对应的形状。通过仿真分析其特性的新设计将被添加至设计数据集。本研究通过S型槽设计数据集案例，验证了FairGen在拉胀结构设计中的效率。与网格采样和随机采样相比，FairGen将覆盖评分提升速度提高至两倍，并显著扩展了属性空间中的采样区域。最终，基于FairGen生成数据集训练的生成模型，其平均绝对误差呈现出一致且显著的降低。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日