This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy. By incorporating knowledge transfer from large pre-trained generative models, PASS enhances estimation accuracy, yielding refined distributional estimates of various statistics via Monte Carlo experiments. On the other hand, PAI boasts its statistically guaranteed validity. In pivotal inference, it enables precise conclusions even without prior knowledge of the pivotal's distribution. In non-pivotal situations, we enhance the reliability of synthetic data generation by training it with an independent holdout sample. We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
翻译:本文提出了一种新颖的扰动辅助推断(PAI)框架,该框架利用扰动辅助样本合成(PASS)方法生成的合成数据。该框架专注于复杂数据场景中的不确定性量化,特别是在使用深度学习模型处理非结构化数据时。一方面,PASS采用生成模型创建与原始数据高度相似的合成数据,并通过数据扰动保留其秩特性,从而增强数据多样性并强化隐私保护。通过引入大型预训练生成模型的知识迁移,PASS提升了估计精度,并借助蒙特卡洛实验获得了各类统计量的精细化分布估计。另一方面,PAI具有经统计验证的有效性。在枢轴推断中,即便缺乏关于枢轴量分布的先验知识,它也能得出精确结论。在非枢轴情形下,我们通过使用独立保留样本训练合成数据生成过程,提升了其可靠性。通过将PAI应用于图像合成、情感词分析、多模态推断及预测区间构建等多样化领域,我们证明了其在推进复杂数据驱动任务不确定性量化方面的有效性。