With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across problems to improve the decisions obtained by solving those problems individually. We propose a novel cluster-based Shrunken-SAA approach that can exploit the cluster structure among problems when implementing the data aggregation approaches. We prove that, as the number of problems grows, leveraging the given cluster structure among problems yields additional benefits over the data aggregation approaches that neglect such structure. When the cluster structure is unknown, we show that unveiling the cluster structure, even at the cost of a few data points, can be beneficial, especially when the distance between clusters of problems is substantial. Our proposed approach can be extended to general cost functions under mild conditions. When the number of problems gets large, the optimality gap of our proposed approach decreases exponentially in the distance between the clusters. We explore the performance of the proposed approach through the application of managing newsvendor systems via numerical experiments. We investigate the impacts of distance metrics between problem instances on the performance of the cluster-based Shrunken-SAA approach with synthetic data. We further validate our proposed approach with real data and highlight the advantages of cluster-based data aggregation, especially in the small-data large-scale regime, compared to the existing approaches.
翻译:随着市场环境日益波动与产品快速创新,大规模系统的运营决策需要利用有限的数据解决数千个问题。数据聚合方法通过跨问题整合数据,能够改善单独求解每个问题所获得的决策效果。我们提出了一种基于聚类的Shrunken-SAA方法,该方法在实施数据聚合时能够利用问题间的聚类结构。我们证明,随着问题数量的增加,利用已知的聚类结构相较于忽略该结构的数据聚合方法能带来额外收益。当聚类结构未知时,我们揭示即使以牺牲少量数据点为代价,揭示聚类结构仍可能是有益的,尤其当问题簇之间的间距较大时。所提出的方法可在温和条件下推广至一般成本函数。当问题规模增大时,该方法的最优性间隙会随簇间距离呈指数级递减。通过管理报童系统的数值实验,我们探索了该方法的表现。利用合成数据,我们研究了问题实例间距离度量对基于聚类的Shrunken-SAA方法性能的影响。我们进一步通过真实数据验证该方法,并突出展示了基于聚类的数据聚合相较于现有方法的优势,尤其是在小数据大规模场景下。