The increasing availability of granular and big data on various objects of interest has made it necessary to develop methods for condensing this information into a representative and intelligible map. Financial regulation is a field that exemplifies this need, as regulators require diverse and often highly granular data from financial institutions to monitor and assess their activities. However, processing and analyzing such data can be a daunting task, especially given the challenges of dealing with missing values and identifying clusters based on specific features. To address these challenges, we propose a variant of Lloyd's algorithm that applies to probability distributions and uses generalized Wasserstein barycenters to construct a metric space which represents given data on various objects in condensed form. By applying our method to the financial regulation context, we demonstrate its usefulness in dealing with the specific challenges faced by regulators in this domain. We believe that our approach can also be applied more generally to other fields where large and complex data sets need to be represented in concise form.
翻译:各类目标对象上细粒度与大数据的日益可得,使得开发将此类信息浓缩为具有代表性且可理解地图的方法成为必要。金融监管领域充分体现了这一需求,因为监管机构需要从金融机构获取多样化且通常高度细粒度的数据,以监控和评估其活动。然而,处理和分析这些数据可能是一项艰巨的任务,尤其是在处理缺失值和基于特定特征识别聚类时面临的挑战。为应对这些挑战,我们提出了一种Lloyd算法的变体,该算法适用于概率分布,并利用广义Wasserstein重心构建一个度量空间,以浓缩形式表示给定各类对象的数据。通过将我们的方法应用于金融监管情境,我们展示了其在应对该领域监管机构面临的特定挑战方面的实用性。我们相信,该方法也可更广泛地应用于其他需要以简洁形式表示大型复杂数据集的领域。