GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems

from arxiv, Latent manifold frontiers for benchmarking complex production systems, and applications to national rail operators, wind farms, and macroeconomic productivity are presented

Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as distance to this frontier, but rely on restrictive assumptions on the production set and only indirectly address heterogeneity and scale effects. We propose Geometric Manifold Analysis (GeMA), a latent manifold frontier framework implemented via a productivity-manifold variational autoencoder (ProMan-VAE). Instead of specifying a frontier function in the observed space, GeMA represents the production set as the boundary of a low-dimensional manifold embedded in the joint input-output space. A split-head encoder learns latent variables that capture technological structure and operational inefficiency. Efficiency is evaluated with respect to the learned manifold, endogenous peer groups arise as clusters in latent technology space, a quotient construction supports scale-invariant benchmarking, and a local certification radius, derived from the decoder Jacobian and a Lipschitz bound, quantifies the geometric robustness of efficiency scores. We validate GeMA on synthetic data with non-convex frontiers, heterogeneous technologies and scale bias, and on four real-world case studies: global urban rail systems (COMET), British rail operators (ORR), national economies (Penn World Table) and a high-frequency wind-farm dataset. Across these domains GeMA behaves comparably to established methods when classical assumptions hold, and provides additional insight in settings with pronounced heterogeneity, non-convexity or size-related bias.

翻译：评估铁路网络、可再生能源发电资产及国民经济等复杂系统的性能，是交通规划、监管与宏观经济分析的核心任务。经典前沿方法，特别是数据包络分析（DEA）与随机前沿分析（SFA），在观测的投入-产出空间中估计一个有效前沿，并将效率定义为与该前沿的距离，但这些方法依赖于对生产集的严格假设，且仅间接处理异质性与规模效应。我们提出几何流形分析（GeMA），这是一种通过生产力-流形变分自编码器（ProMan-VAE）实现的潜在流形前沿框架。GeMA 并非在观测空间中指定前沿函数，而是将生产集表示为嵌入联合投入-产出空间的低维流形的边界。一个分头编码器学习捕获技术结构与运营无效率的潜在变量。效率相对于学习到的流形进行评估，内生的同侪群体作为潜在技术空间中的聚类出现，商结构支持尺度不变的性能评估，而由解码器雅可比矩阵与利普希茨边界导出的局部认证半径，则量化了效率评分的几何稳健性。我们在具有非凸前沿、异质技术及规模偏差的合成数据上验证了 GeMA，并在四个真实案例研究中进行了应用：全球城市铁路系统（COMET）、英国铁路运营商（ORR）、国民经济体（佩恩世界表）以及一个高频风电场数据集。在这些领域中，当经典假设成立时，GeMA 的表现与既有方法相当；而在异质性显著、非凸性或规模相关偏差突出的场景中，GeMA 能提供更深入的洞察。