Public transportation (PT) agencies generate vast amounts of heterogeneous data from automatic fare collection (AFC), automatic passenger counting (APC), vehicle location (AVL/CAD), schedule and real-time feeds (GTFS/GTFS-RT), and proprietary platforms. These datasets offer unprecedented opportunities for data-driven planning, operations, and passenger services, but their potential is constrained by fragmentation, inconsistent update frequencies, and the lack of reproducible, interoperable pipelines. While contemporary data platform patterns and architectural styles from enterprise computing address analogous challenges in other sectors, their adaptation to the PT domain remains mostly underexplored. Transit systems present unique conditions, including the convergence of Information Technology (IT) and Operational Technology (OT), long asset lifecycles, rigorous security requirements, multi-agency coordination requirements, and the need to operate on live systems that preclude controlled experimentation.
翻译:公共交通(PT)机构从自动收费(AFC)、自动乘客计数(APC)、车辆定位(AVL/CAD)、时刻表与实时数据馈送(GTFS/GTFS-RT)以及专有平台中生成大量异构数据。这些数据集为数据驱动的规划、运营和乘客服务提供了前所未有的机遇,但其潜力受限于数据碎片化、更新频率不一致以及缺乏可复现、可互操作的数据管道。尽管企业计算中当代数据平台模式与架构风格已在其他领域成功应对类似挑战,但其在公共交通领域的适配仍鲜有探索。交通系统呈现出独特条件,包括信息技术(IT)与运营技术(OT)的融合、长资产生命周期、严格的安全要求、多机构协调需求,以及需在无法进行受控实验的实时系统上运行。