Each year the American Statistical Association (ASA) hosts the Annual Data Challenge Expo, which tasks participants with analyzing a given dataset and presenting their work at the Joint Statistical Meeting (JSM). The 2025 Data Challenge Expo tasked participants with analyzing over 35 years of commercial flight data from the United States Bureau of Transportation Statistics (BTS). These data provide extensive geographic coverage and operational details for the U.S. domestic aviation market. For millions of past flights, there is information about the flight's date, origin, destination, carrier, plane, departure, and arrival. In this article, we present our analysis for the 2025 JSM Data Challenge Expo. We chose to explore patterns in the daily scheduling of departures and arrivals across airlines, airports, and time. In doing so, we observed distinct scheduling ``waves'', or periodic structures at major airline hubs as well as large Federal Aviation Administration (FAA) hubs. In the remainder of this article, we detail the process of visualizing periodicity in flight scheduling as well as quantifying it through the calculation of Shannon entropy. An additional element to the 2025 Data Challenge Expo is the incorporation of a second dataset, to be decided by the participants. We detail the use of a BTS dataset with passenger enplanement (boarding) information to determine Federal Aviation Administration (FAA) hub classification (as opposed to airline-specific hubs). Furthermore, we discuss results from this visual and quantitative analysis, highlighting noticeable differences in the scheduling periodicity and entropy across airports, for the ``big four'' or four largest carriers, in U.S. aviation: American Airlines, Delta Air Lines, United Airlines, and Southwest Airlines.
翻译:美国统计协会(ASA)每年举办数据挑战博览会,要求参与者分析给定的数据集,并在联合统计会议(JSM)上展示成果。2025年数据挑战博览会要求参与者分析美国交通统计局(BTS)提供的超过35年的商业航班数据。这些数据提供了美国国内航空市场广泛的地理覆盖和运营细节,涵盖数百万历史航班的信息,包括航班日期、出发地、目的地、承运商、机型、起飞和到达时间。在本文中,我们展示了对2025年JSM数据挑战博览会的分析。我们选择探索不同航空公司、机场及时间维度下每日起飞和到达航班时间表的模式。在此过程中,我们观察到主要航空公司枢纽以及美国联邦航空管理局(FAA)大型枢纽中存在明显的调度“波”或周期性结构。本文后续部分,我们将详细描述航班调度周期性的可视化过程,以及通过计算香农熵来量化该周期性的方法。2025年数据挑战博览会还要求参与者自行决定引入第二组数据集。我们详细说明了如何利用BTS的乘客登机数据来确定FAA枢纽分类(与航空公司特定枢纽不同)。此外,我们讨论了这一可视化和量化分析的结果,重点展示了美国航空业“四大”承运商(美国航空、达美航空、联合航空和西南航空)在不同机场间调度周期性和熵值的显著差异。