The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In this study, we used the 2017 Commodity Flow Survey Public Use File data set to explore building a high-performance freight mode choice model, considering three main improvements: (1) constructing local models for each separate commodity/industry category; (2) extracting useful geographical features, particularly the derived distance of each freight mode between origin/destination zones; and (3) applying additional ensemble learning methods such as stacking or voting to combine results from local and unified models for improved performance. The proposed method achieved over 92% accuracy without incorporating external information, an over 19% increase compared to directly fitting Random Forests models over 10,000 samples. Furthermore, SHAP (Shapely Additive Explanations) values were computed to explain the outputs and major patterns obtained from the proposed model. The model framework could enhance the performance and interpretability of existing freight mode choice models.
翻译:美国人口普查局已通过商品流量调查收集了两轮实验数据,提供了全国性商品流动的货运层面特征,分别发布于2012年(即公共使用微观数据)和2017年(即公共使用文件)。借助这些信息,数据驱动方法在理解货运物流的详细模式方面日益重要。本研究利用2017年商品流量调查公共使用文件数据集,探索构建高性能货运方式选择模型,主要考虑了三个改进方向:(1)为每个单独的商品/行业类别构建局部模型;(2)提取有用的地理特征,特别是各货运方式在起讫区域间的导出距离;(3)应用额外的集成学习方法(如堆叠或投票)来融合局部模型与统一模型的结果以提升性能。所提出的方法在不引入外部信息的情况下实现了超过92%的准确率,相较于在10000个样本上直接拟合随机森林模型提升了19%以上。此外,计算了SHAP(沙普利加性解释)值以解释所提模型的输出及主要模式。该模型框架能够增强现有货运方式选择模型的性能与可解释性。