This study explores the usefulness of machine learning classifiers for modeling freight mode choice. We investigate eight commonly used machine learning classifiers, namely Naive Bayes, Support Vector Machine, Artificial Neural Network, K-Nearest Neighbors, Classification and Regression Tree, Random Forest, Boosting and Bagging, along with the classical Multinomial Logit model. US 2012 Commodity Flow Survey data are used as the primary data source; we augment it with spatial attributes from secondary data sources. The performance of the classifiers is compared based on prediction accuracy results. The current research also examines the role of sample size and training-testing data split ratios on the predictive ability of the various approaches. In addition, the importance of variables is estimated to determine how the variables influence freight mode choice. The results show that the tree-based ensemble classifiers perform the best. Specifically, Random Forest produces the most accurate predictions, closely followed by Boosting and Bagging. With regard to variable importance, shipment characteristics, such as shipment distance, industry classification of the shipper and shipment size, are the most significant factors for freight mode choice decisions.
翻译:本研究探讨了机器学习分类器在货运方式选择建模中的有效性。我们考察了八种常用机器学习分类器,即朴素贝叶斯、支持向量机、人工神经网络、K近邻、分类与回归树、随机森林、Boosting和Bagging,以及经典的多项Logit模型。以2012年美国商品流动调查数据为主要数据源,并通过二次数据源补充空间属性。基于预测准确率对分类器性能进行比较。当前研究还探讨了样本量和训练-测试数据划分比例对不同方法预测能力的影响。此外,通过估算变量重要性来确定各变量对货运方式选择的影响程度。结果表明,基于树的集成分类器表现最佳。具体而言,随机森林产生最准确的预测,Boosting和Bagging紧随其后。在变量重要性方面,货运特征(如运输距离、托运人行业分类和货运量)是货运方式选择决策中最显著的影响因素。