Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.
翻译:许多现实世界的预测任务中,结果变量具有典型的重尾分布特征,例如书籍销量、艺术品拍卖价格、仓库商品需求等。通过学习重尾分布,能够对"大而罕见"的样本(如畅销书)做出准确预测。现有方法大多未专门针对重尾分布进行设计,因此会严重低估这类样本。为解决该问题,我们提出学习放置方法(L2P),该方法利用样本间的成对关系进行学习。在训练阶段,L2P学习一个成对偏好分类器:判断样本A是否优于样本B?在放置阶段,L2P通过将新样本置于已知样本之间来获得预测结果,并根据其位置为新样本的结果变量赋值。真实数据实验表明,L2P在预测精度和重尾分布复现能力上均优于对比方法。此外,L2P通过将每个预测样本与其可比较的邻近样本进行关联放置,提供了可解释模型——当涉及生命财产安全时,这种可解释性尤为重要。