We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the number of bicycles available at a public rental station. While it is possible to model these as continuous values, and to apply traditional regression, this approach changes the underlying distribution on the labels from discrete to continuous. Discrete distributions have certain benefits, which leads us to the question whether such integer labels can be modeled directly by a discrete distribution, whose parameters are predicted from the features of a given instance. Moreover, we focus on the use case of output distributions of neural networks, which adds the requirement that the parameters of the distribution be continuous so that backpropagation and gradient descent may be used to learn the weights of the network. We investigate several options for such distributions, some existing and some novel, and test them on a range of tasks, including tabular learning, sequential prediction and image generation. We find that overall the best performance comes from two distributions: Bitwise, which represents the target integer in bits and places a Bernoulli distribution on each, and a discrete analogue of the Laplace distribution, which uses a distribution with exponentially decaying tails around a continuous mean.
翻译:本研究探讨预测受限于整数或整数子集的数值标签的问题。例如,社交媒体帖子的点赞数,或公共租赁站点的可用自行车数量。虽然可以将这些建模为连续值并应用传统回归方法,但这种方法会将标签的基础分布从离散变为连续。离散分布具有特定优势,这引出了一个问题:此类整数标签是否可以直接通过离散分布进行建模,而该分布的参数可根据给定实例的特征进行预测。此外,我们聚焦于神经网络输出分布的应用场景,这增加了分布参数需保持连续性的要求,以便能够通过反向传播和梯度下降来学习网络权重。我们研究了若干此类分布的选项(包括现有分布和新提出的分布),并在包括表格学习、序列预测和图像生成在内的多种任务上进行了测试。研究发现,整体最佳性能来自两种分布:Bitwise(将目标整数以比特形式表示并对每位施加伯努利分布)以及拉普拉斯分布的离散模拟(采用在连续均值周围呈指数衰减尾部的分布)。