Communication is defined as ``Who says what to whom with what effect.'' A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLMs on receiver behavior can actually help improve their content-understanding abilities. Specifically, we show that training LLMs to predict the receiver behavior of likes and comments improves the LLM's performance on a wide variety of downstream content understanding tasks. We show this performance increase over 40 video and image understanding tasks over 23 benchmark datasets across both 0-shot and fine-tuning settings, outperforming many supervised baselines. Moreover, since receiver behavior, such as likes and comments, is collected by default on the internet and does not need any human annotations to be useful, the performance improvement we get after training on this data is essentially free-lunch. We release the receiver behavior cleaned comments and likes of 750k images and videos collected from multiple platforms along with our instruction-tuning data.
翻译:沟通被定义为“谁通过什么方式对谁说了什么,产生了什么效果”。来自传播者的信息会引发下游接收者效应,即行为。接收者行为作为信息的下游效应,承载着关于该信息的丰富信号。尽管行为数据携带了信息的相关信号,但在训练大语言模型时往往被忽视。我们证明,在接收者行为上训练大语言模型实际上能提升其内容理解能力。具体而言,我们展示训练大语言模型预测点赞和评论等接收者行为,能显著提升其在多种下游内容理解任务上的表现。我们在23个基准数据集上对40项视频和图像理解任务进行了零样本和微调设置下的实验,结果表明其性能超越了多个有监督基线方法。此外,由于点赞和评论等接收者行为是互联网默认收集的,无需人工标注即可使用,因此基于此类数据训练所带来的性能提升本质上是一种“免费午餐”。我们发布了从多个平台收集的75万条图像和视频的接收者行为清洗评论和点赞数据,以及对应的指令调优数据。