Communication is defined as "Who says what to whom with what effect". A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLMs on receiver behavior can actually help improve their content-understanding abilities. Specifically, we show that training LLMs to predict the receiver behavior of likes and comments improves the LLM's performance on a wide variety of downstream content understanding tasks. We show this performance increase over 46 video and image understanding tasks over 26 benchmark datasets across both 0-shot and fine-tuning settings, outperforming many supervised baselines. Moreover, since receiver behavior, such as likes and comments, is collected by default on the internet and does not need any human annotations to be useful, the performance improvement we get after training on this data is essentially free-lunch. We release the receiver behavior cleaned comments and likes of 750k images and videos collected from multiple platforms along with our instruction-tuning data.
翻译:沟通被定义为"谁通过什么渠道向谁说了什么并产生了何种效果"。来自传播者的信息会产生下游接收者效应,即行为。接收者行为作为信息的下游效应,承载着关于信息的丰富信号。尽管这些行为数据蕴含着信息特征,但在训练大型语言模型时往往被忽视。我们证明,基于接收者行为训练LLM实际上能够提升其内容理解能力。具体而言,我们通过训练LLM预测接收者的点赞和评论行为,显著提升了模型在多种下游内容理解任务中的表现。我们在26个基准数据集的46项视频与图像理解任务中验证了这种性能提升,涵盖零样本和微调两种设置,其表现优于许多监督基线方法。此外,由于点赞和评论这类接收者行为在互联网上默认被收集且无需人工标注即可使用,基于此类数据训练获得的性能提升本质上属于"免费增益"。我们公开了从多平台收集的75万张图像和视频的接收者行为数据(经清洗的评论与点赞数据)以及相应的指令微调数据集。