Third-party annotation is the status quo for labeling text, but egocentric information such as sentiment and belief can at best only be approximated by a third-person proxy. We introduce author labeling, an annotation technique where the writer of the document itself annotates the data at the moment of creation. We collaborate with a commercial chatbot with over 20,000 users to deploy an author labeling annotation system. This system identifies task-relevant queries, generates on-the-fly labeling questions, and records authors' answers in real time. We train and deploy an online-learning model architecture for product recommendation with author-labeled data to improve performance. We train our model to minimize the prediction error on questions generated for a set of predetermined subjective beliefs using author-labeled responses. Our model achieves a 537% improvement in click-through rate compared to an industry advertising baseline running concurrently. We then compare the quality and practicality of author labeling to three traditional annotation approaches for sentiment analysis and find author labeling to be higher quality, faster to acquire, and cheaper. These findings reinforce existing literature that annotations, especially for egocentric and subjective beliefs, are significantly higher quality when labeled by the author rather than a third party. To facilitate broader scientific adoption, we release an author labeling service for the research community at https://academic.echogroup.ai.
翻译:第三方标注是文本标注的现状,但情感与信念等自我中心信息至多只能通过第三人称代理近似获取。本文提出作者标注技术,即文档撰写者在创作时对数据进行即时标注。我们与一款拥有超过20,000名用户的商用聊天机器人合作,部署了作者标注系统。该系统能识别任务相关查询、实时生成标注问题并记录作者回答。我们基于作者标注数据构建并部署了在线学习模型架构用于产品推荐,以提升性能。该模型通过最小化针对预设主观信念生成问题的预测误差进行训练,并采用作者标注响应作为监督信号。相较于同期运行的行业广告基线,我们的模型实现了点击率537%的提升。随后我们将作者标注与三种传统情感分析标注方法在质量与实用性方面进行比较,发现作者标注具有质量更高、获取更快、成本更低的优势。这些发现印证了现有研究结论:对于自我中心及主观信念的标注,作者自标注相比第三方标注能显著提升质量。为促进更广泛的科研应用,我们在https://academic.echogroup.ai向研究社区发布了作者标注服务。