The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborative stories dataset called CollabStory. We focus on single-author ($N=1$) to multi-author (up to $N=5$) scenarios, where multiple LLMs co-author stories. We generate over 32k stories using open-source instruction-tuned LLMs. Further, we take inspiration from the PAN tasks that have set the standard for human-human multi-author writing tasks and analysis. We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration. We find that current baselines are not able to handle this emerging scenario. Thus, CollabStory is a resource that could help propel an understanding as well as the development of techniques to discern the use of multiple LLMs. This is crucial to study in the context of writing tasks since LLM-LLM collaboration could potentially overwhelm ongoing challenges related to plagiarism detection, credit assignment, maintaining academic integrity in educational settings, and addressing copyright infringement concerns. We make our dataset and code available at \texttt{\url{https://github.com/saranya-venkatraman/multi_llm_story_writing}}.
翻译:随着实现大型语言模型(LLM)无缝互操作性的统一框架的兴起,LLM间针对开放式任务的协作已成为可能。尽管如此,目前尚未有研究探索此类协作写作。我们迈出超越人类-LLM协作的下一步,通过生成首个完全由LLM生成的协作故事数据集CollabStory,探索多LLM协作场景。我们聚焦于从单一作者($N=1$)到多作者(最多$N=5$)的场景,其中多个LLM共同创作故事。我们使用开源指令微调LLM生成了超过32k个故事。此外,我们借鉴了为人际多作者写作任务与分析设立标准的PAN任务,将其作者身份相关任务扩展至多LLM场景,并为LLM-LLM协作建立了基线模型。我们发现现有基线方法尚无法有效应对这一新兴场景。因此,CollabStory作为一个资源,有助于推动对多LLM使用辨识技术的理解与发展。这在写作任务背景下至关重要,因为LLM-LLM协作可能加剧现有挑战,包括抄袭检测、贡献归属、教育环境中学术诚信的维护以及版权侵权问题的应对。我们的数据集与代码已公开于\texttt{\url{https://github.com/saranya-venkatraman/multi_llm_story_writing}}。