In recent years, chatbots have gained widespread adoption thanks to their ability to assist users at any time and across diverse domains. However, the lack of large-scale curated datasets limits research on their quality and reliability. This paper presents TOFU-D, a snapshot of 1,788 Dialogflow chatbots from GitHub, and COD, a curated subset of TOFU-D including 185 validated chatbots. The two datasets capture a wide range of domains, languages, and implementation patterns, offering a sound basis for empirical studies on chatbot quality and security. A preliminary assessment using the Botium testing framework and the Bandit static analyzer revealed gaps in test coverage and frequent security vulnerabilities in several chatbots, highlighting the need for systematic, multi-Platform research on chatbot quality and security.
翻译:近年来,聊天机器人因其能够随时随地在不同领域为用户提供协助而获得广泛应用。然而,大规模精选数据集的缺乏限制了对聊天机器人质量与可靠性的研究。本文提出TOFU-D——一个包含GitHub上1,788个Dialogflow聊天机器人的快照数据集,以及COD——一个包含185个已验证聊天机器人的TOFU-D精选子集。这两个数据集涵盖了广泛的领域、语言与实现模式,为聊天机器人质量与安全的实证研究提供了可靠基础。通过Botium测试框架与Bandit静态分析器进行的初步评估显示,部分聊天机器人存在测试覆盖不足及常见安全漏洞,凸显了对聊天机器人质量与安全开展系统性、多平台研究的必要性。