We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.
翻译:我们推出了一款开源的、可通过pip安装的工具箱——Sig-Networks,这是首个专门用于纵向语言建模的工具。其核心在于整合了基于签名的神经网络模型,这类模型近期在时序任务中展现出优异性能。我们应用并扩展了已有研究,提供了一套完整的基于签名的模型体系,其组件可作为PyTorch构建模块用于未来架构设计。Sig-Networks支持任务无关的数据集接入、时序数据的无缝预处理、参数灵活调节以及多模型的自动化调参。我们在三种不同时间粒度的自然语言处理任务(心理咨询对话、谣言立场转变、社交媒体线程情绪变化)中对签名网络进行了评估,三项任务均达到当前最优性能,并为后续任务提供了指导。我们以PyTorch包形式发布了该工具箱,配套介绍视频、预处理与建模的Git仓库,以及针对所建模NLP任务的示例笔记。