We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.
翻译:我们推出了一款开源、可pip安装的工具包Sig-Networks,这是首个专为纵向语言建模设计的同类工具包。其核心在于集成基于签名的神经网络模型,这类模型近期在时序任务中展现出显著成效。我们应用并拓展了已发表的研究成果,提供了一套完整的签名模型体系,其组件可作为PyTorch基础模块用于未来架构设计。Sig-Networks支持任务无关的数据集即插即用、序列数据的无缝预处理、参数灵活调节以及跨模型自动调优。我们在三种时间粒度各异的自然语言处理任务(心理咨询对话、谣言立场转变及社交媒体话题情绪波动)中检验了签名网络,均取得了最先进性能,并为后续任务提供了指导。该工具包以PyTorch包形式发布,配套有入门视频、预处理与建模的Git仓库(含建模任务的示例笔记本)。