We introduce the new task of clinically meaningful summarisation of social media user timelines, appropriate for mental health monitoring. We develop a novel approach for unsupervised abstractive summarisation that produces a two-layer summary consisting of both high-level information, covering aspects useful to clinical experts, as well as accompanying time sensitive evidence from a user's social media timeline. A key methodological novelty comes from the timeline summarisation component based on a version of hierarchical variational autoencoder (VAE) adapted to represent long texts and guided by LLM-annotated key phrases. The resulting timeline summary is input into a LLM (LLaMA-2) to produce the final summary containing both the high level information, obtained through instruction prompting, as well as corresponding evidence from the user's timeline. We assess the summaries generated by our novel architecture via automatic evaluation against expert written summaries and via human evaluation with clinical experts, showing that timeline summarisation by TH-VAE results in logically coherent summaries rich in clinical utility and superior to LLM-only approaches in capturing changes over time.
翻译:我们提出了社会媒体用户时间线的临床意义总结这一新任务,适用于心理健康监测。我们开发了一种新颖的无监督抽象式总结方法,生成包含两层信息的总结:既涵盖对临床专家有用的大尺度信息,又包含来自用户社交媒体时间线的伴随时间敏感证据。方法论上的关键创新在于基于分层变分自编码器(VAE)的时序总结组件,该组件适用于长文本表示,并受大语言模型标注的关键短语引导。生成的时序总结输入至大语言模型(LLaMA-2),通过指令提示获得包含大尺度信息及用户时间线相应证据的最终总结。我们通过自动评估(对比专家撰写总结)和临床专家人工评估来检验新架构生成的总结,结果表明基于TH-VAE的时序总结能生成逻辑连贯、富含临床实用性的总结,且在捕捉随时间变化方面优于仅依赖大语言模型的方法。