Generating synthetic variants of a document is often posed as text-to-text transformation. We propose an alternate LLM based method that first decomposes a document into semantic frames and then generates text using this interim sparse format. The frames are modeled using a hypergraph, which allows perturbing the frame contents in a principled manner. Specifically, new hyperedges are mined through topological analysis and complex polyadic relationships including hierarchy and temporal dynamics are accommodated. We show that our solution generates documents that are diverse, coherent and vary in style, sentiment, format, composition and facts.
翻译:文档的合成变体生成通常被视为文本到文本的转换任务。我们提出了一种基于大语言模型的替代方法,该方法首先将文档分解为语义框架,然后利用这一中间稀疏格式生成文本。这些框架通过超图进行建模,从而能够以原则性的方式扰动框架内容。具体而言,通过拓扑分析挖掘新的超边,并容纳包括层级结构和时间动态在内的复杂多元关系。实验表明,我们的解决方案能够生成多样、连贯且在风格、情感、格式、构成和事实层面具有差异性的文档。