Text to image (T2I) models such as gpt-image-2 can now generate publication grade academic figures from a short prompt, but the output is a flat raster: a user who wants to change one arrow, one label, or one icon has to regenerate the whole image, which also disturbs the parts they wanted to keep. We present sketch-plot, an interactive system that closes this controllability gap with a three layer progressive editing pipeline: a generated PNG, an addressable puzzle of editable pieces, and a per piece SVG. The user stops at the layer that gives them enough control for the change at hand, so the cost of decomposition and vectorisation is paid only on the pieces that need it. Realising this pipeline is not trivial. General segmentation models lack the semantic discriminability to decompose a research figure cleanly, and end to end image vectorisation produces incomplete shapes and loses semantic structure. We therefore route both stages through a human in the loop interface that lets the user accept, refine, or reject decomposition and vectorisation decisions on a piece by piece basis. We validate the design with an expert user study, in which participants found sketch-plot effective for making targeted edits to AI generated academic figures and preferred it over regenerating the whole image. A demonstration video is available at https://paper-plot.dev/sketch.
翻译:摘要:当前文本到图像(T2I)模型(如gpt-image-2)已能通过简短提示词生成达到出版质量的学术图表,但其输出为扁平化光栅图像——用户若需修改某个箭头、标签或图标,必须重新生成整张图像,同时会破坏原本满意的部分。本文提出sketch-plot交互系统,通过三层渐进式编辑流水线填补这一可控性缺口:生成的PNG图像层、可寻址的模块化编辑拼图层,以及逐元素SVG层。用户可根据具体修改需求选择对应的控制层级,仅在需要调整的元件上付出分解与矢量化成本。实现该流水线存在显著挑战:通用分割模型缺乏足够的语义区分能力以准确分解科研图表,而端到端图像矢量化会产生不完整形状并丢失语义结构。为此,我们构建了人机协同界面,允许用户逐元件接受、优化或拒绝分解与矢量化结果。通过专家用户研究验证设计,参与者认为sketch-plot能有效实现AI生成学术图表的精准修改,其使用偏好度显著高于全图重新生成方案。演示视频见https://paper-plot.dev/sketch。