Text to image (T2I) models such as gpt-image-2 can now generate publication grade academic figures from a short prompt, but the output is a flat raster: a user who wants to change one arrow, one label, or one icon has to regenerate the whole image, which also disturbs the parts they wanted to keep. We present sketch-plot, an interactive system that closes this controllability gap with a three layer progressive editing pipeline: a generated PNG, an addressable puzzle of editable pieces, and a per piece SVG. The user stops at the layer that gives them enough control for the change at hand, so the cost of decomposition and vectorisation is paid only on the pieces that need it. Realising this pipeline is not trivial. General segmentation models lack the semantic discriminability to decompose a research figure cleanly, and end to end image vectorisation produces incomplete shapes and loses semantic structure. We therefore route both stages through a human in the loop interface that lets the user accept, refine, or reject decomposition and vectorisation decisions on a piece by piece basis. We validate the design with an expert user study, in which participants found sketch-plot effective for making targeted edits to AI generated academic figures and preferred it over regenerating the whole image. A demonstration video is available at https://anonymous.4open.science/r/SketchPlotVideo/.
翻译:文本到图像(T2I)模型(如gpt-image-2)现已能通过简短提示生成达到发表级别的学术图表,但其输出为平面光栅图像:用户若需修改一个箭头、标签或图标,必须重新生成整张图像,这也会破坏其希望保留的部分。我们提出交互式系统sketch-plot,通过三层渐进式编辑流水线弥合这一可控性鸿沟:生成的PNG图像、可寻址的可编辑拼图块、以及逐块SVG矢量图。用户可在适合当前修改需求的层级停止操作,因此分解与矢量化成本仅由需要修改的图形块承担。实现该流水线并非易事。通用分割模型缺乏语义区分能力以清晰分解学术图表,而端到端图像矢量化会产生不完整图形并丢失语义结构。为此,我们通过人工在环接口引导这两个阶段,允许用户逐块接受、优化或拒绝分解与矢量化决策。我们通过专家用户研究验证了该设计,参与者发现sketch-plot能有效对AI生成的学术图表进行定向编辑,且优于重新生成整幅图像。演示视频见https://anonymous.4open.science/r/SketchPlotVideo/。