Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.
翻译:插图是人类表达与交流的基本模式。伴随言语的特定类型运动可提供这种图示化的交流方式。尽管增强现实与虚拟现实技术(AR/VR)已引入通过手部运动进行绘图(空中绘图)的工具,但它们通常需要昂贵硬件和额外数字标记,从而限制了可访问性与便携性。此外,空中绘图需要相当高的技巧才能获得美观效果。为应对这些挑战,我们提出AirSketch概念,旨在直接从手部运动生成准确且视觉连贯的草图,无需复杂头戴设备或标记装置。我们设计了一种基于数据增强的简单自监督训练流程,使可控图像扩散模型能够学习从高度噪声的手部追踪图像转换为清晰美观的草图,同时保留原始追踪数据中的关键视觉线索。我们构建了两个空中绘图数据集以研究该问题。研究结果表明,除了能从精确空间输入生成逼真图像外,可控图像扩散模型还能有效从噪声输入生成精细化清晰草图。本工作为无标记空中绘图迈出初步步伐,并揭示了可控扩散模型在AirSketch及AR/VR领域的独特应用前景。