Language-Guided Multimodal Texture Authoring via Generative Models

Authoring realistic haptic textures typically requires low-level parameter tuning and repeated trial-and-error, limiting speed, transparency, and creative reach. We present a language-driven authoring system that turns natural-language prompts into multimodal textures: two coordinated haptic channels - sliding vibrations via force/speed-conditioned autoregressive (AR) models and tapping transients - and a text-prompted visual preview from a diffusion model. A shared, language-aligned latent links modalities so a single prompt yields semantically consistent haptic and visual signals; designers can write goals (e.g., "gritty but cushioned surface," "smooth and hard metal surface") and immediately see and feel the result through a 3D haptic device. To verify that the learned latent encodes perceptually meaningful structure, we conduct an anchor-referenced, attribute-wise evaluation for roughness, slipperiness, and hardness. Participant ratings are projected to the interpretable line between two real-material references, revealing consistent trends - asperity effects in roughness, compliance in hardness, and surface-film influence in slipperiness. A human-subject study further indicates coherent cross-modal experience and low effort for prompt-based iteration. The results show that language can serve as a practical control modality for texture authoring: prompts reliably steer material semantics across haptic and visual channels, enabling a prompt-first, designer-oriented workflow that replaces manual parameter tuning with interpretable, text-guided refinement.

翻译：真实触觉纹理的创作通常需要低层参数调节与反复试错，这限制了创作速度、透明度及创造性广度。我们提出了一种语言驱动的创作系统，可将自然语言提示转化为多模态纹理：两个协调的触觉通道——基于力/速度条件自回归模型生成的滑动振动与敲击瞬态信号——以及由扩散模型依据文本提示生成的视觉预览。一个共享的语言对齐潜在空间连接了各模态，使得单一提示能生成语义一致的触觉与视觉信号；设计者可以输入目标描述（如“粗糙但有缓冲感的表面”、“光滑坚硬的金属表面”），并通过三维触觉设备即时视觉观察与触觉感受结果。为验证所学潜在空间编码了具有感知意义的结构，我们采用锚定参照的属性评估方法，对粗糙度、滑腻度与硬度进行评价。参与者评分被投射至两种真实材料参照物之间的可解释连线上，揭示了粗糙度中表面微凸效应、硬度中材料柔顺性以及滑腻度中表面膜影响的趋势规律。一项人类受试者研究进一步表明跨模态体验具有一致性，且基于提示的迭代操作所需认知负荷较低。研究结果表明，语言可作为纹理创作的有效控制模态：提示语能够可靠地引导触觉与视觉通道中的材料语义，实现以提示为先、面向设计师的工作流程，将人工参数调节替代为可解释的、文本引导的精细化操作。