People increasingly use multiple Multimodal Large Language Models (MLLMs) concurrently, selecting each based on its perceived strengths. This cross-platform practice creates coordination challenges: adapting prompts to different interfaces, calibrating trust against inconsistent behaviors, and navigating separate conversation histories. Prior HCI research focused on single-agent interactions, leaving multi-MLLM orchestration underexplored. Through a diary study and semi-structured interviews (N=10), we examine how individuals organize work across competing AI systems. Our findings reveal that users construct primary and secondary hierarchies among models that shift over usage context. They also develop personalized switching patterns triggered by task aggregation to adjust effort and latency, and output credibility. These insights inform future tool design opportunities, supporting users to coordinate multi-MLLM workflows.
翻译:人们越来越频繁地同时使用多个多模态大语言模型(MLLMs),并根据各自认知的优势选择具体模型。这种跨平台实践带来了协调挑战:需要针对不同界面调整提示词、校准对不一致行为的不信任感,以及处理分散的对话历史。先前人机交互研究主要集中在单一代理交互上,对多MLLM编排的探索不足。通过日记研究和半结构化访谈(N=10),我们考察了个人如何在竞争性AI系统间组织工作。研究结果揭示,用户会构建模型间的“主要-次要”层级结构,且该结构会随使用情境动态变化。他们还发展出基于任务聚合触发的个性化切换模式,以调整工作量、延迟和输出可信度。这些发现为未来工具设计提供了新机遇,支持用户协调多MLLM工作流程。