MindMirror: A Local-First Multimodal State-Aware Support System for Digital Workers

Digital workers often experience fatigue, anxiety, reduced attention, and task blockage during prolonged computer-based work. Existing productivity tools mainly focus on task completion, while general-purpose AI chatbots require users to formulate clear prompts before receiving useful help. This paper presents MindMirror, a local-first multimodal state-aware support system for digital workers. MindMirror integrates camera-based facial expression cues, text input, optional speech interaction, structured blockage reflection, local large language model (LLM)-based response generation, and daily/weekly review reports. The system forms a closed workflow of state checking, manual correction, structured articulation, suggestion generation, and state review. The current prototype follows a local-first design, while optional speech services may rely on third-party APIs when enabled. It is implemented with a Web frontend, Flask backend, an emotion recognition model, an Ollama-hosted Qwen model, Chart.js visualization, and local JSON/LocalStorage records. We evaluate the emotion recognition module on an independent seven-class image-level facial expression benchmark containing 6,767 images. The fine-tuned Hugging Face model improves accuracy from 59.66% to 94.49% over a non-fine-tuned checkpoint baseline, an absolute gain of 34.83 percentage points. We further validate the prototype through endpoint-level reliability tests, voice-interaction latency tests, and a small formative user feedback study with six digital workers. Results suggest that users value the local-first design, manual correction mechanism, and structured reflection workflow. MindMirror is not intended for psychological diagnosis; instead, it serves as a lightweight, user-controllable tool for state reflection and supportive interaction.

翻译：摘要：数字工作者在长时间基于计算机的工作中常经历疲劳、焦虑、注意力下降及任务阻塞等现象。现有生产力工具主要聚焦于任务完成，而通用AI聊天机器人要求用户明确表述提示以获取有效帮助。本文提出MindMirror——一个面向数字工作者的本地优先多模态状态感知支持系统。该系统整合基于摄像头的面部表情线索、文本输入、可选的语音交互、结构化阻塞反思、基于本地大语言模型的响应生成以及日/周回顾报告。系统形成涵盖状态检查、人工校正、结构化阐述、建议生成与状态回顾的闭环工作流。当前原型遵循本地优先设计，而可选的语音服务在启用时可能依赖第三方API。其实现依托Web前端、Flask后端、情感识别模型、Ollama托管的Qwen模型、Chart.js可视化工具及本地JSON/LocalStorage记录。我们在包含6767张图像的独立七分类图像级面部表情基准上评估情感识别模块。经微调的Hugging Face模型相较于未微调检查点基线，准确率从59.66%提升至94.49%，绝对增益达34.83个百分点。我们还通过端点级可靠性测试、语音交互延迟测试及六名数字工作者参与的小规模形成性用户反馈研究进一步验证原型。结果表明用户重视本地优先设计、人工校正机制及结构化反思工作流。MindMirror不旨在进行心理诊断，而是作为轻量级、用户可控的状态反思与支持性交互工具。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

在通信中断环境下人工智能维持相关态势感知

专知会员服务

16+阅读 · 7月2日

OpenEarthAgent：一种面向工具增强型地理空间智能体的统一框架

专知会员服务

16+阅读 · 2月20日

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

专知会员服务

32+阅读 · 2025年8月7日

DeepSeek R1方法成功迁移到视觉领域，多模态AI迎来新突破！

专知会员服务

25+阅读 · 2025年2月21日