Digital workers often experience fatigue, anxiety, reduced attention, and task blockage during prolonged computer-based work. Existing productivity tools mainly focus on task completion, while general-purpose AI chatbots require users to formulate clear prompts before receiving useful help. This paper presents MindMirror, a local-first multimodal state-aware support system for digital workers. MindMirror integrates camera-based facial expression cues, text input, optional speech interaction, structured blockage reflection, local large language model (LLM)-based response generation, and daily/weekly review reports. The system forms a closed workflow of state checking, manual correction, structured articulation, suggestion generation, and state review. The current prototype follows a local-first design, while optional speech services may rely on third-party APIs when enabled. It is implemented with a Web frontend, Flask backend, an emotion recognition model, an Ollama-hosted Qwen model, Chart.js visualization, and local JSON/LocalStorage records. We evaluate the emotion recognition module on an independent seven-class image-level facial expression benchmark containing 6,767 images. The fine-tuned Hugging Face model improves accuracy from 59.66% to 94.49% over a non-fine-tuned checkpoint baseline, an absolute gain of 34.83 percentage points. We further validate the prototype through endpoint-level reliability tests, voice-interaction latency tests, and a small formative user feedback study with six digital workers. Results suggest that users value the local-first design, manual correction mechanism, and structured reflection workflow. MindMirror is not intended for psychological diagnosis; instead, it serves as a lightweight, user-controllable tool for state reflection and supportive interaction.
翻译:摘要:数字工作者在长时间基于计算机的工作中常经历疲劳、焦虑、注意力下降及任务阻塞等现象。现有生产力工具主要聚焦于任务完成,而通用AI聊天机器人要求用户明确表述提示以获取有效帮助。本文提出MindMirror——一个面向数字工作者的本地优先多模态状态感知支持系统。该系统整合基于摄像头的面部表情线索、文本输入、可选的语音交互、结构化阻塞反思、基于本地大语言模型的响应生成以及日/周回顾报告。系统形成涵盖状态检查、人工校正、结构化阐述、建议生成与状态回顾的闭环工作流。当前原型遵循本地优先设计,而可选的语音服务在启用时可能依赖第三方API。其实现依托Web前端、Flask后端、情感识别模型、Ollama托管的Qwen模型、Chart.js可视化工具及本地JSON/LocalStorage记录。我们在包含6767张图像的独立七分类图像级面部表情基准上评估情感识别模块。经微调的Hugging Face模型相较于未微调检查点基线,准确率从59.66%提升至94.49%,绝对增益达34.83个百分点。我们还通过端点级可靠性测试、语音交互延迟测试及六名数字工作者参与的小规模形成性用户反馈研究进一步验证原型。结果表明用户重视本地优先设计、人工校正机制及结构化反思工作流。MindMirror不旨在进行心理诊断,而是作为轻量级、用户可控的状态反思与支持性交互工具。