uTalk: Bridging the Gap Between Humans and AI

Large Language Models (LLMs) have revolutionized various industries by harnessing their power to improve productivity and facilitate learning across different fields. One intriguing application involves combining LLMs with visual models to create a novel approach to Human-Computer Interaction. The core idea of this system is to create a user-friendly platform that enables people to utilize ChatGPT's features in their everyday lives. uTalk is comprised of technologies like Whisper, ChatGPT, Microsoft Speech Services, and the state-of-the-art (SOTA) talking head system SadTalker. Users can engage in human-like conversation with a digital twin and receive answers to any questions. Also, uTalk could generate content by submitting an image and input (text or audio). This system is hosted on Streamlit, where users will be prompted to provide an image to serve as their AI assistant. Then, as the input (text or audio) is provided, a set of operations will produce a video of the avatar with the precise response. This paper outlines how SadTalker's run-time has been optimized by 27.69% based on 25 frames per second (FPS) generated videos and 38.38% compared to our 20FPS generated videos. Furthermore, the integration and parallelization of SadTalker and Streamlit have resulted in a 9.8% improvement compared to the initial performance of the system.

翻译：大型语言模型（LLMs）通过发挥其提升生产力、促进不同领域学习的能力，已对多个行业产生革命性影响。其中一项引人入胜的应用是将LLMs与视觉模型结合，创建人机交互的新方法。该系统的核心理念是构建一个用户友好平台，使人们能够在日常生活中利用ChatGPT的功能。uTalk集成了Whisper、ChatGPT、微软语音服务（Microsoft Speech Services）以及最先进的（SOTA）说话头部系统SadTalker等技术。用户可以与数字孪生体进行类人对话，并获得任何问题的答案。此外，uTalk还可通过提交图像及输入（文本或音频）来生成内容。该系统部署于Streamlit平台，用户将被提示提供一张图像作为其AI助手。随后，随着输入（文本或音频）的提供，一系列操作将生成包含精确响应的虚拟人视频。本文阐述了如何将SadTalker的运行时间优化了27.69%（基于25帧每秒（FPS）生成的视频），相比20FPS生成的视频优化了38.38%。此外，将SadTalker与Streamlit进行集成与并行化处理，相比系统初始性能提升了9.8%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日