Assistive technologies for people with visual impairments (PVI) have made significant advancements, particularly with the integration of artificial intelligence (AI) and real-time sensor technologies. However, current solutions often require PVI to switch between multiple apps and tools for tasks like image recognition, navigation, and obstacle detection, which can hinder a seamless and efficient user experience. In this paper, we present NaviGPT, a high-fidelity prototype that integrates LiDAR-based obstacle detection, vibration feedback, and large language model (LLM) responses to provide a comprehensive and real-time navigation aid for PVI. Unlike existing applications such as Be My AI and Seeing AI, NaviGPT combines image recognition and contextual navigation guidance into a single system, offering continuous feedback on the user's surroundings without the need for app-switching. Meanwhile, NaviGPT compensates for the response delays of LLM by using location and sensor data, aiming to provide practical and efficient navigation support for PVI in dynamic environments.
翻译:面向视障人士的辅助技术已取得显著进展,尤其是在人工智能与实时传感技术融合方面。然而,现有解决方案通常要求视障人士在图像识别、导航和障碍物检测等任务中切换多个应用程序与工具,这可能会阻碍无缝高效的用户体验。本文提出NaviGPT,一个高保真原型系统,它整合了基于LiDAR的障碍物检测、振动反馈与大语言模型响应,旨在为视障人士提供全面实时的导航辅助。与Be My AI和Seeing AI等现有应用不同,NaviGPT将图像识别与情境化导航指引集成于单一系统,无需切换应用即可持续反馈用户周围环境信息。同时,NaviGPT通过利用位置与传感器数据弥补大语言模型的响应延迟,力求为视障人士在动态环境中提供实用高效的导航支持。