The problem of organizing and finding images in a user's directory has become increasingly challenging due to the rapid growth in the number of images captured on personal devices. This paper presents a solution that utilizes zero shot learning to create image queries with only user provided text descriptions. The paper's primary contribution is the development of an algorithm that utilizes pre-trained models to extract features from images. The algorithm uses OWL to check for the presence of bounding boxes and sorts images based on cosine similarity scores. The algorithm's output is a list of images sorted in descending order of similarity, helping users to locate specific images more efficiently. The paper's experiments were conducted using a custom dataset to simulate a user's image directory and evaluated the accuracy, inference time, and size of the models. The results showed that the VGG model achieved the highest accuracy, while the Resnet50 and InceptionV3 models had the lowest inference time and size. The papers proposed algorithm provides an effective and efficient solution for organizing and finding images in a users local directory. The algorithm's performance and flexibility make it suitable for various applications, including personal image organization and search engines. Code and dataset for zero-search are available at: https://github.com/NainaniJatinZ/zero-search
翻译:随着个人设备上拍摄的图像数量快速增长,用户目录中图像的整理与检索问题日益严峻。本文提出一种利用零样本学习的解决方案,仅通过用户提供的文本描述即可创建图像查询。论文的主要贡献在于开发了一种算法,该算法利用预训练模型提取图像特征,通过OWL检测边界框的存在,并基于余弦相似度分数对图像进行排序。算法输出按相似度降序排列的图像列表,帮助用户更高效地定位特定图像。实验采用自定义数据集模拟用户图像目录,评估了模型的准确率、推理时间和体积大小。结果表明,VGG模型取得了最高准确率,而Resnet50与InceptionV3模型在推理时间和体积方面表现最优。本文提出的算法为用户本地目录中图像的整理与检索提供了高效且有效的解决方案,其性能与灵活性使其可应用于个人图像管理及搜索引擎等多种场景。零搜索的代码与数据集详见:https://github.com/NainaniJatinZ/zero-search