Multimodal Approaches to Fair Image Classification: An Ethical Perspective

In the rapidly advancing field of artificial intelligence, machine perception is becoming paramount to achieving increased performance. Image classification systems are becoming increasingly integral to various applications, ranging from medical diagnostics to image generation; however, these systems often exhibit harmful biases that can lead to unfair and discriminatory outcomes. Machine Learning systems that depend on a single data modality, i.e. only images or only text, can exaggerate hidden biases present in the training data, if the data is not carefully balanced and filtered. Even so, these models can still harm underrepresented populations when used in improper contexts, such as when government agencies reinforce racial bias using predictive policing. This thesis explores the intersection of technology and ethics in the development of fair image classification models. Specifically, I focus on improving fairness and methods of using multiple modalities to combat harmful demographic bias. Integrating multimodal approaches, which combine visual data with additional modalities such as text and metadata, allows this work to enhance the fairness and accuracy of image classification systems. The study critically examines existing biases in image datasets and classification algorithms, proposes innovative methods for mitigating these biases, and evaluates the ethical implications of deploying such systems in real-world scenarios. Through comprehensive experimentation and analysis, the thesis demonstrates how multimodal techniques can contribute to more equitable and ethical AI solutions, ultimately advocating for responsible AI practices that prioritize fairness.

翻译：在人工智能快速发展的领域中，机器感知对于实现更高性能变得至关重要。图像分类系统正日益成为从医疗诊断到图像生成等各种应用的核心组成部分；然而，这些系统常常表现出有害的偏见，可能导致不公平和歧视性结果。依赖单一数据模态（即仅图像或仅文本）的机器学习系统，若数据未经仔细平衡和过滤，可能放大训练数据中隐藏的偏差。即便如此，当这些模型在不适当的语境中使用时——例如政府机构利用预测性警务强化种族偏见——仍可能损害代表性不足的群体。本论文探讨了公平图像分类模型发展中技术与伦理的交汇点。具体而言，我专注于提升公平性以及利用多模态对抗有害人口统计偏见的方法。通过整合多模态方法——将视觉数据与文本、元数据等其他模态相结合——本研究得以增强图像分类系统的公平性与准确性。该研究批判性地审视了图像数据集和分类算法中现有的偏见，提出了缓解这些偏见的创新方法，并评估了在现实场景中部署此类系统的伦理影响。通过全面的实验与分析，本论文论证了多模态技术如何有助于实现更公平、更合乎伦理的人工智能解决方案，最终倡导以公平为首要责任的负责任人工智能实践。