The opaque nature of transformer-based models, particularly in applications susceptible to unethical practices such as dark-patterns in user interfaces, requires models that integrate uncertainty quantification to enhance trust in predictions. This study focuses on dark-pattern detection, deceptive design choices that manipulate user decisions, undermining autonomy and consent. We propose a differential fine-tuning approach implemented at the final classification head via uncertainty quantification with transformer-based pre-trained models. Employing a dense neural network (DNN) head architecture as a baseline, we examine two methods capable of quantifying uncertainty: Spectral-normalized Neural Gaussian Processes (SNGPs) and Bayesian Neural Networks (BNNs). These methods are evaluated on a set of open-source foundational models across multiple dimensions: model performance, variance in certainty of predictions and environmental impact during training and inference phases. Results demonstrate that integrating uncertainty quantification maintains performance while providing insights into challenging instances within the models. Moreover, the study reveals that the environmental impact does not uniformly increase with the incorporation of uncertainty quantification techniques. The study's findings demonstrate that uncertainty quantification enhances transparency and provides measurable confidence in predictions, improving the explainability and clarity of black-box models. This facilitates informed decision-making and mitigates the influence of dark-patterns on user interfaces. These results highlight the importance of incorporating uncertainty quantification techniques in developing machine learning models, particularly in domains where interpretability and trustworthiness are critical.
翻译:基于Transformer的模型具有不透明性,特别是在易受不道德实践(如用户界面中的暗黑模式)影响的应用中,需要集成不确定性量化以增强对预测的信任。本研究聚焦于暗黑模式检测,即通过欺骗性设计选择操纵用户决策、损害用户自主权与知情同意的行为。我们提出一种在最终分类头部通过不确定性量化实现的差异化微调方法,该方法基于Transformer预训练模型。采用密集神经网络(DNN)头部架构作为基线,我们研究了两种能够量化不确定性的方法:谱归一化神经高斯过程(SNGPs)和贝叶斯神经网络(BNNs)。这些方法在一组开源基础模型上从多个维度进行评估:模型性能、预测确定性方差以及训练和推理阶段的环境影响。结果表明,集成不确定性量化在保持性能的同时,能够提供对模型内部挑战性实例的深入洞察。此外,研究发现环境影响的增加并非与不确定性量化技术的引入呈均匀正相关。本研究证明,不确定性量化增强了透明度,为预测提供了可衡量的置信度,从而改善了黑盒模型的可解释性与清晰度。这有助于促进知情决策并减轻暗黑模式对用户界面的影响。这些结果凸显了在开发机器学习模型时纳入不确定性量化技术的重要性,特别是在可解释性与可信度至关重要的领域。