Modern machine learning systems deployed in safety-critical domains require visibility not only into aggregate performance but also into how training dynamics affect subgroup fairness over time. Existing training dashboards primarily support single-metric monitoring and offer limited support for examining relationships between heterogeneous metrics or diagnosing subgroup disparities during training. We present InsightBoard, an interactive TensorBoard plugin that integrates synchronized multi-metric visualization with slice-based fairness diagnostics in a unified interface. InsightBoard enables practitioners to jointly inspect training dynamics, performance metrics, and subgroup disparities through linked multi-view plots, correlation analysis, and standard group fairness indicators computed over user-defined slices. Through case studies with YOLOX on the BDD100k dataset, we demonstrate that models achieving strong aggregate performance can still exhibit substantial demographic and environmental disparities that remain hidden under conventional monitoring. By making fairness diagnostics available during training, InsightBoard supports earlier, more informed model inspection without modifying existing training pipelines or introducing additional data stores.
翻译:现代机器学习系统部署于安全关键领域时,不仅需要关注整体性能指标,更需洞悉训练动态如何随时间影响子群体公平性。现有训练仪表板主要支持单一指标监测,对异质指标间关系的探索及训练过程中子群体差异的诊断能力有限。我们提出InsightBoard——一款交互式TensorBoard插件,通过统一界面将同步多指标可视化与基于切片的公平性诊断相融合。该工具通过关联多视图图、相关性分析及用户自定义切片上的标准群体公平性指标,使实践者能够联合检查训练动态、性能指标与子群体差异。基于BDD100k数据集对YOLOX的案例研究表明,达到强整体性能的模型仍可能在人口统计与环境维度呈现显著差异,而这些差异在传统监控中难以察觉。通过将公平性诊断引入训练过程,InsightBoard支持更早、更明智的模型审查,且无需修改现有训练流水线或引入额外数据存储。