Collocating deep learning training tasks improves GPU utilization but risks resource contention, severe slowdowns, and out-of-memory (OOM) failures. Accurate memory estimation is essential for robust collocation, and GPU utilization estimation -- a key proxy for contention -- enables interference-aware scheduling. Existing GPU memory estimators span three paradigms -- analytical models, CPU-side libraries, and ML-based estimators -- each with distinct limitations: dependence on detailed model specifications, intrusive integration, poor generalization, and varying latency overhead. GPU heterogeneity further complicates estimation, as identical tasks can exhibit different memory footprints across hardware generations. GPU utilization remains comparatively understudied, further complicated by non-additive utilization metrics and GPU heterogeneity. We conduct a systematic analysis of representative estimators from each paradigm -- Horus, PyTorch FakeTensor, and our lightweight ML-based estimator -- evaluating accuracy, generalizability, and overhead. We construct a synthetic dataset spanning MLPs, CNNs, and Transformers with controlled architectural variations, and train MLP- and Transformer-based estimators for memory prediction, and experiment with utilization estimation. Our evaluation reveals key tradeoffs and validates estimators against real-world unseen models. Significant challenges remain: analytical models lack generalization and cannot easily be extended to new GPU architectures or accurately reflect memory optimization savings; CPU-side libraries impose intrusive integration overhead; and both analytical and ML-based estimators rely on model specifications or computation graphs, limiting generalization across diverse architectures and hardware variants. We release all datasets, tools, and artifacts to support further research.
翻译:暂无翻译