Serverless GPU Architecture for Enterprise HR Analytics: A Production-Scale BDaaS Implementation

Guilin Zhang,Wulan Guo,Ziqi Tan,Srinivas Vippagunta,Suchitra Raman,Shreeshankar Chatterjee,Ju Lin,Shang Liu,Mary Schladenhauffen,Jeffrey Luo,Hailong Jiang

from arxiv, 10 pages, 7 figures, 4 tables. Accepted to IEEE BigData 2025

Industrial and government organizations increasingly depend on data-driven analytics for workforce, finance, and regulated decision processes, where timeliness, cost efficiency, and compliance are critical. Distributed frameworks such as Spark and Flink remain effective for massive-scale batch or streaming analytics but introduce coordination complexity and auditing overheads that misalign with moderate-scale, latency-sensitive inference. Meanwhile, cloud providers now offer serverless GPUs, and models such as TabNet enable interpretable tabular ML, motivating new deployment blueprints for regulated environments. In this paper, we present a production-oriented Big Data as a Service (BDaaS) blueprint that integrates a single-node serverless GPU runtime with TabNet. The design leverages GPU acceleration for throughput, serverless elasticity for cost reduction, and feature-mask interpretability for IL4/FIPS compliance. We conduct benchmarks on the HR, Adult, and BLS datasets, comparing our approach against Spark and CPU baselines. Our results show that GPU pipelines achieve up to 4.5x higher throughput, 98x lower latency, and 90% lower cost per 1K inferences compared to Spark baselines, while compliance mechanisms add only ~5.7 ms latency with p99 < 22 ms. Interpretability remains stable under peak load, ensuring reliable auditability. Taken together, these findings provide a compliance-aware benchmark, a reproducible Helm-packaged blueprint, and a decision framework that demonstrate the practicality of secure, interpretable, and cost-efficient serverless GPU analytics for regulated enterprise and government settings.

翻译：工业与政府组织日益依赖数据驱动分析来支持人力、财务及受监管的决策流程，其中时效性、成本效益与合规性至关重要。尽管Spark和Flink等分布式框架在大规模批处理或流分析中仍保持高效，但其引入的协调复杂性与审计开销并不适用于中等规模、对延迟敏感的推理场景。与此同时，云服务商现已提供无服务器GPU服务，且TabNet等模型实现了可解释的表格机器学习，这为受监管环境下的部署方案提供了新思路。本文提出一种面向生产环境的大数据即服务（BDaaS）架构蓝图，将单节点无服务器GPU运行时与TabNet模型相结合。该设计通过GPU加速提升吞吐量，利用无服务器弹性降低成本，并借助特征掩码可解释性满足IL4/FIPS合规要求。我们在HR、Adult和BLS数据集上进行基准测试，将本方案与Spark及CPU基线进行对比。实验结果表明：相较于Spark基线，GPU流水线在每千次推理中可实现高达4.5倍的吞吐量提升、98倍的延迟降低以及90%的成本节约，而合规机制仅增加约5.7毫秒延迟（p99<22毫秒）。可解释性在峰值负载下保持稳定，确保了可靠的审计能力。综合来看，本研究提供了兼顾合规性的基准测试、可复现的Helm封装蓝图及决策框架，证明了无服务器GPU分析在受监管的企业与政府场景中实现安全、可解释且经济高效部署的可行性。