Industrial and government organizations increasingly depend on data-driven analytics for workforce, finance, and regulated decision processes, where timeliness, cost efficiency, and compliance are critical. Distributed frameworks such as Spark and Flink remain effective for massive-scale batch or streaming analytics but introduce coordination complexity and auditing overheads that misalign with moderate-scale, latency-sensitive inference. Meanwhile, cloud providers now offer serverless GPUs, and models such as TabNet enable interpretable tabular ML, motivating new deployment blueprints for regulated environments. In this paper, we present a production-oriented Big Data as a Service (BDaaS) blueprint that integrates a single-node serverless GPU runtime with TabNet. The design leverages GPU acceleration for throughput, serverless elasticity for cost reduction, and feature-mask interpretability for IL4/FIPS compliance. We conduct benchmarks on the HR, Adult, and BLS datasets, comparing our approach against Spark and CPU baselines. Our results show that GPU pipelines achieve up to 4.5x higher throughput, 98x lower latency, and 90% lower cost per 1K inferences compared to Spark baselines, while compliance mechanisms add only ~5.7 ms latency with p99 < 22 ms. Interpretability remains stable under peak load, ensuring reliable auditability. Taken together, these findings provide a compliance-aware benchmark, a reproducible Helm-packaged blueprint, and a decision framework that demonstrate the practicality of secure, interpretable, and cost-efficient serverless GPU analytics for regulated enterprise and government settings.
翻译:工业与政府组织日益依赖数据驱动分析来支持人力资源、财务及受监管决策流程,其中时效性、成本效益与合规性至关重要。虽然Spark和Flink等分布式框架在大规模批处理或流式分析中仍保持高效,但其引入的协调复杂性与审计开销与中等规模、延迟敏感的推理场景存在错配。与此同时,云服务商现已提供Serverless GPU服务,而TabNet等模型能够实现可解释的表格机器学习,这为受监管环境催生了新的部署方案。本文提出一种面向生产环境的大数据即服务(BDaaS)架构方案,将单节点Serverless GPU运行时与TabNet模型相结合。该设计通过GPU加速提升吞吐量,利用Serverless弹性降低成本,并借助特征掩码可解释性满足IL4/FIPS合规要求。我们在HR、Adult和BLS数据集上进行基准测试,将本方案与Spark及CPU基线方案进行对比。实验结果表明:相较于Spark基线方案,GPU流水线在每千次推理中可实现高达4.5倍的吞吐量提升、98倍的延迟降低以及90%的成本节约,而合规机制仅增加约5.7毫秒延迟(p99<22毫秒)。在峰值负载下可解释性保持稳定,确保了可靠的可审计性。综合而言,本研究提供了兼顾合规性的基准测试、可复现的Helm封装方案及决策框架,证明了面向受监管企业和政府场景的安全、可解释且高性价比的Serverless GPU分析方案的实用性。