More and more distributed software systems are being developed and deployed today. Like other software, distributed software systems also need very strong quality assurance support. Distributed software is often very large/complex, has distributed components, and does not have a global clock. All these characteristics make it very challenging to analyze the information flow of such systems to support the software quality assurance. One challenge is that existing dynamic analysis techniques hardly scale to large distributed software systems in the real world. It is also challenging to develop cost-effective dynamic analysis approaches. There are also applicability and portability challenges for dynamic analysis algorithms/applications of distributed software. My dissertation addresses these challenges via three novel approaches to data flow analysis for distributed software. My first approach is based on measuring interprocess communications to understand distributed software behaviors and predict distributed software quality. Then, I developed a particular approach that can actually pinpoint sensitive information via multi-staged and refinement-based dynamic information flow analysis for distributed software. Finally, I explored dynamic dependence analysis for distributed systems, utilizing reinforcement learning to automatically adjust analysis configurations for scalability and better cost-effectiveness tradeoffs.
翻译:当前越来越多的分布式软件系统正在被开发与部署。与其他软件一样,分布式软件系统同样需要强有力的质量保障支持。分布式软件通常规模庞大、结构复杂,包含分布式组件且缺乏全局时钟。这些特性使得对此类系统进行信息流分析以支持软件质量保障变得极具挑战性。其中一个挑战是现有动态分析技术难以规模化应用于真实世界中的大型分布式软件系统,开发成本高效的动态分析方法同样困难重重,此外分布式软件动态分析算法/应用还面临适用性与可移植性方面的挑战。我的博士论文通过三种创新型分布式软件数据流分析方法解决了上述挑战。第一种方法基于进程间通信度量来理解分布式软件行为并预测其质量;随后我开发了一种具体方法,通过基于多阶段与精化机制的动态信息流分析,能够准确定位分布式软件中的敏感信息;最后,我探索了面向分布式系统的动态依赖性分析,利用强化学习自动调整分析配置以实现可扩展性与更优的成本效益权衡。