How to Do Statistical Evaluations in ECE/CS Papers: A Practical Playbook for Defensible Results

Strong experimental papers in electrical and computer engineering and computer science (ECE/CS), especially in systems, networking, and applied machine learning, rest on more than a single impressive number. They rest on a chain of design, measurement, analysis, and validation choices that, taken together, make a result believable. This tutorial is a compact, example-driven guide to that chain for beginning researchers. We organize it as an evaluation workflow: claim, hypothesis, unit of analysis, baseline, regime sweep, uncertainty estimate, validation check, and reporting. Within that workflow we cover the classical statistical foundations (descriptive statistics, the central limit theorem, normal- and $t$-based confidence intervals, Student's $t$-test, ANOVA, chi-squared and Pearson correlation, linear regression) alongside the modern, distribution-free techniques (the bootstrap, Wilcoxon and Mann--Whitney tests, Cliff's delta) that are usually preferred for ECE/CS data. We also discuss factorial design, randomization and blocking, multiple-comparison correction, latency-specific pitfalls, simulation verification and validation, equivalence-style claims, and reproducibility. A running example, a comparison of two job-scheduling algorithms on simulated workloads with truncated heavy-tailed job sizes, threads through the tutorial, with Python snippets the reader can paste and adapt. The paper closes with a pre-submission checklist; companion student-facing material (project-type translation tables, an evaluation-plan worksheet, exercises, and a worked ``bad evaluation autopsy'') is collected in a separate workbook released alongside this paper.

翻译：电气与计算机工程及计算机科学（ECE/CS）领域中的高质量实验论文——尤其是系统、网络及应用机器学习方向——并非仅依赖单一惊艳数据。其可信度源于设计、测量、分析和验证环节构成的一整套逻辑链。本教程面向初研学者，以实例驱动的方式系统梳理这一逻辑链。我们将评估流程组织为：声明主张、提出假设、确定分析单元、构建基线、进行参数扫描、量化不确定性、实施验证检查、撰写报告。在该框架中，我们既涵盖经典统计学基础（描述性统计、中心极限定理、基于正态分布与t分布的置信区间、学生t检验、方差分析、卡方检验、皮尔逊相关系数、线性回归），亦囊括更适用于ECE/CS数据的现代无分布方法（自助法、Wilcoxon秩和检验、Mann–Whitney检验、Cliff's delta）。此外，我们探讨了析因设计、随机化与区组设计、多重比较校正、延迟特质陷阱、仿真验证与确认、等价性声明及可复现性。教程以双作业调度算法对比实例贯穿始终——基于截断重尾分布作业规模模拟负载的调度比较，并附带可粘贴调用的Python代码片段。论文末尾附提交前核查清单，配套学生辅助材料（项目类型对照表、评估计划工作表、习题及"错误评估剖析"案例）收录于随附的工作手册中。

相关内容

计算机科学

关注 56

计算机科学（Computer Science, CS）是系统性研究信息与计算的理论基础以及它们在计算机系统中如何实现与应用的实用技术的学科。它通常被形容为对那些创造、描述以及转换信息的算法处理的系统研究。计算机科学包含很多分支领域；其中一些，比如计算机图形学强调特定结果的计算，而另外一些，比如计算复杂性理论是学习计算问题的性质。还有一些领域专注于挑战怎样实现计算。比如程序设计语言理论学习描述计算的方法，而程序设计是应用特定的程序设计语言解决特定的计算问题，人机交互则是专注于挑战怎样使计算机和计算变得有用、可用，以及随时随地为人所用。 现代计算机科学( Computer Science)包含理论计算机科学和应用计算机科学两大分支。

【斯坦福大学博士论文】高效且可信的机器学习的统计方法，267页pdf

专知会员服务

29+阅读 · 2024年8月20日

【2023新书】数据结构与算法的开放指南，350页pdf

专知会员服务

79+阅读 · 2023年12月6日

总结673篇论文，UIUC等发表《可信机器学习》综述，20个月完成

专知会员服务

41+阅读 · 2023年8月12日