ARM-based and x86-64 laptop processors differ not only in instruction-set design, but also in memory hierarchy, core organization, system integration, and power-management mechanisms. This study presents a combined architectural and experimental comparison of an Apple M3 system and an AMD Ryzen 7 3750H system. The architectural analysis contrasts AArch64's fixed-width load-store design with the variable-length, memory-operand-rich x86-64 instruction model, and discusses how register organization, calling conventions, heterogeneous core organization, memory behavior, and low-power mechanisms shape observed performance and energy characteristics. The experimental part uses two native assembly benchmarks: a recursive Fibonacci workload and an integer matrix-multiplication workload. The analysis combines repeated timing measurements, processor-energy measurements, and cross-platform microarchitectural counter measurements from matched portable-C profiling runs. The Ryzen platform is decisively faster on the branch-heavy Fibonacci benchmark, while matrix multiplication shows no meaningful timing advantage for either platform in the present measurements. In contrast, the Apple platform is markedly more energy-efficient, reducing energy-to-solution by approximately 5.82$\times$ on Fibonacci and 6.38$\times$ on matrix multiplication. These results are interpreted as platform-level findings rather than as pure ISA-only effects, reflecting differences in implementation, system integration, and measurement methodology in addition to instruction-set structure.
翻译:基于ARM和x86-64架构的笔记本处理器不仅在指令集设计上存在差异,还在存储层次、核心组织、系统集成以及电源管理机制方面有所不同。本研究对苹果M3系统和AMD锐龙7 3750H系统进行了架构与实验相结合的比较分析。架构分析对比了AArch64的定长加载-存储设计与x86-64可变长度、富含内存操作数的指令模型,并讨论了寄存器组织、调用约定、异构核心组织、内存行为及低功耗机制如何影响观测到的性能和能耗特征。实验部分采用两种原生汇编程序:递归斐波那契负载和整数矩阵乘法负载。分析结合了重复计时测量、处理器能耗测量以及通过匹配的便携式C语言性能分析运行获得的跨平台微架构计数器测量结果。在分支密集的斐波那契基准测试中,锐龙平台明显更快,而矩阵乘法在当前测量中未显示出任一一方具有显著的时间优势。相比之下,苹果平台的能效明显更高,在斐波那契测试中能效提升约5.82倍,在矩阵乘法测试中提升约6.38倍。这些结果被解释为平台层面的发现,而非纯ISA效应,反映了除指令集结构外,实现方式、系统集成及测量方法上的差异。