当前位置: 首页 > news >正文

Computer Architecture

System Evaluation Metrics

Cost Metrics

The cost of a chip includes:

  • Design cost: non-recurring engineering (NRE), can be amortized well if there is high volume;
  • Manufacturing cost: depends on area;
    • Manufacturing Semiconductor Chips: Ingot → Wafer → Die (unpackaged chip) → Chip
    • To measure the production efficiency of semiconductor manufacturing, we use the metric yield: the portion of good chips per wafer.
  • Testing cost: depends on yield and test time;
  • Packaging cost: depends on die size, number of pins, power delivery, ...

The cost of a system includes:

  • Power cost;
  • Cooling cost;
  • Total Cost of Ownership (TCO) of datacenters:
    • Capital expenses (CAPEX): facilities, assembly & installation, compute, storage,
      networking, software, …
    • Operational expenses (OPEX): energy, rent, maintenance, employee salaries, …
  • System availability: Downtime is expensive​​ and results in a direct loss of revenue. Redundancy​​ (adding backup components) improves availability but also increases the initial capital cost.

Performance Metrics

Performance metrics:

  • Latency: time to complete a task;
  • Throughput: tasks completed per unit time;

Improving latency often reduces throughput, but not vice versa. For example, inter-task parallelization improves throughput but not latency of a task, while intra-task parallelization improves both.

Buffering/queuing/batching improves throughput but may hurt latency, leads to the tradeoff between latency and throughput.

Digital systems (e.g., processors) operate using a constant-rate clock:

  • Clock cycle time (CCT): duration of a clock cycle;
  • Clock frequency (rate): cycles per second.

To compute the execution time of a program, we first compute the number of instructions (IC), which is fixed for a given program. Then we compute the average number of cycles per instruction (CPI), which depends on the system architecture and implementation. All together, we have

\[\text{Execution Time} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Cycles}}{\text{Instruction}} \times \frac{\text{Time}}{\text{Cycle}}= \text{IC} \times \text{CPI} \times \text{CCT}. \]

Roughly speaking, software determines IC, ISA determines CPI, and microarchitecture/circuit determines CCT.

So far we only discuss the performance on processors. What about memory? It could be reflected on CPI. We know that

\[\text{Runtime}=\max(\text{#ops}/\text{processor throughput},\text{#bytes}/\text{memory bandwidth}). \]

Denote operational Intensity (OI) as \(\frac{\text{#ops}}{\text{#bytes}}\), we have

\[\begin{align*} \text{Perf}=&\ \text{#ops}/\text{Runtime}\\ =&\ \min(\text{processor throughput},\text{memory bandwidth}\times\text{operational intensity}). \end{align*} \]

Drawing the graph of performance vs. operational intensity, we have the roofline model (for certain system):

Power and Energy Metrics

Dynamic/active power: \(C\times V_{dd}^2\times f_{0\to 1}=\alpha C V_{dd}^2 f\), where \(C\) is the capacitance being switched, \(V_{dd}\) is the supply voltage, \(f_{0\to 1}\) is the frequency of 0-to-1 transitions, \(\alpha\) is the activity factor (the fraction of capacitance being switched), and \(f\) is the clock frequency.

Static/leakage power: \(V_{dd}I_{leak}\), where \(I_{leak}\) is the leakage current.

Therefore, total power is

\[\text{Power}=\alpha C V_{dd}^2 f + V_{dd} I_{leak}. \]

And

\[\text{Energy}=\text{Power}\times \text{Time}. \]

Limiting factors of power, energy, and power density:

  • Power is limited by infrastructure, e.g., power supply;
  • Power density is limited by thermal dissipation, e.g., fans, liquid cooling;
  • Energy is limited by battery capacity or electrical bill.

Power scaling:

  • Dennard scaling (1974-2005): If the feature size scales by \(1/S\), the supply voltage and current can scale by \(1/S\);
  • Post-Dennard scaling (2006-now): Power limits performance scaling (power wall), so we need to slow down frequency scaling or reduce chip utilization.

Normalize performance to power:

\[\text{Energy Efficiency}=\frac{\text{Performance}}{\text{Power}}=\frac{\text{Operations}/\text{Time}}{\text{Energy}/\text{Time}}=1/\frac{\text{Energy}}{\text{Operations}}. \]

For certain task, choose the "optimal" design to trade off performance and energy.

Scalability

Scalability measures the speedup achieved by using \(N\) processors compared to using just \(1\) processor.

Two settings to evaluate scalability:

  • Strong scaling: speedup on \(N\) processors with fixed total workload size
  • Weak scaling: speedup on \(N\) processors with fixed per-processor workload size

How to balance the workload?

  • Static load balancing: to partition input as evenly as possible
  • Dynamic load balancing, e.g., work dispatch, work stealing

Suppose that an optimization accelerates a fraction \(f\) of a program by a factor of \(S\), then the overall speedup is given by Amdahl's Law:

\[\text{Speedup}=\frac{1}{(1-f)+\frac{f}{S}}. \]

Benchmark

Benchmark is a carefully selected programs used to measure performance. And benchmark suite is a collection of benchmarks.

To report the average performance on a benchmark suite, we may use three types of means: arithmetic (for absolutes), geometric (for rates) and harmonic (for ratios).

http://www.jsqmd.com/news/2461/

相关文章:

  • cv-css 快捷方式,将指定节点的计算样式获取下拉 获取tailwind网页样式成原生样式
  • 软件工程:构建数字世界的基石
  • # Shell 文本处理三剑客:awk、sed 与常用小器具详解
  • Avalonia 学习笔记07. Control Themes(控件主题)
  • matter 协议的架构;
  • matter 协议解析;
  • 相机标定(Camera Calibration)原理及步骤:从 “像素模糊” 到 “毫米精准” 的关键一步 - 实践
  • Nordic 的支持对Matter 协议的支持;
  • nRF54LM20A USB
  • nRF54LM20A GRTC
  • 2025年10款最佳生产力提效chrome插件推荐,亲测有用
  • Avalonia 学习笔记06. Page Layout(页面布局)
  • 发表第一篇文章,谈谈对软件工程的理解
  • nRF54LM20A 芯片分析;
  • 第二天
  • Win10服务器远程连接断开后.bat脚本进程中断的全面解决高效的方案
  • NRF54L15 两者结合的jlink保护机制(硬件+软件)
  • 软件测试员的核心技能:一文掌握等价类划分与边界值分析
  • 《CBI 技术有聊》对话 OpenCSG:智能体落地困境与企业转型的必然路径
  • 个人对软件工程的理解
  • 9/23
  • NUMERICAL RESULT (2025/09/23)
  • 数组入门:从零基础到排序算法 - 教程
  • 用C/C++重构PowerShell:全面绕过安全机制的技术解析
  • Optuna v4.5新特性深度解析:GPSampler实现约束多目标优化
  • 题解:P4769 [NOI2018] 冒泡排序
  • 详细介绍:内网后渗透攻击--域控制器安全(1)
  • 2025/9/23
  • Tita:更频繁的绩效考核周期的好处
  • 完整教程:DCS+PLC协同优化:基于MQTT的分布式控制系统能效提升案例