当前位置：首页 > news >正文

Computer Architecture

news 2026/3/26 19:42:14

System Evaluation Metrics

Cost Metrics

The cost of a chip includes:

Design cost: non-recurring engineering (NRE), can be amortized well if there is high volume;
Manufacturing cost: depends on area;
- Manufacturing Semiconductor Chips: Ingot → Wafer → Die (unpackaged chip) → Chip
- To measure the production efficiency of semiconductor manufacturing, we use the metric yield: the portion of good chips per wafer.
Testing cost: depends on yield and test time;
Packaging cost: depends on die size, number of pins, power delivery, ...

The cost of a system includes:

Power cost;
Cooling cost;
Total Cost of Ownership (TCO) of datacenters:
- Capital expenses (CAPEX): facilities, assembly & installation, compute, storage,
  networking, software, …
- Operational expenses (OPEX): energy, rent, maintenance, employee salaries, …
System availability: Downtime is expensive and results in a direct loss of revenue. Redundancy (adding backup components) improves availability but also increases the initial capital cost.

Performance Metrics

Performance metrics:

Latency: time to complete a task;
Throughput: tasks completed per unit time;

Improving latency often reduces throughput, but not vice versa. For example, inter-task parallelization improves throughput but not latency of a task, while intra-task parallelization improves both.

Buffering/queuing/batching improves throughput but may hurt latency, leads to the tradeoff between latency and throughput.

Digital systems (e.g., processors) operate using a constant-rate clock:

Clock cycle time (CCT): duration of a clock cycle;
Clock frequency (rate): cycles per second.

To compute the execution time of a program, we first compute the number of instructions (IC), which is fixed for a given program. Then we compute the average number of cycles per instruction (CPI), which depends on the system architecture and implementation. All together, we have

\[\text{Execution Time} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Cycles}}{\text{Instruction}} \times \frac{\text{Time}}{\text{Cycle}}= \text{IC} \times \text{CPI} \times \text{CCT}. \]

Roughly speaking, software determines IC, ISA determines CPI, and microarchitecture/circuit determines CCT.

So far we only discuss the performance on processors. What about memory? It could be reflected on CPI. We know that

\[\text{Runtime}=\max(\text{#ops}/\text{processor throughput},\text{#bytes}/\text{memory bandwidth}). \]

Denote operational Intensity (OI) as \(\frac{\text{#ops}}{\text{#bytes}}\), we have

\[\begin{align*} \text{Perf}=&\ \text{#ops}/\text{Runtime}\\ =&\ \min(\text{processor throughput},\text{memory bandwidth}\times\text{operational intensity}). \end{align*} \]

Drawing the graph of performance vs. operational intensity, we have the roofline model (for certain system):

Power and Energy Metrics

Dynamic/active power: \(C\times V_{dd}^2\times f_{0\to 1}=\alpha C V_{dd}^2 f\), where \(C\) is the capacitance being switched, \(V_{dd}\) is the supply voltage, \(f_{0\to 1}\) is the frequency of 0-to-1 transitions, \(\alpha\) is the activity factor (the fraction of capacitance being switched), and \(f\) is the clock frequency.

Static/leakage power: \(V_{dd}I_{leak}\), where \(I_{leak}\) is the leakage current.

Therefore, total power is

\[\text{Power}=\alpha C V_{dd}^2 f + V_{dd} I_{leak}. \]

And

\[\text{Energy}=\text{Power}\times \text{Time}. \]

Limiting factors of power, energy, and power density:

Power is limited by infrastructure, e.g., power supply;
Power density is limited by thermal dissipation, e.g., fans, liquid cooling;
Energy is limited by battery capacity or electrical bill.

Power scaling:

Dennard scaling (1974-2005): If the feature size scales by \(1/S\), the supply voltage and current can scale by \(1/S\);
Post-Dennard scaling (2006-now): Power limits performance scaling (power wall), so we need to slow down frequency scaling or reduce chip utilization.

Normalize performance to power:

\[\text{Energy Efficiency}=\frac{\text{Performance}}{\text{Power}}=\frac{\text{Operations}/\text{Time}}{\text{Energy}/\text{Time}}=1/\frac{\text{Energy}}{\text{Operations}}. \]

For certain task, choose the "optimal" design to trade off performance and energy.

Scalability

Scalability measures the speedup achieved by using \(N\) processors compared to using just \(1\) processor.

Two settings to evaluate scalability:

Strong scaling: speedup on \(N\) processors with fixed total workload size
Weak scaling: speedup on \(N\) processors with fixed per-processor workload size

How to balance the workload?

Static load balancing: to partition input as evenly as possible
Dynamic load balancing, e.g., work dispatch, work stealing

Suppose that an optimization accelerates a fraction \(f\) of a program by a factor of \(S\), then the overall speedup is given by Amdahl's Law:

\[\text{Speedup}=\frac{1}{(1-f)+\frac{f}{S}}. \]

Benchmark

Benchmark is a carefully selected programs used to measure performance. And benchmark suite is a collection of benchmarks.

To report the average performance on a benchmark suite, we may use three types of means: arithmetic (for absolutes), geometric (for rates) and harmonic (for ratios).

查看全文

http://www.jsqmd.com/news/2461/