All performance metrics (measures of performance) are based on system behavior over time. There are three major classes of metrics that can be observed by a user or any other entity outside a system:
A fourth class of metrics, utilization metrics, can only be observed inside a system. Utilization information is vital for understanding and predicting system performance. The remainder of this page discusses each class of metrics in more detail.
Latency or response time metrics are measured in units of (elapsed) time. The definition of a latency metric must specify both a start and stop event: when to begin measuring the delay, and when to stop. Some examples of latency metrics:
In many cases, latency is reported or specified as a statistical distribution. For example, a cell phone base station might be required to set up 99.5% of all calls within one second.
Throughput metrics are measured in units of inverse time. For example:
The term bandwidth is often used to describe the theoretical maximum throughput of a date link or other device. For example, a 32-bit wide data bus running at 100 MHz has a bandwidth of 32 billion bits/second. Since all devices impose some overhead in terms of packet headers, gaps between data blocks, or control protocols, the throughput of usable data is always less the bandwidth. Efficiency is defined as the ratio of usable throughput to the bandwidth. For computer networks, where packets may be lost or damaged, the term goodput is sometimes used for the arrival rate of undamaged packets.
For some applications, throughput metrics may be normalized over some other system characteristic, such as cost or power consumption. It is also a common practice to specify throughout with latency constraint, or visa versa.
For example, the Transaction Processing Performance Council TPC-C online transaction processing benchmark reports the throughput of specific mix of transactions, with the requirement that transactions must be completed within fixed time limits, as "tpmC". A second metric "price/tpmC" reports the total cost of the system per transaction.
The term availability is used to describe the fraction of time a system is available. For example, if an inventory database is down for a hour a day, it has an availability of about 0.96. However, the availability metric alone doesn't tell the whole story. For example, if the same inventory database went down for 10 milliseconds each second, it would have an availability of 0.99 but would probably be useless for any practical purpose. Therefore, the reliability metric is used to report the mean time between failures (MTBF), which indicates on average, the period a system is usable. A related metric is mean time to repair (MTTR), which quantifies how long it takes to recover from a failure.
The fraction of time that a system component, such as CPU, disk, or data link, is active is its utilization. It follows from this definition that utilization values range between 0 and 1. The maximum throughput of a system (its throughput capacity) is reached when the busiest component reaches a utilization of 1. As a practical matter, response time increases rapidly as utilization approaches 100%, so that many systems are designed to keep utilization below some threshold such as 70% or 80%.
The path length of a device for specific workload is the device utilization divided by the throughput. The path length has units of time and it indicates how much time the device needs to process one unit of work, such as a transaction, packet, etc.
Path length is short for "code path length" or the number of instructions required to complete a specific task. As CPU design has progressed, the introduction of caches, virtual memory, pipelining, and concurrent execution of multiple instructions have all made the relationship between instruction count and CPU time less predictable. For current technology, it's better to think of path length as "CPU time per transaction."
Knowledge of utilization and path length is required to do any sort of predictive performance modeling. Therefore, most processors and operating systems incorporate a facility for measuring utilization.
Return to the Understanding Computer Performance Analysis
page.
Return to the FrontRunner Computer Performance Consulting home
page.
© 2002 FrontRunner Computer Performance Consulting.
All Rights Reserved. Other notices.
This page was last modified
20-Nov-2002 7:37
(US Central Time).
Please send comments on this page to webmaster@frontrunnercpc.com.