Performance Measurement

There are several issues that must be considered in planning a successful performance measurement: the purpose of the measurement, workload selection and implementation, the data to be collected, the instrumentation used to collect performance data, and how to validate the results. This page provides an introduction to each topic.

Reasons to Measure Performance

There are several motivations for measuring system performance, including:

Selecting a Workload

A system's performance can be measured at any time with any workload. Some system and network administrators regularly collect performance data as part of monitoring the overall health of the system. Performance measurement and analysis based on the "real" system and workload seems appealing at first, but they have several limitations. The workload applied to a deployed system can vary widely from day to day or even from minute to minute, and there it is difficult, if not impossible to rerun a particular day's workload at a later date to check another system configuration. Therefore, most performance measurements rely on a synthetic workload. A synthetic workload has similar characteristics to a real workload, but it is defined and implemented so as to be repeatable. In many cases, the workload is described by a few key load parameters, and the goal of the measurement study is to evaluate how changes in the workload parameters affect system performance.

Benchmarks are an important category of synthetic workloads. A benchmark not only defines the workload, but the performance metrics to be collected and reported as well. Development organizations can and do define benchmarks for their internal use, but independently published benchmarks are often preferred because they allow customers to compare results from multiple vendors. Many benchmark definitions include the source code required to implement the benchmark.

Implementing Workloads

All performance measurement experiments require three basic elements: the system under test (SUT), load generators (driver) that apply a workload to the SUT, and instrumentation to collect performance data.

Load Generation

Depending on the experiment, load generation could be implemented by special-purpose hardware, software running on separate system, or even a process running on the SUT. Care should be taken to ensure the performance bottleneck lies in the SUT, not the load generators! There are two basic techniques to generate workloads: stochastic, or trace-driven

Stochastic techniques describe the arrival patterns of customers and other aspects of the workload by sampling from a probability distribution. Many workloads can be described accurately by using the appropriate distribution. Stochastic workloads are a good choice when detailed information about the workload isn't available, or where there is a need to vary workload characteristics. Workload generation is efficient and does not require large data files.

In a trace-driven workload, the load generators simply replay a sequence of requests from a log file. For example, a previously recorded sequence of reads and writes would be sent to a file server. Depending on the source of the trace, this method can provide realistic workload, at the expense of substantial storage requirements and a lack of flexibility.

Performance instrumentation

The SUT can be instrumented to collect a variety of metrics. Many benchmarks simply measure latency and throughput. However, information about utilization of processors and other resources is very useful in identifying bottlenecks in the SUT or in future modeling efforts. Many modern hardware and software products include counters that provide useful performance information. For example, page faults might be recorded by a counter on the CPU chip, within the operating system kernel, or both. Traces of operations or events may be collected as well.

Resources for Performance Measurement

The most rigorous benchmarks are those sponsored by the Transaction Processing Performance Council, generally known as TPC. The TPC describes its mission as defining database and transaction processing benchmarks and delivering trusted results to the industry. Vendors can not claim TPC benchmark results until their measurements have been verified by an external auditor.

The Standard Performance Evaluation Corporation (better known as SPEC®) seeks to establish, maintain, and endorse a standardized set of relevant benchmarks and metrics for performance evaluation of modern computer systems. The best known SPEC® benchmark suite is SPEC® CPU2000, which covers CPU performance. SPEC® CPU2000 is the successor to SPECint® and SPECfp®. SPEC® offers a variety of other benchmarks as well. IOZone is a SPEC® sponsored archive that contains public domain characterization and benchmarking tools. The software is free and is supported by its authors.

The benchmarking FAQ's (frequently asked questions) for the comp.benchmarks newsgroup offers a good introduction to basic benchmarking concepts and pitfalls. The Open Directory Project has directory pages on general benchmarking and Internet benchmarks.

Return to the Understanding Computer Performance Analysis page.
Return to the FrontRunner Computer Performance Consulting home page.

© 2002 FrontRunner Computer Performance Consulting. All Rights Reserved. Other notices.
This page was last modified 20-Nov-2002 7:37 (US Central Time).
Please send comments on this page to webmaster@frontrunnercpc.com.