Performance Case Study:
OC-192 Network Processor

Product Description

Agere, Inc., an Austin, Texas startup, initiated design work on what is now the Agere Systems Payload Plus™ NP10 Network Processor. The goal for this second generation device was to scale up from OC-48 (2.5 Gbit/sec) to OC-192 (10 GBit/sec) and to increase the amount of classification (as measured by the number of header bits that were pattern-matched).

Performance Requirements

The primary performance metric was throughput: "run at line rate". This was refined by documenting assumptions about the data links in and out of the chip, as well as details about header lengths. A specific mix of packet sizes was specified, as was a "5-tuple plus" classification workload.

Performance Support Activities

Since this was a second generation chip, some infrastructure for performance analysis existed. There was some limited measurement data, and a cycle-based simulator for the first generation chip's core logic. This was useful for estimating the number of processing cycles for common tasks. Several performance analysts were assigned to the project.

Based on the performance staff's background, a two track approach was taken. One analyst developed a detailed queueing network model, while the rest of the time started hand-coding a cycle-based simulation that incorporated the first-generation simulator. Due to delays in programming and validating the simulation, most of the design decisions were guided by the queueing network model.

The queueing network model (QNM) required extensions to standard techniques in several areas:

The basic approach taken was to write a "front-end" model that read all the system and workload parameters and translated them into inputs to a large QNM. A multi-class model was used to accommodate multiple packet lengths. Initially, the model was written as a spreadsheet but this proved to be difficult to maintain and use. The model was then translated to Mathematica™, which proved to be a more robust and flexible expression of the model. Later a wrapper was written to determine throughput based on the maximum utilization at the bottleneck resource. The QNM was well suited for responding to the many questions about performance tradeoffs raised by the design team, as well as changes in the device architecture.

Throughput predictions from the QNM were within 10% of the simulation results. Later measurements based on a detailed functional verification model of the NP10 showed both models had been pessimistic in terms of the amount of processing that could be done per packet.

Observations

The complexity of both workload and device required some creativity to express as an analytic model, but the resulting model provided essential guidance to the design team and was reasonably accurate. Also note how the separately developed analytic and simulation models were used to cross-check each other.

Return to the case studies page.
Return to the FrontRunner Computer Performance Consulting home page.

© 2002 FrontRunner Computer Performance Consulting. All Rights Reserved. Other notices.
This page was last modified 20-Nov-2002 7:37 (US Central Time).
Please send comments on this page to webmaster@frontrunnercpc.com.