Implementation Decisions for One-Way Delay and Packet Loss Measurement Infrastructures
Advanced Network & Services, Inc.
Advanced Network & Services deployed its first "measurement machine," which measures one-way delay and packet loss, in April 1997. Since then, about 44 measurement machines have been deployed at locations around the world.
The one-way delay and packet loss metrics are described in draft documents of the Internet Engineering Task Forceís (IETF) Internet Protocol Performance Metrics (IPPM) working group. Compared to ping, which measures the sum of delays from a computer across a network to another computer and back, the one-way metrics essentially measure the delay and packet loss occurring in each direction on a network between two measurement machines.
This document summarizes some of the decisions that were made by the project (named "Surveyor") team at Advanced Network & Services, the context in which a particular decision was made and why this decision was made. Finally, future plans related to the decisions presented are discussed.
Because one-way delay measures the distance in one direction only, the measurement machines must have the same notion of time. The accuracy with which the measurement machines are synchronized is a defining element in the accuracy of the measurements themselves. It was felt that since there may be a need to measure performance of very-high speed network paths that may have delays of only a few milliseconds, accuracies in the tens of microseconds are necessary to support the Surveyor projectís requirements.
The options considered include:
GPS was selected because it offered accuracies at the receiver board level of about 150 nanoseconds and to within tens of microseconds once the measurements actually reach the network "wire." The board-level GPS time receivers were particularly attractive because they eliminate variations due to the measurement machineís processor speed and system clock and donít require complex kernel modifications. The accuracies GPS offers are globally available and consistent.
However, using GPS has created numerous problems. Although GPS receivers that pinpoint location are relatively common and inexpensive, commercial GPS receivers that provide accurate time are less common and more expensive (GPS cards used in Advanced Network & Services measurement machines cost between $2,500 and $3,500 US). GPS receiver set-up requires installation of rooftop antennas, which slows the measurement machine deployment. Still worse, antenna installation requirements make deployment in some locations impractical or impossible. GPS receivers can also be foiled where there is extensive radio frequency interference in the vicinity of the antenna (cellular transmissions are the most frequently blamed).
As suggested in the IPPM one-way delay draft, the frequency of measurements performed by Surveyor measurement machines is based upon a Poisson process. A primary intent of this distribution format is to reduce the likelihood that measurements may somehow be synchronized with network events.
Most measurement machines in the Advanced Network & Services infrastructure use a Poisson distribution with a lamda of twice per second. This translates into two measurements per second for each unidirectional path.
Having a frequency set this high strains the measurement infrastructure. The current database collects about 172,000 measurements per day per path and there are about 1,000 paths measured. About 1.7 gigabytes of data is generated each day. In addition to ensuring we have enough disk space to store the measurement data, weíve had to be conscious of the performance of our database server and the calibration error on our measurement machines.
One of the key questions we have asked ourselves is what the appropriate frequency should be. The current frequency provides a rich set of data measurements for research on the behavior of the Internet. More measurements provide more detail, but care must be taken that the frequency of the measurements doesn't stress the measurement infrastructure or congest the network you're trying to measure. An advantage of using Poisson measurement distributions is that it allows for data reduction without losing the Poisson nature of the measurement streams: it is easy to eliminate measurements in our database, but obviously it is impossible to recover from performing too few measurements.
Impact of Other Measurements
Having a measurement machine positioned at strategic locations on the Internet makes the concept of collecting other types of measurements, such as network profiling, on the measurement machine attractive. On the surface, this adds to the value of the measurement machine because it can do more things and offers opportunies to correlate the one-way delay and loss measurements with observations made by other measurement tools.
However, the Surveyor project team has been cautious to add measurement tools and technologies to the measurement machines. A primary concern is the impact that a particular measurement will have on the accuracy of the core one-way delay and loss measurements.
A measurement tool we have added successfully is a modified traceroute tool which provides valuable insights about the route measurements take along a given path; the tool hasnít significantly degraded measurement accuracy. We are open to deploying non-aggressive measurements that arenít difficult to implement and which donít consume considerable CPU processing time.
For all the problems that using GPS causes, no new viable alternative has been found. The Surveyor infrastructure plans on moving to a newer generation of PCI-bus GPS receiver boards and interference resistant antennas are being tested for deployment at interference-prone locations.
Once we better understand the dynamics of one-way delay and loss measurements, we may adjust the measurement frequency. In a recent comparison between the paths measured in common between both the Surveyor and RIPE one-way delay and loss infrastructures, there is strong evidence that the same conclusions can be drawn regarding the network paths even though the RIPE infrastructure is only performing three measurements per minute per path. Reducing Surveyor measurements to the same frequency (2.5% of the current Surveyor frequency) would dramatically reduce the load on the entire infrastructure
The Surveyor team has also developed and is deploying operating system kernel enhancements that will embed and retrieve the time-stamp information on measurement packets. When deployed, measurement machines will be able to measure many more paths and it should be possible to introduce other measurement tools without severely impacting measurement accuracy.
Author contact information:
Bill Cerveny (firstname.lastname@example.org)
Advanced Network & Services, Inc.
200 Business Park Drive
Armonk, NY 10504