Share |

Feature - Identifying network bottlenecks

Feature - Identifying network bottlenecks

Internet2 and the Open Science Grid have formed a partnership to deploy advanced network monitoring and diagnostic tools on the grid.

Image courtesy of

Network performance problems bedevil scientists and network operators alike.  Troubleshooting requires a multi-pronged approach, and problem resolution often drags – holding back scientific progress.

To address this in the grid community, Internet2, the leading U.S. networking consortium in the research and education community, has formed a partnership with the Open Science Grid to deploy a set of advanced network monitoring and diagnostic tools on the grid.

Network users and operators solve problems most quickly when given transparent access to the entire network path involved. Typically, however, separate organizations operate different segments along a full network path and each so-called “domain” controls access to its own diagnostic and performance data. Some organizations make their performance data available while others do not. Without this data, troubleshooters find it extremely difficult to isolate problems.

Adding to the difficulty, performance degradation is often due to factors seemingly unrelated to the network. For example, an improperly configured host or an application’s inadequate internal buffer limits can masquerade as a network bottleneck.

A global collaboration including ESnet, GÉANT, Internet2, and RNP has developed an open, modular infrastructure of services and applications called perfSONAR that enables the gathering and sharing of network performance information, and facilitates troubleshooting of problems across network domains. OSG and Internet2 are working together to begin deploying perfSONAR-ps tools, an easy-to-install implementation of the perfSONAR protocols, in the grid community. Grid operators and network engineers can use these tools to check network performance on specific compute nodes, storage elements or network segments. They will be able to quickly identify problems that impact network performance, and isolate the suspect domain or network segment so that its operators can make corrections in a timely manner.

(Click image for full picture.) The diagram shows the major perfSONAR system components and how they interact with each other. 

Measurement Point Service (MP)
Lookup Service (LS)
Authentication Service (AS)
Measurement Archive Service (MA)
Transformation Service (TS)
Resource Protector Service (RPS)

Image courtesy of Jason Zurawski, Internet2 and Martin Swany, U of Delaware.  

“In the first week of operation, perfSONAR identified a significant performance bottleneck every place we looked,” said Brian Tierney, ESnet research scientist and a developer of perfSONAR. “These types of ‘soft errors’ — where the network path exists, but performance is severely degraded — are extremely difficult to find (without perfSONAR) and have usually required the combined effort of several senior network engineers all working together.”

Within the partnership, OSG will make the perfSONAR client tools available to scientists and their support staff. Internet2 will work with OSG to provide training to administrators on the use of these tools and train them to train others.  Internet2 will also work with grid administrators to install perfSONAR servers, providing test points for both local and remote clients.

“Through the deployment of perfSONAR, we believe we can significantly enhance scientists’ ability to work effectively in the OSG environment — helping the community at large to realize the full potential and power of the OSG infrastructure to facilitate important research and discovery,” said Rich Carlson, Internet2 network engineer and OSG liaison.

Lauren Rotman, Internet2, and Anne Heavey, iSGTW

PerfSONAR, already under active deployment by the U.S. LHC ATLAS sites, is developed through a global collaboration consortium led by the Department of Energy’s ESnet, GÉANT, Internet2, Brazil’s RNP, and in the U.S. via partnerships between the University of Delaware, ESnet, Fermi National Accelerator Laboratory, Internet2, Pittsburgh Supercomputing Center and the SLAC National Accelerator Laboratory.

No votes yet


Post new comment

By submitting this form, you accept the Mollom privacy policy.