Share |

Feature - Here to help: embedded cyberinfrastructure experts

Feature - Here to help: embedded cyberinfrastructure experts

It isn’t easy designing software that can run on a cluster like Fermilab's Grid Computing Center. That’s why advanced technical support is so essential. Photo by Reidar Hahn, Fermilab Visual Media Services.

Although much of today’s scientific research relies on advanced computing, for many researchers learning how to adapt and optimize applications to run on supercomputers, grids, clouds, or clusters can be daunting.

To help newcomers, many cyberinfrastructure providers offer in-depth support tailored to fit each user’s needs. This is much more than the typical technical support that helps users write scripts to enable their jobs to run. Instead, cyberinfrastructure experts are embedded directly into a user team to provide longer-term assistance.

One example is TeraGrid User Support and Services, led by director Sergiu Sanielevici.

“The designation of a supercomputer is that it’s basically five to ten years ahead of the regular curve of technology,” said Sanielevici. “We’ve known from the start that it would be an absolute necessity for there to be experts at the various supercomputing sites to make a bridge between this advanced technology and the users.”

Each of the 11 TeraGrid sites is part of the User Services Working Group, whose members meet bi-weekly to discuss the best practices and share tips. To access around-the-clock support, users can email the TeraGrid help desk or submit a request via the TeraGrid user portal. All requests are immediately sorted, and site-specific problems are handed off to support staff at each site.

User teams can request a TeraGrid expert in their field to help solve a specific problem with their application. For example, applications originally created to run on only a few hundred processors must be adapted to run on thousands of processors. This can be a challenging task for an inexperienced grid or supercomputer user and could take several months to figure out without the help of an embedded TeraGrid expert.

About a month into each quarter, TeraGrid support staff at each site contact new users to see how their projects are going and if they have experienced any problems accessing the resources. “Users are a little hesitant to ask for help sometimes, so this is a proactive way to try to see who’s having problems,” said Chris Hempel, associate director of user services at the Texas Advanced Computing Center at The University of Texas at Austin, a TeraGrid site.

Support staff at each site can also detect when an unusual amount of stress is being placed on the system and contact that particular user to help identify and correct the problem. “We work with them to fix their code so it becomes more efficient and places a lot less stress on the system,” Hempel said.

Open Science Grid also offers a similar level of embedded support through its Engagement Program, which users can access by sending an email to the Engagement Team or submitting a ticket through the OSG portal.

“The Engagement Program is the front door for many users who come into OSG without a lot of existing knowledge of how to operate in a large-scale distributed environment,” said John McGee, OSG Engagement Program Coordinator.

Because OSG consists of many more sites than TeraGrid, the Engagement staff typically handles all support issues centrally instead of directing users to staff at individual sites.

The Engagement staff proactively monitors large job runs and offers support to users with problems or failures. A lot of time and effort is also spent behind the scenes to improve OSG infrastructure so users can more easily get up and running, and avoid failures in the first place.

To help new users to get started, Engagement staff members add code to the user’s application to make it as efficient as possible and then run it on OSG to assure it will operate correctly. The staff teaches users how to modify the code that was added to best suit their needs and then continues to support them over time if questions arise.

It is important to have cyberinfrastructure experts from many scientific fields provide the advanced embedded support needed to help scientists learn how to take full advantage of high-performance computing resources, McGee said. “It’s really important to not only immerse scientists in the cyberinfrastructure, but also to immerse the cyberinfrastructure experts into what it is the scientists are trying to accomplish.”

Amelia Williamson, for iSGTW

No votes yet


Post new comment

By submitting this form, you accept the Mollom privacy policy.