
Imagine cooking a gourmet meal from scratch using only a knife. With just that one tool, some steps, like mincing onions and slicing carrots, would be quick and easy because your tool was designed for those tasks. Other steps would be slow, producing sub-standard results, and some might not be possible at all — imagine trying to whip egg whites or taste the soup, for example.
Computational research is no different. Some 'recipes' — known as workflows in computing — involve few steps which require only one tool. Others involve multiple steps, each requiring different tools.
Open Science Grid (OSG), jointly funded by the U.S. Department of Energy (DOE) and the U.S. National Science Foundation (NSF), is optimized to perform high-throughput computing. Systems available through XSEDE (launched this year, XSEDE will replace and expand TeraGrid) were designed for high-performance computing.
"In the past, researchers used either TeraGrid/XSEDE or Open Science Grid to run their large workflows,” said Paul Avery, OSG Council co-chair and professor of physics at the University of Florida. “Now ExTENCI, a partnership between the two cyberinfrastructures, provides tools that help them to take advantage of both.”
ExTENCI, which stands for Extending Science Through Enhanced National Cyberinfrastructure, was launched in 2010 under the leadership of Avery and co-principal investigators Ralph Roskies, co-scientific director of the Pittsburgh Supercomputing Center, and Daniel S. Katz, senior fellow at the University of Chicago/Argonne National Laboratory. The project brings together 11 U.S. universities and national laboratories — including the University of Chicago, Clemson University, Louisiana State University, Purdue University, University of Wisconsin-Madison, Fermi National Accelerator Laboratory, Brookhaven National Laboratory, Florida State University, and Florida International University — to develop technology that enables easier access of resources by researchers.
“ExTENCI explored how to exploit the mutual capabilities of both TeraGrid/XSEDE and Open Science Grid,” Roskies said.
“Many TeraGrid/XSEDE users have a natural need for the high-throughput resources that the OSG provides. Similarly OSG users sometimes need access to high-performance computing resources such as those of TeraGrid/XSEDE,” said Michael Wilde, a fellow at the University of Chicago Computation Institute and software architect at Argonne National Laboratory. “The ExTENCI project is working to make the use of both cyberinfrastructures more seamless, and easier for individual scientists and smaller collaborations to leverage concurrently.”
“We’ve begun to do this in a few concrete cases, with the aim of leveraging the investments of both NSF and DOE in cyberinfrastructure resources and thereby to improve the productivity of U.S. computational scientists,” Roskies said.
One of those concrete cases is the protein structure prediction project that operates the Midway Folding Server, a collaboration between the laboratories of Karl Freed and Tobin Sosnick of the University of Chicago and Jinbo Xu of the Toyota Technological Institute at Chicago.
The most widely used form of structure prediction uses the structure of known proteins as templates from which to compute the structure of similar unknown proteins. But that only works if there is a similar protein with a known structure. Nor does it give insight into how proteins fold in nature. Predicting a protein’s structure based solely on its amino acid sequence is more difficult – and more computationally intensive.
“We’re trying to predict protein structures by mimicking how we think proteins fold in nature,” said Aashish Adhikari, a researcher at the Institute for Biophysical Dynamics at the University of Chicago. “Experiments suggest that proteins fold in a stepwise fashion, where subunits of structure we call 'foldons' form cooperatively and add on to the existing structure in a process called sequential stabilization. Our algorithm follows a similar principle.”
This works well for protein sequences fewer than 100 amino acids long, according to Adhikari. But “if you increase the number of amino acids, for every amino acid that you add the computation time increases exponentially. Our goal is to try to use our algorithm to fold increasingly bigger proteins.”
Because this process can make excellent use of both high-throughput computing and high-performance computing resources, Wilde identified the group as a good match for ExTENCI. Today, the protein structure prediction project is regularly using resources from both OSG and XSEDE, allowing them to fold larger proteins than ever before.
This article first appeared in the 2011 TeraGrid/XSEDE Highlights book.
Comments
Post new comment