This article is the second part of a two part series. To read the first story, click here.
GPUs have come a long way from the days of just being used for video game display graphics. “Special purpose hardware has long been applied in the domain of molecular dynamics simulations with modest success,” says Thomas Cheatham, an associate professor at the University of Utah who has been using Lincoln and AMBER to study how biomolecules interact and adapt to their surroundings. “The success was typically modest since CPU power kept improving. This has changed in recent years, largely driven by the demands of video game consumers, such that much more powerful GPUs have been developed that allow sufficient precision and accuracy, speed, and accessible memory for more general scientific applications.”
Klaus Schulten, a computational biologist with the University of Illinois’ Beckman Institute and director of the university’s Theoretical and Computational Biophysics Group, agrees that GPU acceleration is not without its challenges. As computers get larger and computational challenges become more data-intensive, “bottlenecks” are created as machines with GPUs or GPU accelerators speed up computations.
“Our main stumbling block is the communications bottleneck to the GPU device,” says Schulten, the largest user within the TeraGrid for GPU research, accessing NCSA’s Lincoln. “We have developed software runs that conservatively give us a speedup factor of between 2 and 10, and we believe that we can further improve that. But it puts a burden on the communication path to the GPU, and that becomes problematic.”
“One problem is that writing a good GPU code takes time and effort; in many cases, you have to change completely how you think about the physics of the problem that you are working on,” says Kohlmeyer, who collaborated on the development of the HOOMD-Blue coarse-grain MD codes and more recently the GPULAMMPS project. “GPUs are in essence a disruptive technology, same as vector processors and Linux cluster computers were. As with any disruptive technology, we need good developers, good programmers– and good scientists.”
As Schulten and many other researchers point out, data analysis is becoming a larger part of computational science. “Today we are doing computational studies using bigger and more powerful computers,” says Schulten, whose group started and developed NAMD, a molecular dynamics community code to determine how proteins are synthesized and form functional structures. “As as a result our studies have much more data, and analysis becomes more of a serious issue. We now spend easily 50% of our effort on analysis, as compared to the actual computation.”
Yet it’s the analysis where GPUs come in handy, says Schulten, whose team is also working with NVIDIA. “Here GPUs are very useful, and in some cases we are getting speedups with a factor of over 200.”
“GPUs provide an exciting option for high-throughput, highly-parallel computations, especially when co-processing work with the host CPUs,” according to Paul Navrátil, a visualization scientist at TACC’s Data and Information Analysis division who has been using Longhorn in his research to develop efficient algorithms for large-scale parallel visualization and data analysis (VDA) and innovative design for large-scale VDA systems. “However, to fully harness the processing power of GPUs, there must be sufficient work to keep all elements of the GPU occupied and the work should be regular, or contain few branches. But CPUs are still superior for handling code with random memory accesses and data-driven instruction flow.”
The CPU vs. GPU debate is sure to continue as researchers focus on the development of future computing architectures that are on the path toward exascale systems. DARPA’s Ubiquitous High Performance Computing project was launched as a way to explore what the agency calls “extreme scale” computing. Individual teams, led by Intel, Massachusetts Institute of Technology (MIT), NVIDIA, and Sandia National Laboratories have been tasked with creating an innovative, revolutionary new generation of computing systems that overcome the limitations of current approaches.
“GPUs, along with ‘manycore’ processors, offer a path to future extreme-scale computing through high concurrency, which is the most promising way to hold power consumption at an acceptable level,” according to Nick Nystrom, director of strategic computing at the Pittsburgh Supercomputing Center (PSC), and head of the TeraGrid Extreme-Scale Working Group, whose focus is to meet the challenges and opportunities of deploying extreme-scale resources into the TeraGrid to maximize scientific output and user productivity.
“Achieving that level of concurrency often requires revisiting algorithms, which also presents an opportunity to consider applying mixed precision to boost arithmetic speed and decrease communication volume,” adds Nystrom. “Given some care with numerical properties, rethinking algorithms in those ways will also benefit performance on manycore and multicore platforms, across which emerging tools are aiming to achieve single-source portability."
“One attraction of GPUs is the fact that one can, on some applications, show great performance per power (flop per watt) ratios,” notes Jeffrey Vetter, group leader of the Future Technologies Group in ORNL’s Computer Science and Mathematics Division, as well as a joint professor at Georgia Tech and the PI for Keeneland. “As an example, many of the top systems on the Green 500 list are accelerator-based systems.”
While GPU-based training is already offered throughout the TeraGrid, GPUs are likely to be a major topic of discussion at TG’11 this July 17-21 in Salt Lake City. But some researchers say more needs to be done to attract, train, and support developers for good GPU code, especially as TeraGrid transitions to the eXtreme Digital (XD) program this year.
“The use of GPUs speeds up a single node considerably, sometimes more than 30 fold,” says SDSC’s Walker, noting that it is imperative that researchers invest the time and effort to use GPUs effectively. “But if at the same time we don't develop a 30-fold higher bandwidth and 30- fold lower latency interconnect, scaling will always be limited across clusters of GPUs.”
“I would hope that the XD program will include more targeted and integrated training and scientific computing awareness outreach than many of the current efforts,” says Kohlmeyer, who also favors XD having a diverse pool of GPU-rich resources, primarily because GPU technology is still rapidly changing and some approaches may favor certain scientific disciplines more than others.
“It may be that too many people are distracted by all the hype around GPUs and expect them to do miracles,” he adds. “But it is the people who make the difference, the ingenuity with which we use technology that moves us forward, not just to have more technology. After all it doesn't help to get an answer 100 times faster if we don't ask the right questions!”