Herbert Stein, known as a pragmatic conservative, formulated Herbert Stein's Law, which he expressed as "If something cannot go on forever, it will stop." By this, he meant there is no need for a stopping action or a program; in other words, stopping requires little to no effort on our part. Starting, on the other hand, often requires much more effort – a plan, a jumping off point from which no certain future can be seen. In the US, a missing plan definitely points to an uncertain exascale future, but collaborators are joining forces and building momentum.
Significant changes in high-performance computing software and hardware have been on the horizon for nearly a decade. What’s worrisome, says Pete Beckman, is that in that time the HPC community in the US hasn’t been able to start – to definitively mobilize a collective movement toward exascale.
Beckman is director of the Exascale Technology and Computing Institute at Argonne National Laboratory, near Chicago, Illinois, US “We’ve known there is huge change coming,” he says. “But as a community looking for support from the US Department of Energy, the National Science Foundation, and others, we haven’t been able to come together and formalize a plan.”
Beckman’s early work, including collaboration with several colleagues, led to the International Exascale Software Project (IESP), which created national and international dialogue around solving the exascale problem. “IESP brought us together to talk about exascale and discuss agendas, but given current trajectories we’re well past the point of imagining that all of the technology must come from a single nation. We’ll be much better off solving these problems collaboratively and sharing pieces of the software,” Beckman says.
Application developers have been collaboratively developing parts of software for decades. If you look at cosmology or biology applications, development teams are spread across the globe. Systems software, however, is a different animal. Rarely thought of as a collaborative endeavor, it’s usually developed and honed in one place. Beckman has been working to shift this traditional model to one that is much more collaborative.
Big data in a variety of domains is now perhaps the most important driver of high-end, high-performance computing and simulation. Scientists, engineers, and researchers regularly want to analyze and look for patterns within extreme data sets. Recognizing the need to account for this shift, Beckman and Jack Dongarra have put together a series of collaborative workshopsto investigate architectures and software.
Dongarra was awarded the 2013 Ken Kennedy Award at SC13 in November. He is an American University Distinguished Professor of Computer Science in the electrical engineering and computer science department at the University of Tennessee, US. Working with partners in France, Japan, and other countries, Beckman and Dongarra’s goal is to determine how to add to or adapt current technologies to account for big data.
Along with big data initiatives, the US Department of Energy is funding several research projects aimed at developing different parts of the software stack. Two projects are focused on the operating system and run time environment: Hobbes, headed by Ron Brightwell at Sandia National Laboratory and Baarney Maccabe at Oak Ridge National Laboratory, and Argo, headed by Beckman and Marc Snir - director of Argonne's Mathematics and Computer Science division.
Work on the Argo project is split amongst four research areas and includes three national labs – Lawrence Livermore, Pacific Northwest, and Argonne – and the US universities of Illinois, Oregon, Tennessee, and Chicago. Beckman recently discussed the Argo project at SC13 in Denver, Colorado, US.
“Most hardware vendors are targeting what they’ll need in order to deliver their next chip today, not five years from now. We're working directly with these vendors and including them in the research we’re doing now.”
This means Beckman and the Argo team are in the unique position of knowing exactly what each vendor’s roadmap entails, enabling them to propose a system architecture that would work well across several different platforms.
“We let the vendors know from the beginning that our plan is to develop an open source system including the APIs,” Beckman says, “Power adjustment on future machines is a key concern, and there is no competitive advantage for each vendor to develop their own API.”
In November 2013, the Argo team met with vendors for the first time and presented their initial system design – including machine management, runtime environment, memory hierarchy, and power management. “We've already gotten feedback from the vendors,” says Beckman. “Now that we have our initial design, we will likely have a more formal version ready in the spring.”
Developers in Japan, Europe, and China have started similar projects, resulting in what Beckman and others in the HPC community call ‘healthy competition.’ These countries may have the competitive advantage, however, because their governments are actively involved in the funding, planning, and execution of the projects. In contrast, right now in the US two exascale bills are stuck in committee before congress.
“We will get to a point in a few years where there will be a lot of energy and buzz happening again in the US,” Beckman says, “It is always a cycle. There is an incredible push for leadership in a couple of new areas now and lots of exciting ideas, but we are still pretty far off.”