Feature: A Fair Shake for Seismologists
In the Midwestern United States, in the spring, there are weeks when you can't get through an episode of your favorite television show without an alert telling you that your county is under a tornado watch or, more invasive still, the local meteorologist interrupting programming to tell you to head to your basement. It saves lives, and it represents an incredible amount of simulation and data collection Â— even if you do have to scurry to the Web later to find out how Â“LostÂ” ended that week.
There may never be an equivalent for temblors, a local Â“earthquake manÂ” breaking into prime time. But an ambitious group of more than 40 institutions, together called the Southern California Earthquake Center (SCEC), is Â“building earthquake modeling capabilities to transform seismology into a predictive science similar to climate modeling or weather forecasting,Â” according to Phil Maechling, SCEC's information technology architect.
To bring that vision of seismology as a predictive science to life, SCEC has built a set of grid-based scientific workflow tools. A series of simulations based on these tools Â— TeraShake 1, TeraShake 2, and the most recent, CyberShake Â— began in 2004. They've run on TeraGrid resources across the country, including the National Center for Supercomputing Applications (NCSA) and the San Diego Supercomputer Center (SDSC). Similar calculations by a SCEC team at Carnegie Mellon University are also being run at the Pittsburgh Supercomputing Center.
While the goals are to build a computational platform for predictive seismology and to improve the Â“hazard curvesÂ” that building designers use to estimate the peak ground motions that will occur over the lifetime of a building, these simulations are already yielding significant results.
TeraShake 2, for example, simulated a series of earthquakes along the San Andreas Fault. Run at NCSA and SDSC, it revealed a Â“striking contrast in ground motion between ruptures that started at the northwestern end of the fault and those that started at the southeastern end,Â” says geologist Kim Olsen of San Diego State University. This effect is further influenced by a chain of sedimentary basins that run from the northern end of the fault to Los Angeles. In earthquakes that start at the southeast end, the basins trap seismic energy and channel it into the Los Angeles area. Studies of previous earthquakes in this region and others have confirmed the existence of this Â“wave guideÂ” effect in nature.
These findings were presented at April 2006's annual meeting of the Seismological Society of America and at a SCEC conference in June. They were also recently published in Geophysical Research Letters.
Â“The success of our national initiatives in supercomputing depends on the integration of hardware, software, and wetware, that is technical expertise, into an effective cyberinfrastructure. We have been leveraging our partnership with TeraGrid to promote this vertical integration,Â” says Tom Jordan, SCEC's director and an earth sciences professor at the University of Southern California.
Â“These calculations really represent three distinct aspects of high-performance computing,Â” he says. Â“The SGTs, which are figuring out the physics at more than a billion mesh points, represent highly parallel interaction between hundreds of nodes. That's capability computing. Extracting the synthetic seismograms is embarrassingly parallel Â— lots of small, unlinked jobs being run to get more than 100,000 seismograms. That's capacity computing. And all of this is very data-intensive computing, requiring large amounts of storage and very fast I/O. The TeraGrid helped us solve all three parts of this.Â” The first two steps are combined into workflows for each physical site that is being simulated, made of between 11 and 1,000 of the two-step analysis components. The final step of creating the hazard curves, meanwhile, is done in-house by SCEC.
A succession of successes
TeraShake 2 moved from a single TeraGrid site to orchestrated, simultaneous runs at SDSC and NCSA. It also went from the kinematic source descriptions, which are based on historical earthquake data, to source descriptions that were simulated using the physics of friction-based sliding at the fault. In TeraShake 1, before these embellishments were added, the team saw some modeled earthquakes that simply could not exist in nature. TeraShake 2's improved models significantly complicated the process and increased the necessary computing power, and they got rid of many of these outliers.
Â“In adding this physics, you're not quite sure what [the simulations are] going to show,Â” says Maechling. Â“TeraShake 1 predicted very large peak ground motions in the Los Angeles area while TeraShake 2 brought these motions down to earth. In this case, the results went in a positive direction [reducing the predicted impact that an earthquake would likely have]. But you just don't know. That's what makes this so exciting.Â”
CyberShake, again run at NCSA and SDSC, uses yet more realistic physics. The hazard curves calculated by CyberShake tend to be significantly different than those issued by the U.S. Geological Survey, which are considered standard. If the CyberShake curves are correct, this type of calculation could significantly change the character of the national probabilistic earthquake hazard maps. Accordingly, the next step for the team will be to run more simulations with the CyberShake code as a base and to validate these simulations.
Â“We completed about 10 sites with CyberShake, and they are very promising. But we need to scale that up,Â” says Maechling. That means increasing the frequency of the waves that propagate from the current 0.5 Hertz to about three Hertz. This is exceptionally taxing because each increase of .5 Hertz increases by a factor of eight the number of mesh points required to simulate the physics.
That also means dramatically increasing the number of sites simulated. The team expects to need about 625 in order to create a comprehensive map of a relatively small region of Southern California. Â“We believe that our grid-based workflow tools, based on the Virtual Data System (VDS), will enable us to scale up our CyberShake calculations to the level necessary to calculate CyberShake-based hazard maps by supporting job scheduling, data transfers, and file management capabilities as a workflow,Â” says Maechling.
Moving from .5-hertz simulations to three-hertz simulations will easily push the amount of storage needed for a single site simulation above one petabyte. These runs also require systems that can handle high-capacity input-output calculations, so specialized I/O nodes on NCSA's Mercury system will continue to be crucial.
Sheer power is not enough, however. Large-scale collaborations like this one require the intimate relationships and expert services supplied by TeraGrid resource providers.
TeraShake 1 was based on a long-term collaboration between SDSC and SCEC. The center assisted with planning, code porting, and visualization, among other things. That relationship continues to this day.
The SCEC VDS-based workflow system handled job scheduling for the capacity runs using Condor glide-ins, which aggregate small jobs for queuing and then parcel them out to multiple processors once given access to the machine. Condor and the glide-in concept were developed at the University of Wisconsin in part under the auspices of NCSA.
NCSA gave the SCEC team dedicated time in its computing queues to debug the final implementation of the Condor glide-ins and to integrate them into the larger workflow. Tailored allocations that give computing time when and how it is needed are an NCSA specialty.
Â“SCEC wants to use seismological simulations in order to make socially relevant predictive statements about earthquake hazards in Southern California,Â” says Maechling. Â“These kinds of collaborations between geoscientists and the high-performance computing community are essential to us reaching this goal.Â”
Learn more at the SCEC Web site.
This article originally appeared on the NCSA Web site.
- J. William Bell, National Center for Supercomputing Applications