Share |

Feature - The forecast before the storm

Feature - The forecast before the storm

How supercomputers and hybrid workflows helped beat tornadoes to the chase

A Doppler On Wheels collects data in a tornado during VORTEX2, as PI Nolan Atkins stands nearby collecting photogrammetric data.

Image courtesy of VORTEX2.

Chasing tornadoes won’t get you very far, if your goal is to understand how tornadoes form. To get results, researchers need to get their instruments on the ground before the tornado touches down.

That’s the big catch 22 of VORTEX2 (Verification of the Origins of Rotation in Tornadoes Experiment), according to principal investigator Joshua Wurman. Current techniques predict tornadoes an average of only 13 minutes in advance, a fact that makes it difficult to evacuate or properly prepare for the impending disaster. To improve that lead time, or learn how to predict how destructive a tornado will be, scientists need data recorded as the tornadoes form.

“In order for us to collect good data we had to surround a supercell perhaps 40 or 50 minutes before the tornado,” Wurman explained. Yet, he added, “if we knew exactly when to surround the storm, one of the big motivations of VORTEX2 wouldn’t be there.”

The solution? For 35 days in 2009 and 45 in 2010, 110 researchers took to the field in search of tornadoes.

Their day typically began before 8 a.m., when the forecasters would consult their favorite weather models to decide where they were most likely to find a supercell, the storms which sometimes result in tornado formation.

VORTEX2 by the numbers
  • 28,000 miles total
  • ~600 miles - longest distance in a single day
  • 8,000 hotels
  • > 40 cities
  • 10 states
  • 14 hours per work day
  • ~15 institutions
  • 110 researchers
  • 35 consecutive days in 2009
  • 45 consecutive days in 2010
  • 25 tornadoes
  • 44 supercell “storms”
  • ~20-30 terabytes recorded data
  • 2 submitted papers
  • ~20 papers in progress
  • ~15 institutions
  • $12 million total from NSF and NOAA

LEAD II by the numbers

  • 214 workflows
  • 109,568 CPU hours
  • 215 GB data
  • Over 9,100 2D products

At 11 a.m. they would leave their hotels for a day of fieldwork, driving en masse towards the most likely tornado site. After a long day of pursuing leads, they would head to a new hotel - selected based on availability and proximity to tomorrow’s likely storm locations - battening down for the night at 10 or 11 p.m.

“A lot of storms we followed didn’t make tornadoes, or made tornadoes before we got there,” Wurman said. Nonetheless, over the course of the two tornado seasons, the VORTEX2 team did succeed at getting data on about 25 tornadoes and 44 supercells.

The nomadic VORTEX2 effort is in stark contrast to more traditional tornado studies, where much smaller teams would make forays into the field from a single home base, bringing to bear far fewer scientific instruments.

Forecasting the distributed computing way

The forecasters used a variety of forecasts generated on high performance computing systems to facilitate their assessments of the weather. Most are well-established models generated regularly by agencies such as the U.S. National Oceanic and Atmospheric Administration (NOAA) or the U.S. National Center for Atmospheric Research (NCAR).

One relative newcomer to the list of models, the Center for Analysis and Prediction of Storms real-time analysis and forecast, was delivered courtesy of a hybrid computing project called Linked Environments for Atmospheric Discover II.

Every morning at 6 a.m., LEAD II would access the Big Red cluster at Indiana University to execute six one-hour CAPS forecasts using the latest weather data. Each succeeding forecast would cover a shorter time window, generating increasingly accurate forecasts. Forecasters in the field could access the resulting visualizations and data on their mobile phones, or a standard internet browser, by visiting the LEAD portal.

“The cellphone was useful because once you left the hotel you had very tenuous connections to live data,” explained Keith Brewster, one of three principle investigators in charge of LEAD II. “With the cellphone link I could look at weather data, including the latest LEAD model run.”

Brewster roved tornado country with the VORTEX2 team during the first two weeks of the 2010 season, serving as part of the forecasting team. During that time, he used the LEAD II system on a regular basis. For the remainder of the season, he regularly sent thunderstorm forecasts based on LEAD II and other models to the forecasters in the field.

VORTEX2 PI Karen Kosiba operates a Doppler On Wheels radar collecting data during a supercell thunderstorm.

Image courtesy of VORTEX2.

LEADing the way for hybrid workflows

Although LEAD II was just a small part of the enormous effort involved in VORTEX2, from a computer science point of view, the project was particularly unusual.

“It’s the first time we demonstrated the hybrid workflow model of computing,” said Beth Plale, one of the LEAD II co-PIs.

The workflow they created used Microsoft’s Trident Scientific Worfklow workbench as the front-end workflow system, delegating pieces of the workflow to back-end Unix- and Linux-based resources such as Big Red.

“A lot of scientists use Windows tools such as Excel,” Plale explained. “We think that utilizing a Windows workflow system on a Windows box is a step towards providing broader flexibility, because of this affinity of a lot of scientists to use Excel and because of the emergence of the cloud-based Azure platform.”

Plale’s research team at the Data to Insight Center of the Pervasive Technology Institute at Indiana University had to tweak a number of different applications to get Trident to talk to Big Red. Trident would pass data to a program called XBaya, a workflow composer web service developed in-house.

Next, the workflow moved from XBaya to the Apache Orchestration Director Engine (ODE), which executed it on Big Red. Once completed, control transferred to a Windows HPC Server machine that carried out post-forecast processing and generated 2D visualizations. LEAD II also developed a new 2D visualization of helicity, which captures circular motion within the atmosphere.

It was the hybrid nature of the workflow that made the project an interesting exercise, Plale explained. “It’s a model of computing that makes workflow systems more flexible.”

As the LEAD II team generated forecasts, it automatically collected metadata and tracked the data’s provenance, carefully documenting it for future reference and study.

“We are using this curated data collection now as an example for our university’s effort to archive curated scientific data,” Plale said. Examples of effective data curation through automated metadata collection are necessary if researchers are to meet the National Science Foundation’s mandate, which dictates that beginning this fall all proposals must include a data management plan.

From data to improved forecasts, and beyond

In the aftermath of VORTEX2, there remains a great deal of work to be done. At the moment, research groups that participated in VORTEX2 have returned to their home institutions to analyze the data their scientific instruments recorded. Although a few papers are already on track towards publication, according to Wurman, some of the most interesting papers that will integrate large swathes of the VORTEX2 data, probably won’t see the light of publication until 2014 or 2015.

In the meantime, some scientists are using the data to test theoretical computer models of tornado formation.

“We are assimilating VORTEX2 observations into the Weather Research and Forecasting (WRF) model,” said Yvette Richardson, a scientist based at Pennsylvania State University.

Most of the data assimilation experiments conducted by Richardson’s research group make use of the NCAR supercomputer, although analysis of the results and some of the simpler simulations are performed using local desktops or clusters. The conclusions they reach could ultimately lead to improved forecasts.

“We will have to wait and see,” Richardson said. “To the extent that forecasters base their impressions of likely scenarios on high resolution models, the improvement of those models in terms of developing the correct types of storms and better representing the interactions of those storms with the environment and with other storms should lead to improved forecasts.”

What next?

“I anticipate there will be a potential lull in field programs over the next couple of years,” Wurman said. “We’ll probably be focusing on supporting analysis efforts on these terabytes of data we’ve gathered.”

The first VORTEX project concluded in 1995. And despite the fact that the VORTEX2 data will likely be exhausted in the next five or so years, it could very well be more than a decade before people start to entertain the notion of doing VORTEX3.

“You don’t want to repeat an experiment you just did,” Wurman explained. In order to make VORTEX2 worthwhile, the mobile radar observing systems pioneered by Wurman at the tail-end of VORTEX1 had to mature. A similar leap forward in technology would probably be necessary before anyone would consider organizing VORTEX3.

“We have all these things that can measure these conditions on the ground, but if we want to know these things above the ground, the only thing we’ve been able to come up with is unmanned aircraft,” Wurman said. “There were a couple of test deployments of these systems at the end of VORTEX2... I think the technology holds great promise.”

You can download the data from LEAD II’s simulations here. To learn more about VORTEX2, please visit their website here.

Or for a more light-hearted VORTEX2 fix, check out The Weather Channel’s series of short segments, in which they followed VORTEX2 throughout both tornado seasons.

—Miriam Boon, iSGTW

No votes yet


Post new comment

By submitting this form, you accept the Mollom privacy policy.