Feature - The CMS “Top 100”
It takes audacity to throw away all but a hundred of every 40 million data points.
How can you be sure you’re saving the precious few that may lead to discovery?
For the CMS high-energy physics experiment, the response involves gambling. Fortunately, these scientists have a good hand.
The Large Hadron Collider at the CERN laboratory in Geneva, Switzerland, will turn on its colliding proton beams late this year, ramping up to a rate of 40 million collisions per second.
Each collision produces particles that leave signals in a surrounding detector.
Finding the best of the best
The CMS experiment, positioned at one of the LHC collision areas, and subject to real-world network limitations, must select only the very best candidates for analysis.
Based on knowledge of how particles with particular attributes behave and how they interact with the detector materials, the physicists develop computer simulations of a variety of expected signals occurring in the CMS detector. These simulations are called “Monte Carlos” because they inject an element of randomness, as both dice and nature do. Physicists must run these simulations literally millions of times in order to gather enough statistics to understand which signals from the detector correspond to which particles and interactions.
Simulating for success
“We need to run full analysis on millions of simulations before start-up,” says Tulika Bose of Brown University, co-leader of a Monte Carlo production team at Fermilab.
“We need to learn to identify signals that represent interesting physics and figure out how many of the coveted 100 spots each kind should take up. We also want to leave room to capture something new if it comes along.”
To complicate matters further, early in 2006, at roughly T minus 18 months, CMS began transitioning to a new software analysis framework. With each new version the physicists validate the output of the simulations against known results.
CMS has several production teams running simulations around the clock on computers all over the world on the LHC Computing Grid, which includes Open Science Grid in the United States and EGEE in Europe.
“We use OSG to farm out thousands of jobs in parallel to our grid sites and to other non-CMS sites,” explains Ajit Mohapatra of the University of Wisconsin at Madison, who heads a U.S.-based production team that runs exclusively on dedicated OSG resources. “Monte Carlo production runs much faster on the grid, and we don’t clog any one site’s resources.”
“Everything has to be ready when data starts,” adds Bose. “We simply couldn’t do it without the grid.”
- Anne Heavey, Open Science Grid