Share |

CERN data center passes 100 petabytes

Servers at the CERN data center collected 75 petabytes of LHC data in the last three years, bringing the total recorded physics data to over 100 petabytes. Image courtesy CERN.

Computer engineers at CERN last week announced that the CERN data center has recorded over 100 petabytes of physics data over the last 20 years. Collisions in the Large Hadron Collider (LHC) generated about 75 petabytes of this data in the past three years.

One hundred petabytes (which is equal to 100 million gigabytes) is a very large number indeed – roughly equivalent 700 years of full HD-quality movies. Storing it is a challenge. At CERN, the bulk of the data (about 88 petabytes) is archived on tape using the CERN Advanced Storage system (CASTOR) and the rest (13 petabytes) is stored on the EOS disk pool system – a system optimized for fast analysis access by many concurrent users.

"We have eight robotic tape libraries distributed over two buildings, and each tape library can contain up to 14,000 tape cartridges," says German Cancio Melia of the CERN IT department. "We currently have around 52,000 tape cartridges with a capacity ranging from one terabyte to 5.5 terabytes each. For the EOS system, the data are stored on over 17,000 disks attached to 800 disk servers." 

The first three-year LHC running period at CERN reached its conclusion on 16 February, 2013, with the LHC now beginning its first long shutdown, known as LS1. Over the coming months, major consolidation and maintenance work will be carried out across the whole of CERN’s accelerator chain. The LHC will be readied for higher energy running, and the experiments will undergo essential maintenance. The LHC running is scheduled to resume in 2015, with the rest of the CERN complex starting up again in the second half of 2014.

Despite this, CERN’s experimental physics community still has plenty of data to analyse during LS1. You can read more about this on the CERN website, here.

Not all the information was generated by LHC experiments. "CERN IT hosts the data of many other high-energy-physics experiments at CERN, past and current, as well as a data centre for the AMS experiment," says Dirk Duellmann of the IT department.

"For both tape and disk, providing efficient data storage and access is very important," says Duellmann, "and this involves identifying performance bottlenecks and understanding how users want to access the data."

Tapes are checked regularly to make sure they stay in good condition and are accessible to users. To optimize storage space, the complete archive is regularly migrated to the newest high-capacity tapes. Disk-based systems are replicated automatically after hard-disk failures and a scalable namespace enables fast concurrent access to millions of individual files.

The data center will keep busy during the long shutdown of the whole accelerator complex, analysing data taken during the LHC's first three-year run, and preparing for the higher expected data flow when upgraded accelerators and experiments start up again. An extension of the center, and the use of a remote data center in Hungary will further increase the data center's capacity.

Expect further petabytes.


This article was originally published on the CERN website, here.

- Cian O'Luanaigh

Your rating: None Average: 3.5 (6 votes)