15 Petabytes of Data Each Year
When in operation, the Large Hadron Collider (LHC) is expected to produce 15 million gigabytes of data per year. That a lot, to say the least. Enough information to create a 21-kilometre-high stack of CDs annually.
To crunch all that data, CERN will use a grid computing system that has been unveiled today: The Worldwide LHC Computing Grid (WLCG – physicists love acronyms).
A Few Facts About the Worldwide LHC Computing Grid
- 100,000 processors (at least) in 32 countries
- Tier-0 is one site: the CERN Computing Centre. All data passes through this central hub but it provides less than 20% of the total compute capacity.
- Tier-1 comprises eleven sites, located in Canada, France, Germany, Italy, the Netherlands, the Nordic countries, Spain, Taipei, and the UK, with two sites in the USA.
- Tier-2 comprises over 140 sites, grouped into 38 federations covering Australia, Belgium, Canada, China, the Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Republic of Korea, the Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, the U.K, Ukraine, and the U.S. Tier-2 sites will provide around 50% of the capacity needed to process the LHC data.
- Tier-2 sites then feed their data to PC clusters in physics institutes around the world, such that groups of scientists and individuals can analyze LHC data from their own desks.
- Access to experimental data needs to be provided for the 5000 scientists in some 500 research institutes and universities worldwide.
- In order to distribute this data, CERN relies on dedicated 10Gbit/s fiber-optic lines that connect CERN with the 11 Tier-1 data centers on the grid. The Tier-2 centers are connected to the grid via regular Internet connections.
- All data need to be available over the 15 year estimated lifetime of the LHC
This is only anecdotal, but a photo from CERN shows a computer booting, and on the screen we can see that it has 2 Intel Xeons E5345 Quads at 2.33 Ghz. The amount of RAM is hard to see, but I think it’s 16 gigabytes.
So far LHC@home has been running a simulation of “particles traveling around the LHC to study the stability of their orbits” (more details here), but CERN is looking for ways to use volunteer CPUs for more. Because of the enormous amounts of data that would need to be transfered, LHC@home volunteers won’t be able to do the same kind of crunching that the WLCG sites do, but with a little luck something interesting to do will be found for them (though I still think that crunching for Rosetta@home is the best way to donate your idle CPU cycles).
- Worldwide LHC Computing Grid
- CERN Officially Unveils Its Grid: 100,000 Processors, 15 Petabytes a Year
- CERN Document Server
- Graphics Processing Units (GPUs): The Future of Scientific Distributed Computing
- Supercomputers Break the Petaflops Barrier