SHARE
Facebook X Pinterest WhatsApp

CERN Revamps Architecture to Better Support Hadron’s Big Data

Five Ways to Know if Your Challenge Is Big Data or Lots of Data When it comes to physics, CERN may be at the top of the food chain. But when it comes to managing Big Data, it’s trying to become more like Google, CERN Infrastructure manger Tim Bell said at Structure: Europe. “We conceded […]

Written By
thumbnail
Loraine Lawson
Loraine Lawson
Sep 19, 2013
Slide Show

Five Ways to Know if Your Challenge Is Big Data or Lots of Data

When it comes to physics, CERN may be at the top of the food chain. But when it comes to managing Big Data, it’s trying to become more like Google, CERN Infrastructure manger Tim Bell said at Structure: Europe.

“We conceded that our challenge is not special — Google is way ahead of us in scale. We need to build on what they’ve done,” Bell said, according to this recent GigaOm article.

Normally, I’d leave infrastructure to ITBusiness Edge blogger Arthur Cole, but as is so often the case with Big Data, this piece falls somewhere between the data software and hardware. Here’s the challenge CERN faces today:

  • CERN generates 40 million pictures a second of proton collisions, which with a 100 megapixel camera translates into 1 petabyte of data per second;
  • So far, CERN has 35 petabytes of data to record per year, but that will double when CERN upgrades the collider.
  • Physicists want to keep all that data for 20 years.
  • The result is a hardware problem: Archiving currently requires 45,000 tape drives.

The solution, however, is a blend of infrastructure and software changes.

Last year, I had the privilege of interviewing CERN physicist Alex Naumann about how the organization handles Big Data. CERN scientists use a home-grown, object-oriented program and library called Root, which was developed by physicists to perform the actual analysis of the data. Managing that was a challenge, since developers used 15 different platforms.

CERN will still use ROOT, as far as I can tell, but it will no longer rely on in-house software for managing the supporting Big Data clusters. Instead, it’s shifting that custom software to software such as Puppet.

It will also use OpenStack, the open source infrastructure cloud as a service platform, for a virtualized infrastructure. That should work well, since one of the major IT challenges at CERN is its distributed and changing workforce. It’s also shifting to the open source Puppet for configuration management.

What made me happy about this shift is the reason behind it:

“Users also want to provision an analysis cluster with 50 machines themselves for an afternoon that then goes away again. It is about providing those kinds of services,” Ian Bird, the Large Hadron Collider computing grid project leader told PC World.

If you’d like to see CERN’s new architecture data flow, check out page 38 on this slide show presented by Bell at the 2012 Puppet Conference.

Recommended for you...

Top Managed Service Providers (MSPs) 2022
Observability: Why It’s a Red Hot Tech Term
Tom Taulli
Jul 19, 2022
Top GRC Platforms & Tools in 2022
Jira vs. ServiceNow: Features, Pricing, and Comparison
Surajdeep Singh
Jun 17, 2022
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.