How Scientists Tackle NASA’s Big Data Deluge


Every hour, NASA’s missions collectively compile hundreds of terabytes of information, which, if printed out in hard copies, would take up the equivalent of tens of millions of trees worth of paper.

This deluge of material poses somebig data challenges for the space agency. But a team at NASA’s Jet Propulsion Laboratory (JPL) in Pasadena, Calif., is coming up with new strategies to tackle problems of information storage, processing and access so that researchers can harness gargantuan amounts data that would impossible for humans to parse through by hand.

“Scientists use big data for everything from predicting weather on Earth to monitoring ice caps on Mars to searching for distant galaxies,” JPL’s Eric De Jong said in a statement. Jong is the principal investigator for one of NASA’s big data programs, the Solar System Visualization project, which aims to convert the scientific information gathered in missions into graphics that researchers can use.

“We are the keepers of the data, and the users are the astronomers and scientists who need images, mosaics, maps and movies to find patterns and verify theories,” Jong explained. For example, his team makes movies from data sets like the 120-megapixel photos by NASA’s Mars Reconnaissance Orbiter during its surveys of the Red Planet.

But even just archiving big data for some of NASA’s missions and other international projects can be daunting. The Square Kilometer Array, or SKA, for example, is a planned array of thousands of telescopes in South Africa and Australia, slated to begin construction in 2016. When it goes online, the SKA is expected to produce 700 terabytes of data each day, which is equivalent to all the data racing through the Internet every two days.

JPL researchers will help archive this flood of information. And big data specialists at the center say they are using existing hardware, developing cloud computing techniques and adapting open source programs to suit their needs for projects like the SKA instead of inventing new products.

“We don’t need to reinvent the wheel,” Chris Mattmann, a principal investigator for JPL’s big data initiative, said in a statement. “We can modify open-source computer codes to create faster, cheaper solutions.”

NASA’s big data team is also devising new ways to make this archival info more accessible and versatile for public use.

“If you have a giant bookcase of books, you still have to know how to find the book you’re looking for,” Steve Groom, of NASA’s Infrared Processing and Analysis Center at the California Institute of Technology, explained in a statement.

Groom’s center manages data from several NASA astronomy missions, including the Spitzer Space Telescope, the Wide-field Infrared Survey Explorer(WISE).

“Astronomers can also browse all the ‘books’ in our library simultaneously, something that can’t be done on their own computers,” Groom added.


What is BIG Data??

What is big data?

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.


Big data spans four dimensions: Volume, Velocity, Variety, and Veracity.

Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.

  • Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
  • Convert 350 billion annual meter readings to better predict power consumption

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

  • Scrutinize 5 million trade events created each day to identify potential fraud
  • Analyze 500 million daily call detail records in real-time to predict customer churn faster

Variety: Big data is any type of data – structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.

  • Monitor 100’s of live video feeds from surveillance cameras to target points of interest
  • Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach. Until now, there was no practical way to harvest this opportunity. Today, IBM’s platform for big data uses state of the art technologies including patented advanced analytics to open the door to a world of possibilities.