IBM Announces Major Commitment In Big Data Software Apache Spark

IBM announced plans to incorporate Apache Spark technology into the company’s enterprise, analytics and cloud platforms. Apache Spark is the high performance open-source cluster-computing framework originally developed by AMPLab at UC Berkeley. Spark’s in-memory processing offers significant performance improvements over similar technologies such as traditional Hadoop clusters. IBM is also committing considerable R&D and educational resources towards the project.

“Most people call Spark a data analytics engine or a programming framework, but I see things a little differently. To me it’s really an analytics operating system. Like Linux, it’s a foundation upon which developers of all types, from startups to giant corporations, can build applications,” said Bob Picciano Senior Vice President, IBM Analytics.

IBM has had a close association with the Apache Spark project dating back before Spark’s initial release in 2009. IBM is a founding member of UC Berkeley’s AMPLab along with Google, SAP and Amazon Web Services. IBM said that it plans to commit over 3,500 researchers and developers to work on Spark-related projects.

IBM said that it plans to build Spark into its analytics and commerce technologies. Watson Health Cloud, the cloud-based healthcare data sharing hub, will use Spark to help its customers quickly gain new insights from the analysis of massive amounts of personal health data. To make it easier for developers to access Spark technology, IBM will make Spark available as a service on the company’s Bluemix cloud.

IBM will open-source its SystemML declarative machine learning (ML) system and contribute it to the Apache Spark project. Companies wishing to run ML algorithms on very large datasets using low-level MapReduce algorithms can be faced with creating complex systems with prohibitive development costs. SystemML expresses many of the building blocks of these algorithms in a high-level language called Declarative Machine Language (DML), compiled and optimized to run on Hadoop, with a significant reduction in complexity and cost.

In 2013, the Apache Software Foundation absorbed Spark, and became their top level project in 2014.

Mike Gualtieri, an analyst of Forrester Research, said: “IBM makes its money higher up, building solutions for customers,” Mike Gualtieri also stated: “That’s ultimately why this makes sense for IBM.”