Covering Disruptive Technology Powering Business in The Digital Age

Home > Archives > Blog > MapR Ushers in New Year for Big Data; Breaks Down Data Silos with Converged Data Platform
MapR Ushers in New Year for Big Data; Breaks Down Data Silos with Converged Data Platform
February 12, 2016 Blog big data

This article was originally published by and can be viewed in full here


In January, MapR will ship its new Converged Data Platform, designed to break down silos by integrating file, database, stream processing, and analytics on one platform. IDN talks with MapR about the new platform – and tips for IoT-ready big data.

To kick off the 2016 New Year, MapR is set to make big noise for big data. In January, MapR’s Converged Data Platform will ship, designed to break down silos by integrating file, database, stream processing, and analytics on one platform.

MapR’s Converged Data Platform sports features that open a new world of easier-to-deliver projects for data-at-rest, data-in-motion, Internet of Things and even a mix, Jack Norris, MapR’s chief marketing officer, told IDN.

“MapR’s Converged Data Platform provides organizations operational agility by eliminating the delay and lack of integrated insights that are only possible with a converged data platform. By unifying platform data services in one system/cluster, cluster sprawl and total cost of ownership is reduced and the overall data-to-action cycle is accelerated,” Norris told IDN.

It’s design aims to confers benefits on developers, IT operations and even business managers — all stakeholders working on big data projects, he added.

How MapR’s Converged Data Platform Unifies File, Database, Streams, Analytics

With its latest platform update, MapR brings together its proven MapR distro capabilities of enterprise storage [MapR-FS]; database [MapR-DB]; and open source processing engines [Hadoop, Spark]  with a new global event streaming system, called MapR Streams. This adds support for high-throughput, real-time streaming to the same platform. Further, all features are available within a single cluster.

Because MapR’s Converged Data Platform natively integrates MapR Streams with MapR’s Hadoop distribution, organizations can continuously collect, analyze and act on streaming data, Norris said.

Technically, MapR’s Converged Data Platform approach provides big benefits across the big data lifecycle, offering both devs and IT ops several advantages, Norris added.

It allows developers “to accelerate application development where developers have a full set of open API’s and open source projects to choose from,” he said.  For IT, it provides, “a simplified enterprise architecture with unified administration, unified security, unified HA/DR services, a global/single namespace for files, tables, event streams, and more.”

To drill down a bit, MapR’s Converged Data Platform is able to:

  • Easily build scalable, continuous high-throughput streams across thousands of locations with millions of topics and billions of messages
  • Unite analytics, transaction, and stream processing to reduce data duplication, latency, and cluster sprawl while using existing open source projects like Spark Streaming, Apache Storm, Apache Flink, and Apache Apex
  • Enable reliable message delivery with auto-failover and order consistency
  • Ensure cross-site replication to build global real-time applications
  • Provide unlimited persistence of all messages in a stream

Real Business Benefits from Streaming Big Data for Real-Time and IoT Projects

For years, big data architectures have promised ways to deliver insights and other business benefits with streaming and real-time data.  That said, getting these results have been a struggle for many adopters.  Norris shared with IDN how MapR’s Converged Data Platform may finally overcome today’s obstacles.

“Current data streaming architecture usually involve Kafka or Flume integrated with Spark Streaming or Storm to provide real-time insights. This requires separate hardware clusters and data movement over the network,” Norris said.

MapR Converged Data Platform was engineered to overcome these requirements in several ways, Norris said. Among them:

  • Unlike legacy special-purpose message queues, MapR Streams is a distributed system that runs on commodity hardware and scales linearly.
  • Compared to other scale-out approaches, MapR eliminates the need for separate clusters for data transport (i.e. Kafka) and data processing (i.e. Hadoop or Spark). One unified cluster to meet both needs.
  • Enterprise features such as HA / DR (high-availability / disaster recovery with mirroring and snapshots), security (with authentication, access control, and encryption), and multi-tenancy are already bundled in.
  • No geo limits; data can be produced and consumed anywhere across the globe in real-time.
  • Batch (e.g., MapReduce), interactive (Drill), and stream processing (e.g., Spark Streaming) frameworks all have direct access to event streams. This key architecture eliminates the need for moving data, and thus can ensure data integrity and consistency.

One high-profile use case that is driving questions about big data’s support for streaming and real-time data is certainly Internet of Things (IoT).  We asked Norris to share some questions he thinks companies should ask about their big data architecture and staff to see if they’re ready for IoT workloads.

Here are some questions to ask about whether your big data systems are ready for IoT that Norris says are worth putting on your list:

Do you have developers in place who can take advantage of open source projects in Hadoop, Spark and more?

Do you have a horizontal scale-out architecture in place that scale incrementally as data volumes grow?

Are you looking for a global, multi-data center solution out of the box?

Is real-time data processing as events take place a critical business requirement?

Are you looking for a simplified architecture for storage, database, and streaming?

Are enterprise features such as HA/DR, security, multi-tenancy critical?

Norris also said some technologies powering the MapR Converged Data Platform will be shared with open source, in line with MapR’s tradition of supporting other popular open source big data projects (including HBase, Spark and YARN).

“We have open-sourced the OJAI API, which supports a JSON document model and is a common interface across files, database tables, and streams,” Norris said  The OJAI API aims to make it easier to build applications that can call on any of these services from a common API, he added.

In addition, MapR is a co-lead and committer to the new Apache Myriad project, which integrates Apache YARN with Apache Mesos, providing a data-center-wide resource management framework for Hadoop and non-Hadoop applications, he said.

“Myriad paves the way for Hadoop jobs to co-exist with non-Hadoop jobs in large-scale clusters that can span across multiple data centers. With Myriad in place, entire data center resources can be managed as a single pool of resources, breaking down any processing silos,” according to the Apache website.