In January, MapR will ship its new Converged Data Platform, designed to break down silos by integrating file, database, stream processing, and analytics on one platform. IDN talks with MapR about the new platform – and tips for IoT-ready big data.
To kick off the 2016 New Year, MapR is set to make big noise for big data. In January, MapR’s Converged Data Platform will ship, designed to break down silos by integrating file, database, stream processing, and analytics on one platform.
MapR’s Converged Data Platform sports features that open a new world of easier-to-deliver projects for data-at-rest, data-in-motion, Internet of Things and even a mix, Jack Norris, MapR’s chief marketing officer, told IDN.
“MapR’s Converged Data Platform provides organizations operational agility by eliminating the delay and lack of integrated insights that are only possible with a converged data platform. By unifying platform data services in one system/cluster, cluster sprawl and total cost of ownership is reduced and the overall data-to-action cycle is accelerated,” Norris told IDN.
It’s design aims to confers benefits on developers, IT operations and even business managers — all stakeholders working on big data projects, he added.
How MapR’s Converged Data Platform Unifies File, Database, Streams, Analytics
With its latest platform update, MapR brings together its proven MapR distro capabilities of enterprise storage [MapR-FS]; database [MapR-DB]; and open source processing engines [Hadoop, Spark] with a new global event streaming system, called MapR Streams. This adds support for high-throughput, real-time streaming to the same platform. Further, all features are available within a single cluster.
Because MapR’s Converged Data Platform natively integrates MapR Streams with MapR’s Hadoop distribution, organizations can continuously collect, analyze and act on streaming data, Norris said.
Technically, MapR’s Converged Data Platform approach provides big benefits across the big data lifecycle, offering both devs and IT ops several advantages, Norris added.
It allows developers “to accelerate application development where developers have a full set of open API’s and open source projects to choose from,” he said. For IT, it provides, “a simplified enterprise architecture with unified administration, unified security, unified HA/DR services, a global/single namespace for files, tables, event streams, and more.”
To drill down a bit, MapR’s Converged Data Platform is able to:
- Easily build scalable, continuous high-throughput streams across thousands of locations with millions of topics and billions of messages
- Unite analytics, transaction, and stream processing to reduce data duplication, latency, and cluster sprawl while using existing open source projects like Spark Streaming, Apache Storm, Apache Flink, and Apache Apex
- Enable reliable message delivery with auto-failover and order consistency
- Ensure cross-site replication to build global real-time applications
- Provide unlimited persistence of all messages in a stream
Real Business Benefits from Streaming Big Data for Real-Time and IoT Projects
For years, big data architectures have promised ways to deliver insights and other business benefits with streaming and real-time data. That said, getting these results have been a struggle for many adopters. Norris shared with IDN how MapR’s Converged Data Platform may finally overcome today’s obstacles.
“Current data streaming architecture usually involve Kafka or Flume integrated with Spark Streaming or Storm to provide real-time insights. This requires separate hardware clusters and data movement over the network,” Norris said.
MapR Converged Data Platform was engineered to overcome these requirements in several ways, Norris said. Among them:
- Unlike legacy special-purpose message queues, MapR Streams is a distributed system that runs on commodity hardware and scales linearly.
- Compared to other scale-out approaches, MapR eliminates the need for separate clusters for data transport (i.e. Kafka) and data processing (i.e. Hadoop or Spark). One unified cluster to meet both needs.
- Enterprise features such as HA / DR (high-availability / disaster recovery with mirroring and snapshots), security (with authentication, access control, and encryption), and multi-tenancy are already bundled in.
- No geo limits; data can be produced and consumed anywhere across the globe in real-time.
- Batch (e.g., MapReduce), interactive (Drill), and stream processing (e.g., Spark Streaming) frameworks all have direct access to event streams. This key architecture eliminates the need for moving data, and thus can ensure data integrity and consistency.
One high-profile use case that is driving questions about big data’s support for streaming and real-time data is certainly Internet of Things (IoT). We asked Norris to share some questions he thinks companies should ask about their big data architecture and staff to see if they’re ready for IoT workloads.
Here are some questions to ask about whether your big data systems are ready for IoT that Norris says are worth putting on your list:
Do you have developers in place who can take advantage of open source projects in Hadoop, Spark and more?
Do you have a horizontal scale-out architecture in place that scale incrementally as data volumes grow?
Are you looking for a global, multi-data center solution out of the box?
Is real-time data processing as events take place a critical business requirement?
Are you looking for a simplified architecture for storage, database, and streaming?
Are enterprise features such as HA/DR, security, multi-tenancy critical?
Norris also said some technologies powering the MapR Converged Data Platform will be shared with open source, in line with MapR’s tradition of supporting other popular open source big data projects (including HBase, Spark and YARN).
“We have open-sourced the OJAI API, which supports a JSON document model and is a common interface across files, database tables, and streams,” Norris said The OJAI API aims to make it easier to build applications that can call on any of these services from a common API, he added.
In addition, MapR is a co-lead and committer to the new Apache Myriad project, which integrates Apache YARN with Apache Mesos, providing a data-center-wide resource management framework for Hadoop and non-Hadoop applications, he said.
“Myriad paves the way for Hadoop jobs to co-exist with non-Hadoop jobs in large-scale clusters that can span across multiple data centers. With Myriad in place, entire data center resources can be managed as a single pool of resources, breaking down any processing silos,” according to the Apache website.
- November 2020(48)
- October 2020(79)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(26)
- September 2019(24)
- August 2019(15)
- July 2019(24)
- June 2019(55)
- May 2019(82)
- April 2019(77)
- March 2019(71)
- February 2019(67)
- January 2019(77)
- December 2018(46)
- November 2018(48)
- October 2018(76)
- September 2018(55)
- August 2018(63)
- July 2018(74)
- June 2018(64)
- May 2018(65)
- April 2018(76)
- March 2018(82)
- February 2018(65)
- January 2018(80)
- December 2017(71)
- November 2017(72)
- October 2017(75)
- September 2017(65)
- August 2017(97)
- July 2017(111)
- June 2017(87)
- May 2017(105)
- April 2017(113)
- March 2017(108)
- February 2017(112)
- January 2017(109)
- December 2016(110)
- November 2016(121)
- October 2016(111)
- September 2016(123)
- August 2016(169)
- July 2016(142)
- June 2016(152)
- May 2016(118)
- April 2016(60)
- March 2016(86)
- February 2016(154)
- January 2016(3)
- December 2015(150)