Covering Disruptive Technology Powering Business in The Digital Age

Home > Archives > Blog > Exploring Hadoop distributions for managing big data
Exploring Hadoop distributions for managing big data
February 12, 2016 Blog big data

This article was originally published by and can be viewed in full here

Hadoop is an open source technology that today is the data management platform most commonly associated with big data applications. The distributed processing framework was created in 2006, primarily at Yahoo and based partly on ideas outlined by Google in a pair of technical papers; soon, other Internet companies such as Facebook, LinkedIn and Twitter adopted the technology and began contributing to its development. In the past few years, Hadoop has evolved into a complex ecosystem of infrastructure components and related tools, which are packaged together by various vendors in commercial Hadoop distributions.

Running on clusters of commodity servers, Hadoop offers a high-performance, low-cost approach to establishing a big data management architecture for supporting advanced analytics initiatives. As awareness of its capabilities has increased, Hadoop’s use has spread to other industries, for both reporting and analytical applications involving a mix of traditional structured data and newer forms of unstructured and semi-structured data. This includes Web clickstream data, online ad information, social media data, healthcare claims records, and sensor data from manufacturing equipment and other devices on the Internet of Things.