Covering Disruptive Technology Powering Business in The Digital Age

Home > DTA news > News > Strata+Hadoop Interview Series – Doug Cutting, Creator of Hadoop and Mike Olson, CSO, Cloudera
Strata+Hadoop Interview Series – Doug Cutting, Creator of Hadoop and Mike Olson, CSO, Cloudera


During Big Community’s stint in Singapore at the Strata-Hadoop Conference, we managed to squeeze out some time from the busy schedule of Doug Cutting and Mike Olson to get their views on what they see for the future for Big Data in the region.

We spoke to them about the latest technologies in what could be improving upon Hadoop and beyond.

Doug explains the two areas related to Hadoop. “There are two senses in this case. The most precise sense is the Hadoop Apache open source project and the more general sense that people often times mean, are all the projects that have built upon Hadoop. In the more precise sense, the technologies in Apache Hadoop are already being replaced. We’ve got a storage system HDFS, we’ve got a scheduler YARN and we’ve got an executioner in MapReduce”.


Doug Cutting

There already are competitors for each of these. In some cases the competitors are beating out these technologies. Such as Spark for instance being a better solution than MapReduce. We only see a few cases where MapR does excel. New systems in HDFS such as Apache Kudu are better in most but not all cases, therefore having a general file system like HDFS can come in useful at times.

“With YARN we see many competitors that aren’t an exact replacement but I think it’s a good thing, because it shows the success of this model of having a loosely connected set of open source projects that are independently run. It also makes Cloudera’s role clear as a curator of this ecosystem of open source projects, while not being tied to any one project specifically”, he added.

Mike Olson went on to say that the early design of the Hadoop project had 3 separate components. “It was modular from the beginning. When Cloudera started it, we started shipping MapReduce and HDFS as the core project. Today we bundle 27 projects that include Apache Spark, Impala and many other eco system projects”.


Mike Olson

Hadoop has expanded dramatically from its initial days. Although some of those projects are not used as often, the ecosystem as a whole, has become robust. The pace at which these new products have been created and the problems they are addressing, has been nothing short of impressive.

Talking on machine learning and artificial intelligence, Doug says that it does have a big impact but at the moment, that hype is all the press doing.

“We find a majority of the customers are finding value from simpler technologies. From just being able to count things which they couldn’t count previously, the customer already finds that incredibly valuable. Also figuring out what needs to be counted and what can’t be counted”.

Mike Olson took an opposite approach by looking at what Big Data’s effect was on machine learning instead of what effect machine learning had on big data.

“Finally we can collect these very large data sets and we can train the machine learning systems. We can build much more accurate models than we could have before”, he says.

Especially with financial services, banks and credit card companies that need to pay attention to fraud and monitor money laundering activities. The use of machine learning techniques to go over historical data and to recognise fraudulent events in the timeline is crucial in locating fraudulent behaviour in the system. They also need to recognise the activities when it happens in real time.

“Same is true in cyber security”, Mike added. “When the bad guys try to break into your network, it helps to know what your network looks like normally. Training your models on historical data lets you recognise different behaviour in the future”.

Talking on the dangers of new technology falling in the wrong hands, Mike views any technology as a double-edged sword. Big Data being no different. Ethics and practices within the legal framework should be adhered to in maintaining and protecting peoples interest.

“There’s been a lot of great work done on data privacy”, Mike adds. “Around the world, we make sure that privacy is protected with strong encryption and good access controls; noticing log-ins, who touches the data and what they do with it. As with any technology, the companies that build these systems, need to look at the law and ethical considerations. This isn’t a new problem. It’s really a new system running into an old problem.”

His experience with companies he deals with is that they work hard to be responsible and ethical. He believes everyone recognises the fact that public attention is looking at this very closely. No one wants bad press in these times of discovery so he believes that companies are taking the right approach in setting good standards to be adhered to.

Big Data is being used to understand consumer behaviour, driving better engagement and real-time interaction with customers and has also delivered real results for many corporations throughout the region as expressed by many keynote speakers from the banking and service industries.

“The partner ecosystems, the solutions providers as well as the applications run on the platform are growing quite quickly in this region. We’re seeing big banks and big insurance companies, the big data consumers, adopt the technology much more than a few years ago”, Mike shares.

Doug agreed wholeheartedly and says that the technology has really taken off in the last 2 years and its showing in how their business is growing phenomenally in the region.