Covering Disruptive Technology Powering Business in The Digital Age

Home > DTA news > News > Waterline Data Releases Smart Data Catalog 4.0 for Faster Use of Big Data
Waterline Data Releases Smart Data Catalog 4.0 for Faster Use of Big Data


Waterline Data, a leader in Data Lifecycle Management, today announced its latest platform offering, Smart Data Catalog 4.0.

“It has become clear that the data catalog is a fundamental enabler not just of the management of the data within a data lake, but also for a variety of related business use cases,” said Matt Aslett, Research Director. “By creating an inventory of data and data lineage, tagging sensitive data to control access, and even identifying data redundancy, the data catalog can be used to identify data for analysis, enable data governance and rationalize excess data sets, unlocking the potential value of big data projects.” 

Connecting the Right Data to the Right People
The Smart Data Catalog 4.0 version replaces manual tagging of metadata with an automated process that rapidly classifies and organizes all an organization’s data assets and lineage, making data readily available for:

  1. Self-Service Analytics.
  2. Data Governance and access control for regulatory compliance.
  3. Data Rationalization for greater storage and cost efficiency.

Smart Data Catalog 4.0 answers fundamental questions that most organizations have regarding data. Where do I find it? Where did it come from? What’s in the data? Who can use the data? 

Smart Data Catalog 4.0 Key Features
SDC 4.0’s new enhancements were all designed to accelerate the usability of trusted data in the enterprise. New capabilities include:

  • Support for directly fingerprinting and cataloging data located in Teradata, Oracle, MySQL, and other relational databases expands Waterline beyond prior version support for Hadoop-only data sources.
  • Support for Data Lakes operating in Amazon AWS.
  • Tag-based access control identifies sensitive data fields and allows data tagged as “sensitive” to have access automatically controlled by Apache Ranger and Cloudera Sentry, along with other access control tools via REST API integration.
  • Dramatically improved user experience for the business professional with a new user interface “skin;” faster, more scalable search based on the industry standard SOLR search platform, improved crowdsourced ratings, annotation, reviews, and collaboration features.
  • The industry’s most extensible, open architecture that supports Hadoop, Spark, and Cloud deployment environments; an RDBMS plug-in architecture for relational sources, as well as extensive REST API partner integration and extensibility.

With its unique combination of automated data inventorying plus crowdsourcing, Smart Data Catalog 4.0 allows data professionals to “fingerprint” data at scale by analysing actual data values. The software automatically tags data fingerprints to glossary terms as well as matches terms through crowdsourcing, and then curates the results by allowing data stewards to accept or reject tags. Meanwhile, business professionals can easily search and use data through a user-friendly interface or through a variety of third party applications.

“Our mission at Waterline Data is to connect the right people to the right data while information is still fresh,” said Alex Gorelik, CEO at Waterline Data. “Most organizations have more than 50% of their data stagnating in quarantine zones or lost in data swamps, because nobody has the time or expertise to identify and organize the assets and decide who should have access to them. Waterline Smart Data Catalog 4.0 delivers a unique combination of automation and crowdsourcing that allows our customers to quickly get their data out of quarantine and into use with the confidence that the data is properly tagged so it can be governed and put into use in days instead of weeks or months.”

This article was originally publshed on and can be viewed in full