April 19, 2018

Data Audit for a Post-RDBMS World

Data Audit for a Post-RDBMS World

We live in a post-RDBMS world.

Yet the audit and compliance tools are designed predominantly for the RDBMSs (Oracle, MySQL).

Consider the following facts regarding unstructured data:

  • Unstructured data is on the rise.
    • 80% of enterprise data today is unstructured. [Gartner]
    • Unstructured data is growing at an alarming rate of 70 percent per year. [Symantec]
  • Unstructured data is vulnerable.
    • 41% of companies have over 1,000 sensitive files open to everyone. [Varonis]
  • Unstructured data is hard to audit.
    • There is lack of tooling to discover what sensitive information might be lurking in unstructured data sources.

Also, consider the following facts regarding scanned documents (stored as image files):

  • Many consumer-facing companies ask their customers for scanned copies of driver’s license, credit card, etc.
    • But there are inadequate security controls around scanned documents due to lack of tooling support.
  • Case in point: The recent FedEx data breach
    • "FedEx was storing more than 100,000 scanned documents including passports, drivers licenses, and security IDs on an unsecured Amazon S3 server. This Amazon S3 server was forgotten in an years-old acquisition."

Lack of Tooling for Auditing Unstructured Data

Audit and compliance tools have not kept pace with the changing data landscape. There simply are not any good tools to reliably and accurately audit unstructured text data and image data. The existing tools still serve the pre-NoSQL, pre-Cloud, and pre-Hadoop world of RDBMSs.

To address this glaring gap, we created Kogni. Kogni discovers sensitive information in unstructured text data and images, in addition to traditional RDBMSs. Kogni does that for both on-premise and cloud data stores. Finally, Kogni makes this audit information available in an intuitive and interactive web-based dashboard.

One More Thing

The very first question, an audit team asks their client is, “what data stores do you have?” Unsurprisingly, the answer often is a bunch of spreadsheets that became outdated soon after they were created. Here is why:

  • New data stores are constantly being added
  • Cloud migration
  • Hadoop is still the Wild West when it comes to data governance
  • Multiple IT environments for development, QA, staging, and production

To handle this common audit bottleneck, we have recently released Kogni Database Discoverer. Kogni Database Discoverer is capable of scanning various IT environments to compile a comprehensive list of all active databases. This simple step alone can jumpstart the audit process.

Summing Up

In today’s post-RDBMS world, an ever-growing amount of data is in unstructured text and image format. At the same time, NoSQL databases, cloud migration, and the emerging trend of multi-cloud have made manual audit and traditional RDBMS-based audit tools a non-starter.

We also looked at how Kogni, with its Database Discoverer and support for unstructured text/image data, hits the sweet spot for today’s heterogeneous and high-entropy data world.

Want to see Kogni in action? Request Demo