Why Automated Sensitive Data Catalog
A common misconception is that IT teams can manually compile a list of sensitive data locations. However, manual sensitive data discovery is impractical for the following reasons:
- In traditional relational databases such as Oracle, sensitive data could be lurking in opaque varchar/clob/blob columns.
- In NoSQL databases such as mongodb, the lack of predefined schema makes it challenging to manually assess what sensitive data might be stored in the database.
- Application log files also sometimes contain sensitive data. This is likely a result of developer mistake, but, unfortunately, it does happen.
- Traditional data warehouses, cloud data stores, and Hadoop often lead to the same (sensitive) data getting replicated to many data stores (Vertica, Amazon S3, Hadoop, etc.).
- Sensitive data, like other data, is often not deleted, but archived. This creates one more location where sensitive data is stored.
- Scanned documents may also contain sensitive data.
The sheer number of data stores potentially containing sensitive data makes manual discovery impractical. Clearly, automation is needed.
Combine the fact that in today’s dynamic IT environment, sensitive data locations keep changing with time, automatic sensitive data discovery becomes almost a must-have.
The Kogni Discovery Engine is the ideal solution for this problem. It scans an enterprise’s data stores and automatically builds a sensitive data catalog. The sensitive data catalog can be explored in Kogni’s intuitive, interactive dashboard, and it can also be accessed through an API to build higher-level functionality such as data-erasure logic.
Kogni’s Interactive Sensitive Data Dashboard
Highlights of the Kogni Discovery Engine:
- Discovers sensitive data stored in text and images
- Inspects Hadoop, S3, NoSQL, and RDBMS
- Purpose-built classifiers for sensitive data like credit card numbers, SSNs, emails, phone numbers, and more
- User-defined classifiers to identify sensitive data types unique to your enterprise
- Dashboard view of sensitive data across enterprise data sources