April 20, 2020

Data Discovery- A Foundation for Your Data Security Strategy

Data Discovery- A Foundation for Your Data Security Strategy

Remember the long-forgotten times when employees would type up information or write them by hand to create hard copies of files? These copies would then be photocopied several times to be circulated among different teams. This process gave way to easy tracing and control of information as the volume of data, back then, was also lesser.

In comparison, the current scenario paints a strikingly contrasting picture. Let us assume Pam is an employee at your enterprise. Pam creates a (digital) draft and mails it to a few people for their initial review. The draft comes back with their comments. She then incorporates the comments into the draft. She sends the revised draft over to her boss, who then makes more revisions to it. The process ends when Pam stores the final draft in the local server/ SharePoint and sends it to the intended audience.

Now, let’s trace our steps back and focus on the employees who reviewed the draft(s), to understand how complicated the information sharing process has become lately. They could have either downloaded Pam’s copy, made changes to it and sent it to their colleagues for further input or could have just sent it back to Pam with their comments. They could have even saved it to the cloud or even downloaded it on to their mobile devices to be able to read it at home later. There are now multiple copies and versions of Pam’s initial draft, available with several of her co-workers and in various locations.

Currently, with the options of data storage expanding more than ever, and the cost to store them decreasing, more and more people are creating and saving their data at multiple locations.

Circling back to Pam, what if her draft contained sensitive information in it? What if it had Personally Identifiable Information (PII), Health data or Intellectual Property? You wouldn’t want to leave it to guesswork to be able to confirm if any of the aforementioned data was present in it. And if there were any, you would certainly not want it to exit your enterprise’s premises to be misused or stolen.      

Most enterprises have little understanding of what and how much data they have, where it resides and who has access to it. This is where an efficient Data Discovery solution comes in.  

Data Discovery explained

Data Discovery is the process of identifying data across your organization’s data landscape. Within the context of Sensitive Data Security, Sensitive Data Discovery is a process generally accomplished by software or solutions that scan multiple or all data sources across your enterprise. These solutions can be comprehensive Sensitive Data Loss Prevention , monitoring and privacy compliance solutions like Kogni.

An effective Data Discovery Solution gives a complete inventory of your enterprise’s data landscape. There are ways by which an enterprise can achieve it-

-The process begins with enterprises defining what they consider sensitive data. Data can be classified as sensitive:

-by evaluating the enterprise’s business needs

-by evaluating the methods used by the enterprise to gather and manage said data

-by recognizing data that fall under government or industry’s data regulations

-by deciding if, the data is stolen or misused, would it potentially damage the enterprise’s reputation and/or incur financial losses?

-Once “sensitive data” has been appropriately defined, the enterprise should then list down all the locations where sensitive data may reside across their data landscape. The list could include email, BYOD, local servers and Cloud among many other locations.

-The final step would be to decide what needs to be done with the sensitive data. This is where most basic data discovery tools fall inadequate. Most tools can pinpoint the sensitive data and their location in your enterprise and may even go as far as giving you the option to erase them if need be.

But, a comprehensive Sensitive Data Security software, such as Kogni, on the other hand, can Discover sensitive data, Classify it, Secure it, continuously Monitor for anomalies and policy violations and also accelerate Compliance with the ever-evolving data privacy regulations.

What is Data Classification?

Data Classification enables your enterprise to label the identified sensitive data with the right degree of sensitivity.

Data classification empowers enterprises to-

  • leverage the power of sensitive data better
  • protect sensitive information in a more efficient manner
  • better adhere to data regulations
  • easily search, track and retrieve sensitive information

Data can be classified under four labels, based on its degree of sensitivity-

  • Restricted- Data that is highly sensitive in nature falls under this category. This data could cause legal consequences, reputational or financial damage, and loss of credibility upon its exposure.

Social Security numbers, unpublished research data, medical certificate, criminal record certificate, credit card numbers, CVV, financial information, health data, employee records, etc. come under this category.

  • Confidential- This type of data is still sensitive and a specific team or employees of an organization are given access to it.

Data such as Identity card number, address, bank information, biometric data, etc. fall under this category.

  • Internal- Any non-sensitive enterprise data that is meant to be used within the organization.

Internal web pages, user guides, manuals, company policies, etc are some of the examples.

  • Public- Any data that is approved for the public eye and access falls under this category. Vacancy listings, calendars, company news, newsletters, etc are some examples of public data.

Why is Data Discovery and Classification vital to your enterprise?

While employees are given access to a certain sensitive database only to do their jobs, a few tend to abuse the privilege to their advantage. In 2018, a Facebook Security engineer exploited his data access to stalk women online.

Insider cybersecurity threats have been steadily increasing in the past few years. These originate from human errors, negligence or employees who already have access to sensitive data. While perimeter security is necessary, it loses its purpose when your enterprise’s biggest threats originate from within.

Effective Data Discovery and Classification tools are necessary to combat data-centric threats for the following reasons-

  • Data Discovery and Classification is the foundation for a comprehensive data-centric approach to security. An effective Data Discovery Solution enables your enterprise to discover sensitive data from anywhere across your data landscape and lets you identify which database and or data sources contains the most volume of sensitive data.

An efficient Data Classification system lets you classify your sensitive information according to predefined or custom-made tiers of sensitivity. These processes enable you to determine who has/can be given access to which sensitive database.

A comprehensive enterprise Data Discovery and Classification Solution, backed by a carefully architected AI-centric model will also study access patterns and alert you should a deviation occur.

This exercise lets you proactively identify insider data threats and stop them from turning into full-blown data breaches

  • Best-of-breed Data Discovery and Classification Solutions can help manage and accelerate your enterprise’s compliance with data privacy regulations. The size of your enterprise is immaterial to determine your compliance with such regulations given you handle or process your customers’ sensitive data.

The ever-evolving data privacy regulations like GDPR, CCPA, HIPAA, etc. require that your enterprise be aware of the volume, location, and degree of sensitivity of your sensitive data. The regulations also demand that you know who has access to which sensitive database, how the access affects the data and more.

Without a data discovery solution pointing out the whereabouts of your sensitive data and how much of it resides in each database, you may not be able to comply with mandatory privacy regulations. Compliance may also not be achievable without a data classifier informing you which data falls under the “highly sensitive data” category.

  • Marriott suffered its second major data breach in two years in March 2020. The hack compromised over 5.2 million guests’ personal information.

Marriott took more than a month to identify the breach and remediated the attack through investigation and conveyed the news to its affected customers.

The data that was exposed included


-mailing addresses,

-email addresses

-phone numbers

-company names

-gender identity and


among others.

Though the attack posed huge challenges to Marriott, the hospitality giant was able to identify the set of information that was not exposed to hackers. It reported that information such as payment card and passport details, driver’s license-related information, etc. were not compromised as part of the breach.

An effective Data Classification Solution would be able to perform the same functions in the unfortunate event of a breach. Wouldn’t you be glad to find out that the sensitive data that was compromised during a breach, was not sensitive enough to cause a disastrous effect on your enterprise?  

What should you look for in an effective Data Discovery platform?

  1. It should make data classification a significant part of the discovery process-

As discussed above, Data Classification is as important as Data Discovery. Data Discovery, when coupled with Classification, offers a holistic data security process that can keep unauthorized access and breaches at bay.

2. It should identify your sensitive information from ANY data source

A comprehensive Data Discovery Solution must have a 360-degree data identification range. This enables the solution to identify sensitive data from anywhere and any data source across your data landscape.

Our Sensitive Data Discovery Software explores different repositories including cloud, on-premise, and third-party controlled storage centers for unknown, sensitive and critical information.

Our predefined data sources include Amazon S3 bucket, Amazon RedShift, Oracle, Sybase, SQL Server, Informix, Postgre SQL, Office365,  MySQL, MongoDB, Google Drive, and many more.

3. It should offer a wide range of data classification types

When looking for a data security solution to suit your enterprise’s unique needs, consider a solution that offers data classification with predefined and user-defined classifiers.

Our carefully architected classification algorithm uses proprietary context-driven AI and Machine Learning algorithms to enable the accuracy of data discovery. Kogni’s pre-defined data types include but not limited to- Credit card, Date of Birth, IBAN, Zip Code, City, URL, Age, Phone, Country, Swift Code, Date, and over 128 other unique sensitive data patterns.

4. It should identify sensitive information within ANY data format

Is your data landscape a mix of structured, unstructured and semi-structured data? Kogni leaves no stone unturned as it can scan any of these data formats to identify sensitive data.

5. It should generate an all-encompassing sensitive data landscape dashboard on demand

Kogni’s comprehensive sensitive data dashboard provides an overview of the distribution of sensitive information across your enterprise’s data sources and types.

Explore is a one of a kind feature that allows you to perform an individual-level inspection of the number and types of sensitive information that any data source may contain.

The dashboard also summarises the categories of sensitive information. Other special features include a notification window that notifies users about important messages and a section that summarises all the categories of sensitive information.

6. It should effectively manage your enterprise’s data compliance with the continuously evolving regulations

Kogni enables you to accelerate compliance with international data regulations and industry standards with its leading Sensitive Data Discovery tool. It achieves compliance with any chosen regulation by identifying and classifying sensitive data as per the regulation.

If you are a healthcare provider, we know that you must comply with HIPAA. We also know that your patients’ medical records fall under your ‘highly sensitive’ data bucket. Kogni discovers all your HIPAA-related sensitive data regardless of their location. It then classifies the data under preset groups created by Kogni or custom groups created to suit your entity’s unique needs. It makes identifying the data location very simple at any given point of time by adding tags to your data and mapping it across users, folders, and permission. Maybe your PHI/ePHI is in a database, filesystem, No-SQL, Big Data or anywhere across your enterprise’s data landscape, Kogni helps you locate it in no time.

Minimize the risk of data leak and the cost of non-compliance with data protection and privacy regulations like HIPAA, GDPR, CCPA, and the PDP bill amongst others.

Kogni’s uncomplicated and cost-effective Data Discovery and Classification solutions are easy to implement and are a comprehensive approach to your Data Security strategy. With Kogni, you can significantly reduce the time you spend on discovering your sensitive data. Leave it to Kogni to efficiently automate your Data Discovery, Classification, Security, and Compliance!