What Is a Data Classification Policy?
Your organization's databases contain data with different levels of confidentiality—some data is more sensitive than others. Data classification policies can assist in enterprise information management, ensuring that sensitive information is properly handled to mitigate potential threats.
Data classification policies make it possible to ensure information is visible or accessible only to those who need to, and are authorized, to use it. It also takes into account how the data collected is used and structured within the organization, allowing authorized personnel to obtain the right information at the right time.
Leveraging the AWS Cloud for Data Classification
Customers can secure workloads by operating in the cloud—whether they are in tightly regulated industries, small-medium sized business, or the public sector—to adhere to data classification regulations and policies. Many organizations are migrating workloads to the cloud, and must ensure that data is classified so it can be properly managed and secured.
Cloud service providers, including AWS, offer a utility-based, standardized service which is self-provisioned by users. Cloud providers don’t have a way to view the sort of data run by customers in the cloud—this means that they don’t differentiate between, for example, personal information and other customer information when offering cloud services.
Customers bear the responsibility for classifying information and putting in place the correct controls within the cloud environment (for example via encryption). Nevertheless, the security practices the providers implement in their infrastructure and the services they provide may be employed by customers to address their most sensitive information requirements.
AWS Services Supporting Data Classification
AWS provides various features and services that can help with an organization’s use of a data classification process.
For instance, Amazon Macie can assist with customer inventory and classify business-critical and sensitive data retained in AWS. Amazon Macie makes use of machine learning to automate functions such as classifying, discovering and applying protection rules to data. This gives customers insight into where sensitive data is retained and the way it is accessed, such as access patterns and authentications.
Additional AWS features and services that can support information classification include:
- Identity Access Management (IAM)—for dealing with setting permissions, user credentials, authorizing access, and several types of authentication.
- AWS GuardDuty—for threat detection supports ongoing monitoring demands.
- AWS Glue—for retaining data and identifying connected metadata such as schema and table definitions via the data catalog. After it has been cataloged, your information can be searched and is available for ETL.
- Amazon Neptune—an entirely managed graph database that provides customers with information about the connection between different sets of data. This might include traceability and identification of sensitive data via metadata analysis.
How to Implement a Classification Policy Using AWS Tools
Use the following process to implement your data classification policy using Amazon services.
1. Identify the Data Within Your AWS Workload
Your organization is in control over the types of data each cloud workload processes. Map your data and collect the following details:
- Role of the data in business processes
- Who owns the data
- Legal and industry compliance requirements affecting the data, including personally identifiable information (PII) and protected health information (PHI)
- Where the data is stored
- Whether the data should be publicly available, or is only for internal use
- Whether the data contains intellectual property or other sensitive business information
2. Define Data Protection Controls
Resource tags allow you to classify entire AWS accounts by sensitivity. You can then use IAM policies, the Amazon Key Management Service (KMS), and AWS CloudHSM (a hardware system that generates encryption keys, to implement data classification and protection strategies.
For example, if your project contains an S3 bucket with very sensitive data or an EC2 instance that handles sensitive data, you can mark it with a tag (News - Alert) that is only meaningful to the project team. You can then use this tag internally as an attribute affecting access control. At the same time, unauthorized individuals or attackers will not know the tag indicates sensitive or valuable data.
3. Define Data Lifecycle Management
Define a lifecycle strategy for the data, based on sensitivity levels, organizational processes, and legal requirements. Aspects such as data retention, data disposal, data access control, data transformation, and data sharing must be considered.
When choosing how to classify your data, balance ease of use and accessibility. Ensure that at every level of sensitivity, there are security controls that are effective but still easily usable. Use defense-in-depth methods to reduce employee access to data and leverage mechanisms that can automatically transform, delete, or copy data at different stages of the life cycle.
4. Automate Identification and Classification
Automated data identification and classification can help you implement the appropriate data access controls. Using automation instead of direct manual access can minimize risks associated with human error and exposure.
You can use a tool like Macie to automatically assess data—Macie automatically detects, classifies, and protects sensitive data using machine learning. It accurately recognizes sensitive data like PII and intellectual property, and provides visibility over access and transmission of the data via control panels and alerts.
In this article, I explained the basics of data classification policies, and how to use Amazon tools to implement a classification policy for your cloud workloads. I reviewed the use of Amazon services including:
- Amazon Macie—provides automated data classification based on machine learning algorithms
- Identity and Access Management (IAM)—lets you specify permissions and authorization for data access depending on sensitivity level
- AWS GuardDuty—lets you monitor sensitive datasets and identify threats
- AWS Glue—allows you to manage metadata to make data easily searchable, and perform ETL to automate data processing workflows
Finally, I explained how to implement data classification in four steps: mapping out and identifying your datasets, defining data protection controls, defining data lifecycle management, and automating identification and classification.
I hope this will be helpful as you build your automated data classification process in the Amazon cloud.
Author Bio: Gilad David Maayan
Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Imperva, Samsung (News - Alert) NEXT, NetApp and Ixia, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership. Today he heads Agile SEO, the leading marketing agency in the technology industry.