SHARE
Facebook X Pinterest WhatsApp

Ten Best Practices for Securing Sensitive Data in Hadoop

Dataguise, a leading innovator of data security intelligence and protection solutions, recently released 10 security best practices for organizations considering or implementing Hadoop. By following these procedures to manage privacy risk, data management and security, professionals can prevent costly exposure of sensitive data, reduce their risk profile and better adhere to compliance mandates. With Hadoop […]

Written By
thumbnail
ITBE Staff
ITBE Staff
Apr 8, 2013

Dataguise, a leading innovator of data security intelligence and protection solutions, recently released 10 security best practices for organizations considering or implementing Hadoop. By following these procedures to manage privacy risk, data management and security, professionals can prevent costly exposure of sensitive data, reduce their risk profile and better adhere to compliance mandates. With Hadoop security deployments among the Fortune 200, Dataguise has developed these practices and procedures from significant experience in securing these large and diverse environments.

The explosion in information technology tools and capabilities has enabled advanced analytics using Big Data. However, the benefits of this new technology area are often coupled with data privacy issues. In these large information repositories, personally identifiable information (PII), such as names, addresses and social security numbers may exist. Financial data such as credit card and account numbers might also be found in large volumes across these environments and pose serious concerns related to access. Through careful planning, testing, pre-production preparation and the appropriate use of technology, much of these concerns can be alleviated.

The following 10 Hadoop security best practices provide valuable guidance throughout Hadoop project implementations, but are especially important in the early planning stages.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 1

Click through for 10 security best practices for organizations considering or implementing Hadoop, as identified by Dataguise.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 2

Determine the data privacy protection strategy during the planning phase of a deployment, preferably before moving any data into Hadoop. This will prevent the possibility of damaging compliance exposure for the company and avoid unpredictability in the roll out schedule.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 3

Identify what data elements are defined as sensitive within your organization. Consider company privacy policies, pertinent industry regulations and governmental regulations.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 4

Discover whether sensitive data is embedded in the environment, assembled or will be assembled in Hadoop.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 5

Determine the compliance exposure risk based on the information collected.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 6

Determine whether business analytic needs require access to real data or if desensitized data can be used. Then, choose the right remediation technique (masking or encryption). If in doubt, remember that masking provides the most secure remediation while encryption provides the most flexibility, should future needs evolve.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 7

Ensure the data protection solutions under consideration support both masking and encryption remediation techniques, especially if the goal is to keep both masked and unmasked versions of sensitive data in separate Hadoop directories.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 8

Ensure the data protection technology used implements consistent masking across all data files (Joe becomes Dave in all files) to preserve the accuracy of data analysis across every data aggregation dimensions.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 9

Determine whether a tailored protection for specific data sets is required and consider dividing Hadoop directories into smaller groups where security can be managed as a unit.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 10

Ensure the selected encryption solution interoperates with the company’s access control technology and that both allow users with different credentials to have the appropriate, selective access to data in the Hadoop cluster.

Ten Best Practices for Securing Sensitive Data in Hadoop - slide 11

Ensure that when encryption is required, the proper technology (Java, Pig, etc.) is deployed to allow for seamless decryption and ensure expedited access to data.

Recommended for you...

How Revolutionary Are Meta’s AI Efforts?
Kashyap Vyas
Aug 8, 2022
Data Lake Strategy Options: From Self-Service to Full-Service
Chad Kime
Aug 8, 2022
What’s New With Google Vertex AI?
Kashyap Vyas
Jul 26, 2022
Data Lake vs. Data Warehouse: What’s the Difference?
Aminu Abdullahi
Jul 25, 2022
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.