Data Anonymization

Data Anonymization’s Critical Role in Protecting Data Privacy

Data anonymization utilizes various techniques to remove identity and sensitive data from electronic records that can identify a specific individual. This identity data is referred to as personally identifiable information (PII) and includes name, email, phone, address or other information that relates to a specific person. Sensitive data typically relates to the financial, health, political or religious beliefs of an individual. PII and sensitive data can be anonymized by removal, redaction, substitution, or randomization:

  1. Removal: PII or sensitive data fields are simply removed. This technique has limited use as many applications and reports will experience errors if data fields are missing.
  2. Redaction: Data is blurred or covered so that the original values can’t be viewed. This technique is useful for reports or application screens where PII and sensitive data is obfuscated for unauthorized users.
  3. Substitution: PII or sensitive data is substituted with similar but unrelated data. Data substitutions typically fall within specific ranges to provide realistic but anonymized data sets. This method is used frequently for anonymizing application test data.
  4. Randomization: PII or sensitive data is simply randomized. This approach supports instances where the value of the randomized data affects the use of the remaining data in an electronic record. An example is a report that details purchases by zip code. Randomizing the customer identities would not impact the analysis, as the information needed is what products are being purchased in which local markets.

Data anonymization provides privacy control for business processes such as training, software development, or customer service by eliminating the unauthorized viewing of personal information. An example would be application testers who need production data to ensure the optimal testing of new applications or features. By utilizing production data that is anonymized, testers cannot see PII and sensitive information, but they can test their new software with the best data sets available (production data).

What tools and techniques are used for data anonymization?

  1. Data masking software: Data masking allows the organization to mask data in two ways. First, dynamic masking will mask data based on a user’s role. For example, a staff member in finance may be able to see sensitive data while a customer service representative may not. Second, persistent data masking will mask copies of production data. These copies are used for software testing, analytics, and training.
  2. Format preserving encryption (FPE): FPE will use encryption to randomize data that fits the format of the original data. An example would be a zip code of 92694, using encryption FPE would turn this value into 33333; the person’s location is kept confidential and reports or analytics would operate normally as they do not error out because the data field is null. Only authorized users will have the ability to decrypt the anonymized fields.
  3. Data transformation solutions: Data transformation tools allow organizations to migrate and reformat data from one platform (such as Oracle) to a new target platform (such as Microsoft Sequel Server). Data can be masked as part of the transformation process.