Psedonymization’s Role in Data Privacy Protection and Analytics

Pseudonymous data cloak the identity of the real person. Much like anonymous data, pseudonymization changes characteristics such as the name, identity, numbers and other attributes of a person so that a person can not be correlated back to their data record.

What is the difference between anonymization and pseudonymization?

The key difference between anonymization and pseudonymization is that pseudonymization provides a methodology for the data record to be re-identified. Secret keys (hash codes) can be used to point back to the original data in case data needs to be re-identified.

An example would be a medical researcher conducting analytics on a pseudonymized data set. In the research, one of the data records indicates a likelihood of cancer. In this case, the researcher can inform the data controller of the results and the data controller could use their secret key related to the record to identify the individual so that they could be notified and proper action could be taken for medical care.

GDPR and Pseudonymization

GDPR refers to pseudonymization six times in the legislation, and defines pseudonymization as: “‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person?”.

How do organizations pseudonymize data?

  1. Custom scripts: programmers write scripts to modify data fields related to personal or sensitive data and create an encrypted index for re-identification
  2. Packaged software: numerous vendors provide capabilities via their data masking, encryption, or DLP packaged software.