Toxic combinations of data can re-identify or reveal new insights about an individual.
While many organizations will mask the identities of customers, consumers, or patients for analytic projects, combinations of data elements may lead to re-identifying an individual. Such combinations of data elements that individually remain de-identified but can be combined to identify an individual are considered toxic combinations. This is often the case with data lakes that take in a diverse mix of data sources and data source types such as structured, unstructured and semi-structured data. Data in-motion can also be a blind spot for many companies, given most organizations don’t know what data is entering and leaving their organization every day.
Toxic combinations of data are the unintentional combination of data elements that can lead to unauthorized re-identification of individuals. An example would be a dataset that provides date of birth, zip code, and gender of an individual. Based on this information, ~90% of the US population can be identified. The rates may be lower for de-identified health or legal data, but organizations must exercise due diligence and due care to ensure they protect the privacy of individuals whose data is used for analytics.
What are some examples of toxic combinations of personal data?
- Age, sex and zip code are used to re-identify a person.
- Location is used to identify travel and personal interests.
- Diagnostic codes, age, sex and zip code reveal a person’s health status.
- Browsing activity and social activity reveal political and religious beliefs.