What is Anonymisation?
Anonymisation is currently an unclear standard of de-identification that is to be determined by the Data Protection Authority of India (to be established under the PDP Bill once it is enacted). De-identification is a process by which identifiers that help in attributing data to an individual are removed so that the data is delinked from the individual.
The PDP Bill defines anonymisation as the “irreversible process of transforming or converting personal data to a form in which a data principal cannot be identified, which meets the standards of irreversibility specified by the Authority.” Even though the PDP Bill is yet to be enacted, the characterisation of the process as irreversible indicates that the standard must be fairly high. To be clear, there have been studies which show that personal data can never be truly irreversibly anonymised.
In order to better understand the process of de-identification, let us consider one of the techniques that have been mentioned in the Gopalakrishnan Committee Report; say K-anonymity. K-anonymity helps in preventing attempts to link the data to a particular person by generalising existing attributes.
Let us assume that a digital contact tracing app collects some personal information at the time of registration. This could include identifiers such as name, city, health condition and gender, as represented in table 1 below:
Table 1
Date of Birth |
Name |
City |
COVID Status |
Gender |
01.01.1967 |
Alisha |
Mumbai |
COVID-19 Positive |
Female |
04.04.1976 |
Ankit |
New Delhi |
COVID-19 Negative |
Male |
Table 2 generalises and de-identifies the information collected by the app as represented earlier in table 1 to illustrate the process of k-anonymity. If we look at the two tables together closely and compare them, the names of the individuals and their exact date of births have been omitted to attain some degree of generalisation. Only their year of birth, city, gender and COVID status is accessible now:
Table 2
Date of Birth |
Name |
City |
COVID Status |
Gender |
XX.XX.1967 |
Patient 1 |
Mumbai |
COVID-19 Positive |
Female |
XX.XX.1976 |
Patient 2 |
New Delhi |
COVID-19 Negative |
Male |
To some (albeit a limited) extent, therefore, the information in table 1 has been de-identified in table 2. Does this mean that the data (as represented in table 2) has really been anonymised? |