The EU’s cybersecurity agency has called for further research into the use of pseudonymization to help bolster data protection measures in the healthcare sector.
Pseudonymization de-associates a data subject’s identity from their personal data by replacing personal identifiers with pseudonyms, or fictitious names.
The European Union Agency for Cybersecurity (ENISA) has urged researchers, regulators, and application developers to play their part in improving pseudonymization techniques and best practices amid evolving medical technologies, a ballooning attack surface, and soaring numbers of cyber-attacks.
“This is not only relevant to the choice of the technique itself but also to the overall design of the pseudonymization process including, especially, the protection of the additional information,” says ENISA in a new report that considers healthcare use cases of pseudonymization techniques.
If attackers separately obtain this “additional information” they could potentially correlate breached, pseudonymized data with specific individuals, meaning pseudonymized data still falls under the ambit of the General Data Protection Regulation (GDPR).
But combined with other security controls, such as encryption, pseudonymization can provide significant reassurance to data subjects, says ENISA.
‘Detective work’
Andrew Patel, a researcher for WithSecure – formerly F-Secure Business – at its Artificial Intelligence Centre of Excellence, told The Daily Swig that attackers should not be able to infer a data subject’s identity if a dataset with pseudonymized fields is “stored separately from any data or methods that would allow those fields to be reversed back to their original data”.
But, he added: “This, of course, depends upon which fields from the original data that was pseudonymized, and whether other fields, which weren’t pseudonymized, but that may allow an attacker to infer or guess pseudonymized fields, are present.
“As an example, medical data would naturally omit or pseudonymize fields that can be used to infer who the patient is (name, social security number, address, telephone number, patient number, etc). However, if patient visit dates and locations are not anonymized or removed from data, it may be possible to figure out the patient’s identity using a bit of detective work”.
‘Very complex process’
The most common methods for generating pseudonyms include counter, random number, encryption, hash function, and hash-based message authentication code (HMAC) techniques.
“Different solutions might provide equally good results in specific scenarios, depending on the requirements in terms of protection, utility, scalability, etc,” said ENISA.
“Starting from a plain token, pseudonymization can be a ‘simple’ option to adopt, but it can also be comprised of a very complex process both at technical and at organisational levels.”
As such, ENISA has emphasized the importance of defining clear objectives before crafting pseudonymization policies based on variables such as regulations, speed, simplicity, predictability, and budgets.
ENISA recommends that clinical trials, which typically gather sensitive information such as age, gender, and home address, pseudonymize participants’ main identifying data and use multiple pseudonyms for each identifying data for various clinical parameters.
“Such an approach could limit the personal data related to each pseudonym that, paired with a robustness that can be enforced using a solid hashing function like SHA-2 with a random seed value […] would make re-identification even more difficult.”
The report also considers use cases for exchanging patient data between departments and service providers, as well as when patients’ own health-monitoring devices pseudonymize data ready for transit to doctors.
WithSecure’s Patel warns that pseudonymization, however sophisticated the technique, cannot inoculate organizations from the impact of data breaches.
“If an attacker has reached the point that they’ve acquired pseudonymized data from an organization’s internal systems, it is likely they’ll also have access to other systems (and data), and thus may be able to reverse engineer the original fields regardless of safeguards put in place.”