Hashing Isn’t a Magic Cloak: Why Data Remains Unmasked
Hashing has been a long-time favourite tool in the data analysis inventory for companies of all shapes and sizes. People have relied on its magical capabilities to convert any piece of personal data into an anonymous state with the simple press of a button. However, the reality is that hashed data does not equate to anonymized data.
What is Hashing?
Before we jump into why we shouldn’t assume anonymity when it comes to hashing, let’s first debrief on what hashing is. Hashing involves converting data (like email addresses, phone numbers, or user IDs) into fixed-length strings of characters (hashes). To better understand hashing, we can look at how passwords are stored. When you sign up for a website or app, they don’t store your password in plain text (that would be like leaving your house key under the welcome mat). Instead, the company will hash your password*. So, your chosen password “P@ssw0rd” when hashed, becomes, “161ebd7d45089b3446ee4e0d86dbcf92”.
Hashes may seem like random gibberish, but they’re consistently created from the same input data. And—here’s the twist—they can eventually be reversed to reveal the original information, or someone can use a dictionary of hashes known as a “Rainbow Table” to match hashes to known values. Data is only anonymous when it can never be associated back to a person.
A good tip to keep in mind when assessing if your data is anonymized or pseudonymized is to look for irreversibility.
Anonymization: True anonymization involves irreversible changes. Once data is anonymized, it can never be linked back to an individual and identify them (a real magic cloak).
Pseudonymization: Pseudonymized data, on the other hand, retains a reversible link and acts more like a mask – you can unmask or re-identify an individual if you hold the right clue (like the original data). This is the category hashed data falls into.
Remember, the choice between these tools depends on your specific use case and privacy needs!
Why Companies Hash?
Hashing different types of data means a company can hold onto datasets and use them for maintaining data integrity or sharing with third parties. Often, some companies will simply repurpose hashed data for something other than its original intent under the guise that it is okay because the personal information has been “anonymized”. This practice is a slippery slope, and companies need to remember that hashing is not a way to magically turn personal information into anonymous data to circumvent privacy legislation on a technicality. Hashing is not a magic cloak, and the Federal Trade Commission (FTC) is reminding people that this practice will not hold up or be tolerated without any consequences.
The Flawed Logic of Anonymity
Hashing doesn’t equal anonymity. Why?
1. Unique Signatures
Even though a hash might look completely random, it’s still a unique signature. If you hash the same input data twice, you’ll get the same hash. This kind of traceability and consistency allows tracking. Whether it’s a phone number or a device ID, the trail remains.
2. Contextual Information
Even if the data is hashed, additional contextual information can sometimes be used to infer the original data. For example, if hashed email addresses are used in conjunction with other identifiable information, it may be possible to deduce the original email addresses.
Real-World Examples
Over the years there have been various instances where companies mistakenly relied on hashing for anonymity.
In 2023, we witnessed the BetterHelp case, where the FTC charged the online therapy provider with sharing sensitive consumer data, including email addresses, IP addresses, and answers to personal health questions, with advertisers like Facebook and Snapchat who then targeted ads to users of BetterHelp. Although BetterHelp claimed that the data was anonymized through hashing, the FTC argued that the hashed data still allowed for re-identification. This led to a $7.8 million settlement in 2023, with BetterHelp agreeing to provide refunds to affected customers.
In the 2015 Nomi case, the FTC took action against Nomi Technologies for hashing Medium Access Control (MAC) addresses, which are used for device identification. Although Nomi claimed this process anonymized the data, the FTC argued that hashing created a persistent unique identifier, not true anonymity. This meant that the data could still be linked back to individual devices, leading to allegations of unfair or deceptive trade practices.
So, what are the privacy implications you should consider?
The FTC has been waving its “Hashed data is not anonymous data” banner since 2012 when they first spoke out on the privacy implications of this practice. But since then, the FTC has been cracking down on many companies who are misusing data and punishing those who continue to work with hashed data as if it is anonymous. This is likely why they have recently come out of the woodwork to once again remind the public of the privacy implications of hashing data and passing it off as being fully anonymous.
There are lots of potential harms that can take place when hashed data is misused or assumed to be anonymous. One of the biggest risks is re-identification which can compromise a data subject’s right to privacy. Another significant risk is deceptive claims. Companies claiming “hashed data is anonymous” mislead users into thinking their data is concealed thereby breaking consumer trust and implicating business longevity. Transparency is key and it is important to remain honest with data subjects about how you are using and processing their data. A third risk is regulatory noncompliance. Under the GDPR, hashed data is considered pseudonymized, not fully anonymized. This means it still falls under data protection laws and requires appropriate security measures. Finally, hashing data can foster a false sense of security for a company in situations where they are sharing data with a third party believing that is anonymous but is not. That third party can put the company at risk by re-identifying the individual and misusing the data.
Putting Thought into Practice
There is a lot to take away from the FTC’s caveat about hashed data. However, companies shouldn’t think that now they should stop hashing data entirely. Hashing data is still a good tool to help build data and privacy security, but companies should not fully rely on it or continue telling themselves and their consumers that it anonymizes data. Hashing data can still be useful for password protection and storage or sustaining digital signatures, but the key factor is transparency and using additional security measures to supplement hashing.
The cautions expressed by the FTC are not new. This organization has been a long-time advocate of educating individuals on the inaccurate information going around about hashing and continues to do so. The FTC recently reiterating their stance on hashing could suggest that companies should be more vigilant when using hashed data to avoid being penalized down the line. Hashing isn’t a cloak of invisibility. It’s a tool, not a solution; and true anonymity requires more robust techniques.
*An important footnote: hashing passwords used in this example is not completely indicative of a “real world” scenario. Given the sensitivity of passwords, additional measures may be implemented along with hashing, such as adding additional text known only to the person doing the hashing (known as “salting”) beforehand, to ensure that the hashes are unique to the company. The threat remains though that someone who knows the salt can still reidentify the information.