Creating a Test Database using Format-Preserving Encryption

The idea of format-preserving encryption is to encrypt data while keeping its form, a sort of encryption in kind. An encrypted credit card number would look like a credit card number, a string of text would be replaced with a string of text, etc.

Format preserving encryption (FPE) is useful in creating a test or demo database. You want realistic data without having accurate data (at least for sensitive data fields), and using FPE on real data might be the way to go.

If a field is supposed to contain a 10-digit number, say a phone number, you want the test data to also contain a 10-digit number. Otherwise validation software might break. And if that number is a key that links tables together, simply substituting a random number would break the relationships unless the same random replacement was used everywhere. Also, two clear numbers could be replaced with the same randomly chosen value. FPE would be a simple way to avoid these problems.

FPE is a two-edged sword. It may be desirable to preserve formatting, but it could also cause problems. Using any form of encryption, format-preserving or not, to preserve the database structure could reveal information you don’t intend to reveal.

It’s quite possible to encrypt data and still compromise privacy. If you encrypt data, but not metadata, then you might keep enough information to re-identify individuals. For example, if you encrypt someone’s messages but retain the time stamp on the messages, that might be enough to identify that person.

The meaning of “format-preserving” can vary, and that could inadvertently leak information. What does it mean to encrypt English text in a format-preserving way? It could mean that English words are replaced with English words. If this is done simplistically, then the number of words in the clear text is revealed. If a set of English words is replaced with a variable number of English words, you’re still revealing that the original text was English.

FPE may not reveal anything that wasn’t already known. If you know that a field in a database contains a 9-digit number, then encrypting it as a 9-digit number doesn’t reveal anything per se. But it might be a problem if it reveals that two numbers are the same number.

What about errors? What happens if a field that is supposed to be a 9-digit number isn’t? Maybe it contains a non-digit character, or it contains more than 9 digits. The encryption software should report an error. But if it doesn’t, maybe the encrypted output is not a 9-digit number, revealing that there was an error in the input. Maybe that’s a problem, maybe not. Depends on context.

Related: Data privacy consulting

2 thoughts on “Format-preserving encryption (FPE) for privacy”

Tom Shrimpton

25 October 2018 at 14:42

John, I know quite a bit about FPE (and related “format-transforming encryption”, or FTE) and would be happy to talk to you about it, directly. A number of the points you raise above are easily answered, and have already been considered in the literature.
John

25 October 2018 at 15:03

I didn’t mean to imply that these possible issues are unsolved problems. Only that like any technology, FPE can’t be applied naively.

Comments are closed.