Postgres

Mask (Redact) sensitive data in PostgreSQL

Problem

One should mask sensitive data from database tables to protect the privacy and security of that data. When sensitive data, such as personal identification numbers, passwords, or credit card numbers, is stored in a database, it is vulnerable to unauthorized access or disclosure. Masking the data can help prevent this from happening by making it unreadable or unusable to anyone who does not have permission to access it.

There are several reasons why one might choose to mask sensitive data in a database:

  1. Compliance: Many industries have regulations that require the protection of sensitive data, such as HIPAA for healthcare data, PCI DSS for payment card data, and GDPR for personal data. Masking sensitive data can help ensure compliance with these regulations.
  2. Security: Masking sensitive data can help prevent unauthorized access to the data, either by malicious actors or by employees who do not have a legitimate need to access it.
  3. Privacy: Masking sensitive data can help protect the privacy of individuals whose data is stored in the database. By masking the data, you can ensure that even if the database is breached, the sensitive information will be unreadable and therefore useless to the attacker.
  4. Risk management: Masking sensitive data can help reduce the risk of data breaches or other security incidents. By limiting the amount of sensitive data that is stored in the database in its original form, you can reduce the potential impact of a security incident.

Overall, masking sensitive data from database tables is an important step in protecting the privacy and security of that data, and in ensuring compliance with industry regulations and best practices.

Solution

There are many ways to mask data in a database table:

  1. Tokenization: It is a method that replaces sensitive data with a meaningless and unique identifier called a token. For example: a credit card number "1234 5678 9012 3456" may be replaced with a token such as "tkn_T4Ngz9sLsZ", which is meaningless outside of the payment processing system.
  2. Format-Preserving Pseudonyms: Format-preserving pseudonyms are synthetic identifiers that are derived from sensitive data, but that preserve the format and length of the original data. For example: name "John Doe" can be replaced with "Charles Smith" or date of birth "12/01/1923" can be replaced with "02/13/1982"
  3. Masking: Masking would reveal only some parts of data and rest of the data will be replaced with either * or X. For example: email address "johndoe@example.com" could be masked as "*****@example.com" or "j******e@e*****.com"

Strac will connect to database instance and will mask based on the configuration that is supplied.

Let's check out an example: Below is a table that has 5 fields: user_id, name, email, company_name and phone

Database Table before any redaction

On the above table, we will apply different redaction experiences:

  1. user_id: we will keep user_id as-is. So, values of user_id will be the same after redaction
  2. name: we will generate a pseudonym, so it will be fake data that will be format preserving
  3. email: we will mask the username and keep the domain name. Note: we will not apply length preserving on user-name
  4. company_name: we will keep only the first character and mask remaining while also length preserving.
  5. phone: we will tokenize the phone number and generate a token
Database Table after Strac Redaction

Support

Please contact hello@strac.io for any questions