October 30, 2022
5
 min read

Why Should You Tokenize Sensitive Data Like PII, Credit Card?

Learn how to de-identify or pseudonymize sensitive data

What is Tokenization?

Tokenization is the process of generating a non-sensitive identifier for a given sensitive data element. That non-sensitive identifier is called a Token. Think of a Token as a random UUID.

Tokenization
Tokenization

A Token does not have any intrinsic or exploitable meaning or value. In layman's terms, that means: If someone steals a Token, no harm can be done because the Token in and of itself is meaningless. It is just a reference to the sensitive data.

Tokenization is the technical solution for De-identification & Pseudonymization. De-identification is the process used to prevent someone's personal identity from being revealed. Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms.

Old World (without Tokenization)

Let's look at how the world would look without Tokenization. Let's start with how we would store it in our database.

Database with sensitive fields
Database with sensitive fields

Although the sensitive fields would be encrypted at rest using the Database server's encryption key, for anyone to access the data, they would get to see data in plain text, aka raw form.

At a high level, there are four use cases for how data is stored/retrieved in/from the database:

  1. Collect Data from Customers
  2. Display Data to Customers or Authorized Users
  3. Send Data to Third Party Partners
  4. Perform Analytics, aka Database Queries

Multiple services within your cloud touch this sensitive data to perform these four broad use cases. These services can be broadly categorized into Application Servers, Networks, Log Files and Databases. Internal employees will consume these services.

Take a step back on how many such services will touch this sensitive data. Think of all the security risks introduced by each service. Security Risks or Vulnerabilities at the application server (compute), database server (storage), Identity & Access Management (IAM), Network, Internet Access, or humans themselves!

Here is the simplest cloud server farm of a small company. Think of how this grows exponentially - so much so that no single person knows the architecture of your entire company over time, much less the sensitive data flowing through this network of servers.

current-state-cloud-server-farm
current-state-cloud-server-farm

New World (with Tokenization)

The idea of Tokenization is compelling because storage and compute services never deal with sensitive data. Sensitive data is tokenized, and the basic plain-text version is isolated to a Tokenization service. Only that system can perform actions on sensitive data.

Database with tokenized fields
Database with tokenized fields

Let's talk about the four broad use cases we discussed earlier and how they will be achieved in this new world with Tokenization.

Collect Data from Customers

It all starts with a simple HTML form and some basic JavaScript to collect any data. The same goes for even sensitive data. In the old world, sensitive data goes from the browser/app to a server API endpoint and gets passed around to multiple services until it hits the service that is the single source of truth. All those services touch sensitive data when they don't need to touch it; thereby increasing the security and compliance risk burden of the company

In the new world, the sensitive data is accepted via input fields that are part of an iFrame. The Tokenization provider hosts this iFrame. For example: As Strac is the Tokenization provider, Strac provides UI Components. With Strac's UI Components, the parent page can never access sensitive data; therefore, sensitive data will never touch the business' server. Strac will tokenize the sensitive data and return those tokens to the UI application.

Strac UI Components (Widgets) to collect sensitive data
Strac UI Components (Widgets) to collect sensitive data

‎‎Display Data to Customers or Authorized Users

In the PII/PHI world, displaying data collected from customers/patients is pretty standard. For example: showing the last 4 of the SSN or Date of Birth or as simple as first/last name. Since the sensitive data is tokenized with a Tokenization provider like Strac, the above Strac UI Components also take care of displaying sensitive data securely.

Send Data to Third Party Partners

Since business application databases have tokens to send data to third-party partners, use Strac Interceptor API

curl --location --request <your verb> 'https://api.strac.io/proxy' \
    --header 'X-Api-Key: <your API key>' \
    --header 'Content-Type: application/json' \
    --header 'Target-Url: <your third party endpoint>' \
    --data-raw '{
        "tin": "tkn_lT8RtnYLfpmfecvAfWqzlMnO"
    }'

Strac Interceptor API to send sensitive data to any server
Strac Interceptor API to send sensitive data to any server

Perform Analytics aka Database Queries

Performing queries against sensitive data like string equality on date of birth or zip code of an address or any operation on sensitive data is super common! With Strac tokens, you can still perform database queries by leveraging Strac APIs.

Real World Scenarios

1. Collect Bank Account/Credit Card Numbers on the website/app to charge customers

Online businesses have to charge customers using a credit card as it is the most common form of payment. To accept credit card data, the online business has to achieve PCI Compliance. PCI Compliance forces you to have a tokenization system that only deals with credit cards so that the rest of your cloud server farm does not even touch credit cards.


2. Collect PII Data (SSN, DoB) to perform Identity Verification

Identity Verification is mandatory in almost all financial and health-related businesses - whether to perform a background check, fraud check, patient look up or even to do taxes.


3. Redact sensitive documents (Passport, Driver's License) and analyze demographics

Targeted marketing allows businesses to tailor and personalize online advertisements. Businesses can extract anonymized customer information (e.g., area of residence, ethnicity, gender, age group) from identity documents and perform analytics without handling PII on your servers.

Founder. YC W22. 11 years at Amazon building Payments Infrastructure (Widget, API, Security).

Latest articles

Browse all