Calendar Icon White
October 30, 2022
Clock Icon
5
 min read

Data Tokenization : Protect PII, PHI & Credit Card Data

Explore the power of data tokenization in enhancing security across digital platforms. Dive into its benefits for SaaS, cloud, and AI enterprise applications

 Data Tokenization : Protect PII, PHI & Credit Card Data

TL;DR

What is Tokenization?

Data Tokenization is the process of generating a non-sensitive identifier for a given sensitive data element. That non-sensitive identifier is called a Token. Think of a Token as a random UUID.

Diagram showing process of Tokenisation of Data

A Token does not have any intrinsic or exploitable meaning or value. In layman's terms, that means: If someone steals a Token, no harm can be done because the Token in and of itself is meaningless. It is just a reference to the sensitive data.

Data Tokenization is the technical solution for De-identification & Pseudonymization. De-identification is the process used to prevent someone's personal identity from being revealed. Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms.

What is Data Tokenization in Data Security?

Let's look at how the world would look without data tokenization. Let's start with how we would store it in our database.

Data Tokenisation Table

Although the sensitive fields would be encrypted at rest using the Database server's encryption key, for anyone to access the data, they would get to see data in plain text, aka raw form.

At a high level, there are four use cases for how data is stored/retrieved in/from the database:

  1. Collect Data from Customers
  2. Display Data to Customers or Authorized Users
  3. Send Data to Third Party Partners
  4. Perform Analytics, aka Database Queries

Multiple services within your cloud touch this sensitive data to perform these four broad use cases. These services can be broadly categorized into Application Servers, Networks, Log Files and Databases. Internal employees will consume these services.

Take a step back on how many such services will touch this sensitive data. Think of all the security risks introduced by each service. Security Risks or Vulnerabilities at the application server (compute), database server (storage), Identity & Access Management (IAM), Network, Internet Access, or humans themselves!

Here is the simplest cloud server farm of a small company. Think of how this grows exponentially - so much so that no single person knows the architecture of your entire company over time, much less the sensitive data flowing through this network of servers.

Diagram of Cloud Server Farm of a Small Company
cCurrent-state-cloud-server-farm

How Data Tokenization works?

The idea of data Tokenization is compelling because storage and compute services never deal with sensitive data. Sensitive data is tokenized, and the basic plain-text version is isolated to a Tokenization service. Only that system can perform actions on sensitive data.

Table showing Tokenisation

Let's talk about the four broad use cases we discussed earlier and how they will be achieved in this new world with Tokenization.

1. Collect Data from Customers

It all starts with a simple HTML form and some basic JavaScript to collect any data. The same goes for even sensitive data. In the old world, sensitive data goes from the browser/app to a server API endpoint and gets passed around to multiple services until it hits the service that is the single source of truth. All those services touch sensitive data when they don't need to touch it; thereby increasing the security and compliance risk burden of the company

In the new world, the sensitive data is accepted via input fields that are part of an iFrame. The Tokenization provider hosts this iFrame. For example: As Strac is the data Tokenization provider, Strac provides UI Components. With Strac's UI Components, the parent page can never access sensitive data; therefore, sensitive data will never touch the business' server. Strac will tokenize the sensitive data and return those tokens to the UI application.

Strac UI Components (Widgets) to collect sensitive data
Strac UI Components (Widgets) to collect sensitive data

2. ‎‎Display Data to Customers or Authorized Users

In the PII/PHI world, displaying data collected from customers/patients is pretty standard. For example: showing the last 4 of the SSN or Date of Birth or as simple as first/last name. Since the sensitive data is tokenized with a Tokenization provider like Strac, the above Strac UI Components also take care of displaying sensitive data securely.

3. Send Data to Third Party Partners

Since business application databases have tokens to send data to third-party partners, use Strac Interceptor API

curl --location --request <your verb> 'https://api.strac.io/proxy' \
    --header 'X-Api-Key: <your API key>' \
    --header 'Content-Type: application/json' \
    --header 'Target-Url: <your third party endpoint>' \
    --data-raw '{
        "tin": "tkn_lT8RtnYLfpmfecvAfWqzlMnO"
    }'

Banner showing Strac Interceptor API
Strac Interceptor API to send sensitive data to any server

4. Perform Analytics aka Database Queries

Performing queries against sensitive data like string equality on date of birth or zip code of an address or any operation on sensitive data is super common! With Strac tokens, you can still perform database queries by leveraging Strac APIs.

Data Tokenization Vs Encryption

In the realm of data security, both tokenization and encryption play pivotal roles. Understanding the differences between them is crucial for determining which tool is best suited for a particular application.

Encryption is a process wherein data is converted into a coded form to prevent unauthorized access. By using cryptographic keys, original data (plaintext) is transformed into encrypted data (ciphertext). Only those possessing the appropriate decryption key can convert the ciphertext back to its original form. As powerful as encryption is, it's not without vulnerabilities. Encrypted data can still be decrypted if the encryption keys are compromised. Moreover, encryption is computationally intensive, which might not be ideal for certain real-time applications.

On the other hand, Tokenization replaces sensitive data with a non-sensitive equivalent, called a token. These tokens typically don’t have any inherent value and cannot be mathematically reverse-engineered back to the original data. Tokenization doesn't rely on cryptographic keys, which means there’s no key to be compromised. It's frequently used in payment processing systems where credit card numbers are replaced with tokens. While the original data is stored in a secure vault, the token, which is meaningless outside its specific context, can be used for processing without risking exposure of the sensitive data.

In comparing the two:

  1. Purpose: While both aim for data security, encryption masks the data, and tokenization substitutes the data.
  2. Reversibility: Encrypted data can be decrypted using the correct key, whereas tokenized data can't be transformed back without a reference to the tokenization system.
  3. Key dependency: Encryption relies on cryptographic keys which, if compromised, can expose the data. Tokenization doesn’t have this vulnerability.

Key Benefits of Data Tokenization

Data tokenization has emerged as a robust strategy for enhancing data security, particularly in SaaS, cloud, and AI-driven enterprises. Here are the key benefits of data tokenization

1. Enhanced Data Security

By replacing sensitive data with non-sensitive tokens, enterprises minimize the risk of data exposure. Even if tokens are leaked, they hold no intrinsic value, ensuring the original data remains protected.

2. Reduced Compliance Burden

Tokenization can limit the scope of compliance audits, particularly in industries with strict regulations on data storage and transmission. For example, tokenizing credit card details can alleviate certain PCI DSS requirements.

3. Versatility Across Platforms

Tokenization offers consistent data security, whether it's integrated into SaaS solutions, cloud platforms, or AI-driven tools. This ensures a uniform security layer across diverse digital environments.

4. Data Integrity and Accuracy

Even though sensitive information is masked, tokenization maintains the original data's format and structure. This preservation is crucial for accurate AI analytics and model training.

5. Cost Savings

By mitigating data breach risks and narrowing the scope of compliance, enterprises can realize significant financial benefits. Additionally, the potential fallout and reputational damage from data breaches are curtailed.

Enhanced User Trust

In an era where data breaches are frequent news, leveraging tokenization can bolster an enterprise's reputation. Assuring clients and customers that their sensitive data is tokenized can build stronger trust and loyalty.

By prioritizing these benefits, enterprises can better navigate the complexities of the digital landscape, ensuring both operational excellence and robust data security.

Real World Data Tokenization Examples

1. Credit Card Tokenization & PCI DSS Compliance

Online businesses have to charge customers using a credit card as it is the most common form of payment. To accept credit card data, the online business has to achieve PCI Compliance.

Payment Card Industry Security Standards(PCI DSS )Compliance forces you to have a tokenization system so that the rest of your cloud (application server) farm does not even touch credit cards.

2. Tokenization of PII (Personal Identifiable Information) data

Identity Verification is mandatory in almost all financial and health-related businesses - whether to perform a background check, fraud check, patient look up or even to do taxes.

3. Tokenization of Sensitive Documents (Passport, Driver's License) to Analyze Demographics

Targeted marketing allows businesses to tailor and personalize online advertisements. Businesses can extract anonymized customer information (e.g., area of residence, ethnicity, gender, age group) from identity documents and perform analytics without handling PII on your servers. To learn more on how to redact sensitive documents, please checkout this blog post.

Strac DLP - A robust Data Tokenization Solution

Strac offers a quick and easy solution to ensure your organization has the right compliance measures in place for audits. Our DLP solution helps you meet compliance requirements efficiently by automating daily tasks and streamlining data protection processes. 

With Strac's redaction experience, you can easily block sensitive customer data such as,

  • PII (Personally Identifiable Information)
  • PHI (Protected Health Information)
  • PCI (Payment Card Industry) information. 

This ensures that your organization remains compliant while keeping sensitive data secure. Strac's audit reports give 100% visibility and control over data, providing detailed insights into your data usage, allowing you to monitor and manage it effectively.

Sensitive data detection, classification and Redaction
Schedule a 30-Minute Demo to Seamlessly Integrate Strac DLP

Explore more sensitive data protection:

Founder, Strac. ex-Amazon Payments Infrastructure (Widget, API, Security) Builder for 11 years.

Latest articles

Browse all