May 5, 2023
 min read

How to NOT Pass Customer PII Data to Open AI LLM?

Want to call AI APIs but don't want to share sensitive customer data?

With the breakthrough in Foundation Models (FM) and Large Language Models (LLM), developers are building apps that improve this world. Let's dive deep into why developers are building apps on FM and LLMs, what kind of sensitive data is shared with AI models like ChatGPT or GPT-4 or custom ones, why it is recommended not to pass sensitive data to AI models, and how to prevent sharing sensitive data to AI models.

Why build apps on Foundation Models (FM) and Large Language Models (LLM)?

  1. Advanced Natural Language Understanding: FM and LLM can understand and process human language at an unprecedented level. This enables developers to create applications communicating effectively with users, making the interaction more engaging and efficient.
  2. Improved Decision-Making and Prediction: These models can analyze vast amounts of data and generate insights, offering valuable predictions and recommendations. This helps developers create applications that can make better decisions, automate processes, and optimize workflows.
  3. Enhanced Creativity: FM and LLM can generate original content, such as text, music, or images, by learning from existing data. This capability enables developers to create applications that can offer creative suggestions, generate personalized content, or inspire new ideas.
  4. Time and Cost Savings: By leveraging the powerful capabilities of FM and LLM, developers can build applications that would previously require significant time and resources, saving both time and money.
  5. Scalability and Adaptability: Foundation Models can be fine-tuned and adapted to various domains and industries, enabling developers to create tailored solutions that address specific needs or problems.

What kind of sensitive data is generally passed to AI models?

Sensitive data that may be inadvertently passed to AI models can vary depending on the context and use case. Some common types of sensitive data include:

Customer sensitive data (Drivers License)
  1. Personally Identifiable Information (PII): This refers to any information that can be used to identify an individual, directly or indirectly. Examples include names, addresses, phone numbers, email addresses, Social Security numbers, and driver's license numbers.
  2. Financial Information: Data related to an individual's financial status or transactions, such as bank account numbers, credit card numbers, transaction history, and credit scores.
  3. Health Information: Medical records, health conditions, diagnoses, treatments, and other health-related data that can be linked to an individual. This information is often regulated under laws such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States.
  4. Biometric Data: Unique physical or behavioral characteristics of an individual, such as fingerprints, facial recognition data, voice patterns, or DNA sequences.
  5. Employment Information: Data related to an individual's employment history, including job titles, salary, performance evaluations, and disciplinary records.
  6. Education Information: Records related to an individual's educational background, such as transcripts, test scores, or enrollment history.
  7. Location Data: Precise geolocation information can reveal an individual's whereabouts or movements over time.
  8. Communications Data: Contents of private communications, such as emails, text messages, or instant messages, may contain sensitive information or reveal personal relationships.
  9. Online Behavior: Data related to an individual's online activities, such as browsing history, search queries, or social media profiles, which can reveal personal interests, preferences, or affiliations.
  10. Legal Information: Data related to an individual's legal history, such as criminal records, court proceedings, or background checks.

Why should developers NOT pass sensitive data to AI models?

  1. Privacy and Security Concerns: Passing sensitive data, such as Personally Identifiable Information (PII) or Protected Health Information (PHI), to AI models may expose this information to unauthorized parties, leading to privacy breaches and possible legal repercussions. Developers should comply with industry-standard compliance frameworks like PCI-DSS, HIPAA, SOC 2, ISO 27001 and data protection regulations like GDPR to avoid penalties and maintain user trust.
  2. Data Retention Policies: AI service providers may have data retention policies that dictate how long user data is stored. If sensitive information is passed to the AI model, there's a risk it could be retained and potentially exposed at a later time.
  3. Unintended Consequences: AI models may inadvertently reveal sensitive information through their generated outputs, even if the original input data is anonymized. This can lead to unintended privacy violations and potentially harm individuals or organizations.
  4. Model Training: AI models, especially LLMs, are trained on vast amounts of data, which may include sensitive information. Inadvertently including PII or PHI in the training data increases the risk of privacy breaches and may expose the model to potential biases.
  5. Ethical Considerations: Using AI models that process sensitive data can raise ethical concerns, as the potential misuse of such data can lead to discrimination, social stigmatization, or other negative consequences. Developers should consider the ethical implications of their applications and strive to create responsible AI solutions.

What should developers do NOT to pass sensitive data to AI models?

Redacted Drivers License

Developers should take several precautions to ensure they do not pass sensitive data to AI models. Here are some best practices to follow:

  1. Data Anonymization or Redaction: Remove or obfuscate any sensitive information, such as names, addresses, phone numbers, email addresses, and identification numbers from the data before passing it to the AI model. Techniques like masking, pseudonymization and generalization can anonymize the data while retaining its utility.
    1. Strac was built to secure sensitive data (PII, PHI, API Keys). It protects businesses by automatic redaction of sensitive data across all communication channels like email (Gmail, Microsoft 365), Slack, customer support tools (Zendesk, Intercom, kustomer, HelpScout, Salesforce, Service Now), cloud storage solutions (One Drive, Sharepoint, Google Drive, Box, DropBox).
    2. Strac exposes APIs to detect sensitive information or redact sensitive parts in a document
  2. Data Tokenization or Encryption: Tokenize or Encrypt data before transmitting it to the AI model to protect it from unauthorized access.
    1. Strac exposes APIs to tokenize sensitive data. Learn more on how Strac does tokenization here.
    2. Tokenized Data

With Strac, you can safely pass data to AI models without worrying about passing sensitive customer data. So, go ahead and call your favorite AI model.

Get Started

To get started, please email us at, or join our slack community or book a meeting with us.

Founder, Strac. ex-Amazon Payments Infrastructure (Widget, API, Security) Builder for 11 years.

Latest articles

Browse all