AI Data Security Risks & DLP for AI

TL;DR

AI data security watchdogs are becoming vigilant. If you are a business dealing with sensitive customer data in generative AI apps, you cannot escape the scrutiny of regulatory bodies.

While Generative AI presents an exciting frontier, promising to transform the way we work, the risks follow the rewards. What’s scary is no AI vendor is ready to fully disclose if they are absolutely compliant with the latest regulations and if their customers have to face data security risks later.

This blog post aims to present the challenges posed by popular AI tools and how organizations can tackle AI data security risks head-on.

Top 5 AI Tools and Their Data Security Risks

In 2022, a shocking percentage of Americans fell prey to internet scams, resulting in a loss of nearly $10.3 billion. That’s the magnitude of havoc AI automation tools with “done for you” services can wreak. If they are free, they eventually come at a higher cost: business security and potential data loss for your customers.

Let’s review the top 5 popular AI tools and their data security risks you should know.

1. ChatGPT

In a shocking report by Gizmodo, ChatGPT-4 faked visual impairment to manipulate a human into solving a CAPTCHA puzzle and bypass a security test. An alarming example is that AI tools are good at deceit too.

Here are a few more AI security risks:

Hackers can misuse ChatGPT to generate sophisticated malware codes. ChatGPT can be manipulated into writing phishing emails that appear authentic and have the potential to steal user data. ChatGPT plug-ins could be exploited to steal users' chat histories, extract personal information, and execute malicious codes on remote devices.

The chatbot’s March 20th outage exposing the payment-related and other sensitive information of 1.2% of subscribers is shocking proof of its data security loopholes.

2. Google’s Bard chatbot

When Google launched its Bard chatbot, the news fueled concerns about data security and misinformation. And the predictions came sooner than expected.

Bard presented the following risks

Bard is trained on data from the internet. Like every AI model based on text scraped from the internet, Bard is prone to picking up on gender bias, racial discrimination, and controversial/hateful messaging.
Hackers can tap into vulnerabilities to exploit Bard and its training data. For example, they can trigger backdoor attacks, where a code can be hidden in the training model to sabotage the output and steal user data.
Non-compliance with the latest regulations like GDPR

Bard is trained on data from the internet. Like every AI model based on text scraped from the internet, Bard is prone to picking up on gender bias, racial discrimination, and controversial/hateful messaging.

Hackers can tap into vulnerabilities to exploit Bard and its training data. For example, they can trigger backdoor attacks, where a code can be hidden in the training model to sabotage the output and steal user data.

Non-compliance with the latest regulations like GDPR

Must read: Secure Your Gmail from Data Loss & Unauthorized Access

3. Zendesk chatbot

Next in line are Zendesk customer chatbots. Given the volume of data flowing through Zendesk every day, the following risks are unavoidable:

App and system integrations may lead to data loss and unauthorized access unless monitored at a granular level.
Links to files and attachments can be directly downloaded without authentication in Zendesk.
Customized user interfaces can cause accidental leakage of sensitive data.

4. JIRA service desk chatbot

JIRA Align, the latest addition to the wide suite of cloud services under Atlassian, has received backlashes due to potential vulnerabilities and malware risks. Interestingly, after the vulnerabilities were addressed, the attackers could still obtain elevated privileges, extract Atlassian cloud credentials and potentially infiltrate Atlassian infrastructure.

5. Zoom AI companion

Zoom has been on the radar of regulatory bodies, mainly due to its long rap-sheet of data privacy and security concerns. Zoom AI companion, a generative AI assistant, was released to amp up productivity. However, given the company’s data collection practices in the past, customers are worried about the following:

Hidden clauses to extract personal data for training AI models
False promises of end-to-end encryption that may lead to Zoom-bombing intrusion by bad actors (similar to Zoom’s 2021 data security fiasco).
Non-adherence to data privacy regulations and misuse of “service-generated data” for training purpose

Ways DLP solutions can combat AI security risks

Despite all the drawbacks, generative AI tools are here to stay. Businesses need to deploy the best security measures to stay a few steps ahead of cybercriminals, and here’s how.

1. IP Leak Prevention

Samsung never imagined its trade secrets would be in the hands of OpenAI. The mishap occurred when Samsung employees mistakenly keyed in classified data such as source code for a new program, into ChatGPT. Now, ChatGPT retains any kind of data to train itself further. This implies that the entire world now has access to what was supposed to be the company’s confidential, proprietary data.

This raises the concern surrounding IP leakage and confidentiality when using generative AI. While companies can issue thousands of data usage policies and train employees on customer data hygiene, securing high-risk data at the source is the first step. Doesn’t matter where your data flows; masking sensitive data (e.g., your IP address) and encrypting the data in transit helps mitigate data security risks posed by AI.

The Strac Advantage:

Strac’s Data Loss Prevention (DLP) capabilities eliminate the leakage of IP data from SaaS and AI apps by scanning (discovering), classifying and remediating sensitive IP data, such as confidential documents, code, over AI websites like ChatGPT, Google Bard, Microsoft Copilot, and more. Also, Strac DLP protects LLM apps. See more: https://docs.strac.io/#operation/outboundProxyRedact

Strac Scanner: Detecting Sensitive Data Sent to ChatGPT

‍

2. PII PCI PHI Sensitive Data Leakage

Companies are worried about sensitive or confidential data being leaked to ChatGPT or any other AI site like grok, google bard.

The Strac Advantage:

Strac offers detection and remediation features like Blocking, Alerting, Redaction to protect sensitive data shared in text or files to any AI website. You can also configure custom policies on:

What data elements to redact
When to remediate
Who should be allowed access.
Create audit reports

3. Tokenize or Pseudonymize Sensitive Data and Send to AI website or LLM provider

It is common to have PII, PCI, PHI or any confidential data accidentally sent to any AI site. With Strac's Tokenization and Pseduonymization technology, Strac can automatically detect and tokenize sensitive data, insert the tokens into prompt, send the prompt containing tokens to AI websites or LLM. Strac also gives the option to toggle between tokenized data and real sensitive data if the user wants to see on the ChatGPT or any AI website. See example below.

‍

Strac Tokenization Pseudonumization where sensitive data is converted to tokens and sent to ChatGPT. Toggle option exists to switch between token and sensitive data.

DLP for ChatGPT or any AI Site

Checkout Strac DLP for ChatGPT. Also, Strac DLP for Chrome Extension that will cover ANY website

DLP for LLM API

Checkout Strac API to automatically block/redact sensitive data when sent to LLM API like OpenAI, AWS Bedrock, and more.

Discover & Protect Data on SaaS, Cloud, Generative AI

Strac provides end-to-end data loss prevention for all SaaS and Cloud apps. Integrate in under 10 minutes and experience the benefits of live DLP scanning, live redaction, and a fortified SaaS environment.

Book a Demo