Snowflake DSPM (Data Discovery)

Snowflake DSPM: Data Discovery & Remediation Guide

ChatGPT
Perplexity
Grok
Google AI
Claude
Summarize and analyze this article with:

Snowflake has become the world’s data lake—transaction logs from 2020, raw JSON dumps in internal stages, "Zero-Copy Clones" of production databases created for testing, and unmasked views containing PII... all sitting somewhere in your account with unknown access.

And here’s the uncomfortable truth:

You can’t protect what you can’t see.

That’s exactly where Snowflake DSPM (Data Discovery) comes in.

This is the guide you wish existed years ago—tactical, real-world, and written specifically for the Snowflake Data Cloud environment.

TL;DR

  • Snowflake DSPM (Data Discovery) gives full visibility into sensitive data across all Snowflake assets—not just active tables, but internal stages, time-travel history, and cloned schemas.
  • Most risk comes from "Shadow Data"—Zero-Copy Clones created for dev/test and forgotten, or raw files (CSV/JSON) sitting in stages.
  • DSPM identifies what data exists (e.g., SSNs in a VARIANT column), where it lives, who (Roles/Users) has access, and how exposed it is.
  • Remediation includes applying Dynamic Masking, setting Row Access Policies, and revoking dangerous grants like PUBLIC.
  • DSPM is critical before enabling AI services like Snowflake Cortex or Copilot.
  • Strac provides automated scanning, risk scoring, RBAC mapping, and bulk remediation for Snowflake.

What Is Snowflake DSPM (Data Discovery)?

Snowflake DSPM (Data Discovery) is the process of:

  1. Discovering all sensitive data stored in Snowflake (Tables, Views, Internal Stages, Streams, Clones).
  2. Classifying it (PII, PHI, PCI, API Keys, Secrets in JSON/XML columns).
  3. Mapping access (Roles with SELECT privileges, grants to PUBLIC, Service Accounts).
  4. Assessing risk (Unmasked PII in dev environments, "Time Travel" retention of deleted secrets).
  5. Remediating exposure (Dynamic Masking, Tagging, Row-Level Security, Deletion).

In short: DSPM = Visibility + Understanding + Action

Snowflake DSPM (Data Discovery) vs. Snowflake DLP — Why You Need Both

Think of it as:

DSPM = X-ray (Scans existing petabytes of historical tables and stages at rest)

DLP = Treatment (Blocks new sensitive data from being inserted or queried in real-time)

Once DSPM uncovers where sensitive data lives (e.g., cleartext credit card numbers in a raw staging table), companies need DLP or ingestion controls to prevent new sensitive data from landing there moving forward.

👉 Learn more with our Snowflake DLP solution

This pairing creates true closed-loop protection.

Why Companies Need Snowflake DSPM (Data Discovery)

Snowflake is the default backend for modern data stacks. It stores:

  • Marketing/Sales: Customer emails, purchase history, lead lists.
  • Product: Usage logs, raw JSON events (often containing accidental PII).
  • Finance: Transaction records, bank account details.

And these problems make Snowflake high-risk:

✅ 1. The "Zero-Copy Clone" Sprawl

Snowflake’s "Zero-Copy Clone" feature allows developers to instantly copy a Production database to a Test environment without using extra storage. While convenient, this creates massive "Shadow Data" risks where real PII is effectively duplicated into lower-security environments that developers access freely.

✅ 2. The "PUBLIC" Role Crisis

A common misconfiguration involves granting USAGE or SELECT permissions to the PUBLIC role. This effectively allows any user in your Snowflake account (including low-level analysts or temporary contractors) to query sensitive tables.

✅ 3. Unstructured Data in Stages

Data usually lands in an "Internal Stage" (a file storage area inside Snowflake) before being loaded into tables. These CSV, JSON, or Parquet files often contain raw, unredacted PII that bypasses table-level security controls but is still accessible via SQL queries.

✅ 4. Secrets in Semi-Structured Data (VARIANT)

Snowflake shines at handling JSON data using the VARIANT data type. However, developers often dump entire API response payloads into these columns. DSPM is required to parse inside these JSON blobs to find buried API keys or passwords that standard column naming conventions miss.

✅ 5. Compliance & "Right to Be Forgotten"

GDPR and CCPA require you to delete customer data upon request. If that data exists in immutable "Time Travel" storage or forgotten clones, you are non-compliant. You need an inventory that maps a customer ID to every table it populates.

✅ 6. AI Risk (Snowflake Cortex)

If you enable Snowflake Cortex (AI Search/Analyst), it may index your data for RAG (Retrieval-Augmented Generation). If your access controls are loose, a user could ask the AI, "Show me the payroll table," and the AI might retrieve data the user shouldn't theoretically find.

Historical Scanning in Snowflake DSPM

Most companies only govern new tables. The real danger lives in the terabytes of data loaded years ago.

Historical scanning answers:

  • Which 2021 backup tables contain unencrypted SSNs?
  • Do we have API keys stored in meta data $ filename or raw logs?
  • Are there "temporary" schemas from last year's migration that still exist?
  • Is that "Dev_Clone" actually a full copy of Production?

Historical scanning must cover:

Tables & Views (including VARIANT columns)

Internal Stages (Raw files waiting to be loaded)

Zero-Copy Clones

Time Travel & Fail-safe Storage

Without historical scanning, you’re blind to 90% of your data risk.

Access Visibility: Who Can See Your Data?

Finding the data is only half the story. You must know: Who has the Role to query it?

Snowflake DSPM identifies:

  • Over-Privileged Roles: Users with ACCOUNTADMIN or SYSADMIN who don't need it.
  • "Public" Grants: Tables readable by the system-wide PUBLIC pseudo-role.
  • Service Accounts: Non-human users (e.g., Looker, Tableau) that have broader access than necessary.
  • Future Grants: Schemas configured to automatically grant access to future tables, silently expanding risk exposure.

This is the difference between: "This table contains 1M emails. "and" This table contains 1M emails and is querying-able by the Summer Intern role."

Only the second is an immediate emergency.

✨Remediation in Strac Snowflake DSPM

Visibility without action is useless. Strac allows you to fix Snowflake risks instantly.

Tagging & Classification Automatically apply Snowflake Object Tags (e.g., confidentiality=high) to columns. These tags can trigger Snowflake's native Masking Policies automatically.

Dynamic Masking Enforce data masking policies (e.g., MASK_SSN) that redact data based on the user's role—showing plain text to HR but ***-**-**** to analysts.

Revoking Dangerous Grants One-click remediation to revoke PUBLIC access or strip excessive permissions from specific roles.

Row Access Policies Implement Row-Level Security (RLS) to ensure users can only see rows relevant to their region or department.

Cleaning Up Clones Identify and drop stale Zero-Copy Clones that haven't been queried in months but still pose a security risk.

Bulk Remediation Apply masking policies to hundreds of columns across different databases in one action.

Strac Cloud Remediation

How Snowflake DSPM Protects Against AI & Cortex Risk

AI services like Snowflake Cortex are powerful, but they operate on the data you give them access to.

When you use Cortex for RAG (Retrieval-Augmented Generation), you risk:

✅ AI RISK #1: RAG Leaks via "Owners Rights"

In some configurations, AI services might run with elevated privileges ("Owner's Rights"), bypassing the row-level security policies meant for the user ("Caller's Rights"). This could allow the AI to summarize data the user shouldn't see.

✅ AI RISK #2: Inference on Sensitive Clones

If you point Cortex or a custom LLM model at a "Dev" clone that contains unmasked production data, the model will learn and leak that sensitive information.

✅ Snowflake DSPM is Step Zero for AI

Before enabling Snowflake Cortex:

  1. Scan your tables and stages.
  2. Identify sensitive columns.
  3. Remediate (mask PII, drop old clones).
  4. Create a trusted, governed dataset for the AI to access.

✨How Strac Solves Snowflake DSPM (Data Discovery)

Strac provides a unified Data Security Platform for Snowflake:

  • Coverage: Tables, Views, Materialized Views, Internal Stages, Streams.
  • Detection: PII, PHI, PCI, API Keys, Secrets, IP, Custom Regex.
  • Deep Scanning: Parses JSON/XML in Variant columns and unstructured files in Stages.
  • Real-Time & Historical: Scans existing data warehouses and monitors for new schema changes.
  • Compliance: Maps findings to SOC2, HIPAA, PCI-DSS, GDPR, NIST.
  • Remediation: Auto-Tagging, Dynamic Masking, Access Revocation.

🔗 Explore Strac's Snowflake Integrations

Strac Cloud DSPM

🌶️ Spicy FAQs on Snowflake DSPM

Doesn't Snowflake's "Governance" tab do this?

Snowflake's native governance features (like Horizon) are excellent frameworks, but they often require manual configuration of classifiers and policies. They charge compute credits for classification, which can get expensive at scale. Strac provides an automated, external "auditor" view with pre-built detectors and remediation workflows that often work faster and more cost-effectively.

Can Strac find secrets in Variant (JASON)Columns?

Yes. This is a critical differentiator. Many tools only look at column names. Strac scans the actual data values inside semi-structured JSON columns to find hidden API keys or PII.

Does this handle "Zero-Copy Clones"?

Yes. Strac identifies clones and treats them as distinct data assets, ensuring you know if a "Dev" clone is actually holding unencrypted "Production" data.

Does this help with HIPAA/SOC2 compliance?

Absolutely. Auditors require an up-to-date inventory of where PHI resides. Strac provides the automated mapping and evidence that you are managing access to sensitive health data in your warehouse.

Trusted by enterprises

Discover & Remediate PII, PCI, PHI, and Secrets in Snowflake

[Book a Demo]

Trusted by enterprises
Discover & Remediate PII, PCI, PHI, Sensitive Data

More Data Discovery (DSPM) Integrations

No items found.