Azure DSPM (Data Discovery): Scan Blob Storage & SQL

Microsoft Azure is the backbone of the modern enterprise. But it often resembles a sprawling digital attic: Blob Storage containers from 2019, unmanaged Azure SQL databases, Cosmos DB collections with raw customer JSON, and unencrypted snapshots sitting in forgotten Resource Groups.

And here’s the uncomfortable truth: You can’t protect what you can’t see.

That’s exactly where Azure DSPM (Data Discovery) comes in.

This is the guide you wish existed years ago—tactical, real-world, and written specifically for the Azure cloud environment.

TL;DR

Azure DSPM (Data Discovery) provides full visibility into sensitive data across all Azure data stores—Blob Storage, Azure SQL, Cosmos DB, and Synapse.
Most risk comes from "SAS Token Hell" (over-privileged shared access signatures) and forgotten storage accounts with public access.
DSPM identifies what data exists, where it lives, who (Entra ID users/Service Principals) has access, and how exposed it is.
Remediation includes automated tagging, redaction, key rotation, and enforcing private endpoints.
DSPM is the prerequisite for safe AI adoption before using Azure OpenAI or Copilot.
Strac provides automated scanning, risk scoring, and remediation for Azure in a unified pane of glass.

✨What Is Azure DSPM (Data Discovery)?

Azure DSPM (Data Security Posture Management) is the process of:

Discovering sensitive data across Azure Storage (Blob, Files), Databases (SQL, Cosmos DB), and Analytics (Synapse).
Classifying it (PII, PHI, PCI, Secrets, Intellectual Property).
Mapping access (Which Entra ID users, Guest Accounts, or SAS tokens can read this data?).
Assessing risk (Is this Blob container public? Is the SAS token non-expiring?).
Remediating exposure (Redaction, Encryption, Access Revocation).

In short: DSPM = Visibility + Understanding + Action

Azure DSPM (Data Discovery) vs. Azure Information Protection (AIP)

Think of it as:

✅ DSPM = The Radar (Finds sensitive data everywhere, including unmanaged JSON files, logs, and shadow databases).

✅ AIP/MIP = The Label (Great for Office docs, but often struggles with raw data in Cosmos DB or unstructured logs).

You need DSPM to find the risks that AIP labels miss.

Why Companies Need Azure DSPM (Data Discovery)

Azure is the default cloud for the enterprise. It stores:

Blob Storage: Corporate archives, legal documents, massive data lakes.
Azure SQL: Core business transaction data.
Cosmos DB: Real-time user profiles and session data.

And these problems make Azure high-risk:

✅ 1. The "SAS Token" Nightmare Azure uses Shared Access Signatures (SAS) to delegate access. Developers often create SAS tokens with "Read/Write" permissions, set them to expire in 10 years, and hardcode them into applications. If that token leaks, your data is gone.

✅ 2. "Shadow" Resource Groups DevOps teams spin up new Resource Groups for "POCs" and forget to delete them. These environments often lack the policies of production but contain real customer data.

✅ 3. Public Blob Containers A simple toggle can make a Storage Container accessible to the public internet. Without continuous scanning, a developer debugging an issue might accidentally expose terabytes of data.

✅ 4. Guest Access (Entra ID) Azure makes it easy to invite external "Guest Users" (vendors, partners). DSPM answers the question: "Does that vendor from 2021 still have read-access to our financial backups?"

✅ 5. Compliance & Sovereignty GDPR is strict about data residency. Do you know if your US-East region accidentally contains data from your German customers?

✅ 6. AI Risk (Azure OpenAI) If you connect Azure OpenAI to your data lake for RAG (Retrieval-Augmented Generation), it indexes everything. If your Blob storage contains executive payroll data, the AI becomes a leakage vector.

Historical Scanning in Azure DSPM

Most native tools act on "events" (new file upload). They miss the petabytes of data that have been sitting there for years.

Historical scanning answers:

Which Blob container holds the unencrypted database export from 2020?
Are there hardcoded secrets in our old Azure Function logs?
Did we leave unmasked PII in a Cosmos DB "Dev" collection?
Is that "Archive" storage account publicly readable?

Historical scanning must cover:

‍✅ Azure Blob Storage (Hot, Cool, and Archive tiers)

✅ Azure SQL Database & Managed Instances

‍✅ Cosmos DB (NoSQL)✅ Azure Files & NetApp Files

‍✅ Disk Snapshots

Without historical scanning, you’re blind to 90% of your cloud risk.

Access Visibility: Who Can See Your Data?

Finding the data is only half the story. You must know: Who has the permission to read it?

Azure DSPM identifies:

Public Exposure: Containers allowing "Blob" or "Container" level public access.
Toxic SAS Tokens: Tokens with excessive permissions or long expiration dates.
Over-Privileged Identities: Service Principals with Storage Blob Data Owner roles on sensitive accounts.
Cross-Tenant Access: Data shared with external Azure tenants.

This is the difference between:

"We have financial records in Azure "and" We have financial records in a Blob container that allows anonymous public read access."

Only the second is an immediate emergency.

✨Remediation in Strac Azure DSPM

Visibility without action is useless. Strac allows you to fix Azure risks instantly.

✅ Auto-TaggingAutomatically apply Azure Tags (e.g., Confidentiality: High, Compliance: HIPAA) to resources. This triggers downstream policies.

✅ RedactionStrac can physically redact sensitive values inside files (CSV, JSON, Text) stored in Blob Storage. It replaces a credit card number with ****-****-****-1234 directly in the object.

✅ Fixing Public AccessInstantly revoke public access on misconfigured Storage Accounts.

✅ Encryption EnforcementIdentify unencrypted storage and trigger workflows to enable Infrastructure Encryption or Customer Managed Keys (CMK).

✅ Least-Privilege CleanupIdentify Entra ID users who have high-level access to sensitive data but haven't logged in for 90 days, and suggest revoking access.

How Azure DSPM Protects Against AI & GenAI Risk

Microsoft is leading the AI charge with Copilot and Azure OpenAI Service.

When you connect your Azure data estate to these tools, you risk:

✅ AI RISK https://www.google.com/search?q=%231: The "Copilot" Leak: Microsoft 365 Copilot respects user permissions. But if your permissions are messy (e.g., "Everyone" has read access to a sensitive Blob), Copilot will surface that sensitive data to anyone who asks.

✅ AI RISK https://www.google.com/search?q=%232: Model Grounding: When building custom AI apps using Azure OpenAI, you often "ground" the model in your data (RAG). If that data includes secrets or PII, the AI will confidently recite them to users.

✅ Azure DSPM is Step Zero for AI: Before enabling Copilot or Azure OpenAI:

Scan your data sources (Blob, SQL, SharePoint).
Clean (Redact/Delete) toxic data.
Certify the dataset as "AI-Ready."

✨How Strac Solves Azure DSPM (Data Discovery)

Strac provides a unified Data Security Platform for the Multi-Cloud era:

Coverage: Azure Blob, Azure SQL, Cosmos DB, Azure Files, MySQL/PostgreSQL for Azure.
Detection: PII, PHI, PCI, API Keys, SAS Tokens, Secrets, IP.
OCR: Scans images (IDs, scanned contracts) and PDFs in Blob storage.
Real-Time & Historical: Scans existing "dark data" and monitors new data streams.
Compliance: Maps findings to SOC2, HIPAA, PCI-DSS, GDPR, NIST.
Remediation: Redact, Label, Encrypt, Block Access.

🔗 Explore Strac's Azure Integrations

🌶️ Spicy FAQs on Azure DSPM

Why can't I just use Microsoft Purview?

‍Purview is a powerful governance tool, but it can be complex to deploy and expensive at scale. More importantly, Purview is primarily a cataloging tool. Strac focuses on remediation (like redaction) and provides a unified view if you also use AWS, Google Cloud, or SaaS apps (Slack, Jira). You don't want policies fragmented across clouds.

Does this replace Microsoft Defender for Cloud?

‍No. Defender for Cloud is a CSPM/CWPP (Cloud Security Posture Management). It protects the infrastructure (e.g., "Is port 3389 open? Is the OS patched?"). Strac DSPM protects the data inside (e.g., "Does this SQL table contain unencrypted SSNs?"). You need both.

Can Strac find secrets in Azure DevOps?

‍Yes. Strac integrates with developer environments to find hardcoded credentials, ensuring they don't leak into production builds.

Does Strac move my data out of Azure to scan it?

‍Strac is architected with privacy in mind. We use ephemeral scanning where data is processed in memory and never stored on our servers. We give you the verdict (Risk/No Risk), not the data custody headache.

Trusted by enterprises

Discover & Remediate PII, PCI, PHI, and Secrets in Azure

‍Book a Demo