AWS DSPM (Data Discovery)

Scan, Label, and Remediate Sensitive Data in S3, RDS, and DynamoDB

ChatGPT
Perplexity
Grok
Google AI
Claude
Summarize and analyze this article with:

AWS has become the world’s largest data warehouse—S3 buckets from 2018, unmanaged RDS instances, DynamoDB tables with customer logs, EBS snapshots, and CloudWatch logs containing secrets... all sitting somewhere in your account with unknown access.

And here’s the uncomfortable truth:

You can’t protect what you can’t see.

That’s exactly where AWS DSPM (Data Discovery) comes in.

This is the guide you wish existed years ago—tactical, real-world, and written specifically for the AWS cloud environment.

TL;DR

  1. AWS DSPM (Data Discovery) gives full visibility into sensitive data across all AWS data stores—not just S3, but RDS, DynamoDB, and logs.
  2. Most risk comes from "Shadow Data"—forgotten S3 buckets, unencrypted database snapshots, and public-read ACLs.
  3. DSPM identifies what data exists, where it lives, who (IAM roles/users) has access, and how exposed it is.
  4. Remediation includes tagging, encrypting, blocking public access, and redacting sensitive data.
  5. DSPM is critical before rolling out AI services like AWS Bedrock or SageMaker.
  6. Strac provides automated scanning, risk scoring, IAM access mapping, and bulk remediation for AWS.

✨What Is AWS DSPM (Data Discovery)?

AWS DSPM (Data Discovery)

AWS DSPM (Data Discovery) is the process of:

  • Discovering all sensitive data stored in AWS (S3, RDS, DynamoDB, Redshift, EBS).
  • Classifying it (PII, PHI, PCI, API Keys, Secrets, IP).
  • Mapping access (Public buckets, Cross-account roles, "Any Authenticated User").
  • Assessing risk (Unencrypted at rest, public exposure).
  • Remediating exposure (Redaction, Encryption, Permissions reset).

In short:

DSPM = Visibility + Understanding + Action

AWS DSPM (Data Discovery) vs. AWS DLP — Why You Need Both

Think of it as:

✅ DSPM = X-ray (Scans existing data at rest in S3/RDS)

✅ DLP = Treatment (Blocks new data from leaving or entering)

Once DSPM uncovers where sensitive data lives (e.g., a credit card number in an old S3 log file), companies need DLP to prevent new sensitive data from being uploaded or egressed moving forward.

👉 Learn more with our AWS S3 DLP solution

This pairing creates true closed-loop protection.

Why Companies Need AWS DSPM (Data Discovery)

AWS is the default backend for modern applications. It stores:

  • S3: Customer documents, KYC images, backups.
  • RDS/DynamoDB: User profiles, transaction histories.
  • CloudWatch: Application logs (often leaking API keys).

And these problems make AWS high-risk:

✅ 1. The S3 "Public Bucket" Crisis

A single misconfigured bucket policy or ACL can turn "Private" into "Public Internet." This is the #1 cause of cloud data breaches.

✅ 2. Shadow Data & Snapshots

DevOps teams spin up RDS instances for testing and forget them. They take EBS snapshots and leave them unencrypted. This data sits dormant but dangerous.

✅ 3. Log Leakage (CloudWatch)

Developers often accidentally log JSON bodies containing passwords, API keys, or PII into CloudWatch logs. Without DSPM, this data is plaintext and searchable.

✅ 4. Cross-Account Access

Data might be secure in your account, but if an IAM role allows access from a vendor's AWS account (or a developer's personal account), your perimeter is broken.

✅ 5. Compliance Gaps

SOC2, HIPAA, PCI, and GDPR all require you to know exactly which S3 buckets contain PII and prove that access is restricted.

✅ 6. AI Risk (Bedrock & SageMaker)

If your S3 data lake is fed into AWS Bedrock for RAG (Retrieval-Augmented Generation), any sensitive data inside becomes part of the AI's knowledge base.

Historical Scanning in AWS DSPM

Most companies only monitor new objects. The real danger lives in the petabytes of data from years ago.

Historical scanning answers:

  • Which S3 buckets contain unencrypted PII?
  • Do we have secrets stored in DynamoDB tables?
  • Are there old RDS snapshots with real customer data?
  • Is that "test" bucket actually public?

Historical scanning must cover:

✅ S3 Buckets (including archived/Glacier objects)

✅ RDS & Aurora (Databases)

✅ DynamoDB Tables

✅ CloudWatch Logs

✅ EBS Volumes

Without historical scanning, you’re blind to 90% of your cloud risk.

Access Visibility: Who Can See Your Data?

Finding the data is only half the story. You must know: Who has the IAM role to see it?

AWS DSPM identifies:

  • Publicly Accessible Buckets: (Read/Write to "Everyone")
  • Cross-Account Access: Buckets shared with external AWS IDs.
  • Over-Permissioned Roles: Users with S3:* or DynamoDB:* permissions on sensitive resources.
  • Unencrypted Resources: Data stored without KMS keys.

This is the difference between:

"This bucket contains 10,000 SSNs."

and

"This bucket contains 10,000 SSNs and is readable by any AWS user."

Only the second is an immediate emergency.

✨ Remediation in Strac AWS DSPM

Visibility without action is useless. Strac allows you to fix AWS risks instantly.

✅ Tagging & Labeling

Automatically tag S3 objects or RDS instances as Confidential, PII, or HIPAA. This allows AWS Service Control Policies (SCPs) to enforce stricter controls.

✅ Blocking Public Access

One-click remediation to enable "Block Public Access" on exposed S3 buckets.

✅ Redaction

Strac can physically redact sensitive values (like masking a credit card number) inside the file or log.

✅ Encryption Enforcement

Flag and alert on resources that are not encrypted with AWS KMS.

✅ Least-Privilege Enforcement

Identify and suggest removal of unused IAM roles or excessive permissions on sensitive data stores.

✅ Bulk Remediation

Fix thousands of misconfigured objects or logs in one action.

Strac AWS DSPM (Data Discovery

How AWS DSPM Protects Against AI & GenAI Risk

AI services like AWS Bedrock, Amazon Q, and SageMaker are powerful, but they are data amplifiers.

When you connect an S3 Data Lake to a Large Language Model (LLM), you risk:

✅ AI RISK #1: RAG (Retrieval-Augmented Generation) Leaks

If you point Amazon Q to your S3 buckets for "company knowledge," it will index everything. If that includes a spreadsheet of employee salaries, the AI will happily answer: "What is the CEO's salary?"

✅ AI RISK #2: Model Training

If sensitive customer data is ingested into a custom SageMaker model, that data is "baked in." You cannot delete it without destroying the model.

✅ AWS DSPM is Step Zero for AI

Before enabling AWS Bedrock or Amazon Q:

  1. Scan your data sources (S3, RDS).
  2. Identify sensitive files.
  3. Remediate (delete, redact, or move) toxic data.
  4. Create a clean "AI-Ready" dataset.

🎥How Strac Solves AWS DSPM (Data Discovery)

Strac provides a unified Data Security Platform for AWS:

  • Coverage: S3, RDS, DynamoDB, CloudWatch, Redshift.
  • Detection: PII, PHI, PCI, API Keys, Secrets, IP, Custom Regex.
  • OCR: Scans images (passports, IDs) and PDFs in S3.
  • Real-Time & Historical: Scans existing data and monitors new streams.
  • Compliance: Maps findings to SOC2, HIPAA, PCI-DSS, GDPR, NIST.
  • Remediation: Redact, Label, Encrypt, Block Access.

🔗 Explore Strac's AWS Integrations

🌶️ Spicy FAQs on AWS DSPM

Doesn't Amazon Macie do this?

Macie is a good start, but it is limited primarily to S3. It can become very expensive at scale and often lacks the remediation workflows (like redaction) and broader coverage (RDS, DynamoDB, Logs) that Strac provides.

What is the difference between AWS Security Hub and DSPM?

Security Hub focuses on infrastructure posture (e.g., "Is MFA enabled?"). DSPM focuses on data content (e.g., "Is there a credit card number in this file?"). You need both.

Can Strac find secrets in CloudWatch logs?

Yes. This is a common leak vector. Strac scans logs for API keys, passwords, and tokens and can alert or redact them.

Does this help with HIPAA/SOC2 compliance?

Absolutely. Auditors require an up-to-date inventory of sensitive data. Strac provides the reports and evidence that you know exactly where your PHI/PII resides.

Trusted by enterprises

Discover & Remediate PII, PCI, PHI, and Secrets in AWS

Book a Demo

Trusted by enterprises
Discover & Remediate PII, PCI, PHI, Sensitive Data

More Data Discovery (DSPM) Integrations

No items found.