Calendar Icon White
June 8, 2026
Clock Icon
15
 min read

Databricks MCP Server: Secure Genie & AI Agent Setup (2026)

The Databricks MCP server lets Claude, Cursor, and AI agents run NL→SQL with Genie, call Unity Catalog functions, and search your lakehouse. Here's the official setup, the real blast-radius risks, and how to govern it with DLP-grade redaction at the MCP layer.

Databricks MCP Server: Secure Genie & AI Agent Setup (2026)
ChatGPT
Perplexity
Grok
Google AI
Claude
Summarize and analyze this article with:

TL;DR

  • The Databricks MCP server is how AI agents (Claude, Cursor, ChatGPT, custom agents) reach into your lakehouse via the Model Context Protocol — running natural-language-to-SQL through Genie, calling Unity Catalog functions, and doing vector/semantic search across every Delta table the authorizing principal can read.
  • Setup is documented in the official Databricks managed MCP server guide; Databricks ships managed servers for Genie spaces, Unity Catalog functions, and Vector Search, plus the option to host custom MCP servers on Databricks Apps. Authentication is OAuth on-behalf-of the user.
  • Strac Databricks MCP DLP is the governance layer for AI-agent access to the lakehouse. Strac governs every tool call between the agent and Databricks: it controls what each agent and principal can query and do (allow, block, or require approval on broad or high-risk queries and exports), protects the PHI / PII / financial data sitting in result rows with redaction, masking, tokenization, and vaulting — including column-level — and logs every query as audit evidence mapped to HIPAA / SOC 2 / PCI / GDPR / EU AI Act / ISO 42001. One control plane across every Genie query, UC function call, and vector search.
  • Setup is agentless and under 10 minutes per workspace. No notebook changes, no agent SDK changes, no Unity Catalog re-permissioning.

What Is the Databricks MCP Server?

The Databricks MCP server is a Model Context Protocol implementation that exposes your lakehouse as a standardized set of tools to AI agents. Databricks provides managed servers out of the box — a Genie server for natural-language-to-SQL across your data, Unity Catalog Functions servers that let an agent invoke predefined SQL queries and tools, and Vector Search (AI Search) servers for semantic retrieval over indexed documents — and lets you host custom MCP servers on Databricks Apps for anything beyond that.

See the official Databricks managed MCP documentation for the current server list, endpoint patterns, and OAuth scopes. Each managed server has its own endpoint — Genie at /api/2.0/mcp/genie/{genie_space_id}, UC functions at /api/2.0/mcp/functions/{catalog}/{schema}/{function_name}, vector search at /api/2.0/mcp/vector-search/{catalog}/{schema}/{index_name} — and an external agent connects by registering that endpoint and authenticating with an OAuth token on behalf of the user.

From the analyst's perspective, the AI agent suddenly understands their lakehouse — ask a question in English, get an answer computed over real Delta tables. From the security perspective, the AI agent now has query access to every row that principal can reach across the catalog.

That's the value. It's also where data-security teams need a control layer.

What AI Agents Can Actually Do With Databricks MCP

Wire the Databricks MCP server into an agent and the lakehouse starts answering questions on its own. Acting as the authorizing principal, Claude can reach Genie spaces, Unity Catalog functions, and your Delta tables without anyone opening a notebook. The concrete wins:

  • Answer questions in plain English over real tables — "What's our 90-day readmission rate by facility?" or "Which merchants had the highest chargeback ratio last quarter?" resolves through Genie into live SQL against Delta tables, not a stale dashboard export.
  • Run natural-language-to-SQL at lakehouse scale — Genie translates a prompt into a query that can scan, join, and aggregate across many tables and millions of rows, then returns the result set to the model.
  • Invoke Unity Catalog functions as tools — predefined, governed SQL functions become callable agent actions, so the agent runs parameterized queries your data team blessed.
  • Do vector and semantic search — query a Vector Search index to pull the most relevant documents, notes, or records for retrieval-augmented answers.
  • Compile cross-table reports — combine Genie queries, function calls, and vector lookups in one prompt to assemble an analysis that would otherwise be a multi-step notebook job.

Every one of those actions runs through Databricks' own APIs and Unity Catalog's permission model — which is what makes it genuinely useful, and exactly why the regulated data those queries return needs an inspection layer in the tool-call path.

The Real Security Risks of the Databricks MCP Server

The risks fall into four categories that every healthcare and regulated team running Databricks should price into the deployment.

1. Genie NL→SQL has enormous blast radius. Unlike a single-object read, one Genie prompt can be translated into a query that scans and joins across the entire lakehouse. There is no WHERE not_regulated = true clause — Genie returns whatever the generated SQL selects, and the raw result rows flow straight to the model. A vague question can pull far more sensitive data than the user intended to expose.

2. Unity Catalog grants are usually broad. In practice, analysts and service principals are granted catalog- or schema-level access so they can do their jobs. Unity Catalog enforces those grants faithfully — but if the principal can read a schema full of PHI, so can the agent acting on its behalf. The MCP server honors permissions; it doesn't narrow them.

3. PHI, PII, and financial data sit in Delta tables and get returned raw. Lakehouses are where regulated data concentrates — patient records, claims, transactions, card data, customer identifiers. When Genie or a UC function returns those rows, the values land in the AI model's context window in the clear. Nothing inspects them on the way out.

4. Row filters and column masks don't follow the data to an external agent. Unity Catalog supports row filters and column masks, and they're valuable — but they're scoped to the principal's read path inside Databricks. They don't redact what flows out over MCP to an external agent and into a third-party model's context. The moment the result set leaves the lakehouse boundary, those controls are behind it, not in front of the model.

The traditional DLP a company already runs — at the network edge, on the file share, inside a SaaS rule engine — does not sit in the MCP path. The Genie or function response goes straight from the lakehouse into the AI agent's context window. That reach is exactly why each agent's access and queries against Databricks must be governed: controlled (what it can query and do), the sensitive data in the result rows protected, and every query audited. That is where Strac Databricks MCP DLP lives.

✨ Strac Databricks MCP DLP — Production-Ready Agent Governance

Strac's Databricks MCP DLP is the governance layer that sits between AI agents and the Databricks MCP server. Strac governs every tool call: it sees exactly what each agent queries across the lakehouse, controls its actions (allow, block, or require approval on broad scans, high-risk exports, and write queries), protects the sensitive data in the returned rows by redacting, masking, tokenizing, or vaulting it — down to the column level — and proves it by logging every query as audit evidence. Non-sensitive, in-policy queries flow through untouched.

Strac Databricks MCP DLP architecture — AI agents query the lakehouse via Genie, Unity Catalog functions, and Vector Search over MCP; Strac intercepts every tool response and redacts PHI, PII, PCI, and financial data in the result rows before content reaches the AI model
The Strac Databricks MCP DLP gateway intercepts every tool call between any AI agent (Claude, Cursor, ChatGPT, custom) and the Databricks MCP server. PHI, PII, card numbers, and financial identifiers in Genie result rows, function outputs, and vector-search hits are redacted or masked before the agent ever reads them.

What this looks like in practice:

  • Read queries are filtered at the row and column level. When the agent runs a Genie query, calls a UC function, or hits a vector index, Strac inspects the returned result set, masks or tokenizes the columns and values that hold SSNs / MRNs / card numbers / clinical data / financial identifiers inline, and passes the clean payload to the agent. The agent still answers the question; the regulated values never enter the model context.
  • Broad and high-risk queries are guardrailed. When a Genie prompt translates into a query with outsized blast radius — a full-schema scan, an export, a write — Strac can require approval or block based on the principal, the data class, and the volume returned.
  • Result rows are inspected at depth. Free-text columns, document blobs from vector search, and exported files are parsed with the same OCR and document-parser pipeline Strac uses across its DLP product line — so sensitive content buried inside notes, attachments, and scanned PDFs is found and redacted, not just structured fields.
  • Every query is logged. AI client, principal, server (Genie / functions / vector search), query or function invoked, tables and columns touched, data classes detected, redactions and masks applied, vault references, disposition. The log is the HIPAA / SOC 2 / PCI / GDPR audit evidence — produced automatically.
  • Policy is contextual. Different catalogs, schemas, and data classes get different policies. Strac maps to your existing Unity Catalog classification, not an MCP-specific silo.

The same Strac MCP DLP layer covers other data-platform surfaces — Snowflake MCP, BigQuery MCP, and Postgres MCP — one control plane across every place AI agents query your regulated data. For the full picture of why this matters as data moves into models, see MCP DLP and AI DLP.

✨ Strac Native Databricks DLP — The Companion to MCP DLP

Strac data discovery dashboard continuously scanning a connected lakehouse and classifying PII, PHI, PCI, and secrets in real time
Strac natively discovers and classifies the regulated columns inside your Databricks lakehouse before any agent queries them — the companion to Databricks MCP DLP that maps where sensitive data lives.

MCP DLP protects the AI-agent surface. Strac's native Databricks DLP protects the data-at-rest surface — the same lakehouse, but discovered and classified where the regulated data actually lives. Most regulated teams run both: native DSPM to know what sensitive data sits in which Delta tables and cloud storage, MCP DLP to govern what agents do with it. Together they cover every path regulated data can take into and out of Databricks.

What Strac's native Databricks DLP includes:

  • Continuous discovery and classification of PHI, PII, PCI, and financial data across Delta tables, Unity Catalog schemas, and the underlying cloud storage (S3 / ADLS / GCS) — this is DSPM for AI applied to the lakehouse
  • Column-level inspection — Strac classifies which columns hold SSNs, MRNs, card numbers, and customer identifiers, so masking policies target the right fields
  • Document and blob inspection at depth — free-text columns, exported reports, and files in attached storage, with OCR for scanned documents
  • A live data map of where regulated data concentrates, so you can right-size Unity Catalog grants before an agent ever connects
  • Audit findings mapped to HIPAA Security Rule, SOC 2 CC6, PCI Req. 3/4/7/10, and GDPR

For the broader integration catalog — every SaaS, cloud, browser, and endpoint surface Strac covers — see strac.io/integrations.

✨ See Strac MCP DLP in Action

The screenshot below shows Strac's MCP DLP redacting sensitive data from a real Claude session — patient identifiers, customer emails, and card numbers tokenized inline before the model received the prompt. The same inspection pattern runs on every Genie query, UC function call, and vector search routed through Strac.

Strac DLP redacting sensitive data in a Claude conversation — PHI, PII, and PCI elements replaced with tokenized placeholders before reaching the model
Strac DLP at work inside a Claude conversation: sensitive elements tokenized inline before the model sees them. The same pattern runs at the MCP layer for every Databricks tool call.

How to Set Up Strac Databricks MCP DLP

Setup is agentless and takes under 10 minutes.

  1. Authorize Strac with your Databricks workspace via OAuth. Strac requests the on-behalf-of scopes for the managed servers you want covered (Genie, functions, vector search). It honors Unity Catalog's permission model — Strac only sees what the authorizing principal can see.
  2. Point your agent at the Strac MCP gateway endpoint. Strac issues an MCP server endpoint that drops into your AI client's MCP configuration in place of the raw Databricks endpoint, so every tool call flows through Strac. For Claude Desktop: json "mcpServers": { "databricks": { "url": "https://mcp.strac.io/databricks", "auth": { "type": "bearer", "token": "<your-strac-token>" } } } For Cursor, OpenAI Agents, and custom agents — same endpoint, same auth.
  3. Pick your policy. Out-of-the-box templates for HIPAA, SOC 2, PCI, and GDPR. Custom policies (catalog-level, schema-level, column-level, data-class-level) take minutes to configure.
  4. Done. Every MCP tool call between your agent and Databricks now routes through the Strac gateway. No notebook changes, no agent code changes, no Unity Catalog re-permissioning. The audit log starts populating immediately.

Compliance Coverage Out of the Box

The same Strac Databricks MCP DLP control produces evidence mapped to every major compliance framework. For lakehouses holding patient and claims data, HIPAA is the headline — Strac puts a redaction and accounting-of-disclosures layer between PHI in Delta tables and any external model.

Framework
What Strac Databricks MCP DLP Satisfies
HIPAA
§164.312(b) (audit controls over AI access to ePHI), §164.502(b) (minimum necessary — masking what the agent doesn't need), §164.514 (de-identification of returned values), §164.528 (accounting of disclosures to AI agents)
SOC 2
CC6.6 (unauthorized data exposure), CC6.7 (restricted transmission of data to external systems), CC7.2 (monitoring for anomalous AI query activity)
PCI DSS v4.0.1
Req. 3.3 (PAN masking in result rows), Req. 4.x (encryption in transit), Req. 7 (least privilege on queries), Req. 10 (log every access)
GDPR
Art. 5 (purpose & storage limitation), Art. 25 (data protection by design), Art. 30 (records of processing), Art. 32 (security of processing)
EU AI Act
Art. 10 (data governance for high-risk AI systems)
ISO/IEC 42001
Clause 6.1.4 (risk treatment), Clause 8.4 (operational controls), Annex A.7 (data for AI systems)

For the broader AI-data-governance program this sits inside, see DSPM for AI.

🌶️ Spicy FAQs for Databricks MCP Server

What is the Databricks MCP server?

The Databricks MCP server is a Model Context Protocol implementation that lets AI agents (Claude, Cursor, ChatGPT, custom agents) query your lakehouse via standardized tool calls. Databricks ships managed servers for Genie (natural-language-to-SQL), Unity Catalog functions, and Vector Search, plus the option to host custom MCP servers on Databricks Apps. It's how an AI assistant gets contextual access to every Delta table the authorizing principal can read.

Databricks MCP vs Databricks Genie/Assistant — what's the difference?

Genie and the Databricks Assistant are agents that run inside Databricks — they translate natural language to SQL and answer questions within Databricks' own boundary and UI. The Databricks MCP server points the other direction: it exposes the lakehouse (including Genie itself) to external agents like Claude, Cursor, and custom agents over the open Model Context Protocol, so the AI client your team already uses can query your data. Genie keeps the interaction inside Databricks' guardrails; MCP lets an external agent reach in, and the result rows leave those guardrails the moment they return to the client. That hand-off is exactly where Strac Databricks MCP DLP inspects, masks, and redacts.

Is the Databricks MCP server safe to use with PHI and regulated data?

By itself, no — not without an additional DLP layer. The Databricks MCP server enforces Unity Catalog permissions, but it returns whatever the principal can read, and analysts are usually granted broad schema-level access. A single Genie query can pull thousands of rows of PHI, PII, or card data straight into the model's context. For regulated use, you need an MCP-layer control like Strac Databricks MCP DLP that inspects and masks every result row before content reaches the AI model.

Don't Unity Catalog row filters and column masks already protect the data?

They protect it inside Databricks, for the principal's own read path. Row filters and column masks are scoped to how that principal sees data in the lakehouse — they don't redact what flows out over MCP to an external agent and into a third-party model's context window. The moment the result set crosses the lakehouse boundary, those controls are behind it. Strac sits in the tool-call path itself, so masking follows the data all the way to the model.

How is Strac Databricks MCP DLP different from Databricks' built-in protections?

Databricks' built-in protections operate at the catalog and policy layer — Unity Catalog grants, row filters, column masks, lineage. None of those inspect the payload that leaves over MCP to an external model. Strac is purpose-built for the MCP layer: it intercepts every tool call before the result reaches the AI agent's context window, with detection breadth (PII / PHI / PCI / financial / secrets / OCR-in-documents) and column-level masking that goes beyond static catalog policies.

Does Strac Databricks MCP DLP work with Claude, Cursor, ChatGPT, and custom agents?

Yes. Strac exposes a standard MCP gateway endpoint, so any MCP-aware AI client routes its Databricks tool calls through it with one configuration change. No SDK changes, no notebook changes, no application code changes.

What sensitive data types does Strac detect in Databricks MCP result rows?

PII (SSN, driver's license, passport, address, phone, email), PHI (clinical notes, MRN co-occurrence, ICD-10 codes adjacent to identifiers, lab values), PCI (full and partial card numbers via Luhn check), financial identifiers (account and routing numbers, balances), credentials (API keys, cloud access keys, OAuth tokens, JWTs, SSH keys — 48+ patterns), and custom detectors trained on your Unity Catalog classifications. Detection runs across structured columns, free-text fields, vector-search document hits, and files (OCR).

How long does Strac Databricks MCP DLP take to deploy?

Under 10 minutes for the first workspace. OAuth Strac into Databricks, swap the Strac MCP gateway endpoint into your AI client's config, pick a policy template, done. No agents to install, no Unity Catalog re-permissioning, no notebook changes.

Where does masked data go — is it stored?

Masked and redacted values are replaced inline in the result rows. Optionally, sensitive values can be vaulted — replaced with a short-lived retrieval link that only authorized users can resolve, so the original is retrievable for legitimate use without ever entering the AI context. Vaulted data is stored encrypted at rest in your Strac tenant; you control retention.

Can I see what an AI agent queried in my lakehouse?

Yes. Strac produces a per-call audit log: timestamp, AI client identity, principal, server invoked (Genie / functions / vector search), query or function, tables and columns touched, data classes detected, redactions and masks applied, vault references, disposition. The log is queryable in the Strac console and exportable to your SIEM. This is the evidence trail HIPAA, SOC 2, PCI, and GDPR auditors will ask about for AI-agent activity against your lakehouse.

The Bottom Line

The Databricks MCP server is rapidly becoming the way AI agents read into the lakehouse — and the lakehouse is where the most concentrated PHI, PII, and financial data your organization has actually lives. Genie's natural-language-to-SQL gives an agent enormous reach across that data, Unity Catalog grants are usually broad, and row filters and column masks stop at the lakehouse boundary. Running Databricks MCP in 2026 without an MCP-layer DLP control is not a question of if the first incident reaches your security team; it's when.

Strac Databricks MCP DLP gives you the governance layer, the column-level masking, the audit evidence, and the framework-agnostic compliance coverage so healthcare and regulated teams can let analysts use Databricks with Claude, Cursor, ChatGPT, and any future AI client without making each one a separate security exception.

If you are running — or about to run — Databricks MCP in production, book a 30-minute demo. We'll walk through the architecture, the policy templates, and a deployment plan for your specific lakehouse, Genie spaces, and AI clients.

For the broader MCP DLP control plane across every data surface, see the MCP DLP pillar. For more data-platform deep dives: Snowflake MCP, BigQuery MCP, Postgres MCP.

What is the Databricks MCP server?
Databricks MCP vs Databricks Genie/Assistant — what's the difference?
Is the Databricks MCP server safe to use with PHI and regulated data?
Don't Unity Catalog row filters and column masks already protect the data?
How is Strac Databricks MCP DLP different from Databricks' built-in protections?
Discover & Protect Data on SaaS, Cloud, Generative AI
Strac provides end-to-end data loss prevention for all SaaS and Cloud apps. Integrate in under 10 minutes and experience the benefits of live DLP scanning, live redaction, and a fortified SaaS environment.
Users Most Likely To Recommend 2024 BadgeG2 High Performer America 2024 BadgeBest Relationship 2024 BadgeEasiest to Use 2024 Badge
Trusted by enterprises
Data Security + Compliance Automation

Latest articles

Browse all

Get Your Datasheet

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Close Icon