MICROSOFT PURVIEW
Data Governance at Scale
Enterprise Architecture, Implementation Patterns & AI-Era Governance
Data Platform Practice | Microsoft Ecosystem
Veratas | 2026 Edition | Version 2.0
Executive Summary
Data is the most valuable asset enterprises possess — yet for most organizations, it remains ungoverned, undiscovered, and untrustworthy. Regulatory scrutiny has never been higher. GDPR fines exceeded €4.2 billion in 2023. The average cost of a data breach reached $4.45 million in 2024 (IBM). And yet, surveys consistently find that fewer than 30% of enterprise data assets are formally catalogued or classified.
Microsoft Purview has evolved significantly since its launch in 2022. As of early 2026, it is no longer merely a data catalog — it is Microsoft’s unified data security, governance, and compliance platform for the era of AI. New capabilities including Data Security Posture Management (DSPM), AI Observability for agents, and the Generally Available Unified Catalog have fundamentally expanded what Purview can do for enterprise data programs.
This white paper is written for data architects, Chief Data Officers, compliance leaders, and senior engineers who need a current, production-grade reference. It reflects the state of the platform as of Q1 2026 — incorporating AI governance capabilities, updated terminology (Microsoft Entra ID, not Azure Active Directory), and the latest Fabric integration innovations announced March 2026.
Key Measured Outcomes
| Dimension | Without Purview | With Purview (Measured) |
| Data Discovery | Manual inventory; 40–60% assets undocumented | 95%+ automated classification across 200+ source types |
| Time to Compliance Audit | 6–12 weeks manual preparation | 2–3 days with automated evidence collection |
| Data Breach Detection | Mean time 197 days (IBM 2024) | Near-real-time DLP alerts and sensitivity label enforcement |
| AI Data Risk Visibility | 0% — no tooling for agent/AI data flows | Full AI Observability via DSPM; agent risk scoring and remediation |
| Data Consumer Productivity | Avg 4.2 hours/week searching for trusted data | 68% reduction in data search time (customer benchmark) |
| Governance TCO (3-year) | Distributed tools: $2.1M–$4.8M | Purview consolidation: $800K–$1.6M (45–65% reduction) |
| Key Insight: Microsoft Purview is not merely a data catalog — it is an integrated governance, risk, and compliance (GRC) platform that creates a closed-loop governance operating model. As of 2026, it is also your primary control plane for AI data risk. Organizations that treat it as only a catalog leave 60% of its capability untapped. |
Chapter 1: The Data Governance Imperative
1.1 Why Governance Fails Without Architecture
Most governance programs fail not because of lack of intent, but because of architectural sprawl. Governance teams operate spreadsheets, data stewards work in isolation, and lineage is tracked manually if at all. The root causes are structural:
- Federated data ownership without centralized metadata: Business units own data but metadata lives nowhere.
- Tool fragmentation: Organizations accumulate Collibra, Informatica, Alation, Apache Atlas, and custom wikis — each partial, none authoritative.
- Classification as a one-time project: Point-in-time inventories that decay immediately as new data arrives.
- Compliance as reactive audit response: Evidence collection happens at audit time, not continuously.
- AI data flows with no governance: Copilot prompts, agent responses, and AI-generated data traverse the estate invisibly — a new and rapidly growing gap.
1.2 The 2026 Regulatory Landscape
| Regulation | Key Requirement | Purview Capability | Implementation Evidence |
| GDPR (EU) | Data subject rights, consent tracking, cross-border controls | Data Map classification, Subject Rights Requests, sensitivity labels | Automated PII detection; SRR workflow audit trail |
| HIPAA (US) | PHI identification, access controls, audit logging | Custom PHI classification rules, policy enforcement, access reviews | Classification report; access policy audit log export |
| CCPA (California) | Consumer data inventory, opt-out rights | Data estate inventory export, lineage documentation | Asset inventory report; consent metadata tagging |
| PCI-DSS v4.0 | Cardholder data scoping, encryption, access logging | Sensitive info type: Credit Card, DLP policies, encryption insights | Data map export; DLP incident reports |
| EU AI Act | AI system risk classification, data quality for AI training | DSPM AI Observability, Unified Catalog data quality, agent governance | Agent risk inventory; data quality scan reports |
| ISO 27001:2022 | Information classification, asset inventory, supplier management | Full data catalog, sensitivity labels, third-party scanner integration | Control mapping export from Compliance Manager |
| War Story: A global bank with 47 data sources spent 11 weeks preparing for a GDPR audit. After deploying Purview with automated scanning, the same audit was prepared in 4 days — with full lineage documentation. Annual compliance preparation cost reduced from $1.8M to $340K. |
Chapter 2: Microsoft Purview — Platform Architecture (2026)
2.1 Architectural Overview
Microsoft Purview is delivered as a cloud-native SaaS service. As of 2026, its architecture has expanded to five primary planes:
| Plane | Components | Primary Function |
| Data Map Plane | Automated scanning, classification engine, lineage collector, Atlas-compatible metadata store | Discovery, classification, and relationship mapping across all data sources |
| Unified Catalog Plane (GA 2025) | Search index, glossary engine, data products, automated access workflows, data quality tools | Self-service discovery, business context, data access, and quality management for consumers |
| Governance Insights Plane | Estate health dashboards, sensitivity coverage reports, stewardship metrics, Data Estate Insights | Measurement, reporting, and continuous improvement of governance program |
| Compliance & Protection Plane | Information Protection (sensitivity labels), DLP engine, Compliance Manager, eDiscovery, Audit | Regulatory compliance, data protection, legal hold, and investigation |
| DSPM & AI Governance Plane (New 2025/26) | Data Security Posture Management, AI Observability, Agent Risk Management, Security Copilot integration | Visibility and remediation of data risks across human and AI agent activity |
2.2 Identity & RBAC Architecture
Purview uses Microsoft Entra ID (formerly Azure Active Directory, renamed October 2023) for authentication and implements its own RBAC model on top.
| Purview Role | Scope | Permissions | Recommended Assignment |
| Collection Admin | Per-collection | Manage sub-collections, assign roles within scope | Business unit data governance leads |
| Data Source Admin | Per-collection | Register and manage data sources, create scan rule sets | Data engineering team leads |
| Data Curator | Per-collection | Edit metadata, apply glossary terms, manage classifications | Data stewards, domain data owners |
| Data Reader | Per-collection | Read-only access to catalog, lineage, and classifications | Data consumers, analysts, report developers |
| Insights Reader | Account-level | Access Data Estate Insights dashboards | CDO, governance program manager |
| Policy Author | Account-level | Create and publish data access policies | Security architects, data governance lead |
Chapter 3: Data Map — Discovery & Classification at Scale
3.1 Scanning Architecture
The Data Map’s scanning engine is the foundation of Purview governance. Understanding scan architecture is critical to building a reliable governance program. Three scan execution models are available:
- Managed Virtual Network (MVNet) — Recommended: Purview manages the integration runtime within a Microsoft-managed VNet. No infrastructure to deploy. Best for most Azure-native deployments.
- Self-Hosted Integration Runtime (SHIR): Customer-deployed VM running the Purview runtime agent. Required for on-premises sources, private network sources, and non-Azure cloud sources.
- Azure Integration Runtime (AIR): Used for public-endpoint Azure sources. Not recommended for sensitive environments.
SHIR Sizing Guide
| Data Volume (Assets) | CPU | RAM | Network Bandwidth | Node Count |
| < 100K assets | 4 vCPU | 8 GB | 100 Mbps | 1 (no HA) |
| 100K – 1M assets | 8 vCPU | 16 GB | 1 Gbps | 2 (active-active HA) |
| 1M – 10M assets | 16 vCPU | 32 GB | 10 Gbps | 4 (2 + 2 failover) |
| > 10M assets | 32 vCPU | 64 GB | 10 Gbps dedicated | 8+ (scale-out cluster) |
3.2 Supported Data Sources
| Source Category | Supported Sources | Lineage Support | Classification |
| Azure Data | ADLS Gen1/Gen2, Azure Blob, Azure SQL DB, SQL MI, Synapse, Cosmos DB, PostgreSQL, MySQL | Yes (native connectors) | Full (all classification types) |
| Microsoft Fabric | Fabric Lakehouse, Fabric Warehouse, Dataflows, Power BI datasets/reports | Yes (deep integration, column-level) | Full, bidirectional label sync |
| On-Premises | SQL Server 2012+, Oracle 12c+, SAP HANA, Teradata, HDFS | SQL Server: Yes. Others: Limited | Full classification |
| Multi-Cloud | AWS S3, AWS RDS, Google BigQuery, GCS, Snowflake | Limited (no native lineage) | Full classification |
| Third-Party (DSPM) | Salesforce (via Varonis), Databricks (via BigID), Snowflake (via Cyera), GCP (via OneTrust) | Via partner connectors into DSPM | Classification via partner signals |
| SaaS & Office 365 | SharePoint Online, Exchange, Teams, OneDrive | N/A (unstructured) | Full M365 sensitivity label integration |
Chapter 4: Unified Catalog — Enterprise Search, Lineage & Data Quality
4.1 Unified Catalog (Generally Available, 2025)
The Microsoft Purview Unified Catalog reached General Availability in late 2025, consolidating data discovery into a single experience and replacing the previous bifurcated catalog model. Key advances over the prior catalog:
- Automated access workflows replace manual approval chains for data product access requests and glossary term publishing.
- Built-in data quality tools: measure, monitor, and remediate issues such as incomplete records, inconsistencies, and redundancies.
- Critical Data Column table: new self-service analytics capability allowing users to report glossary terms and concepts associated with data asset columns.
- Data quality error record publishing to cloud storage: generally available in all supported Azure regions, enabling dashboards and continuous improvement tracking.
- Integration with external catalogs: Fabric OneLake, Databricks Unity Catalog, and Snowflake Polaris metadata can be unified into a single view.
4.2 Business Glossary Design
A well-designed glossary is the semantic backbone of the catalog. Flat glossaries fail at scale — a 2,000-term flat list is unsearchable. Structure terms in a parent-child hierarchy:
- L1 — Domain: Customer, Product, Finance, Risk, Operations, HR
- L2 — Subdomain: Customer → Prospect, Active, Churned
- L3 — Concept: Customer → Active → Customer Lifetime Value, Net Promoter Score
- L4 — Attribute: CLV → Predicted CLV (12-month), Actual CLV (trailing 12-month)
| Term Status | Meaning | Who Sets It | Catalog Behavior |
| Draft | Term being developed; not yet authoritative | Term authors (data stewards) | Discoverable but not recommended for use |
| Approved | Reviewed and endorsed by domain owner | Domain data owner | Shown as authoritative in search results |
| Deprecated | Term being replaced; avoid new usage | Governance team | Shown with deprecation warning; redirects to replacement term |
| Expired | Term no longer valid; historical reference | Governance program manager | Hidden from default search; accessible via filter |
4.3 Lineage Architecture
Data lineage answers the questions that matter most: ‘Where does this metric come from?’, ‘What would break if we changed this table?’, ‘How was this data transformed?’
- Automated lineage (preferred): Purview automatically extracts lineage from ADF, Synapse Spark, Synapse Pipelines, Fabric Dataflows, and Power BI. Zero code required.
- SQL-based lineage parsing: Purview parses stored procedures, views, and CTAS statements for column-level lineage. Supports Azure SQL Database, Synapse Dedicated Pool, SQL Server.
- Custom lineage via Atlas API: For dbt, custom Spark jobs, Informatica, Talend — lineage submitted programmatically via the Apache Atlas REST API.
- Fabric lineage (recommended 2025+): Column-level lineage through Lakehouse, Warehouse, Dataflows, and Power BI reports in a single unbroken chain.
| Lineage Troubleshooting: If lineage gaps appear between Lakehouse and Warehouse, ensure Fabric Warehouse is using shortcuts to Lakehouse (not COPY INTO). COPY INTO breaks automated lineage — use Lakehouse shortcuts or Dataflows instead. |
Chapter 5: Data Security Posture Management & AI Governance
This chapter covers what is arguably the most significant expansion of Purview in 2025–2026: Data Security Posture Management (DSPM) and AI governance capabilities. These address risks that traditional data governance tools were never designed to handle.
5.1 Why AI Governance Requires New Capabilities
As organizations adopt Microsoft 365 Copilot, Copilot Studio agents, and Azure AI Foundry models, entirely new data risk vectors emerge:
- 86% of organizations lack visibility into what data flows through AI systems (2025 Microsoft survey).
- 40% of data security incidents now occur within AI applications (Microsoft research, 2025).
- 78% of AI users bring their own AI tools to work — creating ‘Shadow AI’ exposure.
- AI agents operate autonomously and access large volumes of sensitive data, creating risk profiles tied to behavior, not just identity.
5.2 Data Security Posture Management (DSPM)
The new DSPM experience (public preview December 2025, GA target April 2026) unifies the previous DSPM classic and DSPM for AI classic experiences into a single, outcome-based platform:
- Outcome-based guided workflows: Choose a data security objective and receive step-by-step remediation guidance.
- AI Observability: A dedicated inventory of all AI apps and agents — including first-party (Copilot Studio, Azure AI Foundry), third-party, and custom-built agents — with activity in the last 30 days, risk levels, and sensitive interaction counts.
- Item-level remediation: Bulk disable overshared SharePoint links, apply sensitivity labels, and activate protection policies directly from DSPM.
- External platform visibility: Third-party signals from Salesforce (Varonis), Databricks (BigID), Snowflake (Cyera), and Google Cloud Platform (OneTrust) surface in a unified view via Microsoft Sentinel Data Lake.
- Advanced reports: Instant visibility into sensitivity label coverage, DLP policy activity, and posture trends with drill-down filters.
5.3 AI Agent Governance
AI agents are now first-class entities in Purview’s governance model — not afterthoughts:
| Capability | What It Does | Applies To |
| AI Observability | Inventory of all AI apps and agents; risk level assignment per agent; sensitive interaction count | Copilot Studio, Azure AI Foundry, third-party agents, Agent 365 |
| Agentic Risk in IRM | Agent-specific risk indicators detect unauthorized data access and anomalous behaviors | All agents with M365 access |
| DLP for Agents | Agents inherit DLP protections — prevented from accessing labeled files or sending sensitive data via Teams | First-party and Copilot Studio agents |
| Communication Compliance | Detects non-compliant activity in human-agent interactions; proactive policy-based governance | All agent interactions in M365 |
| eDiscovery & Audit | Agent prompt/response retention, deletion policies, and legal hold extended to agent interactions | All M365-connected agents |
| Risky Agents Policy Template | IRM template detects anomalous agent behaviors including exfiltration patterns | Copilot Studio and Microsoft Foundry agents |
| Key Architectural Principle: Purview treats AI agents as data principals — they inherit the same protections as human users. A Highly Confidential labeled file cannot be accessed by an agent any more than by an unauthorized human. This governance-by-design approach eliminates the ‘Shadow AI’ data exposure gap. |
Chapter 6: Data Policy & Access Governance
6.1 Purview Data Policy Architecture
Purview’s Data Policy capability represents a fundamental shift from infrastructure-level ACLs managed by engineers to business-level policies managed by data owners and governance teams. A Purview data access policy states: ‘Users in group X can perform action Y on data assets matching classification Z.’
6.2 Policy Types
- Data owner policies: Grant read or read/modify access to Azure Storage, ADLS Gen2, Azure SQL, and Fabric without involving the infrastructure team.
- DevOps policies: Grant SQL performance monitoring access (VIEW DATABASE STATE) to DevOps engineers without granting data read permissions.
- Self-service data access policies: Data consumers request access through the Unified Catalog. Automated workflow routes to data owner. Access provisioned or rejected with full audit trail.
- Attribute-based access control (ABAC): Grant access based on asset classifications rather than specific named assets. New assets automatically inherit correct policies as they are classified.
6.3 Policy Enforcement Architecture
| Data Source | Policy Enforcement Point | Latency to Enforce | Granularity |
| ADLS Gen2 | Azure Storage RBAC (via Purview policy propagation) | < 5 minutes | Container, folder, file level |
| Azure SQL Database | SQL permissions (system-managed) | < 2 minutes | Database, schema, table level |
| Microsoft Fabric | Fabric workspace and item permissions | < 10 minutes | Workspace, lakehouse, table level |
| Azure Synapse Analytics | Synapse workspace RBAC | < 5 minutes | Workspace, pool level |
| Fabric Warehouse/KQL DBs (New GA) | DLP policy tip triggering on sensitive data upload | Near real-time | Asset-level with sensitive data detection |
| Architectural Constraint: Purview data policies do NOT replace row-level security (RLS), column masking, or Dynamic Data Masking (DDM) in SQL databases. Purview policies govern who can connect and query. RLS/DDM governs what data they see within an allowed connection. Both layers are required for complete access governance. |
Chapter 7: Information Protection & DLP
7.1 Sensitivity Label Taxonomy
Sensitivity labels are the governance primitive that spans cloud storage, databases, Office documents, Teams messages, third-party applications, and — as of 2026 — AI agent interactions. The taxonomy must balance usability with enforcement precision. More than 10 labels typically causes label fatigue.
| Label | Definition | Protection Actions | Auto-labelling Trigger |
| Public | Approved for external publication. No restrictions. | None | No sensitive classifications detected |
| Internal | Business information for employee use. Not for public sharing. | Watermark on documents | Default label applied to all unlabelled items |
| Confidential | Sensitive business data. External sharing requires approval. | Encryption, external sharing DLP block, watermark | PII classification, financial data patterns |
| Highly Confidential | Regulated data, trade secrets, executive communications. | Encryption, download restrictions, MFA for access, audit logging | PHI, PCI data, credentials, classified IP |
| Restricted | Legal hold, regulatory investigation, M&A sensitive. | Encryption, access list restricted to named individuals, no forwarding | Legal trigger or manual assignment only |
7.2 DLP Policy Architecture (2026)
As of 2026, Purview DLP has been restructured (the table of contents for DLP documentation was reorganized for clarity) and now explicitly covers three scenarios: protecting enterprise data, protecting enterprise data on devices, and inline data protection.
| DLP Scope | Trigger Condition | Action | Business Justification Override |
| Exchange Email | Highly Confidential label; external recipient | Block delivery; notify sender; generate incident | Yes — manager approval workflow |
| SharePoint/OneDrive | Confidential label; public sharing link created | Block link creation; notify user; generate incident | Yes — data owner approval |
| Teams Messages | Credit card number, SSN pattern in message | Block send; notify user; policy tip displayed | No — hard block (financial regulatory) |
| Fabric Warehouse (New GA) | Sensitive data detected in asset uploaded to Warehouse | Policy tip trigger; restrict access for KQL/SQL DBs | Admin configurable |
| AI Agents (New) | Agent attempts to access Highly Confidential labeled file | Block agent access; audit log entry; alert to admin | No — security team review required |
Chapter 8: Deployment Architecture & Operating Model
8.1 Deployment Architecture Patterns
- Pattern 1 — Centralized Governance (Single Account): Best for organizations with <50,000 data assets, single-geography operation, or strong central governance team. Simple operations, lower cost.
- Pattern 2 — Federated Governance (Hub-and-Spoke): Best for large multi-geography organizations with autonomous business units. Central CDO office hub; business unit spoke accounts synchronized via Purview metadata API.
- Pattern 3 — Domain-Aligned (Data Mesh): Best for organizations implementing Data Mesh. Each data domain owns its own Purview account. Enterprise governance sets standards; federated computational governance via shared glossary and classification taxonomy.
8.2 Governance Operating Model
| Role | Responsibilities | Time Commitment | Purview Role |
| Chief Data Officer | Set governance strategy, approve glossary, report metrics to board | 2–4 hrs/week | Insights Reader, Collection Admin (Root) |
| Data Governance Lead | Operate Purview program, manage stewards, evolve policies | Full time | Collection Admin, Policy Author |
| Domain Data Owner | Own data quality for domain, approve certifications and access requests | 4–8 hrs/week | Data Curator (domain collection) |
| Data Steward | Enrich metadata, link glossary terms, resolve classification issues, review flagged assets | 50–100% FTE | Data Curator |
| Data Engineer | Register data sources, configure scans, build custom lineage integration | 20% allocation | Data Source Admin |
| AI Governance Analyst (New Role) | Monitor AI agent risk scores in DSPM, review AI Observability reports, manage agentic risk policies | 20–50% FTE | Insights Reader + Security Admin |
| Operating Model Insight: A Purview deployment without assigned Data Stewards is a catalog that fills with metadata but never becomes trusted. For a 50,000 asset estate, plan for 2–3 full-time stewards in year one. Automation can increase effective capacity to 1 FTE per 100,000 assets at maturity. For organizations deploying Copilot or AI agents, an AI Governance Analyst role is now essential — this is not optional in the AI era. |
Chapter 9: Scale, Performance & Cost Optimization
9.1 Capacity Planning
| Resource | Scale Limit (2025) | Recommendation |
| Assets in Data Map | 100 million assets per account | For >80M assets, begin planning federated architecture |
| Registered sources | 3,000 sources per account | Consolidate similar source types into single registered sources where possible |
| Concurrent scans | 100 concurrent scan runs | Use scan scheduling to avoid peak concurrency; prioritize by source criticality |
| Glossary terms | 100,000 terms per account | Maintain term hygiene; deprecate unused terms quarterly |
| Collections | 256 collections per account | Design flat-ish hierarchies; max 4–5 levels for most organizations |
| Custom classification rules | 500 per account | Consolidate similar patterns; use regex groups over multiple single patterns |
9.2 Scan Performance Optimization
| Optimization Lever | Description | Performance Impact | Tradeoff |
| Incremental scanning | Scan only new/modified assets since last scan using watermark-based detection | 60–80% reduction in scan time for stable sources | May miss classification changes on unmodified assets |
| Targeted scanning | Scope scans to specific folders, schemas, or file patterns | 40–70% faster | Requires good source naming conventions |
| Classification rule optimization | Use targeted rule sets per source type; reduce rules per scan rule set | 30–50% faster per scanned asset | Requires maintaining multiple scan rule sets |
| Off-hours scheduling | Schedule large scans for 2–6 AM to avoid competing with production workloads | No throughput gain but avoids source contention | Delayed freshness; not suitable for compliance triggers |
9.3 Cost Optimization
Purview pricing is based on Data Map capacity units (CUs), scan compute, and Microsoft 365 Compliance licensing. A new pay-as-you-go pricing model (available alongside the Suite license) covers data estates, analytics, and AI apps — use the DSPM Usage Center to track consumption per investigation and avoid over-provisioning.
| Cost Component | Billing Model | Optimization Strategy |
| Data Map capacity units | $0.496/CU/hour (1 CU = 1GB metadata storage + processing capacity) | Incremental scans reduce CU consumption by 60–75% |
| Scan compute | Billed per vCore-hour for SHIR; Managed VNet included in CUs | Right-size SHIR VMs; schedule to minimize runtime; use MVNet where possible |
| M365 Compliance (DLP, Labels) | Included in M365 E5 or E5 Compliance add-on | Audit license assignments; unused Compliance seats are common waste |
| DSPM & AI Governance (New) | Pay-as-you-go; Data Security Investigation Compute Units (DSICUs) replaced SCUs | Use the Usage Center dashboard to track per-investigation consumption |
| Defender for Cloud integration | Charged per resource per hour | Enable only for in-scope regulated workloads |
Chapter 10: Real-World Case Studies
Case Study 1: Global Financial Services Firm — GDPR & PCI Compliance Transformation
| Dimension | Details |
| Organization | Pan-European retail bank, 12,000 employees, 85 data sources |
| Challenge | GDPR audit failed in 2022 due to inability to demonstrate PII data inventory. €2.3M fine issued. Compliance team spent 14 weeks per audit cycle manually documenting data assets. |
| Purview Scope | Data Map (85 sources), full estate classification, GDPR & PCI assessments in Compliance Manager, sensitivity labels across M365 and Azure Storage, DLP policies for credit card and IBAN patterns |
| Timeline | 16 weeks to full production deployment across all 85 sources |
| Metric | Before Purview | After Purview (12 months) |
| Compliance audit preparation time | 14 weeks manual | 6 days automated |
| PII coverage (classified assets) | 23% (manual inventory) | 94% (automated) |
| Sensitivity label coverage | 8% (M365 only) | 87% across Azure + M365 |
| GDPR SRR response time | 28 days | 4 days |
| Annual compliance staffing cost | £1.2M (12 FTE) | £380K (3 FTE + automation) |
| Purview TCO (Year 1) | — | £420K (all-in) |
Case Study 2: Healthcare Network — PHI Governance & HIPAA Continuous Compliance
| Dimension | Details |
| Organization | US regional hospital network, 22 hospitals, 6,500 clinical staff, 140TB of health data across Azure and on-premises |
| Challenge | Inability to demonstrate minimum-necessary access principle for PHI (HIPAA §164.514). Multiple breaches of PHI to non-clinical staff through misconfigured Power BI reports. |
| Purview Scope | Healthcare-specific classification rules (34 custom PHI types), Data Map across Epic EHR integration layer + Azure SQL + ADLS Gen2, Purview policy for PHI access restriction, DLP to block PHI in Teams/email, Compliance Manager HIPAA assessment |
| Key Result | Classification accuracy validated at 96.3% against 10,000 manually labelled records. First external HIPAA audit post-deployment: no significant findings. |
Case Study 3: Retail Enterprise — AI-Ready Data Mesh with Fabric & Purview (2026)
A FTSE 100 retailer with 8 data domains implemented a Data Mesh on Microsoft Fabric. The governance challenge evolved in 2025: beyond interoperability and trust, they needed to govern Copilot-powered analytics agents accessing domain-owned data products.
- Deployed DSPM AI Observability to inventory 47 AI agents accessing the Fabric estate — 12 were flagged as high-risk due to oversharing patterns.
- Applied DLP policies to Fabric Warehouse and KQL DBs (newly GA) to prevent sensitive data leakage through Copilot agent responses.
- Insider Risk Management extended to Fabric lakehouses with built-in risk indicators for potential data exfiltration by agents.
| KPI | Target | Achieved (Month 12) |
| Data products certified | 80% of published products | 84% |
| Cross-domain data access time | < 3 business days | 1.2 days average |
| AI agent data risk incidents resolved | < 5/month | 2.1/month average |
| Time to identify root cause of cross-domain data issue | < 4 hours | 47 minutes average |
Chapter 11: 90-Day Implementation Roadmap
Based on 20+ Purview deployments, the following 90-day roadmap represents the optimal sequencing for enterprise governance programs in 2026 — incorporating AI governance activation alongside traditional catalog and classification work.
Phase 1: Foundation (Days 1–30)
| Week | Activity | Owner | Success Criteria |
| 1 | Purview account provisioning, network design (MVNet vs SHIR), Microsoft Entra ID group creation, collection hierarchy design | Data Engineer + Architect | Purview account live; network connectivity validated; collection hierarchy approved |
| 1–2 | Source inventory: document all data sources including AI systems and Copilot deployments | Data Governance Lead | Complete source inventory; sources prioritized by regulatory risk; AI systems catalogued |
| 2 | Glossary foundation: identify 50–100 core business terms per domain with definitions, owners, and related terms | Data Governance Lead + Domain Owners | Core glossary terms in Draft status; term owners assigned |
| 2–3 | Register and scan Tier 1 sources (PCI scope, PHI scope, GDPR-critical sources) | Data Engineer | Tier 1 sources scanned; classification results reviewed; false positive rate <5% |
| 3–4 | Classification review and tuning: build custom rules for organizational patterns | Data Steward + Data Engineer | Custom classification rules deployed; accuracy >90% on validation set |
| 4 | RBAC configuration: assign roles to domain teams using Microsoft Entra ID groups | Data Governance Lead | All roles assigned; domain teams can access catalog; engineers can register sources |
Phase 2: Activation (Days 31–60)
| Week | Activity | Owner | Success Criteria |
| 5–6 | Register and scan all remaining data sources; configure incremental scan schedules | Data Engineer | >90% of data estate registered and scanned; scan schedule operational |
| 6 | Quick win: publish Governance Maturity Score dashboard in Power BI; present to leadership | Data Governance Lead | Dashboard live; leadership briefing completed; program funded for Phase 3 |
| 6–7 | Sensitivity label deployment: configure taxonomy; deploy auto-labelling policies in simulation mode | Security + Data Governance | Labels published; simulation mode running; simulation report reviewed |
| 7–8 | Enable DSPM: activate AI Observability; inventory all AI apps and agents; assign initial risk levels | Security + AI Governance Analyst | AI agent inventory complete; high-risk agents identified; initial DSPM posture established |
| 7–8 | Lineage validation: verify ADF, Synapse, Fabric lineage; build custom lineage via Atlas API for non-native sources | Data Engineer | End-to-end lineage visible for >3 critical pipelines; column-level lineage for Power BI reports |
| 8 | Compliance Manager setup: create GDPR, HIPAA, or relevant regulatory assessments | Compliance Officer | At least one regulatory assessment active; initial compliance score baseline established |
Phase 3: Optimization (Days 61–90)
| Week | Activity | Owner | Success Criteria |
| 9–10 | Enable sensitivity labels in production (exit simulation mode); deploy DLP policies including Fabric Warehouse DLP | Security + Data Governance | Labels applying to new content; DLP incidents in dashboard; <10% false positive rate |
| 10–11 | AI governance policies: configure IRM Risky Agents template; deploy agent-specific DLP; establish AI data access policies | AI Governance Analyst + Security | Risky agent policies active; agent DLP blocking tested; AI governance posture report generated |
| 11 | Glossary completion: approve Tier 1 glossary terms; link to classified assets via bulk assignment | Data Steward | >80% of Tier 1 assets linked to at least one approved glossary term |
| 12 | Program review: measure Governance Maturity Score vs. Week 1 baseline; document lessons; plan 90–180 day roadmap | CDO + Data Governance Lead | Governance score improvement documented; 90–180 day roadmap approved; operating model confirmed |
| Critical Success Factor: Governance programs that fail typically do so in Days 31–60 — the ‘activation phase.’ Quick wins must be demonstrated by Day 45 to maintain organizational momentum. The Power BI Governance Maturity Score dashboard is designed as this early value demonstration. In 2026, demonstrating AI agent governance to leadership is an equally powerful motivator for program continuation. |
Appendix: Governance Maturity Model & Quick Reference
A. Governance Maturity Model
| Level | Name | Characteristics | Target Score | Typical Timeline |
| L1 | Initial | Ad hoc governance; no systematic catalog; manual compliance; governance by tribal knowledge | 0–20 | Starting point |
| L2 | Managed | Data sources registered and scanned; basic classification applied; glossary under development; ownership partially assigned | 20–50 | 0–6 months post-deployment |
| L3 | Defined | Full estate classification; glossary approved and linked; lineage documented; compliance assessments active | 50–70 | 6–12 months post-deployment |
| L4 | Quantitatively Governed | Governance Maturity Score tracked weekly; stewardship SLAs enforced; access policies active; DLP protecting sensitive data; AI agent inventory established | 70–85 | 12–24 months |
| L5 | Optimizing | Automated certification; continuous compliance; AI-assisted stewardship; full AI governance with DSPM; governance embedded in CI/CD pipelines | 85–100 | 24–36 months |
B. Key Purview REST API Reference
| Operation | Method | Endpoint | Use Case |
| List collections | GET | /account/collections | Audit collection hierarchy; governance reports |
| Get asset by qualified name | GET | /catalog/api/atlas/v2/entity/uniqueAttribute/type/{typeName} | Look up specific asset metadata in automation |
| Update asset contacts | PUT | /catalog/api/atlas/v2/entity/guid/{guid}/businessattribute/Contacts | Bulk owner assignment in onboarding automation |
| Submit lineage | POST | /catalog/api/atlas/v2/entity/bulk | Custom lineage for non-native sources (dbt, custom ETL) |
| Run scan | POST | /scan/datasources/{dsName}/scans/{scanName}/runs | Trigger scan on-demand from CI/CD on schema change |
| Create glossary term | POST | /catalog/api/atlas/v2/glossary/term | Bulk glossary population from existing business dictionaries |
| Get DSPM agent inventory (New) | GET | /security/dspm/agents | List all AI agents with risk levels and sensitive interaction counts |
C. Glossary of Key Terms
| Term | Definition |
| Apache Atlas | Open-source metadata management framework; the foundational metadata model underlying Purview’s Data Map |
| Business Glossary | Curated vocabulary of business terms linked to data assets; provides semantic context and shared language |
| Collection | Hierarchical container in Purview that scopes metadata, access control, and policy enforcement |
| Data Lineage | Documentation of data origin, movement, and transformation — tracing how data flows from source to consumption |
| DSPM | Data Security Posture Management — Purview’s unified plane for discovering, protecting, and investigating data risks across traditional and AI workloads |
| AI Observability | DSPM capability providing an inventory of all AI apps and agents, their risk levels, and sensitive data interactions |
| Microsoft Entra ID | The current name for Azure Active Directory (rebranded October 2023). All Purview documentation and configurations should use this term. |
| Unified Catalog | GA feature (2025) consolidating data discovery, data quality, automated access workflows, and glossary management into a single experience |
| OpenLineage | Open standard for data lineage metadata; used by Purview Spark connector to emit lineage from Spark jobs |
| Sensitivity Label | Classification tag applied to data assets and documents that drives downstream protection actions across the entire Microsoft ecosystem |
