MICROSOFT PURVIEW

October 30, 2025 Puneet No comments yet

Data Governance at Scale

Enterprise Architecture, Implementation Patterns & Operational Excellence

Data Platform Practice | Microsoft Ecosystem Veratas | 2025 Edition

Classification: Public | Version 1.0 | 2025

Executive Summary

Data is the most valuable asset enterprises possess — yet for most organizations, it remains ungoverned, undiscovered, and untrustworthy. Regulatory scrutiny has never been higher. GDPR fines exceeded €4.2 billion in 2023. The average cost of a data breach reached $4.45 million in 2024 (IBM). And yet, surveys consistently find that fewer than 30% of enterprise data assets are formally catalogued or classified.

Microsoft Purview addresses this challenge head-on. Released as a unified governance platform in 2022 (consolidating Azure Purview and Microsoft 365 Compliance Center), Purview provides organizations with a single control plane for data discovery, classification, cataloguing, lineage tracking, policy enforcement, information protection, and regulatory compliance — spanning on-premises, multi-cloud, and SaaS environments.

This white paper is written for data architects, Chief Data Officers, compliance leaders, and senior engineers who need to move beyond theoretical governance frameworks toward a production-grade, measurable, and operational governance program. It draws on real-world deployments across financial services, healthcare, retail, and public sector organizations.

Key Findings

Dimension	Without Purview	With Purview (Measured Outcomes)
Data Discovery	Manual inventory; 40-60% of assets undocumented	95%+ automated classification across 200+ source types
Time to Compliance Audit	6-12 weeks manual preparation	2-3 days with automated evidence collection
Data Breach Detection	Mean time 197 days (IBM 2024)	Near-real-time DLP alerts, sensitivity label enforcement
Data Consumer Productivity	Avg 4.2 hours/week searching for trusted data	68% reduction in data search time (customer benchmark)
Regulatory Fine Risk	Unquantified exposure	Documented control framework mapped to GDPR, HIPAA, ISO 27001
Governance TCO (3-year)	Distributed tools: $2.1M-$4.8M	Purview consolidation: $800K-$1.6M (45-65% reduction)

Key Insight: Microsoft Purview is not merely a data catalog — it is an integrated governance, risk, and compliance (GRC) platform that creates a closed-loop governance operating model. Organizations that treat it as only a catalog leave 60% of its capability untapped.

CHAPTER 1

The Data Governance Imperative

1.1 Why Governance Fails Without Architecture

Most governance programs fail not because of lack of intent, but because of architectural sprawl. Governance teams operate spreadsheets, data stewards work in isolation, and lineage is tracked manually if at all. The root causes are structural:

Federated data ownership without centralized metadata: Business units own data but metadata lives nowhere. No single system answers “where does this sales figure come from?”
Tool fragmentation: Organizations accumulate Collibra, Informatica, Alation, Apache Atlas, and custom wiki pages — each partial, none authoritative.
Classification as a one-time project: Data classification efforts produce point-in-time inventories that decay immediately as new data arrives and pipelines evolve.
Compliance as reactive audit response: Evidence collection happens at audit time, not continuously — creating gaps, inconsistencies, and audit fatigue.
Data access governed by infrastructure, not policy: Access control lives in storage ACLs, database permissions, and firewall rules — not in business-level policies aligned to data sensitivity.

1.2 The Regulatory Landscape

The regulatory environment demands systematic, demonstrable data governance. The following table maps key regulations to specific Purview capabilities:

Regulation	Key Requirement	Purview Capability	Implementation Evidence
GDPR (EU)	Data subject rights (Art. 15-22), consent tracking, cross-border transfer controls	Data Map classification, Subject Rights Requests module, sensitivity labels	Automated PII detection; SRR workflow audit trail
HIPAA (US)	PHI identification, access controls, audit logging, minimum necessary standard	Custom classification rules for PHI patterns, policy enforcement, access reviews	Classification report; access policy audit log export
CCPA (California)	Consumer data inventory, opt-out rights, data sale disclosure	Data estate inventory export, lineage documentation	Asset inventory report; consent metadata tagging
PCI-DSS v4.0	Cardholder data environment scoping, encryption, access logging	Sensitive info type: Credit Card, encryption insights, DLP policies	Data map export; DLP incident reports
ISO 27001:2022	Information classification, asset inventory, supplier management	Full data catalog, sensitivity labels, third-party scanner integration	Control mapping export from Compliance Manager
SOC 2 Type II	Continuous monitoring, change management, logical access	Policy enforcement, continuous compliance score, activity logs	Evidence packages from Compliance Manager

1.3 The Microsoft Purview Value Proposition

Unlike point solutions, Microsoft Purview delivers governance across the full data estate — from structured SQL databases to unstructured SharePoint documents, from on-premises SQL Server to Google BigQuery. Its value derives from three architectural properties:

Unified metadata fabric: A single metadata layer connects every data source, creating one authoritative catalog regardless of where data lives.

Automated classification at scale: Machine learning-based scanning continuously discovers and classifies data without manual tagging — critical when estates contain billions of assets.

Policy-to-enforcement bridge: Governance policies defined in Purview propagate to enforcement points in Azure Storage, SQL, Fabric, Power BI, and Microsoft 365 — eliminating the gap between policy documents and actual access control.

War Story: A global bank with 47 data sources and 12 governance tools spent 11 weeks preparing for a GDPR audit. After deploying Purview with automated scanning across all sources, the same audit was prepared in 4 days — with higher confidence and full lineage documentation. Annual compliance preparation cost reduced from $1.8M to $340K.

CHAPTER 2

Microsoft Purview: Platform Architecture

2.1 Architectural Overview

Microsoft Purview is delivered as a cloud-native SaaS service with no infrastructure to manage. Its architecture comprises four primary planes:

Plane	Components	Primary Function
Data Map Plane	Automated scanning, classification engine, lineage collector, Atlas-compatible metadata store	Discovery, classification, and relationship mapping across all data sources
Data Catalog Plane	Search index, glossary engine, data products, collections framework, stewardship workflows	Self-service discovery, business context, and data access for consumers
Governance Insights Plane	Estate health dashboards, sensitivity coverage reports, stewardship metrics, Data Estate Insights	Measurement, reporting, and continuous improvement of governance program
Compliance & Protection Plane	Information Protection (sensitivity labels), DLP engine, Compliance Manager, eDiscovery, Audit	Regulatory compliance, data protection, legal hold, and investigation

2.2 The Metadata Architecture

At the core of Purview is Apache Atlas-compatible metadata store, extended significantly by Microsoft. Every entity in Purview — whether a SQL table, Power BI report, ADF pipeline, or SharePoint document — is represented as a metadata entity with:

Entity type definition: Structural schema (e.g., azure_sql_table has columns, a database parent, and connection properties)
Attribute payload: Business metadata, technical metadata, system metadata, and user-defined custom attributes
Relationships: Parent-child containment (server → database → schema → table → column), lineage edges (source → transformation → target), and semantic links (term → asset)
Labels and classifications: Applied automatically by the classification engine or manually by stewards
Contacts: Owners and experts assigned to assets for stewardship accountability

Entity Relationship Model

Entity Tier	Example Entities	Relationship Type	Governance Action
Collection	Root Collection, Business Unit Collections	Hierarchical container	Access control scope, policy boundary
Data Source	Azure SQL Server, ADLS Gen2 Account, Fabric Workspace	Registered source in Data Map	Scanning target, credential binding
Dataset	SQL Table, Parquet file, Power BI dataset, Synapse Dedicated Pool	Owned by data source	Classification, lineage, ownership, sensitivity
Column/Field	SQL column, file schema field, report column	Child of dataset	Column-level classification, PII detection
Process	ADF pipeline, Spark job, Power Query transformation	Lineage edge between datasets	Data flow tracking, impact analysis
Glossary Term	Customer ID, Revenue, Churn Rate	Semantic link to datasets/columns	Business context, certified definitions

2.3 Multi-Tenant & Collections Architecture

Purview’s Collections framework provides hierarchical organizational structure that governs both metadata access and policy scope. Collections are critical architectural decisions that cannot be changed without significant rework — they must be designed upfront.

Collection Design Patterns

Pattern 1 — Organizational Hierarchy: Root → Business Unit → Domain → Subdomain. Best for large enterprises with clear organizational governance. Enables business unit autonomy with enterprise-level roll-up reporting.

Pattern 2 — Data Zone Architecture: Root → Raw Zone → Curated Zone → Certified Zone → Restricted Zone. Best for data platform teams managing lakehouse tiers. Aligns collections to data quality and trust levels.

Pattern 3 — Regulatory Scope: Root → PCI Scope → PHI Scope → PII Scope → Internal. Best for compliance-driven organizations. Enables precise policy enforcement by regulatory classification.

Pattern 4 — Hybrid (Recommended for Enterprise): Combines organizational hierarchy at top level with regulatory scope sub-collections and data zone sub-collections. Most flexible but requires careful RBAC planning.

Design Principle: Collections cannot be deleted if they contain assets. Design your collection hierarchy before onboarding any data sources. A flat hierarchy is always easier to expand than a complex hierarchy is to flatten. Start with no more than 3 levels for organizations under 10,000 employees.

2.4 Identity & RBAC Architecture

Purview uses Azure Active Directory (Entra ID) for authentication and implements its own RBAC model on top. The key roles and their operational implications:

Purview Role	Scope	Permissions	Recommended Assignment
Collection Admin	Per-collection	Manage sub-collections, assign roles within scope	Business unit data governance leads
Data Source Admin	Per-collection	Register and manage data sources, create scan rule sets	Data engineering team leads
Data Curator	Per-collection	Edit metadata, apply glossary terms, manage classifications	Data stewards, domain data owners
Data Reader	Per-collection	Read-only access to catalog, lineage, and classifications	Data consumers, analysts, report developers
Insights Reader	Account-level	Access Data Estate Insights dashboards	CDO, governance program manager
Policy Author	Account-level	Create and publish data access policies	Security architects, data governance lead

CHAPTER 3

Data Map: Discovery & Classification at Scale

3.1 Scanning Architecture

The Data Map’s scanning engine is the foundation of Purview governance. Scans extract metadata, apply classification rules, collect lineage, and populate the catalog — continuously and at scale. Understanding scan architecture is critical to building a reliable governance program.

Scan Execution Models

Managed Virtual Network (MVNet) — Recommended: Purview manages the integration runtime within a Microsoft-managed VNet. No infrastructure to deploy. Supports private endpoint connectivity to sources. Best for most Azure-native deployments.

Self-Hosted Integration Runtime (SHIR): Customer-deployed VM running the Purview runtime agent. Required for: on-premises sources, private network sources without private endpoint support, and non-Azure cloud sources with strict network controls.

Azure Integration Runtime (AIR): Used for public-endpoint Azure sources. Simplest deployment model. Not recommended for sensitive environments where data should not traverse public internet, even transiently.

Self-Hosted Integration Runtime: Sizing Guide

Data Volume (Assets)	CPU Recommendation	RAM Recommendation	Network Bandwidth	Node Count
< 100K assets	4 vCPU	8 GB	100 Mbps	1 (no HA)
100K – 1M assets	8 vCPU	16 GB	1 Gbps	2 (active-active HA)
1M – 10M assets	16 vCPU	32 GB	10 Gbps	4 (2 + 2 failover)
> 10M assets	32 vCPU	64 GB	10 Gbps dedicated	8+ (scale-out cluster)

3.2 Classification Engine Deep Dive

Classification in Purview is applied through a layered engine that combines pattern matching, machine learning, context analysis, and custom rules. Understanding the engine’s precedence model prevents misclassification at scale.

Classification Execution Order

System classification rules (200+ built-in): Microsoft-maintained patterns for global PII types, financial data, health information, credentials, and national identifiers. Examples: Credit Card Number, SSN, IBAN, NHS Number, Email Address.
Custom classification rules: Organization-specific patterns defined via regex, dictionary matching, or column name pattern. Applied after system rules. Override behavior configurable per rule.
ML-based classification: Trained models for complex patterns that resist simple regex (e.g., detecting salary ranges, internally-coded identifiers). Runs after rule-based classification.
Propagated classifications: Column-level classifications propagate upward to table level, and table-level classifications contribute to dataset-level sensitivity scoring.

Custom Classification Rule Design

Custom rules are the most common source of governance debt. The following patterns prevent the most frequent issues:

— Example: Custom classification for internal Employee ID format (EMP-XXXXXXXX)

— Rule Type: Regular Expression

— Pattern: \bEMP-[0-9]{8}\b

— Minimum Match Threshold: 60% of sampled values must match

— Column Name Pattern: employee_id, emp_id, staff_id (case-insensitive)

— Data sampling behavior:

— Purview samples 128 rows per column by default

— Increase to 1000 rows for low-density sensitive data (e.g., executive records)

— Use “Full scan” for regulatory-critical sources (performance tradeoff: 3-8x slower)

— Anti-pattern: Setting threshold to 1% causes false positive explosion

— Anti-pattern: Using [0-9]{8} without word boundary causes substring matches

— Best practice: Test regex against 1000 representative values before deployment

3.3 Supported Data Sources: Complete Matrix

Source Category	Supported Sources	Metadata Extracted	Lineage Support	Classification
Azure Data	ADLS Gen1/Gen2, Azure Blob, Azure SQL DB, Azure SQL MI, Synapse Analytics, Cosmos DB, Azure Database for PostgreSQL/MySQL	Schema, statistics, partitions, resource metadata	Yes (native connectors)	Full (all classification types)
Microsoft Fabric	Fabric Lakehouse, Fabric Warehouse, Fabric Dataflows, Power BI datasets/reports	Schemas, measures, relationships, sensitivity labels	Yes (deep integration, column-level)	Full, bidirectional label sync
Power BI	Workspaces, datasets, reports, dataflows, dashboards	Dataset schema, report lineage, refresh history, endorsements	Yes (full report-to-source lineage)	Label push/pull from M365 Information Protection
Azure Data Factory	Pipelines, datasets, linked services, data flows	Pipeline structure, activity configuration	Yes (automated lineage extraction)	Via dataset classification
On-Premises	SQL Server 2012+, Oracle 12c+, SAP HANA, Teradata, HDFS	Full schema extraction	SQL Server: Yes. Others: Limited	Full classification on all
Multi-Cloud	AWS S3, AWS RDS, Google BigQuery, Google Cloud Storage, Snowflake	Schema and storage metadata	Limited (no native lineage; requires custom lineage API)	Full classification
SaaS Applications	Salesforce, SAP ECC, Erwin, Looker	Configured via connectors	Limited	Classification on connected datasets
Office 365	SharePoint Online, Exchange Online, Teams, OneDrive	Document metadata, content classification	N/A (unstructured)	Full M365 sensitivity label integration

3.4 Scan Rule Sets: Enterprise Design Patterns

Scan Rule Sets define which file types and classification rules apply to each scan. Enterprise-grade Scan Rule Sets require careful design to balance coverage, performance, and noise reduction.

// Recommended Scan Rule Set Strategy

// Rule Set 1: PCI Scope (used for financial data sources)

// – File types: Parquet, Delta, CSV, ORC, JSON

// – Classifications: Credit Card, CVV, Account Number, IBAN, SWIFT Code

// – Sampling: Full scan (not sampled) for audit compliance

// – Custom rules: Internal account number format regex

// Rule Set 2: PHI Scope (used for healthcare sources)

// – File types: HL7, CSV, Parquet, JSON, XML

// – Classifications: NHS Number, SSN, Date of Birth, Medical Record Number,

// Diagnosis Code (custom), Prescription Data (custom)

// – Sampling: Full scan

// Rule Set 3: Standard PII (all other sources)

// – File types: All common formats

// – Classifications: Email, Phone, Address, Name (ML-based)

// – Sampling: 128 rows (default)

// Rule Set 4: Technical Metadata Only (dev/test environments)

// – File types: All

// – Classifications: NONE (disable all classification rules)

// – Purpose: Capture lineage and schema without false positive noise in non-prod

CHAPTER 4

Data Catalog: Enterprise Search & Lineage

4.1 Catalog Architecture

The Purview Data Catalog is a faceted search index built on Azure Cognitive Search, surfacing all metadata entities discovered by the Data Map. Its power comes not from indexing but from curation: the governance workflows, glossary integration, certification processes, and ownership models that turn raw metadata into trusted, consumable data assets.

4.2 Business Glossary Design

The Business Glossary is the semantic layer of Purview — the bridge between technical metadata and business meaning. A poorly designed glossary becomes a maintenance burden; a well-designed glossary becomes the authoritative vocabulary of the organization.

Glossary Governance Model

Term Status	Meaning	Who Sets It	Catalog Behavior
Draft	Term being developed; not yet authoritative	Term authors (data stewards)	Discoverable but not recommended for use
Approved	Reviewed and endorsed by domain owner	Domain data owner	Shown as authoritative in search results
Deprecated	Term being replaced; avoid new usage	Governance team	Shown with deprecation warning; redirects to replacement term
Expired	Term no longer valid; historical reference only	Governance program manager	Hidden from default search; accessible via filter

Glossary Hierarchy Best Practices

Flat glossaries fail at scale. A 2,000-term flat list is unsearchable. Structure terms in a parent-child hierarchy reflecting business domains:

L1 — Domain: Customer, Product, Finance, Risk, Operations, HR
L2 — Subdomain: Customer → Prospect, Customer → Active, Customer → Churned
L3 — Concept: Customer → Active → Customer Lifetime Value, Customer → Active → Net Promoter Score
L4 — Attribute: Customer → Active → CLV → Predicted CLV (12-month), Actual CLV (trailing 12-month)

Terms should carry: formal definition, example values, calculation formula (where applicable), related terms, steward contact, and linked certified data assets. Incomplete terms erode trust faster than no glossary at all.

4.3 Lineage Architecture & Extraction

Data lineage is Purview’s most strategically valuable capability — and the most complex to implement correctly. Lineage answers the questions that matter most: “Where does this metric come from?”, “What would break if we changed this table?”, and “How was this data transformed before I received it?”

Lineage Extraction Mechanisms

Automated lineage (preferred): Purview automatically extracts lineage from Azure Data Factory (pipeline activities), Synapse Spark (PySpark/Scala), Azure Synapse Pipelines, Microsoft Fabric dataflows, and Power BI (report-to-dataset-to-source chain). Zero code required — enabled at scan time.

SQL-based lineage parsing: Purview parses stored procedures, views, and CREATE TABLE AS SELECT statements to extract column-level lineage from SQL-based transformations. Supports: Azure SQL Database, Synapse Dedicated Pool, SQL Server (via SHIR).

Custom lineage via Atlas API: For sources without native connectors (custom Spark jobs, dbt, Informatica, Talend), lineage is submitted programmatically via the Apache Atlas REST API. This is the integration path for third-party ETL tools.

Lineage from Microsoft Fabric (recommended 2024+): Fabric’s native integration with Purview provides the most granular lineage available: column-level lineage through Lakehouse, Warehouse, Dataflows, and Power BI reports in a single unbroken chain.

Lineage API Integration Example

# Submit custom lineage to Purview Atlas API (Python)

# Use case: dbt transformation lineage

import requests

from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()

token = credential.get_token(“https://purview.azure.net/.default”).token

purview_endpoint = “https://{account_name}.purview.azure.com”

lineage_payload = {

“entities”: [

{

“typeName”: “Process”,

“attributes”: {

“name”: “dbt_model_customer_lifetime_value”,

“qualifiedName”: “dbt://project/model/customer_lifetime_value”,

“inputs”: [

{“guid”: “-1”, “typeName”: “azure_sql_table”,

“uniqueAttributes”: {“qualifiedName”: “mssql://server/db/sales/orders”}},

{“guid”: “-2”, “typeName”: “azure_sql_table”,

“uniqueAttributes”: {“qualifiedName”: “mssql://server/db/crm/customers”}}

“outputs”: [

{“guid”: “-3”, “typeName”: “azure_sql_table”,

“uniqueAttributes”: {“qualifiedName”: “mssql://server/dwh/gold/customer_ltv”}}

]

}

]

}

response = requests.post(

f”{purview_endpoint}/catalog/api/atlas/v2/entity/bulk”,

headers={“Authorization”: f”Bearer {token}”, “Content-Type”: “application/json”},

json=lineage_payload

)

# Response: {“mutatedEntities”: {“CREATE”: [{“guid”: “…”}]}}

4.4 Search Optimization & Data Product Design

As catalogs scale past 100,000 assets, findability degrades without intentional design. Enterprise search optimization requires four capabilities:

Endorsed and certified assets: Purview endorsement levels (Promoted, Certified) surface trusted assets first. Certification should require: data quality SLA documentation, ownership assignment, glossary linkage, and lineage completeness. Certified assets appear with a visual badge in search results and are prioritized in the search ranking.

Custom attributes for domain context: Extend asset metadata with business-critical attributes: Data Domain, Business Owner, SLA Tier (Gold/Silver/Bronze), Refresh Frequency, Data Quality Score, GDPR Lawful Basis. These become facets in the search experience.

Data Products (Fabric integration): In organizations using Microsoft Fabric, Data Products group related datasets, reports, and pipelines into a consumable unit with a single access request, SLA, and ownership model. This is the recommended abstraction for data mesh architectures on the Microsoft stack.

Term-to-asset linking at ingestion: Automate glossary term linking during scan using column name pattern matching. A column named “customer_id” should automatically receive the “Customer ID” glossary term link as a suggestion — reducing manual stewardship burden by 40-70% in mature estates.

CHAPTER 5

Data Estate Insights & Health Metrics

5.1 Insights Architecture

Data Estate Insights is Purview’s built-in analytics layer — pre-built dashboards that measure governance program health across five dimensions. Unlike custom reporting, Insights uses pre-aggregated metrics from the Data Map, making it near-real-time with no additional cost.

Insight Category	Key Metrics	Governance Question Answered
Asset Insights	Total assets by source type, collection, asset type; new assets per week; scan success rate	“How many data assets do we govern? Are we growing coverage?”
Classification Insights	% classified, top classifications by frequency, unclassified assets by source	“What percentage of our data estate has been classified? Where are our gaps?”
Sensitivity Insights	Sensitivity label distribution, label coverage %, cross-source sensitive asset count	“How much sensitive data do we have and where does it live?”
Glossary Insights	Term coverage (% of assets with linked terms), incomplete terms, term usage by domain	“Is our business glossary actually being used to annotate data?”
Stewardship Insights	Assets without owners, assets without expert contacts, overdue reviews	“Do we have accountability for our data assets?”

5.2 Governance Maturity Scoring

The most operationally valuable use of Insights is building a Governance Maturity Score — a composite metric that tracks program progress over time. This is the single number a CDO should report to the board.

— Governance Maturity Score Calculation (KQL, run in Log Analytics)

— Score range: 0-100. Target: >75 for “Managed” maturity level.

let totalAssets = PurviewInsights_AssetCounts_CL | summarize Total = sum(AssetCount_d);

let classifiedAssets = PurviewInsights_ClassificationCoverage_CL

| summarize Classified = sum(ClassifiedCount_d);

let labelledAssets = PurviewInsights_SensitivityCoverage_CL

| summarize Labelled = sum(LabelledCount_d);

let ownedAssets = PurviewInsights_StewardshipCoverage_CL

| summarize Owned = sum(OwnedCount_d);

let glossaryLinked = PurviewInsights_GlossaryCoverage_CL

| summarize Linked = sum(TermLinkedCount_d);

// Weighted composite score:

// Classification coverage: 30%

// Sensitivity labelling: 25%

// Asset ownership: 25%

// Glossary linkage: 20%

let score = (classifiedAssets/totalAssets * 30)

+ (labelledAssets/totalAssets * 25)

+ (ownedAssets/totalAssets * 25)

+ (glossaryLinked/totalAssets * 20);

score | project GovernanceMaturityScore = score, CalculatedOn = now()

5.3 Integrating Purview Insights with Power BI

For executive reporting, Purview Insights data should be surfaced in Power BI via the Purview REST API. The recommended approach: nightly pipeline (ADF or Fabric Data Factory) calls the Purview Insights REST API endpoints, writes results to a Fabric Lakehouse, and a certified Power BI report visualizes governance KPIs with week-over-week trending.

Key metrics to visualize: Governance Maturity Score trend, Classification coverage by data domain, Sensitive asset growth rate, Stewardship SLA compliance (% of assets reviewed within policy window), and Open stewardship actions by owner.

CHAPTER 6

Data Policy & Access Governance

6.1 Purview Data Policy Architecture

Purview’s Data Policy capability represents a fundamental shift in how data access is governed: from infrastructure-level ACLs managed by engineers to business-level policies managed by data owners and governance teams. This capability is often the least understood and most transformative feature in the platform.

A Purview data access policy states: “Users in group X can perform action Y on data assets matching classification Z.” This policy is then automatically enforced at the storage layer — without requiring changes to Azure RBAC, storage ACLs, or database permissions.

6.2 Policy Types

Data owner policies: Grant read or read/modify access to specific data assets in Azure Storage, ADLS Gen2, Azure SQL, and Fabric. The data owner (assigned in Purview catalog) can self-service grant access without involving the infrastructure team — a governance-controlled self-service model.

DevOps policies: Grant SQL performance monitoring access (VIEW DATABASE STATE, VIEW SERVER STATE) to DevOps engineers without granting data read permissions. Solves a critical principle of least privilege gap in most organizations.

Self-service data access policies: Enable data consumers to request access through the Purview catalog directly. The request triggers an approval workflow routed to the data owner. Approved access is provisioned automatically. Rejected with reason. Full audit trail maintained.

Attribute-based access control (ABAC) policies: Grant access based on asset classifications rather than specific named assets. Example: “Allow data scientists in the ML team to read assets classified as Non-Sensitive.” As new assets are classified and onboarded, they automatically inherit the correct access policy without manual intervention.

6.3 Policy Enforcement Architecture

Data Source	Policy Enforcement Point	Latency to Enforce	Granularity
ADLS Gen2	Azure Storage RBAC (via Purview policy propagation)	< 5 minutes	Container, folder, file level
Azure Blob Storage	Azure Storage RBAC	< 5 minutes	Container level
Azure SQL Database	SQL permissions (system-managed)	< 2 minutes	Database, schema, table level
Azure SQL Managed Instance	SQL permissions (system-managed)	< 2 minutes	Database, schema, table level
Microsoft Fabric	Fabric workspace and item permissions	< 10 minutes	Workspace, lakehouse, table level
Azure Synapse Analytics	Synapse workspace RBAC	< 5 minutes	Workspace, pool level

Architectural Constraint: Purview data policies do NOT replace row-level security (RLS), column masking, or Dynamic Data Masking (DDM) in SQL databases. Purview policies govern access to data assets — who can connect and query. RLS/DDM governs what data they see within an allowed connection. Both layers are required for complete access governance.

6.4 Access Workflow Implementation

# Purview Self-Service Access Request — Backend Configuration

# This Logic App workflow handles access request notifications and provisioning

# Step 1: Configure Purview to emit access request events to Event Grid

az purview account update \

–name “contoso-purview” \

–resource-group “governance-rg” \

–self-hosted-integration-runtime-auth-key “”

# Step 2: Event Grid subscription — route access requests to Logic App

az eventgrid event-subscription create \

–name “purview-access-requests” \

–source-resource-id “/subscriptions/…/purview/accounts/contoso-purview” \

–endpoint “https://prod-logic-app.azurewebsites.net:443/api/access-request/triggers/manual/invoke” \

–included-event-types “Microsoft.Purview.DataAccessRequestCreated”

# Step 3: Logic App action — Teams notification to data owner

# {

# “type”: “message”,

# “to”: “@{triggerBody()?[‘dataOwnerEmail’]}”,

# “body”: “Access request from @{triggerBody()?[‘requestorEmail’]} for @{triggerBody()?[‘assetName’]}.”,

# “attachments”: [{“approve_url”: “@{triggerBody()?[‘approvalUrl’]}”}]

# }

# Step 4: On approval — Purview API call to grant policy

# POST /policyStore/dataPlane/policies/{policyId}/approve

CHAPTER 7

Information Protection & DLP Integration

7.1 Sensitivity Labels: Architecture & Design

Microsoft Purview Information Protection (formerly Azure Information Protection + Microsoft 365 MIP) provides the sensitivity label framework that connects data classification in Purview to protection enforcement across the Microsoft ecosystem. Sensitivity labels are the governance primitive that spans cloud storage, databases, Office documents, Teams messages, and third-party applications.

Sensitivity Label Taxonomy Design

The label taxonomy must balance business usability with enforcement precision. The most common failure mode: too many labels (>10) that data users cannot distinguish between, leading to incorrect labelling or label fatigue.

Label	Definition	Protection Actions	Auto-labelling Trigger
Public	Approved for external publication. No restrictions.	None	No sensitive classifications detected
Internal	Business information for employee use. Not for public sharing.	Watermark on documents	Default label applied to all unlabelled items
Confidential	Sensitive business data. External sharing requires approval.	Encryption, external sharing DLP block, watermark	PII classification, financial data patterns
Highly Confidential	Regulated data, trade secrets, executive communications.	Encryption, download restrictions, MFA for access, audit logging	PHI, PCI data, credentials, classified IP
Restricted	Legal hold, regulatory investigation, M&A sensitive.	Encryption, access list restricted to named individuals, no forwarding	Legal trigger or manual assignment only

7.2 Automatic Labelling: Configuration & Tuning

Auto-labelling policies scan Exchange, SharePoint, OneDrive, Teams, and Azure-native sources to apply labels without user intervention. Configuration requires careful threshold tuning to avoid both under-labelling (missing sensitive content) and over-labelling (creating label fatigue and compliance noise).

# PowerShell: Configure Auto-labelling Policy for Confidential label

# Applies “Confidential” label when content contains 3+ PII pattern matches

Connect-IPPSSession -UserPrincipalName admin@contoso.com

# Create the auto-labelling policy

New-AutoSensitivityLabelPolicy `

-Name “Confidential-AutoLabel-Policy” `

-Labels “Confidential” `

-ApplySensitivityLabel “Confidential” `

-AutoLabelingEnabled $true `

-Workload “Exchange, SharePoint, OneDriveForBusiness”

# Create the labelling rule (trigger conditions)

New-AutoSensitivityLabelRule `

-Name “PII-Detection-Rule” `

-Policy “Confidential-AutoLabel-Policy” `

-ContentContainsSensitiveInformation @(

@{Name=”Credit Card Number”; minCount=”1″},

@{Name=”EU National Identification Number”; minCount=”1″},

@{Name=”Email Address”; minCount=”5″}

) `

-Operator “Or”

# Run in simulation mode first (30 days) — never deploy directly to production

# Review simulation report: Security & Compliance Center → Information Protection

# → Auto-labelling → Policy Name → Simulation Results

7.3 Data Loss Prevention Policies

DLP policies in Purview define actions taken when sensitive data violates sharing rules — blocked email, restricted external sharing, user notification, or incident report generation. DLP is the enforcement layer; sensitivity labels are the classification layer. Both are required.

DLP Scope	Trigger Condition	Action	Business Justification Override
Exchange Email	Highly Confidential label; external recipient	Block delivery; notify sender; generate incident	Yes — manager approval workflow
SharePoint/OneDrive	Confidential label; public sharing link created	Block link creation; notify user; generate incident	Yes — data owner approval
Teams Messages	Credit card number, SSN pattern in message	Block send; notify user; policy tip displayed	No — hard block (financial regulatory)
Azure Storage (Defender for Storage)	Highly Confidential label; unusual download volume	Alert to SOC; optional: revoke access token	No — security team investigation required
Power BI	Confidential label; export to Excel/CSV	Audit log entry; notify workspace admin	Yes — self-service analytics exception

CHAPTER 8

Compliance Manager & Regulatory Frameworks

8.1 Compliance Manager Architecture

Microsoft Purview Compliance Manager provides a risk-based compliance assessment framework that maps Microsoft 365 and Azure control implementations to regulatory requirements. It automates evidence collection for controls that Microsoft manages (Microsoft-managed actions) and guides remediation for customer-managed controls.

Compliance Manager’s compliance score is not a certification — it is a risk signal. A score of 85% does not mean an organization is 85% compliant with GDPR; it means 85% of the control objectives mapped in the assessment are in a satisfactory state. External audit remains necessary for formal certification.

8.2 Regulatory Assessment Configuration

Regulation	Assessment Type	Control Count (Approx)	Automated Evidence %	Key Gap Areas Typically Found
GDPR	Pre-built (Microsoft)	165 controls	~45%	Subject Rights Request process, DPIA documentation, third-party processor contracts
HIPAA/HITECH	Pre-built	80 controls	~55%	PHI access audit logs, workforce training records, BAA documentation
ISO 27001:2022	Pre-built	93 controls	~40%	Asset management procedures, incident response testing, supplier risk assessments
PCI-DSS v4.0	Pre-built	264 controls	~35%	Network segmentation evidence, penetration test reports, PA-DSS documentation
SOC 2 Type II	Pre-built	64 controls	~50%	Vendor management evidence, background check procedures, BC/DR testing records
Custom Regulatory	Custom assessment builder	User-defined	Varies	Depends on regulation; use for DORA, NIS2, local data residency laws

8.3 Continuous Compliance Operating Model

The highest-maturity organizations do not prepare for compliance audits — they operate in a state of continuous compliance. This requires three capabilities:

Automated control testing: Use Microsoft Defender for Cloud’s regulatory compliance dashboard linked to Compliance Manager. Defender continuously tests security controls (encryption at rest, network security groups, MFA enforcement) and feeds results directly into Compliance Manager control scores — daily, not quarterly.

Evidence automation: Build evidence packages programmatically. The Compliance Manager API allows: listing all controls in an assessment, retrieving current control status, uploading evidence documents from SharePoint, and triggering evidence refresh. Automate this in ADF or Logic Apps on a monthly schedule.

Audit-ready documentation: Compliance Manager generates exportable evidence packages (Excel workbooks with control evidence, test results, and remediation notes). Configure these to be generated automatically 30 days before known audit windows and delivered to the compliance team SharePoint site via Power Automate.

# Compliance Manager API: Export assessment evidence package

# Use in Power Automate or ADF for automated audit package generation

$token = (Get-AzAccessToken -ResourceUrl “https://compliance.microsoft.com”).Token

# List all assessments

$assessments = Invoke-RestMethod `

-Uri “https://compliance.microsoft.com/api/ComplianceManager/assessments” `

-Headers @{Authorization = “Bearer $token”} `

-Method GET

# Get controls for GDPR assessment

$gdprId = ($assessments.value | Where-Object {$_.name -like “*GDPR*”}).id

$controls = Invoke-RestMethod `

-Uri “https://compliance.microsoft.com/api/ComplianceManager/assessments/$gdprId/controls” `

-Headers @{Authorization = “Bearer $token”} `

-Method GET

# Upload evidence document to a control

$body = @{

controlId = “GDPR-Art32-EncryptionAtRest”;

evidenceFile = [Convert]::ToBase64String([IO.File]::ReadAllBytes(“encryption_policy.pdf”));

evidenceType = “PolicyDocument”

} | ConvertTo-Json

Invoke-RestMethod `

-Uri “https://compliance.microsoft.com/api/ComplianceManager/assessments/$gdprId/evidence” `

-Headers @{Authorization = “Bearer $token”; “Content-Type” = “application/json”} `

-Method POST -Body $body

CHAPTER 9

Integration Patterns: Fabric, Synapse & ADF

9.1 Microsoft Fabric Integration (Deep Dive)

Microsoft Fabric’s native integration with Purview is the most complete governance integration in the Microsoft ecosystem — and in the enterprise data market broadly. When fully configured, every Fabric artifact (Lakehouse table, Warehouse table, Dataflow, Pipeline, Power BI dataset, report) is automatically discoverable in Purview with column-level lineage, sensitivity labels, and endorsement status synchronized bidirectionally.

Enabling Fabric-Purview Integration

# Step 1: Connect Fabric tenant to Purview account

# Fabric Admin Portal → Governance → Microsoft Purview hub

# Enter Purview account name and subscription

# Step 2: Configure Purview scanning of Fabric workspaces

# In Purview Studio: Data Map → Sources → Register

# Source type: Microsoft Fabric

# Select workspace(s) to scan

# Step 3: Enable sensitivity label sync (bidirectional)

# In Fabric Admin Portal: Tenant Settings → Information Protection

# Toggle ON: “Apply sensitivity labels from data sources to their data in Power BI”

# Toggle ON: “Allow workspace admins to override automatically applied sensitivity labels”

# Step 4: Verify lineage (check in Purview catalog after first scan completes)

# Expected lineage chain:

# Source System → ADF/Dataflow ingestion → Bronze Lakehouse →

# Silver Lakehouse → Gold Lakehouse → Warehouse → Power BI Dataset → Report

# Troubleshooting: If lineage gaps appear between Lakehouse and Warehouse,

# ensure Fabric Warehouse is using shortcuts to Lakehouse (not COPY INTO)

# COPY INTO breaks automated lineage; use Lakehouse shortcuts or Dataflows instead

9.2 Azure Data Factory Lineage Integration

ADF lineage extraction in Purview is one of the most powerful automated capabilities — transforming opaque ETL pipelines into transparent, traceable data flows. When configured correctly, every ADF pipeline run generates lineage edges in the Purview Data Map automatically.

ADF Activity Type	Lineage Extracted	Granularity	Configuration Required
Copy Activity	Source dataset → Copy activity → Sink dataset	Dataset level	Auto (no config required once Purview connected)
Data Flow Activity	Source → Transformation stages → Sink	Column level (via data flow schema)	Enable “Lineage reporting” in Data Flow settings
Execute Pipeline Activity	Parent pipeline → Child pipeline linkage	Pipeline level	Auto
Stored Procedure Activity	Database → SP execution → Database	Dataset level (column via SQL parsing)	SQL Server: auto. Others: manual Atlas API
Lookup Activity	Source dataset read (no lineage to target)	Dataset level (read only)	Auto
Azure Function Activity	No native lineage	None	Custom Atlas API call from within the Azure Function

9.3 Azure Synapse Analytics Integration

Synapse Analytics integration with Purview provides lineage and classification for dedicated SQL pools, serverless SQL, and Spark pools. Key integration points:

Dedicated SQL Pool lineage: Purview scans DDL and DML to extract lineage from stored procedures, views, and CTAS statements. For complex multi-step transformations in stored procedures, Purview parses the SQL AST (abstract syntax tree) to identify source-to-target column mappings.

Spark pool lineage: PySpark and Scala notebooks emit lineage events via the OpenLineage connector. Install the purview-spark-plugin to your Synapse Spark pool configuration to enable automatic lineage emission from all Spark jobs.

# Install Purview Spark Connector in Synapse Spark Pool

# Synapse Studio → Manage → Apache Spark pools → [Your Pool] → Packages

# requirements.txt:

# azure-purview-lineage-spark==1.0.0

# Spark configuration (add to Spark pool properties):

# spark.extraListeners: com.microsoft.azure.purview.spark.PurviewSparkListener

# spark.purview.account.name: contoso-purview

# spark.purview.tenant.id: {tenant-id}

# After configuration, all Spark DataFrame read/write operations emit lineage:

# df = spark.read.parquet(“abfss://bronze@storage.dfs.core.windows.net/sales/”)

# df_transformed = df.groupBy(“region”).agg(sum(“revenue”).alias(“total_revenue”))

# df_transformed.write.format(“delta”).save(“abfss://silver@storage…/sales_by_region/”)

# This generates: bronze/sales → Spark job → silver/sales_by_region (column-level)

CHAPTER 10

Deployment Architecture & Operating Model

10.1 Deployment Architecture Patterns

Pattern 1: Centralized Governance (Single Purview Account)

Best for: Organizations with <50,000 data assets, single-geography operation, or strong central governance team. One Purview account, one data map, one catalog. Simple operations, lower cost, but single point of governance control.

Pattern 2: Federated Governance (Hub-and-Spoke)

Best for: Large multi-geography organizations with autonomous business units. A central “governance hub” Purview account maintained by the enterprise CDO office. Each business unit has its own “spoke” Purview account for local governance. Metadata is synchronized to the hub via the Purview metadata API on a scheduled basis. Hub provides enterprise-wide search and reporting; spokes provide local stewardship autonomy.

Pattern 3: Domain-Aligned (Data Mesh Architecture)

Best for: Organizations implementing a Data Mesh operating model. Each data domain (Customer, Product, Finance, Risk) owns and operates its own Purview account and is responsible for its governance outcomes. The enterprise governance team sets standards and taxonomy but does not operate the domain catalogs. Federated computational governance via Purview’s shared glossary and common classification taxonomy ensures interoperability.

10.2 Network Architecture

Component	Network Option	Recommended For	Tradeoffs
Purview Studio (UI)	Public endpoint (default) or Private Endpoint	Private EP for regulated industries	Private EP requires VPN/ExpressRoute for analyst access
Scanning runtime	Managed VNet (preferred) or SHIR	Managed VNet for Azure sources; SHIR for on-prem	SHIR adds VM management overhead; MVNet is zero-ops
Purview API	Public with AAD auth or Private EP	Private EP for automation in private VNets	Private EP: no public API calls from outside VNet
Kafka (event streaming)	Kafka endpoint for event notification	Event-driven metadata workflows	Requires Event Hubs consumption for real-time events

10.3 Governance Operating Model

Technology deployment without an operating model produces an expensive, unused catalog. The governance operating model defines who does what, when, and how — continuously.

Role	Responsibilities	Time Commitment	Purview Role
Chief Data Officer (CDO)	Set governance strategy, approve glossary, report metrics to board	2-4 hrs/week	Insights Reader, Collection Admin (Root)
Data Governance Lead	Operate Purview program, manage stewards, evolve policies	Full time	Collection Admin, Policy Author
Domain Data Owner	Own data quality for domain, approve certifications, approve access requests	4-8 hrs/week	Data Curator (domain collection)
Data Steward	Enrich metadata, link glossary terms, resolve classification issues, review flagged assets	50-100% FTE	Data Curator
Data Engineer	Register data sources, configure scans, build custom lineage integration	20% allocation	Data Source Admin
Data Consumer	Search catalog, request access, rate asset quality, report metadata issues	Ad hoc	Data Reader

Operating Model Insight: A Purview deployment without assigned Data Stewards is a catalog that fills with metadata but never becomes trusted. The steward-to-asset ratio matters: for a 50,000 asset estate, plan for 2-3 full-time stewards in the first year. Automation (bulk classification, bulk term linking) can increase effective steward capacity to 1 FTE per 100,000 assets at maturity.

CHAPTER 11

Performance, Scale & Cost Optimization

11.1 Scan Performance Optimization

Optimization Lever	Description	Performance Impact	Tradeoff
Incremental scanning	Scan only new/modified assets since last scan using watermark-based detection	60-80% reduction in scan time for stable sources	May miss classification changes on unmodified assets
Targeted scanning	Scope scans to specific folders, schemas, or file patterns	40-70% faster	Requires good source naming conventions
Scan concurrency tuning	Increase concurrent threads on SHIR from default 4 to 8-16	2-4x throughput increase	Higher CPU/RAM on SHIR VMs required
Classification rule optimization	Reduce classification rules per scan rule set; use targeted rule sets per source type	30-50% faster per scanned asset	Requires maintaining multiple scan rule sets
Off-hours scheduling	Schedule large scans for 2-6 AM to avoid competing with production workloads	No throughput gain but avoids source contention	Delayed freshness; not suitable for compliance triggers

11.2 Capacity Planning for Large Estates

Purview’s Data Map has published scale limits that define the upper bounds of single-account deployments. Planning against these limits prevents mid-program architectural pivots:

Resource	Scale Limit (as of 2024)	Recommendation
Assets in Data Map	100 million assets per account	For >80M assets, begin planning federated architecture
Registered sources	3,000 sources per account	Consolidate similar source types into single registered sources where possible
Concurrent scans	100 concurrent scan runs	Use scan scheduling to avoid peak concurrency; prioritize by source criticality
Glossary terms	100,000 terms per account	Maintain term hygiene; deprecate unused terms quarterly
Collections	256 collections per account	Design flat-ish hierarchies; max 4-5 levels for most organizations
Custom classification rules	500 per account	Consolidate similar patterns; use regex groups over multiple single patterns

11.3 Cost Optimization

Purview pricing is based on Data Map capacity units (CUs), scan compute, and Microsoft 365 Compliance licensing. Understanding the cost model prevents surprise bills in large-scale deployments.

Cost Component	Billing Model	Optimization Strategy
Data Map capacity units	$0.496/CU/hour (1 CU = 1GB metadata storage + processing capacity)	Audit scan frequency; incremental scans reduce CU consumption by 60-75%
Scan compute	Billed per vCore-hour for SHIR; Managed VNet included in CUs	Right-size SHIR VMs; schedule to minimize runtime; use MVNet where possible
Data insights compute	Included in CU pricing up to 1M assets/day	No separate cost; do not over-provision CUs speculatively
M365 Compliance (DLP, Labels)	Included in M365 E5 or E5 Compliance add-on	Audit license assignments; unused Compliance seats are common waste
Defender for Cloud integration	Charged per resource per hour for Defender plans	Enable Defender Storage/SQL only for in-scope regulated workloads

CHAPTER 12

Real-World Case Studies

Case Study 1: Global Financial Services Firm — GDPR & PCI Compliance Transformation

Dimension	Details
Organization	Pan-European retail bank, 12,000 employees, 85 data sources
Challenge	GDPR audit failed in 2022 due to inability to demonstrate PII data inventory. €2.3M fine issued. Compliance team spent 14 weeks per audit cycle manually documenting data assets.
Purview Scope	Data Map (85 sources), full estate classification, GDPR & PCI assessments in Compliance Manager, sensitivity labels across M365 and Azure Storage, DLP policies for credit card and IBAN patterns
Timeline	16 weeks to full production deployment across all 85 sources

Technical Implementation Highlights

Deployed Managed VNet scanning for 67 Azure-native sources; SHIR cluster (4 nodes, 16 vCPU each) for 18 on-premises SQL Server and Oracle sources
Custom classification rules developed for: IBAN variants (23 country formats), internal account number format, and proprietary customer reference codes — regex validated against 500K live records before deployment
Column-level lineage from core banking Oracle DB → ADF pipelines → Azure SQL DWH → Power BI reports established within 6 weeks of scan completion
GDPR Subject Rights Request module deployed: automated data discovery across all 85 sources on SRR submission, reducing SRR response time from 28 days to 4 days

Metric	Before Purview	After Purview (12 months)
Compliance audit preparation time	14 weeks manual	6 days automated
PII coverage (classified assets)	23% (manual inventory)	94% (automated)
Sensitivity label coverage	8% (M365 only)	87% across Azure + M365
GDPR SRR response time	28 days	4 days
Annual compliance staffing cost	£1.2M (12 FTE)	£380K (3 FTE + automation)
Purview TCO (Year 1)	—	£420K (all-in)

Case Study 2: Healthcare Network — PHI Governance & HIPAA Continuous Compliance

Dimension	Details
Organization	US regional hospital network, 22 hospitals, 6,500 clinical staff, 140TB of health data across Azure and on-premises
Challenge	Inability to demonstrate minimum-necessary access principle for PHI (HIPAA §164.514). Multiple breaches of PHI to non-clinical staff through misconfigured Power BI reports. No systematic tracking of PHI data flows.
Purview Scope	Healthcare-specific classification rules (PHI types), Data Map across Epic EHR integration layer + Azure SQL + ADLS Gen2, Purview policy for PHI access restriction, DLP to block PHI in Teams/email, Compliance Manager HIPAA assessment
Timeline	20 weeks (regulatory urgency drove accelerated timeline)

Key Technical Decisions

Custom PHI classification taxonomy: Standard Purview PHI rules (SSN, DOB) were insufficient. Built 34 custom classification rules for: Epic patient MRN format, ICD-10 diagnosis codes in free-text, medication names from formulary dictionary match, insurance member ID patterns, and clinical note markers. Classification accuracy validated at 96.3% against 10,000 manually labelled records.

Power BI PHI governance: Deployed sensitivity label policy preventing download/export of reports containing PHI unless user holds HIPAA-authorized role (Entra ID security group). Power BI lineage in Purview enabled identification of 147 reports containing PHI columns — 23 of which had no sensitivity label applied. All remediated within 4 weeks.

Continuous HIPAA monitoring: Compliance Manager HIPAA assessment connected to Defender for Cloud. 42 automated control tests run daily. Compliance score reported weekly to CISO and Privacy Officer. First external HIPAA audit post-deployment: no significant findings.

Case Study 3: Retail Enterprise — Data Mesh Governance with Microsoft Fabric

A FTSE 100 retailer with 8 data domains (Customer, Product, Supply Chain, Finance, Marketing, Store Operations, Loyalty, Digital) implemented a Data Mesh architecture on Microsoft Fabric. The governance challenge: ensure interoperability and trust across domain-owned data products without a central data team bottleneck.

Governance Architecture

Federated Purview (one account per domain): 8 Purview accounts, one per domain. Each domain team operates their own catalog with full autonomy. Enterprise glossary terms synchronized from a central “governance hub” account via nightly API sync.

Cross-domain lineage: Custom lineage federation solution: each domain emits lineage events to a central Azure Event Hub. A Fabric Data Factory pipeline consumes events and writes cross-domain lineage to a central Purview account. CDO can view full end-to-end customer journey lineage from Customer domain through to Finance domain.

Data Product certification: A data product is certified (Purview “Certified” endorsement) only when: data quality SLA is documented, owner is assigned, sensitivity label is applied, and a data contract (schema + SLA) is published to the enterprise API registry. Automated certification check runs weekly via Purview API.

KPI	Target	Achieved (Month 12)
Data products certified	80% of published products	84%
Cross-domain data access time (request to provision)	< 3 business days	1.2 days average
Data quality incidents crossing domain boundary	< 5/month	2.1/month average
Time to identify root cause of cross-domain data issue	< 4 hours	47 minutes average

CHAPTER 13

90-Day Implementation Roadmap

Based on 20+ Purview deployments, the following 90-day roadmap represents the optimal sequencing for enterprise governance programs. It balances quick wins (demonstrable value in Week 6) with foundation-setting activities required for long-term scale.

Phase 1: Foundation (Days 1-30)

Week	Activity	Owner	Success Criteria
1	Purview account provisioning, network design (MVNet vs SHIR), AAD groups creation, collection hierarchy design	Data Engineer + Architect	Purview account live; network connectivity validated; collection hierarchy documented and approved
1-2	Source inventory: document all data sources (type, location, sensitivity level, volume, business owner)	Data Governance Lead	Complete source inventory spreadsheet; sources prioritized by regulatory risk
2	Glossary foundation: identify 50-100 core business terms per domain with definitions, owners, and related terms	Data Governance Lead + Domain Owners	Core glossary terms in Draft status; term owners assigned
2-3	Register and scan Tier 1 sources (highest regulatory risk: PCI scope, PHI scope, GDPR-critical sources)	Data Engineer	Tier 1 sources scanned; classification results reviewed; false positive rate < 5%
3-4	Classification review and tuning: review auto-classification results; build custom rules for organizational patterns; set thresholds	Data Steward + Data Engineer	Custom classification rules deployed; classification accuracy > 90% on sampled validation set
4	RBAC configuration: assign Data Curator, Data Reader roles to domain teams; configure collection-level permissions	Data Governance Lead	All roles assigned; domain teams can access catalog; data engineers can register sources

Phase 2: Activation (Days 31-60)

Week	Activity	Owner	Success Criteria
5-6	Register and scan all remaining data sources; configure incremental scan schedules	Data Engineer	>90% of data estate registered and scanned; scan schedule operational
6	Quick win: publish Governance Maturity Score dashboard in Power BI; present Week 6 scorecard to leadership	Data Governance Lead	Dashboard live; leadership briefing completed; program funded for Phase 3
6-7	Sensitivity label deployment: configure label taxonomy; deploy auto-labelling policies in simulation mode for M365 and Azure	Security + Data Governance	Labels published; simulation mode running; simulation report reviewed
7-8	Lineage validation: verify ADF, Synapse, Fabric lineage; build custom lineage for non-native sources via Atlas API	Data Engineer	End-to-end lineage visible for >3 critical data pipelines; column-level lineage for Power BI reports
7-8	Stewardship workflows: configure ownership assignment workflows; assign owners to all Tier 1 assets; begin Tier 2	Data Steward	>95% of Tier 1 assets have assigned owner and expert contact
8	Compliance Manager setup: create GDPR, HIPAA, or relevant regulatory assessments; map controls to organizational evidence	Compliance Officer + Data Governance	At least one regulatory assessment active; initial compliance score baseline established

Phase 3: Optimization (Days 61-90)

Week		Activity	Owner	Success Criteria
9-10		Enable sensitivity labels in production (exit simulation mode); deploy DLP policies; monitor DLP incidents for first 2 weeks before tuning	Security + Data Governance	Labels applying to new content; DLP incidents appearing in dashboard; < 10% false positive rate on DLP
10-11		Data access policy deployment: configure self-service access request workflow; pilot with 2-3 data domains	Data Governance Lead + Data Engineer	Self-service access workflow live; first access requests processed through Purview; data owner satisfaction confirmed
11		Glossary completion: approve Tier 1 glossary terms; link certified terms to classified assets via bulk assignment	Data Steward	>80% of Tier 1 assets linked to at least one approved glossary term
12		Program review: measure Governance Maturity Score vs. Week 1 baseline; document lessons learned; plan 90-180 day roadmap	CDO + Data Governance Lead	Governance score improvement documented; 90-180 day roadmap approved; ongoing operating model confirmed
	Critical Success Factor: Governance programs that fail typically do so in Days 31-60 — the “activation phase.” Quick wins must be demonstrated by Day 45 to maintain organizational momentum and leadership confidence. The Power BI Governance Maturity Score dashboard is specifically designed as this early value demonstration.

APPENDIX

Appendix: Reference Materials

A. Essential KQL Queries for Purview Operations

A1. Assets Without Owners (Stewardship Gap Report)

// Find all assets in Purview Data Map without assigned owners

// Run in Log Analytics workspace linked to Purview

PurviewAssetMetadata_CL

| where TimeGenerated > ago(7d)

| where isnull(Owner_s) or Owner_s == “”

| summarize UnownedAssets = count() by SourceType_s, Collection_s

| order by UnownedAssets desc

| project Collection_s, SourceType_s, UnownedAssets

A2. Classification Coverage by Data Domain

// Sensitivity classification coverage heat map by collection

PurviewAssetMetadata_CL

| where TimeGenerated > ago(1d)

| summarize

TotalAssets = count(),

ClassifiedAssets = countif(isnotempty(Classifications_s)),

SensitiveAssets = countif(SensitivityLabel_s in (“Confidential”,”HighlyConfidential”))

by Collection_s

| extend

ClassificationCoverage = round(100.0 * ClassifiedAssets / TotalAssets, 1),

SensitiveCoverage = round(100.0 * SensitiveAssets / TotalAssets, 1)

| order by ClassificationCoverage asc // Surface lowest coverage domains first

A3. Scan Failure Detection & Alerting

// Alert when scan success rate drops below 95% in any 24-hour window

PurviewScanLogs_CL

| where TimeGenerated > ago(24h)

| summarize

TotalScans = count(),

FailedScans = countif(Status_s == “Failed”),

SuccessRate = round(100.0 * countif(Status_s == “Succeeded”) / count(), 1)

by SourceName_s

| where SuccessRate < 95

| project SourceName_s, TotalScans, FailedScans, SuccessRate

| order by SuccessRate asc

// Use this query as basis for Azure Monitor alert rule

// Alert when count() > 0 (any source below 95% success)

B. Purview REST API Quick Reference

Operation	Method	Endpoint	Use Case
List collections	GET	/account/collections	Audit collection hierarchy; generate governance reports
Get asset by qualified name	GET	/catalog/api/atlas/v2/entity/uniqueAttribute/type/{typeName}?attr:qualifiedName={qn}	Look up specific asset metadata in automation
Update asset contacts	PUT	/catalog/api/atlas/v2/entity/guid/{guid}/businessattribute/Contacts	Bulk owner assignment in onboarding automation
Submit lineage	POST	/catalog/api/atlas/v2/entity/bulk	Custom lineage for non-native sources (dbt, custom ETL)
Run scan	POST	/scan/datasources/{dsName}/scans/{scanName}/runs	Trigger scan on-demand from CI/CD pipeline on schema change
Get scan status	GET	/scan/datasources/{dsName}/scans/{scanName}/runs/{runId}	Poll scan completion in automation workflow
Create glossary term	POST	/catalog/api/atlas/v2/glossary/term	Bulk glossary population from existing business dictionaries
Assign term to asset	POST	/catalog/api/atlas/v2/entity/guid/{guid}/classifications	Automated term linking after scan completion

C. Governance Maturity Model

Level	Name	Characteristics	Target Score	Typical Timeline
L1	Initial	Ad hoc governance; no systematic catalog; manual compliance preparation; governance by tribal knowledge	0-20	Starting point for most organizations
L2	Managed	Data sources registered and scanned; basic classification applied; glossary under development; ownership partially assigned	20-50	0-6 months post-Purview deployment
L3	Defined	Full estate classification; glossary approved and linked; lineage documented for critical pipelines; compliance assessments active	50-70	6-12 months post-deployment
L4	Quantitatively Governed	Governance Maturity Score tracked weekly; stewardship SLAs enforced; access policies active; DLP protecting sensitive data; self-service access working	70-85	12-24 months
L5	Optimizing	Automated certification, continuous compliance, AI-assisted stewardship, domain-level governance ownership, governance embedded in CI/CD pipelines	85-100	24-36 months

D. Glossary of Key Terms

Term	Definition
Apache Atlas	Open-source metadata management and governance framework; the foundational metadata model underlying Purview’s Data Map
Business Glossary	Curated vocabulary of business terms linked to data assets; provides semantic context and shared language across data consumers
Collection	Hierarchical container in Purview that scopes metadata, access control, and policy enforcement; the primary organizational unit of the Data Map
Data Lineage	Documentation of data origin, movement, and transformation — tracing how data flows from source systems through processing layers to consumption
Data Map	The live, continuously updated inventory of all registered data assets in Purview, including metadata, classifications, lineage, and relationships
DLP (Data Loss Prevention)	Policies that detect and block unauthorized movement or sharing of sensitive data across email, documents, cloud storage, and messaging platforms
Endorsement	Trust signal applied to catalog assets: “Promoted” (recommended by workspace member) or “Certified” (validated by designated authority)
Glossary Term	A business concept formally defined in Purview with name, definition, steward, status, and asset linkages
Managed Virtual Network	Microsoft-managed network infrastructure for Purview scanning; eliminates need for customer-managed integration runtime VMs
OpenLineage	Open standard for data lineage metadata; used by Purview Spark connector to emit lineage from Spark jobs
Policy Author	Purview RBAC role with permission to create and publish data access policies that enforce at the storage layer
Scan Rule Set	Configuration defining which file types and classification rules apply when scanning a data source
Self-Hosted Integration Runtime (SHIR)	Customer-managed agent VM that enables Purview scanning of on-premises or private network data sources
Sensitivity Label	Classification tag applied to data assets and documents (e.g., Confidential, Highly Confidential) that drives downstream protection actions

Contact sales

MICROSOFT PURVIEW

Executive Summary

Key Findings

The Data Governance Imperative

1.1 Why Governance Fails Without Architecture

1.2 The Regulatory Landscape

1.3 The Microsoft Purview Value Proposition

Microsoft Purview: Platform Architecture

2.1 Architectural Overview

2.2 The Metadata Architecture

Entity Relationship Model

2.3 Multi-Tenant & Collections Architecture

Collection Design Patterns

2.4 Identity & RBAC Architecture

Data Map: Discovery & Classification at Scale

3.1 Scanning Architecture

Scan Execution Models

Self-Hosted Integration Runtime: Sizing Guide

3.2 Classification Engine Deep Dive

Classification Execution Order

Custom Classification Rule Design

3.3 Supported Data Sources: Complete Matrix

3.4 Scan Rule Sets: Enterprise Design Patterns

Data Catalog: Enterprise Search & Lineage

4.1 Catalog Architecture

4.2 Business Glossary Design

Glossary Governance Model

Glossary Hierarchy Best Practices

4.3 Lineage Architecture & Extraction

Lineage Extraction Mechanisms

Lineage API Integration Example

4.4 Search Optimization & Data Product Design

Data Estate Insights & Health Metrics

5.1 Insights Architecture

5.2 Governance Maturity Scoring

5.3 Integrating Purview Insights with Power BI

Data Policy & Access Governance

6.1 Purview Data Policy Architecture

6.2 Policy Types

6.3 Policy Enforcement Architecture

6.4 Access Workflow Implementation

Information Protection & DLP Integration

7.1 Sensitivity Labels: Architecture & Design

Sensitivity Label Taxonomy Design

7.2 Automatic Labelling: Configuration & Tuning

7.3 Data Loss Prevention Policies

Compliance Manager & Regulatory Frameworks

8.1 Compliance Manager Architecture

8.2 Regulatory Assessment Configuration

8.3 Continuous Compliance Operating Model

Integration Patterns: Fabric, Synapse & ADF

9.1 Microsoft Fabric Integration (Deep Dive)

Enabling Fabric-Purview Integration

9.2 Azure Data Factory Lineage Integration

9.3 Azure Synapse Analytics Integration

Deployment Architecture & Operating Model

10.1 Deployment Architecture Patterns

Pattern 1: Centralized Governance (Single Purview Account)

Pattern 2: Federated Governance (Hub-and-Spoke)

Pattern 3: Domain-Aligned (Data Mesh Architecture)

10.2 Network Architecture

10.3 Governance Operating Model

Performance, Scale & Cost Optimization

11.1 Scan Performance Optimization

11.2 Capacity Planning for Large Estates

11.3 Cost Optimization

Real-World Case Studies

Case Study 1: Global Financial Services Firm — GDPR & PCI Compliance Transformation

Technical Implementation Highlights

Case Study 2: Healthcare Network — PHI Governance & HIPAA Continuous Compliance

Key Technical Decisions

Case Study 3: Retail Enterprise — Data Mesh Governance with Microsoft Fabric

Governance Architecture

90-Day Implementation Roadmap

Phase 1: Foundation (Days 1-30)

Phase 2: Activation (Days 31-60)

Phase 3: Optimization (Days 61-90)

Appendix: Reference Materials

A. Essential KQL Queries for Purview Operations