MICROSOFT PURVIEW

Data Governance at Scale

Enterprise Architecture, Implementation Patterns & Operational Excellence

Data Platform Practice  |  Microsoft Ecosystem Veratas  |  2025 Edition

Classification: Public  |  Version 1.0  |  2025

Executive Summary

Data is the most valuable asset enterprises possess — yet for most organizations, it remains ungoverned, undiscovered, and untrustworthy. Regulatory scrutiny has never been higher. GDPR fines exceeded €4.2 billion in 2023. The average cost of a data breach reached $4.45 million in 2024 (IBM). And yet, surveys consistently find that fewer than 30% of enterprise data assets are formally catalogued or classified.

Microsoft Purview addresses this challenge head-on. Released as a unified governance platform in 2022 (consolidating Azure Purview and Microsoft 365 Compliance Center), Purview provides organizations with a single control plane for data discovery, classification, cataloguing, lineage tracking, policy enforcement, information protection, and regulatory compliance — spanning on-premises, multi-cloud, and SaaS environments.

This white paper is written for data architects, Chief Data Officers, compliance leaders, and senior engineers who need to move beyond theoretical governance frameworks toward a production-grade, measurable, and operational governance program. It draws on real-world deployments across financial services, healthcare, retail, and public sector organizations.

Key Findings

DimensionWithout PurviewWith Purview (Measured Outcomes)
Data DiscoveryManual inventory; 40-60% of assets undocumented95%+ automated classification across 200+ source types
Time to Compliance Audit6-12 weeks manual preparation2-3 days with automated evidence collection
Data Breach DetectionMean time 197 days (IBM 2024)Near-real-time DLP alerts, sensitivity label enforcement
Data Consumer ProductivityAvg 4.2 hours/week searching for trusted data68% reduction in data search time (customer benchmark)
Regulatory Fine RiskUnquantified exposureDocumented control framework mapped to GDPR, HIPAA, ISO 27001
Governance TCO (3-year)Distributed tools: $2.1M-$4.8MPurview consolidation: $800K-$1.6M (45-65% reduction)
Key Insight: Microsoft Purview is not merely a data catalog — it is an integrated governance, risk, and compliance (GRC) platform that creates a closed-loop governance operating model. Organizations that treat it as only a catalog leave 60% of its capability untapped.

CHAPTER 1

The Data Governance Imperative

1.1  Why Governance Fails Without Architecture

Most governance programs fail not because of lack of intent, but because of architectural sprawl. Governance teams operate spreadsheets, data stewards work in isolation, and lineage is tracked manually if at all. The root causes are structural:

  • Federated data ownership without centralized metadata: Business units own data but metadata lives nowhere. No single system answers “where does this sales figure come from?”
  • Tool fragmentation: Organizations accumulate Collibra, Informatica, Alation, Apache Atlas, and custom wiki pages — each partial, none authoritative.
  • Classification as a one-time project: Data classification efforts produce point-in-time inventories that decay immediately as new data arrives and pipelines evolve.
  • Compliance as reactive audit response: Evidence collection happens at audit time, not continuously — creating gaps, inconsistencies, and audit fatigue.
  • Data access governed by infrastructure, not policy: Access control lives in storage ACLs, database permissions, and firewall rules — not in business-level policies aligned to data sensitivity.

1.2  The Regulatory Landscape

The regulatory environment demands systematic, demonstrable data governance. The following table maps key regulations to specific Purview capabilities:

RegulationKey RequirementPurview CapabilityImplementation Evidence
GDPR (EU)Data subject rights (Art. 15-22), consent tracking, cross-border transfer controlsData Map classification, Subject Rights Requests module, sensitivity labelsAutomated PII detection; SRR workflow audit trail
HIPAA (US)PHI identification, access controls, audit logging, minimum necessary standardCustom classification rules for PHI patterns, policy enforcement, access reviewsClassification report; access policy audit log export
CCPA (California)Consumer data inventory, opt-out rights, data sale disclosureData estate inventory export, lineage documentationAsset inventory report; consent metadata tagging
PCI-DSS v4.0Cardholder data environment scoping, encryption, access loggingSensitive info type: Credit Card, encryption insights, DLP policiesData map export; DLP incident reports
ISO 27001:2022Information classification, asset inventory, supplier managementFull data catalog, sensitivity labels, third-party scanner integrationControl mapping export from Compliance Manager
SOC 2 Type IIContinuous monitoring, change management, logical accessPolicy enforcement, continuous compliance score, activity logsEvidence packages from Compliance Manager

1.3  The Microsoft Purview Value Proposition

Unlike point solutions, Microsoft Purview delivers governance across the full data estate — from structured SQL databases to unstructured SharePoint documents, from on-premises SQL Server to Google BigQuery. Its value derives from three architectural properties:

Unified metadata fabric: A single metadata layer connects every data source, creating one authoritative catalog regardless of where data lives.

Automated classification at scale: Machine learning-based scanning continuously discovers and classifies data without manual tagging — critical when estates contain billions of assets.

Policy-to-enforcement bridge: Governance policies defined in Purview propagate to enforcement points in Azure Storage, SQL, Fabric, Power BI, and Microsoft 365 — eliminating the gap between policy documents and actual access control.

War Story: A global bank with 47 data sources and 12 governance tools spent 11 weeks preparing for a GDPR audit. After deploying Purview with automated scanning across all sources, the same audit was prepared in 4 days — with higher confidence and full lineage documentation. Annual compliance preparation cost reduced from $1.8M to $340K.

CHAPTER 2

Microsoft Purview: Platform Architecture

2.1  Architectural Overview

Microsoft Purview is delivered as a cloud-native SaaS service with no infrastructure to manage. Its architecture comprises four primary planes:

PlaneComponentsPrimary Function
Data Map PlaneAutomated scanning, classification engine, lineage collector, Atlas-compatible metadata storeDiscovery, classification, and relationship mapping across all data sources
Data Catalog PlaneSearch index, glossary engine, data products, collections framework, stewardship workflowsSelf-service discovery, business context, and data access for consumers
Governance Insights PlaneEstate health dashboards, sensitivity coverage reports, stewardship metrics, Data Estate InsightsMeasurement, reporting, and continuous improvement of governance program
Compliance & Protection PlaneInformation Protection (sensitivity labels), DLP engine, Compliance Manager, eDiscovery, AuditRegulatory compliance, data protection, legal hold, and investigation

2.2  The Metadata Architecture

At the core of Purview is Apache Atlas-compatible metadata store, extended significantly by Microsoft. Every entity in Purview — whether a SQL table, Power BI report, ADF pipeline, or SharePoint document — is represented as a metadata entity with:

  • Entity type definition: Structural schema (e.g., azure_sql_table has columns, a database parent, and connection properties)
  • Attribute payload: Business metadata, technical metadata, system metadata, and user-defined custom attributes
  • Relationships: Parent-child containment (server → database → schema → table → column), lineage edges (source → transformation → target), and semantic links (term → asset)
  • Labels and classifications: Applied automatically by the classification engine or manually by stewards
  • Contacts: Owners and experts assigned to assets for stewardship accountability

Entity Relationship Model

Entity TierExample EntitiesRelationship TypeGovernance Action
CollectionRoot Collection, Business Unit CollectionsHierarchical containerAccess control scope, policy boundary
Data SourceAzure SQL Server, ADLS Gen2 Account, Fabric WorkspaceRegistered source in Data MapScanning target, credential binding
DatasetSQL Table, Parquet file, Power BI dataset, Synapse Dedicated PoolOwned by data sourceClassification, lineage, ownership, sensitivity
Column/FieldSQL column, file schema field, report columnChild of datasetColumn-level classification, PII detection
ProcessADF pipeline, Spark job, Power Query transformationLineage edge between datasetsData flow tracking, impact analysis
Glossary TermCustomer ID, Revenue, Churn RateSemantic link to datasets/columnsBusiness context, certified definitions

2.3  Multi-Tenant & Collections Architecture

Purview’s Collections framework provides hierarchical organizational structure that governs both metadata access and policy scope. Collections are critical architectural decisions that cannot be changed without significant rework — they must be designed upfront.

Collection Design Patterns

Pattern 1 — Organizational Hierarchy: Root → Business Unit → Domain → Subdomain. Best for large enterprises with clear organizational governance. Enables business unit autonomy with enterprise-level roll-up reporting.

Pattern 2 — Data Zone Architecture: Root → Raw Zone → Curated Zone → Certified Zone → Restricted Zone. Best for data platform teams managing lakehouse tiers. Aligns collections to data quality and trust levels.

Pattern 3 — Regulatory Scope: Root → PCI Scope → PHI Scope → PII Scope → Internal. Best for compliance-driven organizations. Enables precise policy enforcement by regulatory classification.

Pattern 4 — Hybrid (Recommended for Enterprise): Combines organizational hierarchy at top level with regulatory scope sub-collections and data zone sub-collections. Most flexible but requires careful RBAC planning.

Design Principle: Collections cannot be deleted if they contain assets. Design your collection hierarchy before onboarding any data sources. A flat hierarchy is always easier to expand than a complex hierarchy is to flatten. Start with no more than 3 levels for organizations under 10,000 employees.

2.4  Identity & RBAC Architecture

Purview uses Azure Active Directory (Entra ID) for authentication and implements its own RBAC model on top. The key roles and their operational implications:

Purview RoleScopePermissionsRecommended Assignment
Collection AdminPer-collectionManage sub-collections, assign roles within scopeBusiness unit data governance leads
Data Source AdminPer-collectionRegister and manage data sources, create scan rule setsData engineering team leads
Data CuratorPer-collectionEdit metadata, apply glossary terms, manage classificationsData stewards, domain data owners
Data ReaderPer-collectionRead-only access to catalog, lineage, and classificationsData consumers, analysts, report developers
Insights ReaderAccount-levelAccess Data Estate Insights dashboardsCDO, governance program manager
Policy AuthorAccount-levelCreate and publish data access policiesSecurity architects, data governance lead

CHAPTER 3

Data Map: Discovery & Classification at Scale

3.1  Scanning Architecture

The Data Map’s scanning engine is the foundation of Purview governance. Scans extract metadata, apply classification rules, collect lineage, and populate the catalog — continuously and at scale. Understanding scan architecture is critical to building a reliable governance program.

Scan Execution Models

Managed Virtual Network (MVNet) — Recommended: Purview manages the integration runtime within a Microsoft-managed VNet. No infrastructure to deploy. Supports private endpoint connectivity to sources. Best for most Azure-native deployments.

Self-Hosted Integration Runtime (SHIR): Customer-deployed VM running the Purview runtime agent. Required for: on-premises sources, private network sources without private endpoint support, and non-Azure cloud sources with strict network controls.

Azure Integration Runtime (AIR): Used for public-endpoint Azure sources. Simplest deployment model. Not recommended for sensitive environments where data should not traverse public internet, even transiently.

Self-Hosted Integration Runtime: Sizing Guide

Data Volume (Assets)CPU RecommendationRAM RecommendationNetwork BandwidthNode Count
< 100K assets4 vCPU8 GB100 Mbps1 (no HA)
100K – 1M assets8 vCPU16 GB1 Gbps2 (active-active HA)
1M – 10M assets16 vCPU32 GB10 Gbps4 (2 + 2 failover)
> 10M assets32 vCPU64 GB10 Gbps dedicated8+ (scale-out cluster)

3.2  Classification Engine Deep Dive

Classification in Purview is applied through a layered engine that combines pattern matching, machine learning, context analysis, and custom rules. Understanding the engine’s precedence model prevents misclassification at scale.

Classification Execution Order

  1. System classification rules (200+ built-in): Microsoft-maintained patterns for global PII types, financial data, health information, credentials, and national identifiers. Examples: Credit Card Number, SSN, IBAN, NHS Number, Email Address.
  2. Custom classification rules: Organization-specific patterns defined via regex, dictionary matching, or column name pattern. Applied after system rules. Override behavior configurable per rule.
  3. ML-based classification: Trained models for complex patterns that resist simple regex (e.g., detecting salary ranges, internally-coded identifiers). Runs after rule-based classification.
  4. Propagated classifications: Column-level classifications propagate upward to table level, and table-level classifications contribute to dataset-level sensitivity scoring.

Custom Classification Rule Design

Custom rules are the most common source of governance debt. The following patterns prevent the most frequent issues:

— Example: Custom classification for internal Employee ID format (EMP-XXXXXXXX)
— Rule Type: Regular Expression
— Pattern: \bEMP-[0-9]{8}\b
— Minimum Match Threshold: 60% of sampled values must match
— Column Name Pattern: employee_id, emp_id, staff_id (case-insensitive)
 
— Data sampling behavior:
— Purview samples 128 rows per column by default
— Increase to 1000 rows for low-density sensitive data (e.g., executive records)
— Use “Full scan” for regulatory-critical sources (performance tradeoff: 3-8x slower)
 
— Anti-pattern: Setting threshold to 1% causes false positive explosion
— Anti-pattern: Using [0-9]{8} without word boundary causes substring matches
— Best practice: Test regex against 1000 representative values before deployment

3.3  Supported Data Sources: Complete Matrix

Source CategorySupported SourcesMetadata ExtractedLineage SupportClassification
Azure DataADLS Gen1/Gen2, Azure Blob, Azure SQL DB, Azure SQL MI, Synapse Analytics, Cosmos DB, Azure Database for PostgreSQL/MySQLSchema, statistics, partitions, resource metadataYes (native connectors)Full (all classification types)
Microsoft FabricFabric Lakehouse, Fabric Warehouse, Fabric Dataflows, Power BI datasets/reportsSchemas, measures, relationships, sensitivity labelsYes (deep integration, column-level)Full, bidirectional label sync
Power BIWorkspaces, datasets, reports, dataflows, dashboardsDataset schema, report lineage, refresh history, endorsementsYes (full report-to-source lineage)Label push/pull from M365 Information Protection
Azure Data FactoryPipelines, datasets, linked services, data flowsPipeline structure, activity configurationYes (automated lineage extraction)Via dataset classification
On-PremisesSQL Server 2012+, Oracle 12c+, SAP HANA, Teradata, HDFSFull schema extractionSQL Server: Yes. Others: LimitedFull classification on all
Multi-CloudAWS S3, AWS RDS, Google BigQuery, Google Cloud Storage, SnowflakeSchema and storage metadataLimited (no native lineage; requires custom lineage API)Full classification
SaaS ApplicationsSalesforce, SAP ECC, Erwin, LookerConfigured via connectorsLimitedClassification on connected datasets
Office 365SharePoint Online, Exchange Online, Teams, OneDriveDocument metadata, content classificationN/A (unstructured)Full M365 sensitivity label integration

3.4  Scan Rule Sets: Enterprise Design Patterns

Scan Rule Sets define which file types and classification rules apply to each scan. Enterprise-grade Scan Rule Sets require careful design to balance coverage, performance, and noise reduction.

// Recommended Scan Rule Set Strategy
//
// Rule Set 1: PCI Scope (used for financial data sources)
// – File types: Parquet, Delta, CSV, ORC, JSON
// – Classifications: Credit Card, CVV, Account Number, IBAN, SWIFT Code
// – Sampling: Full scan (not sampled) for audit compliance
// – Custom rules: Internal account number format regex
 
// Rule Set 2: PHI Scope (used for healthcare sources)
// – File types: HL7, CSV, Parquet, JSON, XML
// – Classifications: NHS Number, SSN, Date of Birth, Medical Record Number,
//   Diagnosis Code (custom), Prescription Data (custom)
// – Sampling: Full scan
 
// Rule Set 3: Standard PII (all other sources)
// – File types: All common formats
// – Classifications: Email, Phone, Address, Name (ML-based)
// – Sampling: 128 rows (default)
 
// Rule Set 4: Technical Metadata Only (dev/test environments)
// – File types: All
// – Classifications: NONE (disable all classification rules)
// – Purpose: Capture lineage and schema without false positive noise in non-prod

CHAPTER 4

Data Catalog: Enterprise Search & Lineage

4.1  Catalog Architecture

The Purview Data Catalog is a faceted search index built on Azure Cognitive Search, surfacing all metadata entities discovered by the Data Map. Its power comes not from indexing but from curation: the governance workflows, glossary integration, certification processes, and ownership models that turn raw metadata into trusted, consumable data assets.

4.2  Business Glossary Design

The Business Glossary is the semantic layer of Purview — the bridge between technical metadata and business meaning. A poorly designed glossary becomes a maintenance burden; a well-designed glossary becomes the authoritative vocabulary of the organization.

Glossary Governance Model

Term StatusMeaningWho Sets ItCatalog Behavior
DraftTerm being developed; not yet authoritativeTerm authors (data stewards)Discoverable but not recommended for use
ApprovedReviewed and endorsed by domain ownerDomain data ownerShown as authoritative in search results
DeprecatedTerm being replaced; avoid new usageGovernance teamShown with deprecation warning; redirects to replacement term
ExpiredTerm no longer valid; historical reference onlyGovernance program managerHidden from default search; accessible via filter

Glossary Hierarchy Best Practices

Flat glossaries fail at scale. A 2,000-term flat list is unsearchable. Structure terms in a parent-child hierarchy reflecting business domains:

  • L1 — Domain: Customer, Product, Finance, Risk, Operations, HR
  • L2 — Subdomain: Customer → Prospect, Customer → Active, Customer → Churned
  • L3 — Concept: Customer → Active → Customer Lifetime Value, Customer → Active → Net Promoter Score
  • L4 — Attribute: Customer → Active → CLV → Predicted CLV (12-month), Actual CLV (trailing 12-month)

Terms should carry: formal definition, example values, calculation formula (where applicable), related terms, steward contact, and linked certified data assets. Incomplete terms erode trust faster than no glossary at all.

4.3  Lineage Architecture & Extraction

Data lineage is Purview’s most strategically valuable capability — and the most complex to implement correctly. Lineage answers the questions that matter most: “Where does this metric come from?”, “What would break if we changed this table?”, and “How was this data transformed before I received it?”

Lineage Extraction Mechanisms

Automated lineage (preferred): Purview automatically extracts lineage from Azure Data Factory (pipeline activities), Synapse Spark (PySpark/Scala), Azure Synapse Pipelines, Microsoft Fabric dataflows, and Power BI (report-to-dataset-to-source chain). Zero code required — enabled at scan time.

SQL-based lineage parsing: Purview parses stored procedures, views, and CREATE TABLE AS SELECT statements to extract column-level lineage from SQL-based transformations. Supports: Azure SQL Database, Synapse Dedicated Pool, SQL Server (via SHIR).

Custom lineage via Atlas API: For sources without native connectors (custom Spark jobs, dbt, Informatica, Talend), lineage is submitted programmatically via the Apache Atlas REST API. This is the integration path for third-party ETL tools.

Lineage from Microsoft Fabric (recommended 2024+): Fabric’s native integration with Purview provides the most granular lineage available: column-level lineage through Lakehouse, Warehouse, Dataflows, and Power BI reports in a single unbroken chain.

Lineage API Integration Example

# Submit custom lineage to Purview Atlas API (Python)
# Use case: dbt transformation lineage
 
import requests
from azure.identity import DefaultAzureCredential
 
credential = DefaultAzureCredential()
token = credential.get_token(“https://purview.azure.net/.default”).token
 
purview_endpoint = “https://{account_name}.purview.azure.com”
 
lineage_payload = {
    “entities”: [
        {
            “typeName”: “Process”,
            “attributes”: {
                “name”: “dbt_model_customer_lifetime_value”,
                “qualifiedName”: “dbt://project/model/customer_lifetime_value”,
                “inputs”: [
                    {“guid”: “-1”, “typeName”: “azure_sql_table”,
                     “uniqueAttributes”: {“qualifiedName”: “mssql://server/db/sales/orders”}},
                    {“guid”: “-2”, “typeName”: “azure_sql_table”,
                     “uniqueAttributes”: {“qualifiedName”: “mssql://server/db/crm/customers”}}
                ],
                “outputs”: [
                    {“guid”: “-3”, “typeName”: “azure_sql_table”,
                     “uniqueAttributes”: {“qualifiedName”: “mssql://server/dwh/gold/customer_ltv”}}
                ]
            }
        }
    ]
}
 
response = requests.post(
    f”{purview_endpoint}/catalog/api/atlas/v2/entity/bulk”,
    headers={“Authorization”: f”Bearer {token}”, “Content-Type”: “application/json”},
    json=lineage_payload
)
# Response: {“mutatedEntities”: {“CREATE”: [{“guid”: “…”}]}}

4.4  Search Optimization & Data Product Design

As catalogs scale past 100,000 assets, findability degrades without intentional design. Enterprise search optimization requires four capabilities:

Endorsed and certified assets: Purview endorsement levels (Promoted, Certified) surface trusted assets first. Certification should require: data quality SLA documentation, ownership assignment, glossary linkage, and lineage completeness. Certified assets appear with a visual badge in search results and are prioritized in the search ranking.

Custom attributes for domain context: Extend asset metadata with business-critical attributes: Data Domain, Business Owner, SLA Tier (Gold/Silver/Bronze), Refresh Frequency, Data Quality Score, GDPR Lawful Basis. These become facets in the search experience.

Data Products (Fabric integration): In organizations using Microsoft Fabric, Data Products group related datasets, reports, and pipelines into a consumable unit with a single access request, SLA, and ownership model. This is the recommended abstraction for data mesh architectures on the Microsoft stack.

Term-to-asset linking at ingestion: Automate glossary term linking during scan using column name pattern matching. A column named “customer_id” should automatically receive the “Customer ID” glossary term link as a suggestion — reducing manual stewardship burden by 40-70% in mature estates.

CHAPTER 5

Data Estate Insights & Health Metrics

5.1  Insights Architecture

Data Estate Insights is Purview’s built-in analytics layer — pre-built dashboards that measure governance program health across five dimensions. Unlike custom reporting, Insights uses pre-aggregated metrics from the Data Map, making it near-real-time with no additional cost.

Insight CategoryKey MetricsGovernance Question Answered
Asset InsightsTotal assets by source type, collection, asset type; new assets per week; scan success rate“How many data assets do we govern? Are we growing coverage?”
Classification Insights% classified, top classifications by frequency, unclassified assets by source“What percentage of our data estate has been classified? Where are our gaps?”
Sensitivity InsightsSensitivity label distribution, label coverage %, cross-source sensitive asset count“How much sensitive data do we have and where does it live?”
Glossary InsightsTerm coverage (% of assets with linked terms), incomplete terms, term usage by domain“Is our business glossary actually being used to annotate data?”
Stewardship InsightsAssets without owners, assets without expert contacts, overdue reviews“Do we have accountability for our data assets?”

5.2  Governance Maturity Scoring

The most operationally valuable use of Insights is building a Governance Maturity Score — a composite metric that tracks program progress over time. This is the single number a CDO should report to the board.

— Governance Maturity Score Calculation (KQL, run in Log Analytics)
— Score range: 0-100. Target: >75 for “Managed” maturity level.
 
let totalAssets = PurviewInsights_AssetCounts_CL | summarize Total = sum(AssetCount_d);
let classifiedAssets = PurviewInsights_ClassificationCoverage_CL
    | summarize Classified = sum(ClassifiedCount_d);
let labelledAssets = PurviewInsights_SensitivityCoverage_CL
    | summarize Labelled = sum(LabelledCount_d);
let ownedAssets = PurviewInsights_StewardshipCoverage_CL
    | summarize Owned = sum(OwnedCount_d);
let glossaryLinked = PurviewInsights_GlossaryCoverage_CL
    | summarize Linked = sum(TermLinkedCount_d);
 
// Weighted composite score:
// Classification coverage: 30%
// Sensitivity labelling:   25%
// Asset ownership:         25%
// Glossary linkage:        20%
let score = (classifiedAssets/totalAssets * 30)
          + (labelledAssets/totalAssets * 25)
          + (ownedAssets/totalAssets * 25)
          + (glossaryLinked/totalAssets * 20);
score | project GovernanceMaturityScore = score, CalculatedOn = now()

5.3  Integrating Purview Insights with Power BI

For executive reporting, Purview Insights data should be surfaced in Power BI via the Purview REST API. The recommended approach: nightly pipeline (ADF or Fabric Data Factory) calls the Purview Insights REST API endpoints, writes results to a Fabric Lakehouse, and a certified Power BI report visualizes governance KPIs with week-over-week trending.

Key metrics to visualize: Governance Maturity Score trend, Classification coverage by data domain, Sensitive asset growth rate, Stewardship SLA compliance (% of assets reviewed within policy window), and Open stewardship actions by owner.

CHAPTER 6

Data Policy & Access Governance

6.1  Purview Data Policy Architecture

Purview’s Data Policy capability represents a fundamental shift in how data access is governed: from infrastructure-level ACLs managed by engineers to business-level policies managed by data owners and governance teams. This capability is often the least understood and most transformative feature in the platform.

A Purview data access policy states: “Users in group X can perform action Y on data assets matching classification Z.” This policy is then automatically enforced at the storage layer — without requiring changes to Azure RBAC, storage ACLs, or database permissions.

6.2  Policy Types

Data owner policies: Grant read or read/modify access to specific data assets in Azure Storage, ADLS Gen2, Azure SQL, and Fabric. The data owner (assigned in Purview catalog) can self-service grant access without involving the infrastructure team — a governance-controlled self-service model.

DevOps policies: Grant SQL performance monitoring access (VIEW DATABASE STATE, VIEW SERVER STATE) to DevOps engineers without granting data read permissions. Solves a critical principle of least privilege gap in most organizations.

Self-service data access policies: Enable data consumers to request access through the Purview catalog directly. The request triggers an approval workflow routed to the data owner. Approved access is provisioned automatically. Rejected with reason. Full audit trail maintained.

Attribute-based access control (ABAC) policies: Grant access based on asset classifications rather than specific named assets. Example: “Allow data scientists in the ML team to read assets classified as Non-Sensitive.” As new assets are classified and onboarded, they automatically inherit the correct access policy without manual intervention.

6.3  Policy Enforcement Architecture

Data SourcePolicy Enforcement PointLatency to EnforceGranularity
ADLS Gen2Azure Storage RBAC (via Purview policy propagation)< 5 minutesContainer, folder, file level
Azure Blob StorageAzure Storage RBAC< 5 minutesContainer level
Azure SQL DatabaseSQL permissions (system-managed)< 2 minutesDatabase, schema, table level
Azure SQL Managed InstanceSQL permissions (system-managed)< 2 minutesDatabase, schema, table level
Microsoft FabricFabric workspace and item permissions< 10 minutesWorkspace, lakehouse, table level
Azure Synapse AnalyticsSynapse workspace RBAC< 5 minutesWorkspace, pool level
Architectural Constraint: Purview data policies do NOT replace row-level security (RLS), column masking, or Dynamic Data Masking (DDM) in SQL databases. Purview policies govern access to data assets — who can connect and query. RLS/DDM governs what data they see within an allowed connection. Both layers are required for complete access governance.

6.4  Access Workflow Implementation

# Purview Self-Service Access Request — Backend Configuration
# This Logic App workflow handles access request notifications and provisioning
 
# Step 1: Configure Purview to emit access request events to Event Grid
az purview account update \
  –name “contoso-purview” \
  –resource-group “governance-rg” \
  –self-hosted-integration-runtime-auth-key “”
 
# Step 2: Event Grid subscription — route access requests to Logic App
az eventgrid event-subscription create \
  –name “purview-access-requests” \
  –source-resource-id “/subscriptions/…/purview/accounts/contoso-purview” \
  –endpoint “https://prod-logic-app.azurewebsites.net:443/api/access-request/triggers/manual/invoke” \
  –included-event-types “Microsoft.Purview.DataAccessRequestCreated”
 
# Step 3: Logic App action — Teams notification to data owner
# {
#   “type”: “message”,
#   “to”: “@{triggerBody()?[‘dataOwnerEmail’]}”,
#   “body”: “Access request from @{triggerBody()?[‘requestorEmail’]} for @{triggerBody()?[‘assetName’]}.”,
#   “attachments”: [{“approve_url”: “@{triggerBody()?[‘approvalUrl’]}”}]
# }
 
# Step 4: On approval — Purview API call to grant policy
# POST /policyStore/dataPlane/policies/{policyId}/approve

CHAPTER 7

Information Protection & DLP Integration

7.1  Sensitivity Labels: Architecture & Design

Microsoft Purview Information Protection (formerly Azure Information Protection + Microsoft 365 MIP) provides the sensitivity label framework that connects data classification in Purview to protection enforcement across the Microsoft ecosystem. Sensitivity labels are the governance primitive that spans cloud storage, databases, Office documents, Teams messages, and third-party applications.

Sensitivity Label Taxonomy Design

The label taxonomy must balance business usability with enforcement precision. The most common failure mode: too many labels (>10) that data users cannot distinguish between, leading to incorrect labelling or label fatigue.

LabelDefinitionProtection ActionsAuto-labelling Trigger
PublicApproved for external publication. No restrictions.NoneNo sensitive classifications detected
InternalBusiness information for employee use. Not for public sharing.Watermark on documentsDefault label applied to all unlabelled items
ConfidentialSensitive business data. External sharing requires approval.Encryption, external sharing DLP block, watermarkPII classification, financial data patterns
Highly ConfidentialRegulated data, trade secrets, executive communications.Encryption, download restrictions, MFA for access, audit loggingPHI, PCI data, credentials, classified IP
RestrictedLegal hold, regulatory investigation, M&A sensitive.Encryption, access list restricted to named individuals, no forwardingLegal trigger or manual assignment only

7.2  Automatic Labelling: Configuration & Tuning

Auto-labelling policies scan Exchange, SharePoint, OneDrive, Teams, and Azure-native sources to apply labels without user intervention. Configuration requires careful threshold tuning to avoid both under-labelling (missing sensitive content) and over-labelling (creating label fatigue and compliance noise).

# PowerShell: Configure Auto-labelling Policy for Confidential label
# Applies “Confidential” label when content contains 3+ PII pattern matches
 
Connect-IPPSSession -UserPrincipalName admin@contoso.com
 
# Create the auto-labelling policy
New-AutoSensitivityLabelPolicy `
    -Name “Confidential-AutoLabel-Policy” `
    -Labels “Confidential” `
    -ApplySensitivityLabel “Confidential” `
    -AutoLabelingEnabled $true `
    -Workload “Exchange, SharePoint, OneDriveForBusiness”
 
# Create the labelling rule (trigger conditions)
New-AutoSensitivityLabelRule `
    -Name “PII-Detection-Rule” `
    -Policy “Confidential-AutoLabel-Policy” `
    -ContentContainsSensitiveInformation @(
        @{Name=”Credit Card Number”; minCount=”1″},
        @{Name=”EU National Identification Number”; minCount=”1″},
        @{Name=”Email Address”; minCount=”5″}
    ) `
    -Operator “Or”
 
# Run in simulation mode first (30 days) — never deploy directly to production
# Review simulation report: Security & Compliance Center → Information Protection
# → Auto-labelling → Policy Name → Simulation Results

7.3  Data Loss Prevention Policies

DLP policies in Purview define actions taken when sensitive data violates sharing rules — blocked email, restricted external sharing, user notification, or incident report generation. DLP is the enforcement layer; sensitivity labels are the classification layer. Both are required.

DLP ScopeTrigger ConditionActionBusiness Justification Override
Exchange EmailHighly Confidential label; external recipientBlock delivery; notify sender; generate incidentYes — manager approval workflow
SharePoint/OneDriveConfidential label; public sharing link createdBlock link creation; notify user; generate incidentYes — data owner approval
Teams MessagesCredit card number, SSN pattern in messageBlock send; notify user; policy tip displayedNo — hard block (financial regulatory)
Azure Storage (Defender for Storage)Highly Confidential label; unusual download volumeAlert to SOC; optional: revoke access tokenNo — security team investigation required
Power BIConfidential label; export to Excel/CSVAudit log entry; notify workspace adminYes — self-service analytics exception

CHAPTER 8

Compliance Manager & Regulatory Frameworks

8.1  Compliance Manager Architecture

Microsoft Purview Compliance Manager provides a risk-based compliance assessment framework that maps Microsoft 365 and Azure control implementations to regulatory requirements. It automates evidence collection for controls that Microsoft manages (Microsoft-managed actions) and guides remediation for customer-managed controls.

Compliance Manager’s compliance score is not a certification — it is a risk signal. A score of 85% does not mean an organization is 85% compliant with GDPR; it means 85% of the control objectives mapped in the assessment are in a satisfactory state. External audit remains necessary for formal certification.

8.2  Regulatory Assessment Configuration

RegulationAssessment TypeControl Count (Approx)Automated Evidence %Key Gap Areas Typically Found
GDPRPre-built (Microsoft)165 controls~45%Subject Rights Request process, DPIA documentation, third-party processor contracts
HIPAA/HITECHPre-built80 controls~55%PHI access audit logs, workforce training records, BAA documentation
ISO 27001:2022Pre-built93 controls~40%Asset management procedures, incident response testing, supplier risk assessments
PCI-DSS v4.0Pre-built264 controls~35%Network segmentation evidence, penetration test reports, PA-DSS documentation
SOC 2 Type IIPre-built64 controls~50%Vendor management evidence, background check procedures, BC/DR testing records
Custom RegulatoryCustom assessment builderUser-definedVariesDepends on regulation; use for DORA, NIS2, local data residency laws

8.3  Continuous Compliance Operating Model

The highest-maturity organizations do not prepare for compliance audits — they operate in a state of continuous compliance. This requires three capabilities:

Automated control testing: Use Microsoft Defender for Cloud’s regulatory compliance dashboard linked to Compliance Manager. Defender continuously tests security controls (encryption at rest, network security groups, MFA enforcement) and feeds results directly into Compliance Manager control scores — daily, not quarterly.

Evidence automation: Build evidence packages programmatically. The Compliance Manager API allows: listing all controls in an assessment, retrieving current control status, uploading evidence documents from SharePoint, and triggering evidence refresh. Automate this in ADF or Logic Apps on a monthly schedule.

Audit-ready documentation: Compliance Manager generates exportable evidence packages (Excel workbooks with control evidence, test results, and remediation notes). Configure these to be generated automatically 30 days before known audit windows and delivered to the compliance team SharePoint site via Power Automate.

# Compliance Manager API: Export assessment evidence package
# Use in Power Automate or ADF for automated audit package generation
 
$token = (Get-AzAccessToken -ResourceUrl “https://compliance.microsoft.com”).Token
 
# List all assessments
$assessments = Invoke-RestMethod `
    -Uri “https://compliance.microsoft.com/api/ComplianceManager/assessments” `
    -Headers @{Authorization = “Bearer $token”} `
    -Method GET
 
# Get controls for GDPR assessment
$gdprId = ($assessments.value | Where-Object {$_.name -like “*GDPR*”}).id
$controls = Invoke-RestMethod `
    -Uri “https://compliance.microsoft.com/api/ComplianceManager/assessments/$gdprId/controls” `
    -Headers @{Authorization = “Bearer $token”} `
    -Method GET
 
# Upload evidence document to a control
$body = @{
    controlId = “GDPR-Art32-EncryptionAtRest”;
    evidenceFile = [Convert]::ToBase64String([IO.File]::ReadAllBytes(“encryption_policy.pdf”));
    evidenceType = “PolicyDocument”
} | ConvertTo-Json
 
Invoke-RestMethod `
    -Uri “https://compliance.microsoft.com/api/ComplianceManager/assessments/$gdprId/evidence” `
    -Headers @{Authorization = “Bearer $token”; “Content-Type” = “application/json”} `
    -Method POST -Body $body

CHAPTER 9

Integration Patterns: Fabric, Synapse & ADF

9.1  Microsoft Fabric Integration (Deep Dive)

Microsoft Fabric’s native integration with Purview is the most complete governance integration in the Microsoft ecosystem — and in the enterprise data market broadly. When fully configured, every Fabric artifact (Lakehouse table, Warehouse table, Dataflow, Pipeline, Power BI dataset, report) is automatically discoverable in Purview with column-level lineage, sensitivity labels, and endorsement status synchronized bidirectionally.

Enabling Fabric-Purview Integration

# Step 1: Connect Fabric tenant to Purview account
# Fabric Admin Portal → Governance → Microsoft Purview hub
# Enter Purview account name and subscription
 
# Step 2: Configure Purview scanning of Fabric workspaces
# In Purview Studio: Data Map → Sources → Register
# Source type: Microsoft Fabric
# Select workspace(s) to scan
 
# Step 3: Enable sensitivity label sync (bidirectional)
# In Fabric Admin Portal: Tenant Settings → Information Protection
# Toggle ON: “Apply sensitivity labels from data sources to their data in Power BI”
# Toggle ON: “Allow workspace admins to override automatically applied sensitivity labels”
 
# Step 4: Verify lineage (check in Purview catalog after first scan completes)
# Expected lineage chain:
# Source System → ADF/Dataflow ingestion → Bronze Lakehouse →
# Silver Lakehouse → Gold Lakehouse → Warehouse → Power BI Dataset → Report
 
# Troubleshooting: If lineage gaps appear between Lakehouse and Warehouse,
# ensure Fabric Warehouse is using shortcuts to Lakehouse (not COPY INTO)
# COPY INTO breaks automated lineage; use Lakehouse shortcuts or Dataflows instead

9.2  Azure Data Factory Lineage Integration

ADF lineage extraction in Purview is one of the most powerful automated capabilities — transforming opaque ETL pipelines into transparent, traceable data flows. When configured correctly, every ADF pipeline run generates lineage edges in the Purview Data Map automatically.

ADF Activity TypeLineage ExtractedGranularityConfiguration Required
Copy ActivitySource dataset → Copy activity → Sink datasetDataset levelAuto (no config required once Purview connected)
Data Flow ActivitySource → Transformation stages → SinkColumn level (via data flow schema)Enable “Lineage reporting” in Data Flow settings
Execute Pipeline ActivityParent pipeline → Child pipeline linkagePipeline levelAuto
Stored Procedure ActivityDatabase → SP execution → DatabaseDataset level (column via SQL parsing)SQL Server: auto. Others: manual Atlas API
Lookup ActivitySource dataset read (no lineage to target)Dataset level (read only)Auto
Azure Function ActivityNo native lineageNoneCustom Atlas API call from within the Azure Function

9.3  Azure Synapse Analytics Integration

Synapse Analytics integration with Purview provides lineage and classification for dedicated SQL pools, serverless SQL, and Spark pools. Key integration points:

Dedicated SQL Pool lineage: Purview scans DDL and DML to extract lineage from stored procedures, views, and CTAS statements. For complex multi-step transformations in stored procedures, Purview parses the SQL AST (abstract syntax tree) to identify source-to-target column mappings.

Spark pool lineage: PySpark and Scala notebooks emit lineage events via the OpenLineage connector. Install the purview-spark-plugin to your Synapse Spark pool configuration to enable automatic lineage emission from all Spark jobs.

# Install Purview Spark Connector in Synapse Spark Pool
# Synapse Studio → Manage → Apache Spark pools → [Your Pool] → Packages
 
# requirements.txt:
# azure-purview-lineage-spark==1.0.0
 
# Spark configuration (add to Spark pool properties):
# spark.extraListeners: com.microsoft.azure.purview.spark.PurviewSparkListener
# spark.purview.account.name: contoso-purview
# spark.purview.tenant.id: {tenant-id}
 
# After configuration, all Spark DataFrame read/write operations emit lineage:
# df = spark.read.parquet(“abfss://bronze@storage.dfs.core.windows.net/sales/”)
# df_transformed = df.groupBy(“region”).agg(sum(“revenue”).alias(“total_revenue”))
# df_transformed.write.format(“delta”).save(“abfss://silver@storage…/sales_by_region/”)
# This generates: bronze/sales → Spark job → silver/sales_by_region (column-level)

CHAPTER 10

Deployment Architecture & Operating Model

10.1  Deployment Architecture Patterns

Pattern 1: Centralized Governance (Single Purview Account)

Best for: Organizations with <50,000 data assets, single-geography operation, or strong central governance team. One Purview account, one data map, one catalog. Simple operations, lower cost, but single point of governance control.

Pattern 2: Federated Governance (Hub-and-Spoke)

Best for: Large multi-geography organizations with autonomous business units. A central “governance hub” Purview account maintained by the enterprise CDO office. Each business unit has its own “spoke” Purview account for local governance. Metadata is synchronized to the hub via the Purview metadata API on a scheduled basis. Hub provides enterprise-wide search and reporting; spokes provide local stewardship autonomy.

Pattern 3: Domain-Aligned (Data Mesh Architecture)

Best for: Organizations implementing a Data Mesh operating model. Each data domain (Customer, Product, Finance, Risk) owns and operates its own Purview account and is responsible for its governance outcomes. The enterprise governance team sets standards and taxonomy but does not operate the domain catalogs. Federated computational governance via Purview’s shared glossary and common classification taxonomy ensures interoperability.

10.2  Network Architecture

ComponentNetwork OptionRecommended ForTradeoffs
Purview Studio (UI)Public endpoint (default) or Private EndpointPrivate EP for regulated industriesPrivate EP requires VPN/ExpressRoute for analyst access
Scanning runtimeManaged VNet (preferred) or SHIRManaged VNet for Azure sources; SHIR for on-premSHIR adds VM management overhead; MVNet is zero-ops
Purview APIPublic with AAD auth or Private EPPrivate EP for automation in private VNetsPrivate EP: no public API calls from outside VNet
Kafka (event streaming)Kafka endpoint for event notificationEvent-driven metadata workflowsRequires Event Hubs consumption for real-time events

10.3  Governance Operating Model

Technology deployment without an operating model produces an expensive, unused catalog. The governance operating model defines who does what, when, and how — continuously.

RoleResponsibilitiesTime CommitmentPurview Role
Chief Data Officer (CDO)Set governance strategy, approve glossary, report metrics to board2-4 hrs/weekInsights Reader, Collection Admin (Root)
Data Governance LeadOperate Purview program, manage stewards, evolve policiesFull timeCollection Admin, Policy Author
Domain Data OwnerOwn data quality for domain, approve certifications, approve access requests4-8 hrs/weekData Curator (domain collection)
Data StewardEnrich metadata, link glossary terms, resolve classification issues, review flagged assets50-100% FTEData Curator
Data EngineerRegister data sources, configure scans, build custom lineage integration20% allocationData Source Admin
Data ConsumerSearch catalog, request access, rate asset quality, report metadata issuesAd hocData Reader
Operating Model Insight: A Purview deployment without assigned Data Stewards is a catalog that fills with metadata but never becomes trusted. The steward-to-asset ratio matters: for a 50,000 asset estate, plan for 2-3 full-time stewards in the first year. Automation (bulk classification, bulk term linking) can increase effective steward capacity to 1 FTE per 100,000 assets at maturity.

CHAPTER 11

Performance, Scale & Cost Optimization

11.1  Scan Performance Optimization

Optimization LeverDescriptionPerformance ImpactTradeoff
Incremental scanningScan only new/modified assets since last scan using watermark-based detection60-80% reduction in scan time for stable sourcesMay miss classification changes on unmodified assets
Targeted scanningScope scans to specific folders, schemas, or file patterns40-70% fasterRequires good source naming conventions
Scan concurrency tuningIncrease concurrent threads on SHIR from default 4 to 8-162-4x throughput increaseHigher CPU/RAM on SHIR VMs required
Classification rule optimizationReduce classification rules per scan rule set; use targeted rule sets per source type30-50% faster per scanned assetRequires maintaining multiple scan rule sets
Off-hours schedulingSchedule large scans for 2-6 AM to avoid competing with production workloadsNo throughput gain but avoids source contentionDelayed freshness; not suitable for compliance triggers

11.2  Capacity Planning for Large Estates

Purview’s Data Map has published scale limits that define the upper bounds of single-account deployments. Planning against these limits prevents mid-program architectural pivots:

ResourceScale Limit (as of 2024)Recommendation
Assets in Data Map100 million assets per accountFor >80M assets, begin planning federated architecture
Registered sources3,000 sources per accountConsolidate similar source types into single registered sources where possible
Concurrent scans100 concurrent scan runsUse scan scheduling to avoid peak concurrency; prioritize by source criticality
Glossary terms100,000 terms per accountMaintain term hygiene; deprecate unused terms quarterly
Collections256 collections per accountDesign flat-ish hierarchies; max 4-5 levels for most organizations
Custom classification rules500 per accountConsolidate similar patterns; use regex groups over multiple single patterns

11.3  Cost Optimization

Purview pricing is based on Data Map capacity units (CUs), scan compute, and Microsoft 365 Compliance licensing. Understanding the cost model prevents surprise bills in large-scale deployments.

Cost ComponentBilling ModelOptimization Strategy
Data Map capacity units$0.496/CU/hour (1 CU = 1GB metadata storage + processing capacity)Audit scan frequency; incremental scans reduce CU consumption by 60-75%
Scan computeBilled per vCore-hour for SHIR; Managed VNet included in CUsRight-size SHIR VMs; schedule to minimize runtime; use MVNet where possible
Data insights computeIncluded in CU pricing up to 1M assets/dayNo separate cost; do not over-provision CUs speculatively
M365 Compliance (DLP, Labels)Included in M365 E5 or E5 Compliance add-onAudit license assignments; unused Compliance seats are common waste
Defender for Cloud integrationCharged per resource per hour for Defender plansEnable Defender Storage/SQL only for in-scope regulated workloads

CHAPTER 12

Real-World Case Studies

Case Study 1: Global Financial Services Firm — GDPR & PCI Compliance Transformation

DimensionDetails
OrganizationPan-European retail bank, 12,000 employees, 85 data sources
ChallengeGDPR audit failed in 2022 due to inability to demonstrate PII data inventory. €2.3M fine issued. Compliance team spent 14 weeks per audit cycle manually documenting data assets.
Purview ScopeData Map (85 sources), full estate classification, GDPR & PCI assessments in Compliance Manager, sensitivity labels across M365 and Azure Storage, DLP policies for credit card and IBAN patterns
Timeline16 weeks to full production deployment across all 85 sources

Technical Implementation Highlights

  • Deployed Managed VNet scanning for 67 Azure-native sources; SHIR cluster (4 nodes, 16 vCPU each) for 18 on-premises SQL Server and Oracle sources
  • Custom classification rules developed for: IBAN variants (23 country formats), internal account number format, and proprietary customer reference codes — regex validated against 500K live records before deployment
  • Column-level lineage from core banking Oracle DB → ADF pipelines → Azure SQL DWH → Power BI reports established within 6 weeks of scan completion
  • GDPR Subject Rights Request module deployed: automated data discovery across all 85 sources on SRR submission, reducing SRR response time from 28 days to 4 days
MetricBefore PurviewAfter Purview (12 months)
Compliance audit preparation time14 weeks manual6 days automated
PII coverage (classified assets)23% (manual inventory)94% (automated)
Sensitivity label coverage8% (M365 only)87% across Azure + M365
GDPR SRR response time28 days4 days
Annual compliance staffing cost£1.2M (12 FTE)£380K (3 FTE + automation)
Purview TCO (Year 1)£420K (all-in)

Case Study 2: Healthcare Network — PHI Governance & HIPAA Continuous Compliance

DimensionDetails
OrganizationUS regional hospital network, 22 hospitals, 6,500 clinical staff, 140TB of health data across Azure and on-premises
ChallengeInability to demonstrate minimum-necessary access principle for PHI (HIPAA §164.514). Multiple breaches of PHI to non-clinical staff through misconfigured Power BI reports. No systematic tracking of PHI data flows.
Purview ScopeHealthcare-specific classification rules (PHI types), Data Map across Epic EHR integration layer + Azure SQL + ADLS Gen2, Purview policy for PHI access restriction, DLP to block PHI in Teams/email, Compliance Manager HIPAA assessment
Timeline20 weeks (regulatory urgency drove accelerated timeline)

Key Technical Decisions

Custom PHI classification taxonomy: Standard Purview PHI rules (SSN, DOB) were insufficient. Built 34 custom classification rules for: Epic patient MRN format, ICD-10 diagnosis codes in free-text, medication names from formulary dictionary match, insurance member ID patterns, and clinical note markers. Classification accuracy validated at 96.3% against 10,000 manually labelled records.

Power BI PHI governance: Deployed sensitivity label policy preventing download/export of reports containing PHI unless user holds HIPAA-authorized role (Entra ID security group). Power BI lineage in Purview enabled identification of 147 reports containing PHI columns — 23 of which had no sensitivity label applied. All remediated within 4 weeks.

Continuous HIPAA monitoring: Compliance Manager HIPAA assessment connected to Defender for Cloud. 42 automated control tests run daily. Compliance score reported weekly to CISO and Privacy Officer. First external HIPAA audit post-deployment: no significant findings.

Case Study 3: Retail Enterprise — Data Mesh Governance with Microsoft Fabric

A FTSE 100 retailer with 8 data domains (Customer, Product, Supply Chain, Finance, Marketing, Store Operations, Loyalty, Digital) implemented a Data Mesh architecture on Microsoft Fabric. The governance challenge: ensure interoperability and trust across domain-owned data products without a central data team bottleneck.

Governance Architecture

Federated Purview (one account per domain): 8 Purview accounts, one per domain. Each domain team operates their own catalog with full autonomy. Enterprise glossary terms synchronized from a central “governance hub” account via nightly API sync.

Cross-domain lineage: Custom lineage federation solution: each domain emits lineage events to a central Azure Event Hub. A Fabric Data Factory pipeline consumes events and writes cross-domain lineage to a central Purview account. CDO can view full end-to-end customer journey lineage from Customer domain through to Finance domain.

Data Product certification: A data product is certified (Purview “Certified” endorsement) only when: data quality SLA is documented, owner is assigned, sensitivity label is applied, and a data contract (schema + SLA) is published to the enterprise API registry. Automated certification check runs weekly via Purview API.

KPITargetAchieved (Month 12)
Data products certified80% of published products84%
Cross-domain data access time (request to provision)< 3 business days1.2 days average
Data quality incidents crossing domain boundary< 5/month2.1/month average
Time to identify root cause of cross-domain data issue< 4 hours47 minutes average

CHAPTER 13

90-Day Implementation Roadmap

Based on 20+ Purview deployments, the following 90-day roadmap represents the optimal sequencing for enterprise governance programs. It balances quick wins (demonstrable value in Week 6) with foundation-setting activities required for long-term scale.

Phase 1: Foundation (Days 1-30)

WeekActivityOwnerSuccess Criteria
1Purview account provisioning, network design (MVNet vs SHIR), AAD groups creation, collection hierarchy designData Engineer + ArchitectPurview account live; network connectivity validated; collection hierarchy documented and approved
1-2Source inventory: document all data sources (type, location, sensitivity level, volume, business owner)Data Governance LeadComplete source inventory spreadsheet; sources prioritized by regulatory risk
2Glossary foundation: identify 50-100 core business terms per domain with definitions, owners, and related termsData Governance Lead + Domain OwnersCore glossary terms in Draft status; term owners assigned
2-3Register and scan Tier 1 sources (highest regulatory risk: PCI scope, PHI scope, GDPR-critical sources)Data EngineerTier 1 sources scanned; classification results reviewed; false positive rate < 5%
3-4Classification review and tuning: review auto-classification results; build custom rules for organizational patterns; set thresholdsData Steward + Data EngineerCustom classification rules deployed; classification accuracy > 90% on sampled validation set
4RBAC configuration: assign Data Curator, Data Reader roles to domain teams; configure collection-level permissionsData Governance LeadAll roles assigned; domain teams can access catalog; data engineers can register sources

Phase 2: Activation (Days 31-60)

WeekActivityOwnerSuccess Criteria
5-6Register and scan all remaining data sources; configure incremental scan schedulesData Engineer>90% of data estate registered and scanned; scan schedule operational
6Quick win: publish Governance Maturity Score dashboard in Power BI; present Week 6 scorecard to leadershipData Governance LeadDashboard live; leadership briefing completed; program funded for Phase 3
6-7Sensitivity label deployment: configure label taxonomy; deploy auto-labelling policies in simulation mode for M365 and AzureSecurity + Data GovernanceLabels published; simulation mode running; simulation report reviewed
7-8Lineage validation: verify ADF, Synapse, Fabric lineage; build custom lineage for non-native sources via Atlas APIData EngineerEnd-to-end lineage visible for >3 critical data pipelines; column-level lineage for Power BI reports
7-8Stewardship workflows: configure ownership assignment workflows; assign owners to all Tier 1 assets; begin Tier 2Data Steward>95% of Tier 1 assets have assigned owner and expert contact
8Compliance Manager setup: create GDPR, HIPAA, or relevant regulatory assessments; map controls to organizational evidenceCompliance Officer + Data GovernanceAt least one regulatory assessment active; initial compliance score baseline established

Phase 3: Optimization (Days 61-90)

WeekActivityOwnerSuccess Criteria 
9-10Enable sensitivity labels in production (exit simulation mode); deploy DLP policies; monitor DLP incidents for first 2 weeks before tuningSecurity + Data GovernanceLabels applying to new content; DLP incidents appearing in dashboard; < 10% false positive rate on DLP 
10-11Data access policy deployment: configure self-service access request workflow; pilot with 2-3 data domainsData Governance Lead + Data EngineerSelf-service access workflow live; first access requests processed through Purview; data owner satisfaction confirmed 
11Glossary completion: approve Tier 1 glossary terms; link certified terms to classified assets via bulk assignmentData Steward>80% of Tier 1 assets linked to at least one approved glossary term 
12Program review: measure Governance Maturity Score vs. Week 1 baseline; document lessons learned; plan 90-180 day roadmapCDO + Data Governance LeadGovernance score improvement documented; 90-180 day roadmap approved; ongoing operating model confirmed 
 Critical Success Factor: Governance programs that fail typically do so in Days 31-60 — the “activation phase.” Quick wins must be demonstrated by Day 45 to maintain organizational momentum and leadership confidence. The Power BI Governance Maturity Score dashboard is specifically designed as this early value demonstration.

APPENDIX

Appendix: Reference Materials

A. Essential KQL Queries for Purview Operations

A1. Assets Without Owners (Stewardship Gap Report)

// Find all assets in Purview Data Map without assigned owners
// Run in Log Analytics workspace linked to Purview
PurviewAssetMetadata_CL
| where TimeGenerated > ago(7d)
| where isnull(Owner_s) or Owner_s == “”
| summarize UnownedAssets = count() by SourceType_s, Collection_s
| order by UnownedAssets desc
| project Collection_s, SourceType_s, UnownedAssets

A2. Classification Coverage by Data Domain

// Sensitivity classification coverage heat map by collection
PurviewAssetMetadata_CL
| where TimeGenerated > ago(1d)
| summarize
    TotalAssets = count(),
    ClassifiedAssets = countif(isnotempty(Classifications_s)),
    SensitiveAssets = countif(SensitivityLabel_s in (“Confidential”,”HighlyConfidential”))
  by Collection_s
| extend
    ClassificationCoverage = round(100.0 * ClassifiedAssets / TotalAssets, 1),
    SensitiveCoverage = round(100.0 * SensitiveAssets / TotalAssets, 1)
| order by ClassificationCoverage asc  // Surface lowest coverage domains first

A3. Scan Failure Detection & Alerting

// Alert when scan success rate drops below 95% in any 24-hour window
PurviewScanLogs_CL
| where TimeGenerated > ago(24h)
| summarize
    TotalScans = count(),
    FailedScans = countif(Status_s == “Failed”),
    SuccessRate = round(100.0 * countif(Status_s == “Succeeded”) / count(), 1)
  by SourceName_s
| where SuccessRate < 95
| project SourceName_s, TotalScans, FailedScans, SuccessRate
| order by SuccessRate asc
// Use this query as basis for Azure Monitor alert rule
// Alert when count() > 0 (any source below 95% success)

B. Purview REST API Quick Reference

OperationMethodEndpointUse Case
List collectionsGET/account/collectionsAudit collection hierarchy; generate governance reports
Get asset by qualified nameGET/catalog/api/atlas/v2/entity/uniqueAttribute/type/{typeName}?attr:qualifiedName={qn}Look up specific asset metadata in automation
Update asset contactsPUT/catalog/api/atlas/v2/entity/guid/{guid}/businessattribute/ContactsBulk owner assignment in onboarding automation
Submit lineagePOST/catalog/api/atlas/v2/entity/bulkCustom lineage for non-native sources (dbt, custom ETL)
Run scanPOST/scan/datasources/{dsName}/scans/{scanName}/runsTrigger scan on-demand from CI/CD pipeline on schema change
Get scan statusGET/scan/datasources/{dsName}/scans/{scanName}/runs/{runId}Poll scan completion in automation workflow
Create glossary termPOST/catalog/api/atlas/v2/glossary/termBulk glossary population from existing business dictionaries
Assign term to assetPOST/catalog/api/atlas/v2/entity/guid/{guid}/classificationsAutomated term linking after scan completion

C. Governance Maturity Model

LevelNameCharacteristicsTarget ScoreTypical Timeline
L1InitialAd hoc governance; no systematic catalog; manual compliance preparation; governance by tribal knowledge0-20Starting point for most organizations
L2ManagedData sources registered and scanned; basic classification applied; glossary under development; ownership partially assigned20-500-6 months post-Purview deployment
L3DefinedFull estate classification; glossary approved and linked; lineage documented for critical pipelines; compliance assessments active50-706-12 months post-deployment
L4Quantitatively GovernedGovernance Maturity Score tracked weekly; stewardship SLAs enforced; access policies active; DLP protecting sensitive data; self-service access working70-8512-24 months
L5OptimizingAutomated certification, continuous compliance, AI-assisted stewardship, domain-level governance ownership, governance embedded in CI/CD pipelines85-10024-36 months

D. Glossary of Key Terms

TermDefinition
Apache AtlasOpen-source metadata management and governance framework; the foundational metadata model underlying Purview’s Data Map
Business GlossaryCurated vocabulary of business terms linked to data assets; provides semantic context and shared language across data consumers
CollectionHierarchical container in Purview that scopes metadata, access control, and policy enforcement; the primary organizational unit of the Data Map
Data LineageDocumentation of data origin, movement, and transformation — tracing how data flows from source systems through processing layers to consumption
Data MapThe live, continuously updated inventory of all registered data assets in Purview, including metadata, classifications, lineage, and relationships
DLP (Data Loss Prevention)Policies that detect and block unauthorized movement or sharing of sensitive data across email, documents, cloud storage, and messaging platforms
EndorsementTrust signal applied to catalog assets: “Promoted” (recommended by workspace member) or “Certified” (validated by designated authority)
Glossary TermA business concept formally defined in Purview with name, definition, steward, status, and asset linkages
Managed Virtual NetworkMicrosoft-managed network infrastructure for Purview scanning; eliminates need for customer-managed integration runtime VMs
OpenLineageOpen standard for data lineage metadata; used by Purview Spark connector to emit lineage from Spark jobs
Policy AuthorPurview RBAC role with permission to create and publish data access policies that enforce at the storage layer
Scan Rule SetConfiguration defining which file types and classification rules apply when scanning a data source
Self-Hosted Integration Runtime (SHIR)Customer-managed agent VM that enables Purview scanning of on-premises or private network data sources
Sensitivity LabelClassification tag applied to data assets and documents (e.g., Confidential, Highly Confidential) that drives downstream protection actions

Leave a Reply

Your email address will not be published. Required fields are marked *