Marthe Moengen

Gal in a Cube

Arcthitecture best practices in Fabric that enables your data governance journey — 24. Jun 2025

Arcthitecture best practices in Fabric that enables your data governance journey

With its low-code approach, Microsoft Fabric enables anyone to take on tasks that once required a data engineering background. It accelerates development, supercharges workflows, and integrates seamlessly with AI, both enabling AI and using AI to make you even more productive. Definitely super cool. 😎

But with this new speed and power comes a new level of responsibility. As AI becomes deeply embedded in our tools and decisions, the old adage still holds true: garbage in, garbage out. That’s why the architecture of your Microsoft Fabric environment matters more than ever.

Why? Because with the ease and speed of things in Fabric today, it is SO SIMPLE to create things, so how fast can you create a mess for yourself? Anyone using Power BI for a couple of years and going with the self-serve approach? Then you know what I am talking about.

So, a strong foundation ensures compliance, security, and data integrity to ensure you never lose control, end up with duplicates and ultimately low-quality data, because when AI acts on bad data or a flawed setup, the consequences can scale just as fast as the benefits.

Let’s take a look at the what initial steps you should consider for your Fabric architecture and why!

Jump to:

  1. How should you structure your items (Lakehouses/Warehouses) in Microsoft Fabric?
    1. ✅ Pros of Item Separation in Microsoft Fabric
    2. ⚠️ Considerations of Item Separation in Microsoft Fabric
  2. How should you structure your workspaces in Microsoft Fabric?
    1. ✅ Pros of Workspace Separation in Microsoft Fabric
    2. ⚠️ Cons of Workspace Separation in Microsoft Fabric
  3. How should you structure your Domains in Microsoft Fabric?
    1. ✅ Pros of Domain Separation in Microsoft Fabric
    2. ⚠️ Cons of Domain Separation in Microsoft Fabric
  4. How should you structure your Capacities in Microsoft Fabric?
    1. ✅ ️ Pros of Capacity Separation in Microsoft Fabric
    2. ⚠️ Cons of Capacity Separation in Microsoft Fabric

How should you structure your items (Lakehouses/Warehouses) in Microsoft Fabric?

I like to think of Fabric in this order when making the first decisions on the HOW we are going to set things up. Items define your options on workspaces, and workspaces define your options on domains and capacities. So, the first thing you need to think about is item separation.

Let’s use the medallion architecture as an example throughout this blog post to have something many are familiar with.

Would you like to separate the bronze, silver and gold layer into separate items – or do you want to group them into one lakehouse or warehouse? Or a mix?

✅ Pros of Item Separation in Microsoft Fabric

Clear Layer BoundariesEnforces architectural clarity between Bronze, Silver, and Gold layers.
Minimizes accidental data leakage between stages.
Enhanced Security & GovernanceEnables more granular control over access (e.g., only data engineers consume Bronze; analysts consume Gold).
Improved DiscoverabilityEasier for consumers to find the right data at the right stage.
Promotes documentation and ownership via dedicated spaces. E.g. if you want to separate ownership on bronze/silver layer for source-aligned data products, while the gold layer provides consumer-aligned data products.
Improves discoverability (and lineage) in Purview as Items are best supported today.
Better Modularity & ScalabilityEach layer can evolve independently (e.g., switching ingestion logic in Bronze without touching Gold).
Encourages a microservice-style approach where each layer is self-contained.
Supports InteroperabilityEnables integration with various tools and personas by decoupling processing stages.

⚠️ Considerations of Item Separation in Microsoft Fabric

Increased ComplexityMore items to manage.
Requires well-defined conventions and documentation.
Operational OverheadMay lead to duplication of effort (e.g., repeated metadata or pipeline setup across layers).
Monitoring and orchestration across items become more complex.
Risk of Over-EngineeringNot all projects need full item separation; using it universally can slow down small teams.
Risks “compliance theater” without real added value if not paired with strong practices.
Dependency ManagementInter-layer dependencies may become fragile if naming, versioning, or schema tracking isn’t standardized.

Use it when: You need strong governance, multiple teams, or enterprise-scale structure.
Skip it when: You’re working fast, solo, or on smaller, agile projects.

How should you structure your workspaces in Microsoft Fabric?

When you have made your choices on item separation, you are ready to consider your workspace separation, as the item separation also (naturally) enables workspace separation.

Let’s use the medallion architecture as an example again.

Do you want to have all your layers in one workspace, or separate them across workspaces, or a mix?

Pros of Workspace Separation in Microsoft Fabric

1. Self-Contained EnvironmentsEncapsulation of logic and data for each team.
Reduced risk of accidental interference across unrelated areas.
Easier testing and deployment of updates in isolation.
2. Improved DiscoverabilityEasier to navigate than a massive, centralized workspace.
Reduces cognitive load for analysts and consumers.
Improves discoverability in Purview.
3. Stronger Governance & Access ControlDefine permissions on a need-to-know basis using the workspace for different development teams. Then have a more granular option for access control on the item level as well if needed.
Ensure compliance by segmenting sensitive data (e.g. some bronze data might be sensitive compared to gold layer)
4. Domain-Oriented OwnershipTeams can own, maintain, and evolve their domain-specific workspaces independently
Reduces bottlenecks by avoiding centralized gatekeeping
Encourages accountability and autonomy
5. Better ObservabilityErrors, performance, and usage can be scoped per workspace
Easier to trace lineage and operational issues within contained environments

⚠️ Cons of Workspace Separation in Microsoft Fabric

1. Cross-Workspace Dependencies Can Be PainfulSharing datasets between workspaces can involve more manual effort or pipeline complexity.
Lack of strong cross-workspace lineage tracking increases risk of versioning issues.
2. Coordination OverheadSchema changes or upstream updates must be communicated across teams. (Should you consider data product contracts?)
Governance, naming conventions, and SLAs must be actively enforced.
3. Risk of FragmentationWorkspaces can become inconsistent in structure, naming, and metadata practices
Onboarding new users becomes harder if standards vary widely
4. Initial Barrier to EntrySetting up multiple workspaces might feel like overkill
Single-workspace setups may be better for rapid prototyping or agile development

Use when: You have multiple domains or teams, need tight access control, or want to scale governance.
Avoid when: You’re prototyping, working with a small team, or need fast iteration across datasets.

*a consideration not discussed in this article for workspace separation is CI/CD

How should you structure your Domains in Microsoft Fabric?

When you have your workspace plan ready, you can take a look at domains.

Do you want to separate you domains on business use case alone, on technical teams, on data source, or a mix?

If you use a data mesh approach, you might want each domain to own the entire data flow from bronze to silver.

Suppose you want to enable your business domains, but still want to take advantage of some centralization in making the different data layers available. In that case, you might want to look at a domain separation as shown above.

Pros of Domain Separation in Microsoft Fabric

1. Reflects Business StructureOrganizing data by domain mirrors your org chart.
This reduces confusion and aligns data strategy with business operations.
2. Clear Ownership and AccountabilityEach domain owns its data products. This fosters a culture of accountability and ensures data is maintained by those who understand it best.
3. Decentralized Policy EnforcementDomains can enforce their own data quality, security, and compliance rules within their boundary.
This enables scalability without relying solely on a central team.
4. Improved Governance and ObservabilitySmaller, domain-focused scopes are easier to govern.
Monitoring usage, managing permissions, and auditing access becomes simpler and more meaningful.
5. Autonomy and SpeedTeams can build and release data products at their own pace.
They don’t need to wait on a centralized team to deploy pipelines or models.

⚠️ Cons of Domain Separation in Microsoft Fabric

1. Risk of SilosIf domains don’t collaborate or share standards, data silos can (re-)emerge inside of Fabric.
Interoperability must be intentionally designed.
2. Duplication of EffortMultiple teams might build similar models or transformations independently. Without coordination, this wastes time and creates inconsistency.
3. Tooling and Training OverheadEach domain team needs enough skill and support to manage its own pipelines, models, and compliance needs. This requires investment.

Use it when: Your org has distinct teams/domains and you want scalable ownership.
Avoid it when: You’re early in your journey or lack governance maturity.

How should you structure your Capacities in Microsoft Fabric?

Then finally, let’s take a look at your choices when it comes to Fabric capacities.

Do you want to use capacity separation to mirror your business domains, technical teams, environments or a mix?

If your organization requires separate cost management across business domains, you probably want to mirror the capacities and the domains.

Another separation you might consider instead of or in combination with the domain separation is to separate the capacities for the different environments. This can ensure performance. If you are taking advantage of federated development teams, you run a higher risk of someone creating a crazy dataflow that kills the entire capacity. Separating development and production can therefore be wise. This is also a way to maximise cost savings, as the development capacity does not need to be on 24/7 and can be scaled up and down as needed.

If your organisation exists across regions, you might also want to consider separating your environments based on different capacity regions. Be aware that it is currently not possible to move Fabric items across regions without a support ticket to Microsoft. Take some time to consider your needs and use cases before splitting.

Pros of Capacity Separation in Microsoft Fabric

1. Performance IsolationHigh-demand domains won’t be bottlenecked by low-priority processes elsewhere.
Development efforts won’t throttle production environments.
2. Cost Transparency & AccountabilityClearer tracking of compute and storage consumption per business domain/unit or team.
Easier chargeback/showback models for budgeting or internal billing
Data-driven capacity planning (who needs more/less and why)
3. Optimized ScalingCritical business domains can be scaled up.
Lightweight domains can be throttled or moved to shared capacity.

⚠️ Cons of Capacity Separation in Microsoft Fabric

1. Potential Resource WasteSmall or inactive domains may not fully utilize their assigned capacity. Wasted potential if workloads don’t justify a dedicated capacity.
Teams may leave unused resources running (e.g., long-lived Spark jobs) that are not discovered by the separate domains.
3. More Complex GovernanceDomain-level cost and performance management requires clear policies for scaling, shutting down idle jobs, prioritisation and governance around assigning capacity (shared vs dedicated).
Increased administrative overhead to right-size environments.

Use it when: you need performance isolation between teams or layers, want cost tracking per domain or department, domains have high or variable workloads, or you have governance in place for managing capacity.

Avoid it when: workloads are small or early-stage, teams lack cost or performance monitoring maturity, shared capacity meets your needs, or you want to minimize setup and management overhead.


Hope you found this article useful!

Stay updated on new blog posts and videos by subscribing to @GuyInACube on YouTube, follow me on LinkedIn or subscribe to the newsletter for this blog below to get the newest updates!

Start Your Data Governance Journey with Microsoft Purview: The complete guide [videos & descriptions] — 11. Jun 2025

Start Your Data Governance Journey with Microsoft Purview: The complete guide [videos & descriptions]

Feeling unsure about how to begin with Microsoft Purview? You’re in the right place! 💙 🩵

This blog post will be regularly updated as I record new videos and as new features are released. Here, you’ll find a step-by-step guide along with a feature overview. I will include videos and text for each feature. They will be in a logical sequence. This will hopefully make it easier for you to discover exactly what you need! 🤩

I previously created a mini-series on my YouTube Channel, @DataAscend, but I have since joined Adam and Patrick on @GuyInACube! So, this article will contain a mix of videos from both channels.

Stay updated on new videos by subscribing to @GuyInACube on YouTube, follow me on LinkedIn or subscribe to the newsletter for this blog below to get the newest updates!

Jump to:

  1. Purview Course: Get started with Purview Step-by-Step
  2. What is Microsoft Purview? An introduction!
  3. Create your first Purview instance!
  4. Upgrade to New Microsoft Purview
  5. How to Connect and Scan Your Fabric Data in Purview?
  6. How do you structure your Data Map?
  7. What is the difference between the Data Map and the Unified Catalog?
  8. How to Create a Business Domain/Governance Domain in Microsoft Purview
  9. What is the concept of a Data Product, and why should you care?
  10. How to Create a Data Product in Microsoft Purview?
  11. Set up Data Quality on Your Fabric Data Products in Purview

Purview Course: Get started with Purview Step-by-Step

Check out the new Purview Course on Guy in a Cube!

What is Microsoft Purview? An introduction!

Microsoft Purview is a data governance and compliance tool that helps organizations discover, classify, manage, and protect data across cloud, on-premises, and SaaS environments.

It is divided into three main areas: Governance 🏛️, Compliance ✅, and Security 🔒.

From the data perspective, we have previously been most into the governance solutions: Data Map and Unified Catalog – but now Purview Security and Compliance also supports data (and not only the more traditional information management). So, you will probably want to take advantage of all solutions to truly ensure quality, trust and compliance for your data!

Create your first Purview instance!

🥇 The very first step if you do not already have a Purview instance in your tenant. Let’s set it up together!

Upgrade to New Microsoft Purview

Already have an existing Purview account? The “old” one? This video shows how to upgrade to the latest Microsoft Purview solution and access its new features.

How to Connect and Scan Your Fabric Data in Purview?

Learn how to register your Fabric data in Microsoft Purview by creating collections, connections, and scans.

For more details on what to think about when choosing the structure of your Data Map, check out the video below

How do you structure your Data Map?

Creating a well-organised Data Map in Microsoft Purview isn’t just about setting it up – it’s about making the right decisions on Domains and Collections structure. But how do you get it right?

Here’s what you need to consider when planning your structure:

✅ Access Levels & Control – Ensure the right people have the right permissions.
🔒 Separation – Maintain clear boundaries for better management.
🛠️ Development, Test & Production Environments – Keep your workflows organised and efficient.
💡 And more!

What is the difference between the Data Map and the Unified Catalog?

Both are essential for data governance (!), organising your data, and ensuring compliance within your organisation. But how do they differ, and how should you approach structuring them effectively?

I like to divide the data catalog part of Purview into two:

  1. Physical data estate with your Data Map and Data Assets
  2. Your logical data estate with Governance Domains and Data Products

But how should you structure them? Take a look at the video below:

How to Create a Business Domain/Governance Domain in Microsoft Purview

Overview: This video explains how to set up a governance domain for better data organization and governance. You can then group your data products into business domains later.

Topics Covered:

  • Step-by-step guide to creating a business domain.

In this video I call it a “Business Domain”, but Purview has later renamed it to Governance Domain, which I think is more fitting. You can then decide yourself if you want to separate your domains into Business Domains, Governance Domains, Data Domains, Technical Domains, etc. This will depend on your organizational setup.

What is the concept of a Data Product, and why should you care?

Before we dive into the Data Product concept in Purview – what is a Data Product?

How to Create a Data Product in Microsoft Purview?

Overview: Discover how to create data products within Microsoft Purview to manage and catalog data more effectively.

Topics Covered:

  • 🧩 Defining a Data Product and linking it to a 📁 Business Domain.
  • 🔗 Connecting your physical Data Assets to your 🛍️ Data Product.
  • 📃 Setting up terms of use for your Data Product and Data Assets.
  • 🔐 Setting up Request Access Policies for your Data Product.

The Data Assets that we link to the Data Product are the physical data assets that we scanned in the previous step.

Set up Data Quality on Your Fabric Data Products in Purview

This video covers how to monitor data quality on your Fabric data products within Microsoft Purview.

Obs: This video shows you how to scan using the Managed Identity as the authenticator for the scan done before this video. This will not work if you want to do DQ runs on your Fabric sub-level items, like the tables in a lakehouse. To do this, you must use a service principal as authentication when you run the ordinary scan. The SP needs contributor access to the workspace for this to work. See the Microsoft documentation on how to set up a SP authentication.

Topics Covered:

  • 🔗 Setting up data quality connection for your data governance domain.
  • 🛠️ Setting up data quality rules and profiling for your data assets.
  • ▶️ Running the data quality and profiling rules, and 📊 monitoring the outcome.
  • 📌 Looking into actions of your resulting Data Quality and Profiling runs, ✅ assigning tasks and actions to Data Stewards or other roles in your organization to improve the 🧹 Data Quality

A new and updated video is on the way. Subscribe to @GuyInACube on YouTube, follow me on LinkedIn or subscribe the newsletter for this blog below to get the newest updates!

Hope you found this article helpful!

Start Your Data Governance Journey with the new Microsoft Purview: A Step-by-step guide — 1. Oct 2024

Start Your Data Governance Journey with the new Microsoft Purview: A Step-by-step guide

https://data-ascend.com/2025/06/11/start-your-data-governance-journey-with-microsoft-purview-the-complete-guide-videos-descriptions/


Microsoft Purview has gotten a serious makeover. It is not only a Data Catalog anymore, it is a data governance tool that includes data security, data catalogging, metadata management, data quality, data estate health monitoring and more.

I have created a mini-series on how to get started with building your data governance domains, and data products by scanning your Fabric data in Purview on my YouTube channel. This blog post summarizes the mini-series with some added descriptions.

Stay updated on new videos by subscribing to my YouTube Channel:

There are still features in Purview that are in preview, and there is a lot of development ongoing- existing! But that also means that some buttons and names have changed when you read this tutorial.

Jump to:

  1. 1. Upgrade to New Microsoft Purview
  2. 2. How to Register Your Fabric Data in Purview
    1. Scope your Fabric scan in Microsoft Purview
  3. 3. How to Create a Business Domain/Governance Domain in Microsoft Purview
  4. 4. How to Create a Data Product in Microsoft Purview
  5. 5. Set up Data Quality on Your Fabric Data Products in Purview

1. Upgrade to New Microsoft Purview

Overview: This video shows how to upgrade to the latest Microsoft Purview solution and access its new features.

2. How to Register Your Fabric Data in Purview

Overview: Learn how to register your Fabric data in Microsoft Purview by creating collections, connections, and scans.

I like to divide the data catalog part of Purview into two:

  1. Physical data estate with your Data Map and Data Assets
  2. Your logical data estate with Governance Domains and Data Products

In this video I look at how you can set up your Data Map and scan your physical data assets in Fabric.

Topics Covered:

  • Creating a new collection.
  • Setting up a connection to data sources.
  • Running scans to discover and register data assets.

Also check out the “Scope your scan” video below. This feature was released after I created the video. Now you don’t have to scan your entire Fabric Ecosystem, but can choose to scan based on workspaces.

Scope your Fabric scan in Microsoft Purview

Learn how to scope your data scans by workspaces to make your Purview scans more targeted and efficient.

3. How to Create a Business Domain/Governance Domain in Microsoft Purview

Overview: This video explains how to set up a governance domain for better data organization and governance. You can then group your data products into business domains later.

Topics Covered:

  • Step-by-step guide to creating a business domain.

In this video I call it a “Business Domain”, but Purview has later renamed it to Governance Domain, which I think is more fitting. You can then decide yourself if you want to separate your domains into Business Domains, Governance Domains, Data Domains, Technical Domains, etc. This will depend on your organizational setup.

4. How to Create a Data Product in Microsoft Purview

Overview: Discover how to create data products within Microsoft Purview to manage and catalog data more effectively.

Topics Covered:

  • Defining a Data Product and linking it to a Business Domain.
  • Connecting your physical Data Assets to your Data Product
  • Setting up terms of use for your Data Product and Data Assets
  • Setting up Request Access Policies for your Data Product

The Data Assets that we link to the Data Product are the physical data assets that we scanned in the previous video.

5. Set up Data Quality on Your Fabric Data Products in Purview

Overview: This video covers how to monitor data quality on your Fabric data products within Microsoft Purview.

Topics Covered:

  • Setting up data quality connection for your data governance domain.
  • Setting up data quality rules and profiling for your data assets.
  • Running the data quality and profiling rules, and monitoring the outcome.
  • Looking into actions of your resulting Data Quality and Profiling runs, assigning tasks and actions to Data Stewards or other roles in your organization to improve the Data Quality.

Obs! For Purview to be able to scan the data in your workspace, the Purview service principal needs to be assigned Contributor access to be able to run the data scan.

Hope you found this article helpful!

Useful links:

https://learn.microsoft.com/en-us/purview/purview-portal

https://learn.microsoft.com/en-us/purview/whats-new

How do you set up your Data Mesh in Microsoft Fabric? — 8. Jan 2024

How do you set up your Data Mesh in Microsoft Fabric?

Coming from the world of Microsoft analytics I got curious as to why Microsoft chose to go with Fabric as the name of the newly released analytical solution Microsoft Fabric. Looking into this, the architectural concept of Data Fabric appeared to me. I had been working with Data Mesh for a while, but the Fabric architecture was something I hadn’t really heard about before. Meaning of course that I needed to know more.

I was going to write a short blog about the differences between Fabric Architecture and Data Mesh and then see how these two architectures look inside Microsoft Fabric. Turns out there is too much to say, so I had to turn this into a mini-series.

The second out is Data Mesh in Microsoft Fabric!
So, let’s have a look at how you implement a data mesh architecture in Fabric

Stay tuned for more content on Fabric Architecture, and how Data Mesh and Fabric Architecture can be set up in Microsoft Fabric!

  1. What is Data Mesh Architecture?
  2. How do you handle your data products in Microsoft Fabric?
    1. Accessibility
    2. Interoperability
    3. Trusted
    4. Reusability
    5. Findable
    6. Limitations
  3. How do you handle your data domains in Microsoft Fabric?
    1. What are Domains in Microsoft Fabric?
    2. Limitations
  4. How do you handle your self-serve data platform with Microsoft Fabric?
  5. How do you set up data governance with data mesh in Microsoft Fabric?
    1. Federated Governance
    2. Centralized Governance
    3. Limitations
  6. Summary

What is Data Mesh Architecture?

The first post in this mini-series answers this question, looking into what data mesh is, different topologies, challenges and benefits. If you want to read up on the data mesh approach before looking into how this applies to Microsoft Fabric, you can take a look at that blog post here:

In short, the data mesh is an approach to how you manage your data. Data Mesh is not only an architectural and technical approach. To be successful, you also need to change your organisational processes and structure. Having both IT and the business onboard is therefore a crucial success factor when implementing data mesh in your organisation.

The data mesh approach comes from the domain-driven design and bounded context way of thinking. We find these concepts in the 4 components that make up data mesh:

  • Domains
  • Products
  • Federated Governance
  • Self-Service Data Platform

How do you handle your data products in Microsoft Fabric?

By introducing product thinking into the world of data, you get Data Products. A data product is a data asset that should be

  • Accessible
  •  Interoperable
  •  Trusted
  •  Reusable
  •  Findable

The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers.

So, what would a data product look like in Fabric? Sorry to disappoint, but that depends. It depends on how you define a data product in your organisation. Is it a table, a SQL view, a table, a Power BI semantic model, a dataflow or even a Power BI report? Or can it be all of these things?

Let’s have a look at how you can set up your data product in Fabric:

Accessibility

A data product is made accessible through one or more output ports for data consumers. In Fabric there are muliple ways of distributing your data products, but again – it denends on what your data product looks like.

For accessibility outside of Fabric, you can use a SQL endpoint or the underlying ADLS Gen 2 connection that your OneLake is based on.

For internal accessibility inside of Microsoft Fabric you can use a dataflow endpoint, a semantic model connection or an internal shortcut. Or it can just be accessible inside a workspace within one domain where other domains can connect to it an load it using their prefered integration tool within their domain.

Interoperability

A data product is interoperable through its metadata that also holds some of the standardization of the data product, as the schema and semanitcs. Below is a screenshot from Fabric of a dataset with a location, labelled endorsement, refresh date and sensitivity label.

Trusted

The metadata also enforces some of the governance over the data product with its ownership, and security or rights to use that ensures that our data product is trusted. in addition, the observability of the data products provides us with information about the SLA, timeliness and quality of the data product. This is all part of how we can trust our data product.

In Microsoft Fabric, the refresh log provides us with the observability of the data product. For the SLA and data quality, there is no documentation possibility inside of Microsoft Fabric, unless you buy the data catalog purview as an additional tool that integrates with Microsoft Fabric. Here you can document your data products. Purview could also help ensure that a data product is findable through search as a point further below. Still, as Purview is an external tool that requires an additional licence, this is not further considered in this blog.

Reusability

In Fabric, you can reuse a table, dataflow or dataset as needed to develop other data products. An example is the semantic link from the semantic model that now can be queried inside a notebook in your data lakehouse.

Findable

One way to better manage a data product in Microsoft Fabric could be to take advantage of labelling where you can put “Endorsed” and “Certified” on items. For instance, to determine if an item is a data product, you can rely on the labelling in Microsoft Fabric saying “Certified”. However, it’s important to note that this labelling restricts its use to other items in Fabric. Additionally, it is essential to ensure that the label is retained specifically for this purpose.

Limitations

In terms of data products, there are a few features that could enhance the Fabric platform:

  1. Tagging and Categorization: Introducing a capability to tag or categorize data items within Fabric would allow users to easily label their data as a certified Data Product. This would enable efficient organization and retrieval of specific datasets.
  2. Journey Tracking and discoverability: It would be beneficial to have a feature in Fabric that tracks the journey of a data product throughout its lifecycle. This way, users can easily monitor and trace the movement of their data items within the platform.
  3. Documentation and Restrictions: Providing more comprehensive documentation for data products is crucial. Users should have access to clear instructions on how to utilize and connect to the data, as well as any associated restrictions on usage. This information will help users leverage the data effectively and ensure compliance with any contractual obligations.
  4. Data Product Contract Specification: Introducing a data product contract specification feature in Fabric would be advantageous. This would allow users to define contractual terms for their data products. The contract could specify details such as permitted usage, data access restrictions, and any specific requirements for utilizing the data.

By incorporating these features, Fabric could offer a more robust and user-friendly experience for managing data products.

How do you handle your data domains in Microsoft Fabric?

A domain is a grouping of data, technology, teams, and people that work within the same analytical realm, usually within a business area. Examples of domains could be the organizational structure, like Sales, Marketing, Finance, and HR. It can also be more fine-grained or connected to a value chain, like Orders, Production, and Distribution. All of this depends on your organization and how the domains would serve you and your business in the best way. The key thing here is that each domain can be autonomous and have ownership over the data products that naturally belong to their domain.

Since each domain is autonomous, they can develop their data products and govern these as desired. Still, the governance should be aligned with the centralized data governance. I will come back to this. The ETL and data modelling are also handled locally by the domain while taking advantage of the self-service data platform provided through the central organization.

What are Domains in Microsoft Fabric?

To understand how you can set up your governance in Fabric, let us first take a look at the new functionality that was added to Fabric, Domains.

The domains introduced in Fabric are a way to support the data mesh architecture. You can assign the workspaces owned by a business area to one domain. So all your sales-related workspaces could be in your Sales domain, while your marketing-related workspaces could be inside the Marketing domain.

Microsoft Fabric is built from the Power BI developer perspective, and the specialist tools that Microsoft has brought into Fabric as Synapse and ADF are now more available for the generalist. This enables domains to become more independent of technical competence and self-sufficient. This drives the efficiency of each domain.

Limitations

The management of each domain could have been more detailed inside the admin portal of Microsoft Fabric. Today you can only automate what workspaces are places in which domains and who can be the admin of this. It would have been interesting if you could set up policies for each domain and have more possibilities to govern this.

How do you handle your self-serve data platform with Microsoft Fabric?

The next component of the data mesh approach is the self-serve Data Platform. The self-serve data platform is a centralized service that serves the domains with their need for infrastructure, storage, security, access management, ETL pipelines and more. Some call this data infrastructure a platform to highlight that the self-serve platform should serve the infrastructure, i.e. all the technical components and their integrations required to build data products.

This simplicity in distributing Microsoft Fabric as a data platform as a service is one of the biggest strengths. As it is a SaaS it provides you with all the necessary infrastructure and integrations to start building your data products.

By designating a Microsoft Fabric domain for each data mesh domain, organizations effortlessly extend the self-serve capabilities to every corner of their ecosystem. This inclusivity means that every Fabric capacity holder gains unfettered access to the diverse experiences offered by Fabric, empowering them to develop their individualized data products.

How do you set up data governance with data mesh in Microsoft Fabric?

To make each domain autonomous in its data product development and management, a federated governance model is used. The federation of governance is a way of deferring responsibilities to enable scalability. It promotes independence and accountability. This way, the domains can govern their data products in a way that is effective and makes sense to them.

Still, there should be some centralized governance providing a set of standards, and best practices, setting the necessary boundaries, and being a center of excellence, providing expertise to the federated domains.

In this article, I will focus on how data governance fits into the data mesh architecture. For those interested in the specific governance features of Microsoft Fabric, a blog post on setting up data governance within this framework is available.

How do you set up your Data Governance in Microsoft Fabric?

Federated Governance

The Domains enables federated governance in Microsoft Fabric. There is also a new role created with this and that is the domain admin that can delegate responsibilities to contributors.

Through your workspace roles, you manage who can work with what from a technical developer’s perspective. To see the different capabilities that the different roles have, refer to the Microsoft documentation here.

Centralized Governance

Fabric in itself will enable some standardization and enforce some standardisation and controle as there is a finite number of ways to develop your products and make them accessible.

The Purview hub within Microsoft Fabric emerges as a cornerstone for centralized computational governance. This hub offers a level of centralization that enables a comprehensive overview of domains and data products, allowing stakeholders to assess the status of each domain. It serves as a control centre, facilitating both a holistic perspective and the ability to drill down into individual domains for detailed evaluations.

In Microsoft Fabric you can also take advantage of some built-in policies such as the Data Loss Prevention policies and labelling that is further described in the blog linked above.

Limitations

While Microsoft Fabric inherently provides a level of standardization and control due to its finite number of development approaches and accessibility options, there are limitations. Notably, the platform currently lacks the capability to establish standards and patterns for solution development. More possibilities and granular control levels to set up access policies and development policies would be interesting.

Another example where Microsoft Fabric falls short is Master Data Management. There is no integrated solution enabling functionalities such as survivorship and the creation of a golden record, necessitating reliance on external tools.

Summary

In summary, while there are limitations in the Microsoft Fabric setup when implementing a data mesh architecture, I believe that Microsoft Fabric significantly enables some of the most crucial features of data mesh, particularly infrastructure as a service from a central team and the inherent enforcement of central governance through the limited number of methods for the domains to develop their products. While additional levels of control and options to monitor the data product journey would have been desirable, I am currently of the opinion that Microsoft Fabric comes pretty close.

Hope you found this article helpful!

Usefull links:

Is Data Mesh your enabler, or is it just creating a data mess? — 31. Oct 2023

Is Data Mesh your enabler, or is it just creating a data mess?

Coming from the world of Microsoft analytics, I got curious as to why Microsoft chose to go with “Fabric” as the name of the newly released analytical solution, Microsoft Fabric. Looking into this, the architectural concept of Data Fabric became more relevant. I had been working with Data Mesh for a while, but the Fabric architecture was something I had not heard that much about before. Of course, meant I needed to know more.

The plan was to write a short and easy blog about the differences between Fabric Architecture and Data Mesh and then see how these two architectures look inside Microsoft Fabric. Turns out there is too much to say, so I had to turn this into a mini-series. And first out, we have the Data Mesh architecture!

So, let’s have a look at what the Data Mesh is. What are the benefits of this architecture, and are there any limitations?

Stay tuned for more content on Fabric Architecture, and how Data Mesh and Fabric Architecture can be set up in Microsoft Fabric!

  1. What is Data Mesh Architecture?
    1. Domains
    2. Data Products
    3. Federated Governance
    4. Self-serve Data Platform
  2. Why is the Data Mesh approach gaining traction?
  3. What are the different approaches you can have for Data Mesh?
    1. Fully Federated Domain Topology
    2. Governed Domain Topology
    3. Partially Federated Domain Topology
  4. What are the main benefits of a Data Mesh Architecture?
    1. Autonomy
    2. Scalability
    3. Closer collaboration between business and tehcnology
  5. What are the main challenges of Data Mesh?
    1. Risk of creating isolated data hubs
    2. Not serving the organizations data model
    3. Missing a harmonized strategy
  6. So, is Data Mesh you enabler, or is it just creating a data mess?

What is Data Mesh Architecture?

Data mesh is both a data management approach and a data architecture. To be successful, it is not enough to only think about the architecture and technology, you might also need to change your organizational processes and structure. Having both IT and the business onboard is therefore a crucial success factor when implementing data mesh in your organization. In some ways, the data mesh approach brings data management and software architecture together.

Multiple great articles describe and define the data mesh approach. I will link these below, but also try to explain them here in my own words.

So, let’s do it! The data mesh approach comes from the domain-driven design and bounded context way of thinking. And we find these concepts in the 4 components that make up Data Mesh:

  • Domains
  • Products
  • Federated Governance
  • Self-Service Data Platform

Domains

A domain is a grouping of data, technology, teams, and people that work within the same analytical realm, usually within a business area. Examples of domains could be the organizational structure, like Sales, Marketing, Finance, and HR. It can also be more fine-grained or connected to a value chain, like Orders, Production, and Distribution. All of this depends on your organization and how the domains would serve you and your business in the best way. The key thing here is that each domain can be autonomous and have ownership over the data products that naturally belong to their domain.

Since each domain is autonomous, they can develop their data products and govern these as desired. Still, the governance should be aligned with the centralized data governance. I will come back to this. The ETL and data modelling are also handled locally by the domain while taking advantage of the self-service data platform provided through the central organization.

Data Products

By introducing product thinking into the world of data, you get Data Products. A data product is a data asset that should be trusted, reusable, and accessible. The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers.

This means that a domain can use data products from other domains to build their data products. It is these links that give the mesh of a data mesh with all the connections between the domains and the data products.

In practice, a data product can be many things. Examples could be a Power BI Dataset, a parquet file, an SQL table, a Power BI report, etc. The modelling of the data product and the needed ETL process to build the data products are handled by the domains.

Federated Governance

To make each domain autonomous in its data product development and management, a federated governance model is used. The federation of governance is a way of deferring responsibilities to enable scalability. It promotes independence and accountability. This way, the domains can govern their data products in a way that is effective and makes sense to them.

Still, there should be some centralized governance providing a set of standards, and best practices, setting the necessary boundaries, and being a center of excellence, providing expertise to the federated domains.

Self-serve Data Platform

The last component of the data mesh approach is the self-serve Data Platform. The self-serve data platform is a centralized service that serves the domains with their need for infrastructure, storage, security, access management, ETL pipelines and more. Some call this data infrastructure as a platform to highlight that the self-serve platform should serve the infrastructure, i.e. all the technical components and their integrations required to build data products.

The centralization of this service also enables some of the standardization in the centralized governance.

Still, what each organization puts inside the “Self-serve data platform” box might vary, and some open up for flexibility in technology choices.

Why is the Data Mesh approach gaining traction?

To understand this, it can help to have a look at what we are trying to move away from in the data and analytics world. Previously we have seen siloed proprietary enterprise data warehouses. They are complex, with long development processes, low scalability and high cost.

But it is not only the old data warehouse that is the challenge of organizations today. Also, the more modern concept of data lake has proven to be a challenge. The data lake has for some organisations become this big data box that holds all of the organization’s data, operated by a centralized and specialised team of data engineers. For many, this results in a siloed data lake that works as a bottleneck for developing new solutions and gaining new insights.

The monolithic data warehouse or data lake and the centralised operating model become the bottleneck for development and turning data into insights.

What are the different approaches you can have for Data Mesh?

There are also differences in how an organization might choose to interpret the data mesh approach. These are well described in the book Data Management at Scale (Strengholt, 2023) where Strengholt highlights different degrees of the data mesh interpretation, or different topologies as he calls it. I will briefly summarize the ones I find the most interesting below as I think it highlights some of the complexity of the data mesh.

Fully Federated Domain Topology

This solution has no central orchestration, with a strong emphasis on federation. Here you have fine-grained decoupling and high reusability. The domains themselves are independent and serve data products to other domains. You don’t have any central authority and the compliance is enforced through the centralized platform.

The benefit of this approach is the high degree of flexibility with few dependencies. It also promotes the reuse of data products, as there would naturally be a large production of data products.

The challenge with this fully federated approach is that the independence of the domains makes the critical need for harmonization on data interoperability, governance, security standards and metadata difficult. Alignment can be challenging. Also, the nature of fine fine-grained federation promotes more separation in the architectural setup, meaning that you might find the integration job of making all these fine-grained products talk to each other difficult.

The fully federated approach poses a challenge in terms of harmonizing data interoperability, governance, security standards, and metadata. The independence of each domain makes it difficult to align and integrate the fine-grained products. The nature of this approach promotes separation in the architectural setup, making it challenging to establish communication between these products. Also, if you need to pull data from multiple data products and domains to solve analytical needs, you will probably have a challenge with meeting the need for high data quality, performance and interoperability.

Decentralization also requires a significant level of independence and technical expertise within each domain. It requires a substantial pool of highly skilled data professionals who understand the intricacies of the data mesh methodology. To successfully adopt the data mesh approach, an organization must have sufficient traction and a wide array of data products that demonstrate the value of embracing a data product mindset. However, building and sustaining such teams of data professionals can be a substantial investment for any organization.

Governed Domain Topology

A step away from the most fined theoretical data mesh approach is to centralize parts of your mesh components. In the Governed Domain Topology, the data product architecture is centralized. This way, the consumption of data products is also centralized making them more discoverable. Integrations become less complex, and standardization on metadata, distribution or consumption is easier to implement and enforce.

Despite the numerous advantages, the central distribution of your data can sometimes become a bottleneck. Moreover, if your data landscape consists of various cloud providers or technologies, integrating them into a centralized data distribution can present a significant challenge.

Partially Federated Domain Topology

Other organizations might want to go all in on the data mesh approach, but due to their technical setup and/or lack of data engineers and resources need to go with a more centralized solution with some federation.

You can have partly centralized data on the source system side, while the consumption side is more distributed. I like to think of this as more centralization closer to the sources. Your first data transformation steps, as in the bronze layer of a medallion architecture, or your landing and transformation zone is centralized. While your distribution layer or gold layer adopts the data mesh topology.

The challenge with this is the possible bottleneck on the source site, as bringing new sources into the solution requires a centralized team of data engineers. Less autonomy for the data product consumers domain.

What are the main benefits of a Data Mesh Architecture?

Autonomy

The federation and domain-driven design bring about a paradigm shift in the way organizations approach software development. By advocating for independence and accountability, these methodologies empower teams to take ownership of their domain and develop autonomous solutions. This decentralization fosters a culture of innovation and agility, allowing teams to adapt quickly to changing requirements and market demands.

Scalability

The scalability of these methodologies is another key benefit. With each team operating independently and focusing on their specific domain, the overall system becomes highly scalable. This means that as the organization grows and new functionalities are required, additional teams can be introduced seamlessly without disrupting the existing ones. This modular approach enables organizations to effectively manage complex projects and easily accommodate future growth.

Closer collaboration between business and tehcnology

One of the notable advantages of the data mesh approach, which is closely aligned with domain-driven design, is the closer collaboration between business and technology. By placing ownership of data in the hands of the business, this approach enables better alignment between data management strategies and overall business goals. It encourages cross-functional communication and enhances the understanding of data within the organization. This alignment fosters a shared vision and empowers decision-makers to make informed choices based on business objectives.

What are the main challenges of Data Mesh?

Risk of creating isolated data hubs

The decentralization certainly enables scalability and high productivity, but it can also lead to chaos. Without a certain level of centralization where the organization as a whole can establish boundaries, standards, and best practices, there is a risk that each team will develop their architecture, and choose its technologies, standards, data formats, and more. This may result in each team being solely responsible for their data, creating isolated data hubs that cannot be combined or integrated with other domains. Consequently, the overall value proposition of a data mesh can be compromised.

Not serving the organizations data model

The concept of the data product approach challenges the notion that there is a single data model that applies to the entire organization. While this may hold to some extent, in reality, there are interconnected relationships between domains and data products within an organization that should be standardized and governed. These relationships play a crucial role in maintaining the integrity and quality of your data model.

Missing a harmonized strategy

More decentralization makes it more difficult to harmonize around a strategy and set centralized governance, boundaries and standards. You can end up with siloed data domains or multiple fragmented data warehouses, ultimately blocking one of our initial justifications for implementing the data mesh approach, the organizational scalability.

So, is Data Mesh you enabler, or is it just creating a data mess?

Short answer, it depends.

Even though there is some great literature out there explaining and even defining the how-to’s of a data mesh approach, the reality is that organizations quickly interpret the approach to fit their organizational structure.

That can play out in different emphasis on the centralised components of the data mesh structure, opening up for too much autonomy creating as the ultimate consequence siloed data domains that create data products that are not interoperable and consumable for the organization as a whole. It can also lead to data products with different interpretations of the data model that can ultimately result in different truths.

However, the data mesh approach is the result of the need to move away from the monolithic data warehouse or data lake and the centralised data engineering team. It does enable autonomy and scalability with its federation.
A key enabler for data mesh will therefore be, despite the decentralised focus, a centralised plan. A data strategy containing some standards, an architecture, overall governance and best practices or rules. This will help you ensure that your data mesh doesn’t become a mess.

Hope you found this article helpful!

Useful links:

How do you set up your Data Governance in Microsoft Fabric? — 11. Oct 2023

How do you set up your Data Governance in Microsoft Fabric?


What is Data Governance in Microsoft Fabric?

So, what is data governance in Fabric? The governance domain contains many capabilities. If you follow the DAMA approach, you know that they have divided Data Governance into 10 capabilities, looking into architecture, data warehousing, operations, security, quality, and everything you do with your data from the source to delivered insights.

In Fabric, obviously, everything regarding Data Governance is still important. Despite that, in this article, I will focus on the specific Fabric components and features that help you govern your data in Fabric.

Let’s take a look at the new features Domains in Fabric, how Data Lineage is implemented, Roles and access management, policies and processes, and the purview hub.

I have previously written a blog on what your Power BI governance should contain. As fabric makes up more of your data ecosystem, you will additionally need to focus on other governance capabilities. Still, if you want to look into Power BI-specific governance, you can have a look at that one here:

What are Domains in Fabric?

To understand how you can set up your governance in Fabric, let us first take a look at the new functionality that was added to Fabric, Domains.

Today, the distributed and federated model is becoming more and more popular for organizations. The data mesh architecture is gaining traction, where you decentralize data architecture and have each business domain govern their own data.

The domains introduced in Fabric are a way to support this data mesh architecture. You can assign the workspaces owned by a business area to one domain. So all your sales-related workspaces could be in your Sales domain, while your marketing-related workspaces could be inside the Marketing domain.

Which Roles do we have in Fabric?

In Microsoft Fabric, you can divide your roles into three areas. You have your domain roles, tenant roles, and workspace roles. The Capacity admin and domain admin can delegate some of their responsibilities to contributors.

In the world of data governance, your domain admin could be your data owner or a technical resource working on behalf of the data owner, while the domain contributor could be your data steward. You could also give the domain admin role to your data stewards, depending on the role definitions in your organization.

The capacity admin and capacity contributor, as well as the overall Fabric admin, would normally be assigned to technical roles in your organization.

Through your workspace roles, you manage who can work with what from a technical developer’s perspective. To see the different capabilities that the different roles have, refer to the Microsoft documentation here.

Access Management in Fabric

There are four permision levels in Fabric

  • Workspace Permission
  • Item Permission
  • Compute Permission
  • Row and column level permission

Workspace permission in Fabric

Workspace permission provides access to all the items that are inside a workspace. That means you get access to all datalakes, data warehouses, data pipelines, dataflows, datasets, reports, etc. in the workspace.

In the workspace, there are also two access types you can give:

  • Read-only
  • Read-write

The read-only role is a viewer role that can view the content of the workspace, but not make changes. The role can also query data from SQL or Power BI reports, but not create new items or make changes to items.

Read-write is the Admin, Member, and Contributor role. They can view data directly in OneLake, write data to OneLake, and create and manage items.

Items permission in Fabric

Item permission makes it possible to provide access to one spesific item in the workspace direcly, without granting access to the workspace and all the items in the workspace.

This can be done through two methods:

Give access through Sharing

This feature can be configured to give connect-only permissions, full SQL access, or access to OneLake and Apache Spark.

In the Microsoft documentation page you can read the details on what the different sharing permissions provide access to

Give access through Manage Permissions

Here you can give direct access to items, or manage your already provided accesses.

Compute permission in Fabric

You can also provide access through the SQL endpoint in Fabric.

As an example, if you want to provide viewer-only access to the lakehouse, you can grant the user SELECT through the SQL endpoint.

Or, if you want to provide granular access to specific objects within the Warehouse, share the Warehouse with no additional permissions, then provide granular access to specific objects using T-SQL GRANT statement.

Column-Level & Row-Level Security for Farcic Warehouse & SQL Endpoint in Fabric

On October 3rd, Microsoft announced the public preview of Column-level and Row-level security for the Fabric warehouse and SQL endpoint.

Read the full announcement here: https://blog.fabric.microsoft.com/en-us/blog/announcing-column-level-row-level-security-for-fabric-warehouse-sql-endpoint

Row-level security allows you to control access to specific roles in your table for certain users or groups. This means you don’t have to create separate tables or reports to provide access to only certain parts of your data for specific users. For example, you can give a store manager access to only the sick leave data of their employees.

Column-level security works similarly, but it operates at the column level. This means you can restrict access to specific columns of a table, such as GDPR-related data like a customer’s full name, while allowing more users to access the remaining data in that table.

These ways of providing access can help you simplify management, reduce duplication, and increase the security of your data assets.

Best practices for access in Fabric

Microsoft has provided som general advice in what access type you should use when providing access to workspaces and specific items in Fabric. The following advice was found in the documentation here.

Write access: To have write access, users must be in a workspace role that allows writing. This applies to all data items, so limit workspaces to a single team of data engineers.

Lake access: To allow users to read data in OneLake, add them as Admin, Member, or Contributor, or share the item with ReadAll access.

General data access: Users with Viewer permissions can access data through the SQL endpoint for warehouses, lakehouses, and datasets.

Object level security: To keep sensitive data safe, grant users access to a warehouse or lakehouse SQL endpoint using the Viewer role. Then, use SQL DENY statements to limit access to specific tables.

Processes and Policies in Fabric

Information Protection in Fabric

Information Protection in Microsoft Fabric is based on labeling your data. This way you can set up sensitivity labels on your data in Fabric in order to monitor it and ensure that data is protected, even if it is exported out of Fabric.

These sensitivity labels are set up through the Microsoft Purview portal.

On Microsofts documentation pages you can see what type of labeling is possible, what scenario you should use what, and if it currently is supported in Fabric. See the full documentation here: https://learn.microsoft.com/en-us/fabric/governance/information-protection

Below I have pasted the label overview from that documentation:

Data Loss Prevention in Fabric

You can also set up Data Loss Prevention (DLP) policies in Fabric. So far it is only supported for datasets. You set up these DLP policies inside the Microsoft Purview compliance portal.

When setting this up in the Microsoft Purview portal it looks like the only policy category supported for Power BI/Fabric now is “Custom”.

For the DLP policy you can set up a set of actions that will happen if the policy detects a dataset that contains sensitive data. You can either set up:

  • User Notification
  • Alerts sent by email to aministrators and users

The DLP Policy will run every time a dataset is:

  • Published
  • Repluished
  • Refreshed through an on-remand refreshed
  • Refreshed through a scheduler refresh

When using the DLP feature, it is important to remember that a premium license is required. Additionally, it is worth noting that the DLP evaluation workload utilizes the premium capacity associated with the workspace where the dataset being evaluated is located. The CPU consumption of a of the DLP evaluation is calculated as 30 % of the CPU concumed by the action that triggered the evaluation. If you use a Premium Per User (PPU) license the cost of the DLP is covered up front by the lisence cost.

Endorsement in Fabric

In Fabric you can endorse all items except for the Power BI dashboards. Endorsement is a label you can use on your items to tell your Fabric users that this items hold some level quality.

There are two endorsments you can give an item:

  • Promoted
    • What is it?
      • Users can label items a spormoted if they think the item hold a high standard and could be valuable for others. Somone think the item is ready to use inside the organisation and valuable to share!
    • Who can promote?
      • Content owners and memebers with write permissions to items can promote.
  • Certified
    • What is it?
      • Users can label items as certified if the item meet organizational quality standards, is reliable, authorative and ready to use accross the organixation. Ites with this label holds a higher quality than the promoted ones.
    • Who can certifiy?
      • Fabric administrators can authorize selected users to assign the certified label. Domain administrators can be delegated the eanblement and configuration of specifying reviewers within each domain

Data Activator in Fabric

Microsoft released Data Activator in public preview on October 5th. It’s a tool that helps you automate alerts and actions based on your Fabric data. Using Data Activator, you can avoid the need to constantly monitor operational dashboards manually, helping you govern your data assets in Fabric.

Data activator desereves its own blog post, so I will just mention it here as a component to take advantage of in your Data Governance setup.

Read the full announcement here: https://blog.fabric.microsoft.com/en-us/blog/announcing-the-data-activator-public-preview/

Data Lineage and Metadata in Fabric

Data Lineage and Metadata management is an important enabler for governance. It help you get an overview of your data assets and can also be used to enable obsservability features of your data assets.

Metadata scanning in Fabric

In Fabric you can take advantage of metadata scanning through Admin REST APIs to get information as name of item, owner, senseitivity label, endorsement. Also, for datasets you can get more detailed information about that item as table and column name, DAX expressions and measures. To use this metadata scanning to collect information for your data consumers to look up existing ddata assets, and for the administrators and governance roles to manage the assets is benefitial.

There are four scanner APIs that can be used to catalog your data assets:

Lineage in Fabric

Each workspace in Fabric got a lineage view that can be acceessed by anyone with the Admin, Member or Contributor role for that workspace.

Lineage provdes an overview of how data flows through your items in Fabric, and can be a great way to answer questions as “What is my source for this report?”, “If I change this table, will any other data assets be affected?” and so on.

To view the lienage view for a workspace, click the lineage sign on the top right of the workspae. To see the lineage focused for one specific item, click on the “Show lineage” symbol on the right side of that object. To see the impact analysis, click the “Show impact accross workspace” symbol.

Microsoft Purview Hub in Fabric

In Fabric, administrators have access to Microsoft Purview hub that is a centralized page in Fabric that provide insight on the current state of their data assets.

Inside the Microsoft Purview hub consists of two main components:

  • A portal that will send you to Microsoft Purview. You need to have purchsed Microsoft Purview to take advantage of this.
  • A Microsoft Fabric data report that give you insights on all your Fabric Items. You can open the fill report to view more detailed information in the following report pages:
    • Overviw Report
    • Endorse Report
    • Sensitivity Report
    • Helo
    • Items page
    • Sensitivity page

This report give you insights on how many of your items are labeled with endorsement or sensitivity by item type and workspace. It also provides you with the overview of how your admins or contributors are working with labeling your Fabric items. If your organization have defined data stewards, this would be where you could see the overview of how your data stewards are governing the Fabric items.

Why is Data Governance in Fabric important?

Fabric is a great tool in the way it lowers the barrier of starting to develop new data assets. Also, as it is built from the business user perspective, starting from the Power BI user interface, it also lowers the technical barrier for many Power BI report developers and business analysts to do more with their data assets.

This is a great advantage, but also opens up for some new challenges. Anyone who has been governing BI assets in the past knows the struggle of making sure the reports are developed, managed, and governed in the right way. With Fabric, lowering the technical barrier to do more with your data, and moving further back in your development process, it also becomes easier to do things the wrong way.

Therefore, I think governance in Fabric is more important than ever.

Hope you found this article helpful!

Usefull links:

Source Control in Power BI – What are your options? — 1. Feb 2023

Source Control in Power BI – What are your options?

Updated 21.06: With the realease of Microsoft Fabric, built in Git integration is now a part of your workspace solution inside Fabric (!). More information here: https://learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration

The rest of this article refers to the possibilities before this built in Git feature was released.

So version control in Power BI has been a challenge from the start. What do you do if the changes you made messed everything up and you want to roll back to an older version? Or if the file you saved on your computer is (unintentionally) deleted? What if parts of the changes you made where really good and you would like to update other reports with the same changes? Then what?

No built-in solution for this exists in Power BI today. If you have a Premium licence, deployment pipelines are an option. The Deployment Pipelines will help you get better control of your data, and provide an overview of the differences between the content in workspaces. However, it is not solving all the challenges we have mentioned above. I have written an article about that feature here:

We are still missing source control! This is a feature many wished for from the Power BI team. The idea has been voted on at ideas.powerbi.com. In March 2022 an administrator updated the idea to: We are working on this item but no timeline can be shared yet. We appreciate your patience

So, while we wait for this new feature to come, let’s take a look at your options!


Version control your .pbix files using OneDrive or Sharepoint

The .pbix file is a binary file. That means that it is not possible to track changes. So, what can we do?

OneDrive and Sharepoint contain built in solution for version control witch means that you can get access to a previous versions.

Pros:

  • Easy to start using!

Cons:

  • You have to make all changes in Power BI Desktop in order to get the version history. This means that you are losing some flexibility of Power BI Service.
  • As you are working with the binary file you cannot:
    • Do diffs or schema compares of the versions
    • Cannot merge the files or have multiple developers working at the same time

Considerations:

  • Make sure that there is a process in place so that all developers that collaborate on the same reports use the same OneDrive/Sharepoint folder.

Version control your .pbix files using Git

You could also use GitHub or Azure DevOps to store your file and track the version of your .pbix file by committing the file to git after making changes as you would with other files in Git.

Pros:

  • Could be incorporated into existing development process using git

Cons:

  • Same as for using OneDrive or Sharepoint folder:
    • You have to make all changes in Power BI Desktop in order to get the version history. This means that you are losing some flexibility of Power BI Service.
    • As you are working with the binary file you cannot:
      • Do diffs or schema compares of the versions
      • Cannot merge the files or have multiple developers working at the same time

Considerations:

  • As always it is important to keep in mind the development process for developing Power BI reports. You need to make sure all the developers use this setup for it to work.
  • Also consider using large file storage for git. Read more on that topic here: https://git-lfs.com/

Marc Lelijveld has written an article on how you can automate this process using Azure DevOps: https://data-marc.com/2021/06/04/integrate-power-bi-deployment-pipelines-with-azure-devops/

Tabular editor to source control your Power BI data model

When taking advantage of external tools like Tabular Editor, you can save your file as a .bim file.

Before we look into the .bim file, we need to take a quick look at how a Power BI Report is built up. It is divided into two components: Report and Dataset

The Dataset is the data model that holds your data and the changes you have made to it as transformations or measures.

The Report contain all the report pages and visualizations you have set up. It is the visual part of your Power BI report. All of the visualizations in a report come from a single dataset. 

The fact that you can differentiate these two affect how you can enable source control in Power BI.

A .bim file is essentially the metadata of your Power BI datamodel. As this is a json file it works well with source control. We can then use this file to track changes in our Power BI Datamodel.

Pros:

  • Possible to track changes made to your datamodel.

Cons:

  • We are missing tracking on the visual part of your report. For instance if you make some changes to your visuals or report pages, that is not part of the .bim file.
  • Extraction of the .bim file requires an external tool and a manual step to your process.

Considerations:

  • To automate this you need to take advantage of the XMLA endpoint. However, this requires premium capacity or a premium per-user license.
  • You cannot deploy these changes back into Power BI Desktop. If you have a premium lisence you could use tools like ALM Toolkit to deploy changes to the XMLA endpoint.

Gerhard Brueckl has written an article on how you can automate some of the manual steps in this article: https://blog.gbrueckl.at/2022/02/automating-the-extraction-of-bim-metadata-from-pbix-files-using-ci-cd-pipelines/

Useful links:

What, How, When and Why on Power BI Governance — 21. Dec 2022

What, How, When and Why on Power BI Governance

The What, How, When and Why on Power BI Deployment Pipelines!

  1. What is Power BI Governance?
  2. How can you set up Power BI Governance?
    1. Power BI Governance Kick-start!
      1. Roles and Access management
      2. Processes
      3. Styleguide
      4. Monitoring
      5. Training
    2. Microsoft Power BI Implementation Plan
    3. Ásgeir Gunnarsson’s Power BI Governance Series
  3. When should you use Power BI Governance?
  4. Why should you use Power BI Governance?

What is Power BI Governance?

Power BI Governance is supposed to help you out with leveraging the value and insights from your Power BI Reports.

Let’s start with what is Governance.

Governance is the process of interactions through the laws, norms, power or language of an organized society[1] over a social system (family, tribe, formal or informal organization, a territory or across territories), (Wikipedia)

Then what is Power BI Governance?

Power BI Governance is the policies, processes, roles, rules and guidelines to ensure a level of control and management to leverage the value of Power BI, (from Marthe’s head)

How can you set up Power BI Governance?

This is not an easy question to answer within a short hill sprint. A more detailed article might be needed here, but – let’s start with the most important.

Every Power BI Governance will be set up differently for each organisation. I, therefore, think a set of open questions to start your journey makes the most sense.

Below I have listed a set of domains with questions and if you can answer these I think you have a good starting point for your Power BI Governance! You might also not need all the domains, or maybe some of these can be scaled down for you – it depends on how your organization uses Power BI.

Further down in this post I have listed the documentation on this from Microsoft as well as from my friend Ásgeir Gunnarsson that provide two different approaches on how to take on Power BI Governance.

Power BI Governance Kick-start!

Roles and Access management
  • What roles do you need?
  • Do you need any hands-on roles defined?
    • Report Developer Role
    • Report Consumer
    • Power BI Business Analyst
    • Other?
  • Do you need any Power BI Management roles?
    • Power BI Administrator
    • Report Owner
    • Workspace/App owner
    • Data Owner
    • Other?
  • What are the scope and responsibilities of these roles?
  • What accesses should they have?
  • What are they allowed to do and not to do?
  • What is the level of access for these roles?
  • What processes do these roles need to follow?
Processes
  • Development Process
    • Who can develop Power BI Reports?
    • What sources can you use for your reports?
    • Is there a best practice for reusing dataflows, datasets or datamarts?
    • Where do you store your report? Are there for instance a pipeline in DevOps set up for versioning control? A common OneDrive folder to save your work? Or should you use the Deployment Pipelines in Power BI Service?
    • Should there be any guidelines on Import vs Direct Query?
  • Publishing Process
    • Who can publish and to what workspaces?
    • Who can set up a workspace in Power BI Service?
    • What are the access guidelines to the workspace and who manages these?
    • What are the accessibility guidelines for the app and who manages these?
    • Should you use Deployment Pipelines or a DevOps pipeline when publishing?
    • Are there any sign-offs that need to be done before publishing?
    • Any checklist that should be considered before publishing?
  • Quality sign-off process
    • This quality process could be a part of the publishing process or something that is repeated in defined intervals to ensure the quality of existing reports in Power BI Service.
    • What should be evaluated in this process? What ensures good Power BI quality?
    • Data modelling?
    • DAX code?
    • Should there be a checklist of best practices on how to improve the performance of the report itself?
  • Security process
    • Should comply with the security rules that already exist for the overall company
    • Is it ok to connect to all data sources? Are there any limitations here?
    • How do you secure:
      • App
      • Workspace
      • Report
      • Dataset
      • Gateway connections
    • When and how to use:
      • AD groups
      • Row-level security
      • Data source security
  • Sharing Process
    • What reports/Datasets can be shared?
    • Are there any limitations on sharing content across business areas, data domains or to another organization?
    • How can you share content for the different Power BI objects (Apps, Workspaces, Reports, Datasets, Dataflows, Datamarts, Deployment Pipelines etc.)?
  • Administration Process
    • How should the Power BI tenant be managed?
    • What settings in the tenant should be fixed?
    • How are these settings and the reasoning behind these documented?
    • When can you change tenant settings?
    • How are the users of Power BI informed when such changes are made?
    • How can a new administrator be added?
Styleguide
  • Do you want to provide strict rules or guidelines?
  • Should you have a Power BI template?
  • Are there any colour standards, or themes that should be used?
  • Any guidelines on the usage of logos?
  • Should there be standardization on where you place different types of visuals on a report page?
  • Any best practices on the type of visuals that should be used?
  • Should there be different templates for different levels of reporting (Management Report, Operational Report, Trend report, etc.)?
Monitoring
  • I don’t think there should be a question on whether or not you should perform monitoring, but it could be good to decide on:
    • What role is responsible for following up on monitoring?
  • There are different ways of monitoring your Power BI objects.
    • Track user activities in Power BI. Read more HERE
    • Power BI Premium capacities: App delivered by Microsoft. Read more HERE.
Training
  • Should there be any training?
    • (I think the answer here should be Yes. I mean, think of all the amazing governance you have just set up. People need to know that this exists and how to use it. )
  • Should there be different training for different roles?
  • Is the training mandatory?
  • Should some roles get the training with a set interval? (Every year, Every other year)

Microsoft Power BI Implementation Plan

Microsoft got a Power BI Adoption Roadmap and a Power BI Implementation Plan that provides a step-by-step roadmap you can take advantage of.

Power BI Adoption Roadmap steps:

Key takeaways:

  • The power BI Adoption roadmap provides a broader view of how you should move forward when implementing Power BI. You start out by evaluating maturity, data culture and how you should govern your solution.
  • By going through this roadmap you end up with an overview on the current state of each point mentioned above, and you should be able to pinpoint the next steps to how you can achieve your future state.

Power BI Implementation Guideline steps:

  • BI strategy
  • User needs and opportunities
  • Authoring tools and user machines
  • Tenant setup
  • Subscriptions, licenses, and trials
  • Roles and responsibilities
  • Power BI service oversight
  • Workspaces
  • Data management
  • Content distribution and sharing
  • Change management and deployment
  • Security
  • Information protection and data loss prevention
  • Power BI Premium
  • Gateways
  • Integration with other services
  • Auditing and monitoring
  • Adoption tracking
  • Scaling and growing

Key takeaways:

  • The implementation guideline focus also on technical implementation, and not only governance. In that sense, they get a bit technical which is nice!
  • The Power BI Implementation guideline is under construction, meaning that all articles have not been created yet.

Ásgeir Gunnarsson’s Power BI Governance Series

Also, my friend Ásgeir Gunnarsson has set up a great article on a Power BI Governance Series.

I think this is a great overview of what to do when implementing Power BI Governance without making it too big. In contrast to the Microsoft documentation, this is a bit easier to wrap your head around (I would argue).

Key takeaways:

  • Ásgeir focus on the non-technical aspects of data governance and why this is so important.
  • He describes a Power BI Governance Strategy by defining five pillars:
    • People
    • Processes and framework 
    • Training and support
    • Monitoring
    • Settings and external tool
  • My approach to data governance is inspired by Ásgeir’s work. You should check it out!

When should you use Power BI Governance?

Well, you should always use Power BI Governance, but here are some specific points for when you REALLY should use Power BI Governance.

Also, how much governance, and for what domains you choose to focus on should vary based on the size and complexity of your organization.

When there are multiple report developers and Power BI resources that are not working in the same team (and hence would therefore not necessarily have a common way of working with Power BI).

When you have a large organization gathering data from multiple data sources, and delivering reports for multiple business areas.

When there is a need to control access, source connections, and improve overall control and quality of your Power BI Reports, Dataflows, Datasets, Datamarts and usage. Have your reports gotten out of control?

Why should you use Power BI Governance?

In need of some points to add to your presentation when trying to convince the sponsors or IT department that Governance is needed for your Power BI?

  • Improves trust in your entire analytical database solution. If the reports continuously deliver the right insights and are findable and trustworthy, the overall view of your analytical platform will increase. Why spend a lot of time governing your database, when your Power BI Reports are all over the place?
  • Increased report quality – both on the user experience side and data quality.
  • Accelerate your organization to becoming data-driven. When your reports are both trustworthy and deliver high-value insights, making business decisions based on data becomes a lot easier.
  • Competitive advantage in the market.
  • You get more out of your Power BI resources as they are not needed to answer ad-hoc questions, dig into duplicate reports or KPIs, errors, access control that went wrong, etc. They can now focus on creating that insight from that data!

Useful links:

How did I prepare for the Certified Data Management Professional (CDMP) DAMA exam? — 1. Nov 2022

How did I prepare for the Certified Data Management Professional (CDMP) DAMA exam?

In my recent projects, Data Governance and Master and Reference data were one of the main deliverables. That means the need to dig into the domain of Data Governance! It is a challenging and somewhat daunting task, but also very motivating as the more I read on the topic, the more I see this value.

In this post, I explain how I prepared for the Data Management Fundamentals Exam to gain the CDMP Associate Certification. My next goal is to get the CDMP Practitioner certification.

  1. So, why should you take the Certified Data Management Professional (CDMP) certification?
  2. What are the DAMA certifications?
  3. How did I prepare for the Data Management Fundamentals Exam?
    1. 1. Give yourself a deadline
    2. 2. Purchase the exam
    3. 3. Read the DMBOK2
    4. 4. Do the practice exam over and over (and over) again

So, why should you take the Certified Data Management Professional (CDMP) certification?

With the ever-growing need for insights from data in combination with the exponential growth of data itself, the need for controlling this data has never been more relevant. Still, people either tend to frown or fall asleep when the topic of Data Governance is raised. My experience is that it often is associated with less freedom, more bureaucracy and slower progress.

But that is not true!

In order to serve the insights, consistency, quality, availability and security of the data within an organization, data governance is fundamental. As data is growing, self-service solutions are becoming more available, organizations are more at risk of security breaches as data is not where it should be, ambiguity and distrust in reports and databases as KPIs and master and reference data deviate – to mention a few challenges.

So, how can we build the necessary trust needed in our data?

When preparing for the certification, I learned that governance really is about much more than just making rules and policies – it is about understanding the bigger picture. We cannot only care about the roles, policies, and rules. We also need to consider the architectural approach, data modelling, storage solution, quality processes and security.

Data Governance is relevant to anyone who works with the bigger picture as Architects, Business Managers, IT Managers

Therefore I think the DMDBOK2 is relevant for anyone in a role that works to understand the bigger picture, whether it is technical or business. Not only the data governance-specific roles.

What are the DAMA certifications?

There are 4 certification levels you can take. In this blog I explain how I prepared for the Data Management Fundamentals Exam to gain the CDMP Associate Certification.

  1. CDMP Associate
    • 60 % pass on the Data Management Fundamentals Exam
  2. CDMP Practitioner
    • 70 % pass on the Data Management Fundamentals Exam
    • 70 % pass on two of the Specialist Exams
    • 2-10 years of industry experience
  3. CDMP Master
    • 80 % pass on the Data Management Fundamentals Exam
    • 80 % pass on two of the Specialist Exams
    • 10+ years of industry experience
  4. CDMP Fellow
    • 25+ years of industry experience
    • Globally recognised & respected thought leadership
    • A significant contribution to the Data Management profession 
    • CDMP Master
    • Contribution to CDMP & DMBOK
    • By nomination

How did I prepare for the Data Management Fundamentals Exam?

I did the 4 following:

  1. Give yourself a deadline
  2. Purchase the Exam
  3. Read the DMBOK2
  4. Do the practice exam over and over (and over) again

1. Give yourself a deadline

Start by making a master plan.

What week are you planning to take the exam? Block your calendar right away. You need approximately 2-3 hours to finish the exam.

DAMA recommends 3 weeks (with some level of focus and time set aside) to prepare for the exam. I personally used 5 weeks from I made the commitment until I took the exam. I think this was a bit too long as I was not stressed enough to prioritize time to prepare for the first couple of weeks.

Set a deadline and block an exam slot in your calendar from the start

tip #1

Anyways, we are all different! The most important thing is that you make a plan and set a deadline for yourself.

2. Purchase the exam

In order to stay true to my deadline I purchased the exam. The exam does not have an expiration date or scheduled time, and you can take it whenever it suits you. Still, I would recommend purchasing the exam when you have given yourself a deadline as the commitment is stronger, and you get access to a training exam that I found super useful!

Purchase the exam right away – this way you have commited from start

tip #2

The cost of the exam is USD$311.

See more on how to purchase the exam here. Choose “Purchase” under the Data Management Fundamentals Exam box.

3. Read the DMBOK2

You can purchase the DMBOK2 here.

The DMBOK2 consists of 17 chapters and the first 15 chapters are the ones you will be tested on for the Data Management Fundamentals Exam.

I started by reading the book cover to cover but then realised that I did not have time to do this for all chapters to stay true to my deadline. The chapters are weighted differently for the exam. Therefore, I started prioritizing the chapters with the highest representation in the exam. I skimmed the rest of the chapters, focusing on headlines, sections, figures and definitions.

Focus on the chapters with the highest weight

Tip #3

NoChapterPercentage
1Data Management Process2 %
2Data Ethics2 %
3Data Governance11 %
4Data Architecture6 %
5Data Modelling and Design11 %
6Data Storage and Operations6 %
7Data Security6 %
8Data Integration and Interoperability6 %
9Document and Content Management6 %
10Master and Reference Data Management10 %
11Data Warehousing and Business Intelligence10 %
12Metadata Management11 %
13Data Quality11 %
14Big Data2 %

4. Do the practice exam over and over (and over) again

When you purchase the exam you also get a practice exam.

The practice exam is in the same visual format as the actual exam. You get 40 questions and 30 minutes to answer them. This is the same relative time you will have to answer all the 100 questions on the actual exam. Therefore, you also get to practice answering the questions in the given time.

Personally, I find it demotivating to just read an entire book without a “task” at hand. Therefore, I used the practice exam questions to help me work with the different chapters and sections.

Whenever there was a question I answered incorrectly, I looked it up in the book and made sure I understood the topic for the relevant question. This was a great way to consume the book.

Use the practice exam actively to look up relevant topics in the book

tip #4

Let me know how it goes! Good luck!