Marthe Moengen

Gal in a Cube

Arcthitecture best practices in Fabric that enables your data governance journey — 24. Jun 2025

Arcthitecture best practices in Fabric that enables your data governance journey

With its low-code approach, Microsoft Fabric enables anyone to take on tasks that once required a data engineering background. It accelerates development, supercharges workflows, and integrates seamlessly with AI, both enabling AI and using AI to make you even more productive. Definitely super cool. 😎

But with this new speed and power comes a new level of responsibility. As AI becomes deeply embedded in our tools and decisions, the old adage still holds true: garbage in, garbage out. That’s why the architecture of your Microsoft Fabric environment matters more than ever.

Why? Because with the ease and speed of things in Fabric today, it is SO SIMPLE to create things, so how fast can you create a mess for yourself? Anyone using Power BI for a couple of years and going with the self-serve approach? Then you know what I am talking about.

So, a strong foundation ensures compliance, security, and data integrity to ensure you never lose control, end up with duplicates and ultimately low-quality data, because when AI acts on bad data or a flawed setup, the consequences can scale just as fast as the benefits.

Let’s take a look at the what initial steps you should consider for your Fabric architecture and why!

Jump to:

  1. How should you structure your items (Lakehouses/Warehouses) in Microsoft Fabric?
    1. ✅ Pros of Item Separation in Microsoft Fabric
    2. ⚠️ Considerations of Item Separation in Microsoft Fabric
  2. How should you structure your workspaces in Microsoft Fabric?
    1. ✅ Pros of Workspace Separation in Microsoft Fabric
    2. ⚠️ Cons of Workspace Separation in Microsoft Fabric
  3. How should you structure your Domains in Microsoft Fabric?
    1. ✅ Pros of Domain Separation in Microsoft Fabric
    2. ⚠️ Cons of Domain Separation in Microsoft Fabric
  4. How should you structure your Capacities in Microsoft Fabric?
    1. ✅ ️ Pros of Capacity Separation in Microsoft Fabric
    2. ⚠️ Cons of Capacity Separation in Microsoft Fabric

How should you structure your items (Lakehouses/Warehouses) in Microsoft Fabric?

I like to think of Fabric in this order when making the first decisions on the HOW we are going to set things up. Items define your options on workspaces, and workspaces define your options on domains and capacities. So, the first thing you need to think about is item separation.

Let’s use the medallion architecture as an example throughout this blog post to have something many are familiar with.

Would you like to separate the bronze, silver and gold layer into separate items – or do you want to group them into one lakehouse or warehouse? Or a mix?

✅ Pros of Item Separation in Microsoft Fabric

Clear Layer BoundariesEnforces architectural clarity between Bronze, Silver, and Gold layers.
Minimizes accidental data leakage between stages.
Enhanced Security & GovernanceEnables more granular control over access (e.g., only data engineers consume Bronze; analysts consume Gold).
Improved DiscoverabilityEasier for consumers to find the right data at the right stage.
Promotes documentation and ownership via dedicated spaces. E.g. if you want to separate ownership on bronze/silver layer for source-aligned data products, while the gold layer provides consumer-aligned data products.
Improves discoverability (and lineage) in Purview as Items are best supported today.
Better Modularity & ScalabilityEach layer can evolve independently (e.g., switching ingestion logic in Bronze without touching Gold).
Encourages a microservice-style approach where each layer is self-contained.
Supports InteroperabilityEnables integration with various tools and personas by decoupling processing stages.

⚠️ Considerations of Item Separation in Microsoft Fabric

Increased ComplexityMore items to manage.
Requires well-defined conventions and documentation.
Operational OverheadMay lead to duplication of effort (e.g., repeated metadata or pipeline setup across layers).
Monitoring and orchestration across items become more complex.
Risk of Over-EngineeringNot all projects need full item separation; using it universally can slow down small teams.
Risks “compliance theater” without real added value if not paired with strong practices.
Dependency ManagementInter-layer dependencies may become fragile if naming, versioning, or schema tracking isn’t standardized.

Use it when: You need strong governance, multiple teams, or enterprise-scale structure.
Skip it when: You’re working fast, solo, or on smaller, agile projects.

How should you structure your workspaces in Microsoft Fabric?

When you have made your choices on item separation, you are ready to consider your workspace separation, as the item separation also (naturally) enables workspace separation.

Let’s use the medallion architecture as an example again.

Do you want to have all your layers in one workspace, or separate them across workspaces, or a mix?

Pros of Workspace Separation in Microsoft Fabric

1. Self-Contained EnvironmentsEncapsulation of logic and data for each team.
Reduced risk of accidental interference across unrelated areas.
Easier testing and deployment of updates in isolation.
2. Improved DiscoverabilityEasier to navigate than a massive, centralized workspace.
Reduces cognitive load for analysts and consumers.
Improves discoverability in Purview.
3. Stronger Governance & Access ControlDefine permissions on a need-to-know basis using the workspace for different development teams. Then have a more granular option for access control on the item level as well if needed.
Ensure compliance by segmenting sensitive data (e.g. some bronze data might be sensitive compared to gold layer)
4. Domain-Oriented OwnershipTeams can own, maintain, and evolve their domain-specific workspaces independently
Reduces bottlenecks by avoiding centralized gatekeeping
Encourages accountability and autonomy
5. Better ObservabilityErrors, performance, and usage can be scoped per workspace
Easier to trace lineage and operational issues within contained environments

⚠️ Cons of Workspace Separation in Microsoft Fabric

1. Cross-Workspace Dependencies Can Be PainfulSharing datasets between workspaces can involve more manual effort or pipeline complexity.
Lack of strong cross-workspace lineage tracking increases risk of versioning issues.
2. Coordination OverheadSchema changes or upstream updates must be communicated across teams. (Should you consider data product contracts?)
Governance, naming conventions, and SLAs must be actively enforced.
3. Risk of FragmentationWorkspaces can become inconsistent in structure, naming, and metadata practices
Onboarding new users becomes harder if standards vary widely
4. Initial Barrier to EntrySetting up multiple workspaces might feel like overkill
Single-workspace setups may be better for rapid prototyping or agile development

Use when: You have multiple domains or teams, need tight access control, or want to scale governance.
Avoid when: You’re prototyping, working with a small team, or need fast iteration across datasets.

*a consideration not discussed in this article for workspace separation is CI/CD

How should you structure your Domains in Microsoft Fabric?

When you have your workspace plan ready, you can take a look at domains.

Do you want to separate you domains on business use case alone, on technical teams, on data source, or a mix?

If you use a data mesh approach, you might want each domain to own the entire data flow from bronze to silver.

Suppose you want to enable your business domains, but still want to take advantage of some centralization in making the different data layers available. In that case, you might want to look at a domain separation as shown above.

Pros of Domain Separation in Microsoft Fabric

1. Reflects Business StructureOrganizing data by domain mirrors your org chart.
This reduces confusion and aligns data strategy with business operations.
2. Clear Ownership and AccountabilityEach domain owns its data products. This fosters a culture of accountability and ensures data is maintained by those who understand it best.
3. Decentralized Policy EnforcementDomains can enforce their own data quality, security, and compliance rules within their boundary.
This enables scalability without relying solely on a central team.
4. Improved Governance and ObservabilitySmaller, domain-focused scopes are easier to govern.
Monitoring usage, managing permissions, and auditing access becomes simpler and more meaningful.
5. Autonomy and SpeedTeams can build and release data products at their own pace.
They don’t need to wait on a centralized team to deploy pipelines or models.

⚠️ Cons of Domain Separation in Microsoft Fabric

1. Risk of SilosIf domains don’t collaborate or share standards, data silos can (re-)emerge inside of Fabric.
Interoperability must be intentionally designed.
2. Duplication of EffortMultiple teams might build similar models or transformations independently. Without coordination, this wastes time and creates inconsistency.
3. Tooling and Training OverheadEach domain team needs enough skill and support to manage its own pipelines, models, and compliance needs. This requires investment.

Use it when: Your org has distinct teams/domains and you want scalable ownership.
Avoid it when: You’re early in your journey or lack governance maturity.

How should you structure your Capacities in Microsoft Fabric?

Then finally, let’s take a look at your choices when it comes to Fabric capacities.

Do you want to use capacity separation to mirror your business domains, technical teams, environments or a mix?

If your organization requires separate cost management across business domains, you probably want to mirror the capacities and the domains.

Another separation you might consider instead of or in combination with the domain separation is to separate the capacities for the different environments. This can ensure performance. If you are taking advantage of federated development teams, you run a higher risk of someone creating a crazy dataflow that kills the entire capacity. Separating development and production can therefore be wise. This is also a way to maximise cost savings, as the development capacity does not need to be on 24/7 and can be scaled up and down as needed.

If your organisation exists across regions, you might also want to consider separating your environments based on different capacity regions. Be aware that it is currently not possible to move Fabric items across regions without a support ticket to Microsoft. Take some time to consider your needs and use cases before splitting.

Pros of Capacity Separation in Microsoft Fabric

1. Performance IsolationHigh-demand domains won’t be bottlenecked by low-priority processes elsewhere.
Development efforts won’t throttle production environments.
2. Cost Transparency & AccountabilityClearer tracking of compute and storage consumption per business domain/unit or team.
Easier chargeback/showback models for budgeting or internal billing
Data-driven capacity planning (who needs more/less and why)
3. Optimized ScalingCritical business domains can be scaled up.
Lightweight domains can be throttled or moved to shared capacity.

⚠️ Cons of Capacity Separation in Microsoft Fabric

1. Potential Resource WasteSmall or inactive domains may not fully utilize their assigned capacity. Wasted potential if workloads don’t justify a dedicated capacity.
Teams may leave unused resources running (e.g., long-lived Spark jobs) that are not discovered by the separate domains.
3. More Complex GovernanceDomain-level cost and performance management requires clear policies for scaling, shutting down idle jobs, prioritisation and governance around assigning capacity (shared vs dedicated).
Increased administrative overhead to right-size environments.

Use it when: you need performance isolation between teams or layers, want cost tracking per domain or department, domains have high or variable workloads, or you have governance in place for managing capacity.

Avoid it when: workloads are small or early-stage, teams lack cost or performance monitoring maturity, shared capacity meets your needs, or you want to minimize setup and management overhead.


Hope you found this article useful!

Stay updated on new blog posts and videos by subscribing to @GuyInACube on YouTube, follow me on LinkedIn or subscribe to the newsletter for this blog below to get the newest updates!

Start Your Data Governance Journey with Microsoft Purview: The complete guide [videos & descriptions] — 11. Jun 2025

Start Your Data Governance Journey with Microsoft Purview: The complete guide [videos & descriptions]

Feeling unsure about how to begin with Microsoft Purview? You’re in the right place! 💙 🩵

This blog post will be regularly updated as I record new videos and as new features are released. Here, you’ll find a step-by-step guide along with a feature overview. I will include videos and text for each feature. They will be in a logical sequence. This will hopefully make it easier for you to discover exactly what you need! 🤩

I previously created a mini-series on my YouTube Channel, @DataAscend, but I have since joined Adam and Patrick on @GuyInACube! So, this article will contain a mix of videos from both channels.

Stay updated on new videos by subscribing to @GuyInACube on YouTube, follow me on LinkedIn or subscribe to the newsletter for this blog below to get the newest updates!

Jump to:

  1. Purview Course: Get started with Purview Step-by-Step
  2. What is Microsoft Purview? An introduction!
  3. Create your first Purview instance!
  4. Upgrade to New Microsoft Purview
  5. How to Connect and Scan Your Fabric Data in Purview?
  6. How do you structure your Data Map?
  7. What is the difference between the Data Map and the Unified Catalog?
  8. How to Create a Business Domain/Governance Domain in Microsoft Purview
  9. What is the concept of a Data Product, and why should you care?
  10. How to Create a Data Product in Microsoft Purview?
  11. Set up Data Quality on Your Fabric Data Products in Purview

Purview Course: Get started with Purview Step-by-Step

Check out the new Purview Course on Guy in a Cube!

What is Microsoft Purview? An introduction!

Microsoft Purview is a data governance and compliance tool that helps organizations discover, classify, manage, and protect data across cloud, on-premises, and SaaS environments.

It is divided into three main areas: Governance 🏛️, Compliance ✅, and Security 🔒.

From the data perspective, we have previously been most into the governance solutions: Data Map and Unified Catalog – but now Purview Security and Compliance also supports data (and not only the more traditional information management). So, you will probably want to take advantage of all solutions to truly ensure quality, trust and compliance for your data!

Create your first Purview instance!

🥇 The very first step if you do not already have a Purview instance in your tenant. Let’s set it up together!

Upgrade to New Microsoft Purview

Already have an existing Purview account? The “old” one? This video shows how to upgrade to the latest Microsoft Purview solution and access its new features.

How to Connect and Scan Your Fabric Data in Purview?

Learn how to register your Fabric data in Microsoft Purview by creating collections, connections, and scans.

For more details on what to think about when choosing the structure of your Data Map, check out the video below

How do you structure your Data Map?

Creating a well-organised Data Map in Microsoft Purview isn’t just about setting it up – it’s about making the right decisions on Domains and Collections structure. But how do you get it right?

Here’s what you need to consider when planning your structure:

✅ Access Levels & Control – Ensure the right people have the right permissions.
🔒 Separation – Maintain clear boundaries for better management.
🛠️ Development, Test & Production Environments – Keep your workflows organised and efficient.
💡 And more!

What is the difference between the Data Map and the Unified Catalog?

Both are essential for data governance (!), organising your data, and ensuring compliance within your organisation. But how do they differ, and how should you approach structuring them effectively?

I like to divide the data catalog part of Purview into two:

  1. Physical data estate with your Data Map and Data Assets
  2. Your logical data estate with Governance Domains and Data Products

But how should you structure them? Take a look at the video below:

How to Create a Business Domain/Governance Domain in Microsoft Purview

Overview: This video explains how to set up a governance domain for better data organization and governance. You can then group your data products into business domains later.

Topics Covered:

  • Step-by-step guide to creating a business domain.

In this video I call it a “Business Domain”, but Purview has later renamed it to Governance Domain, which I think is more fitting. You can then decide yourself if you want to separate your domains into Business Domains, Governance Domains, Data Domains, Technical Domains, etc. This will depend on your organizational setup.

What is the concept of a Data Product, and why should you care?

Before we dive into the Data Product concept in Purview – what is a Data Product?

How to Create a Data Product in Microsoft Purview?

Overview: Discover how to create data products within Microsoft Purview to manage and catalog data more effectively.

Topics Covered:

  • 🧩 Defining a Data Product and linking it to a 📁 Business Domain.
  • 🔗 Connecting your physical Data Assets to your 🛍️ Data Product.
  • 📃 Setting up terms of use for your Data Product and Data Assets.
  • 🔐 Setting up Request Access Policies for your Data Product.

The Data Assets that we link to the Data Product are the physical data assets that we scanned in the previous step.

Set up Data Quality on Your Fabric Data Products in Purview

This video covers how to monitor data quality on your Fabric data products within Microsoft Purview.

Obs: This video shows you how to scan using the Managed Identity as the authenticator for the scan done before this video. This will not work if you want to do DQ runs on your Fabric sub-level items, like the tables in a lakehouse. To do this, you must use a service principal as authentication when you run the ordinary scan. The SP needs contributor access to the workspace for this to work. See the Microsoft documentation on how to set up a SP authentication.

Topics Covered:

  • 🔗 Setting up data quality connection for your data governance domain.
  • 🛠️ Setting up data quality rules and profiling for your data assets.
  • ▶️ Running the data quality and profiling rules, and 📊 monitoring the outcome.
  • 📌 Looking into actions of your resulting Data Quality and Profiling runs, ✅ assigning tasks and actions to Data Stewards or other roles in your organization to improve the 🧹 Data Quality

A new and updated video is on the way. Subscribe to @GuyInACube on YouTube, follow me on LinkedIn or subscribe the newsletter for this blog below to get the newest updates!

Hope you found this article helpful!

How to set up FUAM – Fabric Unified Admin Monitoring — 22. Apr 2025

How to set up FUAM – Fabric Unified Admin Monitoring

FabCon Vegas showcased fantastic new features! One noteworthy solution, although not an official Microsoft product, is the Fabric Unified Admin Monitoring (FUAM) solution accelerator. This accelerator has been available for a couple of months, but has flown under the radar I feel. So let’s give it some attention!

  1. A Special Thanks
  2. What is FUAM?
  3. Why use FUAM?
  4. Watch the Video
  5. Where to start?
  6. Now what?

A Special Thanks

I want to extend a big shoutout to Gellert Gintli and Kevin Thomas for their work in bringing this tool to the Fabric community as part of the Fabric toolbox. This means that anyone can take advantage of it and even provide input and feedback on how to evolve it. Love. It. 😎

What is FUAM?

Super-duper-short: FUAM is a solution to enable a holistic monitoring on top of Microsoft Fabric built completely with Fabric capabilities, extracting data on (and giving you insights on):

  • Tenant Settings
  • Delegated Tenant Settings
  • Activities
  • Workspaces
  • Capacities
  • Capacity Metrics
  • Tenant metadata (Scanner API)
  • Capacity Refreshables
  • Git Connections

You can get an overview or a deep dive into specific artefacts. It comes with prebuilt reports, but you can also customise these and combine them with other data to better serve your needs – obviously, since you get access to all the Fabric items used to build the solution.

Why use FUAM?

Built-in solutions like Metrics Apps in Fabric definitely give you some great insights into how your capacities are performing.

  • Capacity optimisation – identify outliers and improve them
  • Admin settings management – what settings are enabled in your Fabric Admin Portal, and when did they change?
  • Identify unused workspaces and items – clean up unused workspaces and items!
  • Report activities – what reports are most used and on what capacities?
  • Best practice analyser – are your developers following the best practices?
  • Monitor users and their activities
  • And more!

Watch the Video

As always, I couldn’t resist creating a video to showcase this feature. It’s just that cool! Below the video, I added some more input/details that can be of use when you implement FUAM for your tenant.

Head over to YouTube and follow @GuyInACube for more insights!

Where to start?

It is SO EASY, as a great how-to has been created where you can read up on the underlying architecture and the step-by-step implementation process:

In the video below, I started out prepping the prerequisites with the following:

  • Created a new workspace (FUAM)
  • Created a new service principal
  • Create a Service Principal by setting up an Enterprise App Registration
  • Created a new client secret for that service principal
  • Saved the secret in my KeyVault. You don’t have to save it using a KeyVault, but you must save it somewhere safe. – Learn more about KeyVault and Fabric here from Patrick’s video:
  • Added the service principal to a security group
  • Added that security group to these two tenant settings inside the Fabric Admin Portal:
    • Service Principals can use Fabric APIs
    • Service Principals can access read-only admin APIs
  • Create a Microsoft Fabric Capacity App. Apps –> Get apps –> Microsoft Fabric Capacity Metrics
  • Rename Workspace connected to the Metrics App to “FUAM Capacity Metrics”
  • Connect workspace to a Fabric or Premium capacity

Now what?

DIVE into all the awesome insights and consider expanding, customising, and optimising the FUAM framework to better fit your needs! Since everything is inside Fabric, you have control. 😎

Start Your Data Governance Journey with the new Microsoft Purview: A Step-by-step guide — 1. Oct 2024

Start Your Data Governance Journey with the new Microsoft Purview: A Step-by-step guide

https://data-ascend.com/2025/06/11/start-your-data-governance-journey-with-microsoft-purview-the-complete-guide-videos-descriptions/


Microsoft Purview has gotten a serious makeover. It is not only a Data Catalog anymore, it is a data governance tool that includes data security, data catalogging, metadata management, data quality, data estate health monitoring and more.

I have created a mini-series on how to get started with building your data governance domains, and data products by scanning your Fabric data in Purview on my YouTube channel. This blog post summarizes the mini-series with some added descriptions.

Stay updated on new videos by subscribing to my YouTube Channel:

There are still features in Purview that are in preview, and there is a lot of development ongoing- existing! But that also means that some buttons and names have changed when you read this tutorial.

Jump to:

  1. 1. Upgrade to New Microsoft Purview
  2. 2. How to Register Your Fabric Data in Purview
    1. Scope your Fabric scan in Microsoft Purview
  3. 3. How to Create a Business Domain/Governance Domain in Microsoft Purview
  4. 4. How to Create a Data Product in Microsoft Purview
  5. 5. Set up Data Quality on Your Fabric Data Products in Purview

1. Upgrade to New Microsoft Purview

Overview: This video shows how to upgrade to the latest Microsoft Purview solution and access its new features.

2. How to Register Your Fabric Data in Purview

Overview: Learn how to register your Fabric data in Microsoft Purview by creating collections, connections, and scans.

I like to divide the data catalog part of Purview into two:

  1. Physical data estate with your Data Map and Data Assets
  2. Your logical data estate with Governance Domains and Data Products

In this video I look at how you can set up your Data Map and scan your physical data assets in Fabric.

Topics Covered:

  • Creating a new collection.
  • Setting up a connection to data sources.
  • Running scans to discover and register data assets.

Also check out the “Scope your scan” video below. This feature was released after I created the video. Now you don’t have to scan your entire Fabric Ecosystem, but can choose to scan based on workspaces.

Scope your Fabric scan in Microsoft Purview

Learn how to scope your data scans by workspaces to make your Purview scans more targeted and efficient.

3. How to Create a Business Domain/Governance Domain in Microsoft Purview

Overview: This video explains how to set up a governance domain for better data organization and governance. You can then group your data products into business domains later.

Topics Covered:

  • Step-by-step guide to creating a business domain.

In this video I call it a “Business Domain”, but Purview has later renamed it to Governance Domain, which I think is more fitting. You can then decide yourself if you want to separate your domains into Business Domains, Governance Domains, Data Domains, Technical Domains, etc. This will depend on your organizational setup.

4. How to Create a Data Product in Microsoft Purview

Overview: Discover how to create data products within Microsoft Purview to manage and catalog data more effectively.

Topics Covered:

  • Defining a Data Product and linking it to a Business Domain.
  • Connecting your physical Data Assets to your Data Product
  • Setting up terms of use for your Data Product and Data Assets
  • Setting up Request Access Policies for your Data Product

The Data Assets that we link to the Data Product are the physical data assets that we scanned in the previous video.

5. Set up Data Quality on Your Fabric Data Products in Purview

Overview: This video covers how to monitor data quality on your Fabric data products within Microsoft Purview.

Topics Covered:

  • Setting up data quality connection for your data governance domain.
  • Setting up data quality rules and profiling for your data assets.
  • Running the data quality and profiling rules, and monitoring the outcome.
  • Looking into actions of your resulting Data Quality and Profiling runs, assigning tasks and actions to Data Stewards or other roles in your organization to improve the Data Quality.

Obs! For Purview to be able to scan the data in your workspace, the Purview service principal needs to be assigned Contributor access to be able to run the data scan.

Hope you found this article helpful!

Useful links:

https://learn.microsoft.com/en-us/purview/purview-portal

https://learn.microsoft.com/en-us/purview/whats-new

How do you set up your Data Mesh in Microsoft Fabric? — 8. Jan 2024

How do you set up your Data Mesh in Microsoft Fabric?

Coming from the world of Microsoft analytics I got curious as to why Microsoft chose to go with Fabric as the name of the newly released analytical solution Microsoft Fabric. Looking into this, the architectural concept of Data Fabric appeared to me. I had been working with Data Mesh for a while, but the Fabric architecture was something I hadn’t really heard about before. Meaning of course that I needed to know more.

I was going to write a short blog about the differences between Fabric Architecture and Data Mesh and then see how these two architectures look inside Microsoft Fabric. Turns out there is too much to say, so I had to turn this into a mini-series.

The second out is Data Mesh in Microsoft Fabric!
So, let’s have a look at how you implement a data mesh architecture in Fabric

Stay tuned for more content on Fabric Architecture, and how Data Mesh and Fabric Architecture can be set up in Microsoft Fabric!

  1. What is Data Mesh Architecture?
  2. How do you handle your data products in Microsoft Fabric?
    1. Accessibility
    2. Interoperability
    3. Trusted
    4. Reusability
    5. Findable
    6. Limitations
  3. How do you handle your data domains in Microsoft Fabric?
    1. What are Domains in Microsoft Fabric?
    2. Limitations
  4. How do you handle your self-serve data platform with Microsoft Fabric?
  5. How do you set up data governance with data mesh in Microsoft Fabric?
    1. Federated Governance
    2. Centralized Governance
    3. Limitations
  6. Summary

What is Data Mesh Architecture?

The first post in this mini-series answers this question, looking into what data mesh is, different topologies, challenges and benefits. If you want to read up on the data mesh approach before looking into how this applies to Microsoft Fabric, you can take a look at that blog post here:

In short, the data mesh is an approach to how you manage your data. Data Mesh is not only an architectural and technical approach. To be successful, you also need to change your organisational processes and structure. Having both IT and the business onboard is therefore a crucial success factor when implementing data mesh in your organisation.

The data mesh approach comes from the domain-driven design and bounded context way of thinking. We find these concepts in the 4 components that make up data mesh:

  • Domains
  • Products
  • Federated Governance
  • Self-Service Data Platform

How do you handle your data products in Microsoft Fabric?

By introducing product thinking into the world of data, you get Data Products. A data product is a data asset that should be

  • Accessible
  •  Interoperable
  •  Trusted
  •  Reusable
  •  Findable

The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers.

So, what would a data product look like in Fabric? Sorry to disappoint, but that depends. It depends on how you define a data product in your organisation. Is it a table, a SQL view, a table, a Power BI semantic model, a dataflow or even a Power BI report? Or can it be all of these things?

Let’s have a look at how you can set up your data product in Fabric:

Accessibility

A data product is made accessible through one or more output ports for data consumers. In Fabric there are muliple ways of distributing your data products, but again – it denends on what your data product looks like.

For accessibility outside of Fabric, you can use a SQL endpoint or the underlying ADLS Gen 2 connection that your OneLake is based on.

For internal accessibility inside of Microsoft Fabric you can use a dataflow endpoint, a semantic model connection or an internal shortcut. Or it can just be accessible inside a workspace within one domain where other domains can connect to it an load it using their prefered integration tool within their domain.

Interoperability

A data product is interoperable through its metadata that also holds some of the standardization of the data product, as the schema and semanitcs. Below is a screenshot from Fabric of a dataset with a location, labelled endorsement, refresh date and sensitivity label.

Trusted

The metadata also enforces some of the governance over the data product with its ownership, and security or rights to use that ensures that our data product is trusted. in addition, the observability of the data products provides us with information about the SLA, timeliness and quality of the data product. This is all part of how we can trust our data product.

In Microsoft Fabric, the refresh log provides us with the observability of the data product. For the SLA and data quality, there is no documentation possibility inside of Microsoft Fabric, unless you buy the data catalog purview as an additional tool that integrates with Microsoft Fabric. Here you can document your data products. Purview could also help ensure that a data product is findable through search as a point further below. Still, as Purview is an external tool that requires an additional licence, this is not further considered in this blog.

Reusability

In Fabric, you can reuse a table, dataflow or dataset as needed to develop other data products. An example is the semantic link from the semantic model that now can be queried inside a notebook in your data lakehouse.

Findable

One way to better manage a data product in Microsoft Fabric could be to take advantage of labelling where you can put “Endorsed” and “Certified” on items. For instance, to determine if an item is a data product, you can rely on the labelling in Microsoft Fabric saying “Certified”. However, it’s important to note that this labelling restricts its use to other items in Fabric. Additionally, it is essential to ensure that the label is retained specifically for this purpose.

Limitations

In terms of data products, there are a few features that could enhance the Fabric platform:

  1. Tagging and Categorization: Introducing a capability to tag or categorize data items within Fabric would allow users to easily label their data as a certified Data Product. This would enable efficient organization and retrieval of specific datasets.
  2. Journey Tracking and discoverability: It would be beneficial to have a feature in Fabric that tracks the journey of a data product throughout its lifecycle. This way, users can easily monitor and trace the movement of their data items within the platform.
  3. Documentation and Restrictions: Providing more comprehensive documentation for data products is crucial. Users should have access to clear instructions on how to utilize and connect to the data, as well as any associated restrictions on usage. This information will help users leverage the data effectively and ensure compliance with any contractual obligations.
  4. Data Product Contract Specification: Introducing a data product contract specification feature in Fabric would be advantageous. This would allow users to define contractual terms for their data products. The contract could specify details such as permitted usage, data access restrictions, and any specific requirements for utilizing the data.

By incorporating these features, Fabric could offer a more robust and user-friendly experience for managing data products.

How do you handle your data domains in Microsoft Fabric?

A domain is a grouping of data, technology, teams, and people that work within the same analytical realm, usually within a business area. Examples of domains could be the organizational structure, like Sales, Marketing, Finance, and HR. It can also be more fine-grained or connected to a value chain, like Orders, Production, and Distribution. All of this depends on your organization and how the domains would serve you and your business in the best way. The key thing here is that each domain can be autonomous and have ownership over the data products that naturally belong to their domain.

Since each domain is autonomous, they can develop their data products and govern these as desired. Still, the governance should be aligned with the centralized data governance. I will come back to this. The ETL and data modelling are also handled locally by the domain while taking advantage of the self-service data platform provided through the central organization.

What are Domains in Microsoft Fabric?

To understand how you can set up your governance in Fabric, let us first take a look at the new functionality that was added to Fabric, Domains.

The domains introduced in Fabric are a way to support the data mesh architecture. You can assign the workspaces owned by a business area to one domain. So all your sales-related workspaces could be in your Sales domain, while your marketing-related workspaces could be inside the Marketing domain.

Microsoft Fabric is built from the Power BI developer perspective, and the specialist tools that Microsoft has brought into Fabric as Synapse and ADF are now more available for the generalist. This enables domains to become more independent of technical competence and self-sufficient. This drives the efficiency of each domain.

Limitations

The management of each domain could have been more detailed inside the admin portal of Microsoft Fabric. Today you can only automate what workspaces are places in which domains and who can be the admin of this. It would have been interesting if you could set up policies for each domain and have more possibilities to govern this.

How do you handle your self-serve data platform with Microsoft Fabric?

The next component of the data mesh approach is the self-serve Data Platform. The self-serve data platform is a centralized service that serves the domains with their need for infrastructure, storage, security, access management, ETL pipelines and more. Some call this data infrastructure a platform to highlight that the self-serve platform should serve the infrastructure, i.e. all the technical components and their integrations required to build data products.

This simplicity in distributing Microsoft Fabric as a data platform as a service is one of the biggest strengths. As it is a SaaS it provides you with all the necessary infrastructure and integrations to start building your data products.

By designating a Microsoft Fabric domain for each data mesh domain, organizations effortlessly extend the self-serve capabilities to every corner of their ecosystem. This inclusivity means that every Fabric capacity holder gains unfettered access to the diverse experiences offered by Fabric, empowering them to develop their individualized data products.

How do you set up data governance with data mesh in Microsoft Fabric?

To make each domain autonomous in its data product development and management, a federated governance model is used. The federation of governance is a way of deferring responsibilities to enable scalability. It promotes independence and accountability. This way, the domains can govern their data products in a way that is effective and makes sense to them.

Still, there should be some centralized governance providing a set of standards, and best practices, setting the necessary boundaries, and being a center of excellence, providing expertise to the federated domains.

In this article, I will focus on how data governance fits into the data mesh architecture. For those interested in the specific governance features of Microsoft Fabric, a blog post on setting up data governance within this framework is available.

How do you set up your Data Governance in Microsoft Fabric?

Federated Governance

The Domains enables federated governance in Microsoft Fabric. There is also a new role created with this and that is the domain admin that can delegate responsibilities to contributors.

Through your workspace roles, you manage who can work with what from a technical developer’s perspective. To see the different capabilities that the different roles have, refer to the Microsoft documentation here.

Centralized Governance

Fabric in itself will enable some standardization and enforce some standardisation and controle as there is a finite number of ways to develop your products and make them accessible.

The Purview hub within Microsoft Fabric emerges as a cornerstone for centralized computational governance. This hub offers a level of centralization that enables a comprehensive overview of domains and data products, allowing stakeholders to assess the status of each domain. It serves as a control centre, facilitating both a holistic perspective and the ability to drill down into individual domains for detailed evaluations.

In Microsoft Fabric you can also take advantage of some built-in policies such as the Data Loss Prevention policies and labelling that is further described in the blog linked above.

Limitations

While Microsoft Fabric inherently provides a level of standardization and control due to its finite number of development approaches and accessibility options, there are limitations. Notably, the platform currently lacks the capability to establish standards and patterns for solution development. More possibilities and granular control levels to set up access policies and development policies would be interesting.

Another example where Microsoft Fabric falls short is Master Data Management. There is no integrated solution enabling functionalities such as survivorship and the creation of a golden record, necessitating reliance on external tools.

Summary

In summary, while there are limitations in the Microsoft Fabric setup when implementing a data mesh architecture, I believe that Microsoft Fabric significantly enables some of the most crucial features of data mesh, particularly infrastructure as a service from a central team and the inherent enforcement of central governance through the limited number of methods for the domains to develop their products. While additional levels of control and options to monitor the data product journey would have been desirable, I am currently of the opinion that Microsoft Fabric comes pretty close.

Hope you found this article helpful!

Usefull links:

Is Data Mesh your enabler, or is it just creating a data mess? — 31. Oct 2023

Is Data Mesh your enabler, or is it just creating a data mess?

Coming from the world of Microsoft analytics, I got curious as to why Microsoft chose to go with “Fabric” as the name of the newly released analytical solution, Microsoft Fabric. Looking into this, the architectural concept of Data Fabric became more relevant. I had been working with Data Mesh for a while, but the Fabric architecture was something I had not heard that much about before. Of course, meant I needed to know more.

The plan was to write a short and easy blog about the differences between Fabric Architecture and Data Mesh and then see how these two architectures look inside Microsoft Fabric. Turns out there is too much to say, so I had to turn this into a mini-series. And first out, we have the Data Mesh architecture!

So, let’s have a look at what the Data Mesh is. What are the benefits of this architecture, and are there any limitations?

Stay tuned for more content on Fabric Architecture, and how Data Mesh and Fabric Architecture can be set up in Microsoft Fabric!

  1. What is Data Mesh Architecture?
    1. Domains
    2. Data Products
    3. Federated Governance
    4. Self-serve Data Platform
  2. Why is the Data Mesh approach gaining traction?
  3. What are the different approaches you can have for Data Mesh?
    1. Fully Federated Domain Topology
    2. Governed Domain Topology
    3. Partially Federated Domain Topology
  4. What are the main benefits of a Data Mesh Architecture?
    1. Autonomy
    2. Scalability
    3. Closer collaboration between business and tehcnology
  5. What are the main challenges of Data Mesh?
    1. Risk of creating isolated data hubs
    2. Not serving the organizations data model
    3. Missing a harmonized strategy
  6. So, is Data Mesh you enabler, or is it just creating a data mess?

What is Data Mesh Architecture?

Data mesh is both a data management approach and a data architecture. To be successful, it is not enough to only think about the architecture and technology, you might also need to change your organizational processes and structure. Having both IT and the business onboard is therefore a crucial success factor when implementing data mesh in your organization. In some ways, the data mesh approach brings data management and software architecture together.

Multiple great articles describe and define the data mesh approach. I will link these below, but also try to explain them here in my own words.

So, let’s do it! The data mesh approach comes from the domain-driven design and bounded context way of thinking. And we find these concepts in the 4 components that make up Data Mesh:

  • Domains
  • Products
  • Federated Governance
  • Self-Service Data Platform

Domains

A domain is a grouping of data, technology, teams, and people that work within the same analytical realm, usually within a business area. Examples of domains could be the organizational structure, like Sales, Marketing, Finance, and HR. It can also be more fine-grained or connected to a value chain, like Orders, Production, and Distribution. All of this depends on your organization and how the domains would serve you and your business in the best way. The key thing here is that each domain can be autonomous and have ownership over the data products that naturally belong to their domain.

Since each domain is autonomous, they can develop their data products and govern these as desired. Still, the governance should be aligned with the centralized data governance. I will come back to this. The ETL and data modelling are also handled locally by the domain while taking advantage of the self-service data platform provided through the central organization.

Data Products

By introducing product thinking into the world of data, you get Data Products. A data product is a data asset that should be trusted, reusable, and accessible. The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers.

This means that a domain can use data products from other domains to build their data products. It is these links that give the mesh of a data mesh with all the connections between the domains and the data products.

In practice, a data product can be many things. Examples could be a Power BI Dataset, a parquet file, an SQL table, a Power BI report, etc. The modelling of the data product and the needed ETL process to build the data products are handled by the domains.

Federated Governance

To make each domain autonomous in its data product development and management, a federated governance model is used. The federation of governance is a way of deferring responsibilities to enable scalability. It promotes independence and accountability. This way, the domains can govern their data products in a way that is effective and makes sense to them.

Still, there should be some centralized governance providing a set of standards, and best practices, setting the necessary boundaries, and being a center of excellence, providing expertise to the federated domains.

Self-serve Data Platform

The last component of the data mesh approach is the self-serve Data Platform. The self-serve data platform is a centralized service that serves the domains with their need for infrastructure, storage, security, access management, ETL pipelines and more. Some call this data infrastructure as a platform to highlight that the self-serve platform should serve the infrastructure, i.e. all the technical components and their integrations required to build data products.

The centralization of this service also enables some of the standardization in the centralized governance.

Still, what each organization puts inside the “Self-serve data platform” box might vary, and some open up for flexibility in technology choices.

Why is the Data Mesh approach gaining traction?

To understand this, it can help to have a look at what we are trying to move away from in the data and analytics world. Previously we have seen siloed proprietary enterprise data warehouses. They are complex, with long development processes, low scalability and high cost.

But it is not only the old data warehouse that is the challenge of organizations today. Also, the more modern concept of data lake has proven to be a challenge. The data lake has for some organisations become this big data box that holds all of the organization’s data, operated by a centralized and specialised team of data engineers. For many, this results in a siloed data lake that works as a bottleneck for developing new solutions and gaining new insights.

The monolithic data warehouse or data lake and the centralised operating model become the bottleneck for development and turning data into insights.

What are the different approaches you can have for Data Mesh?

There are also differences in how an organization might choose to interpret the data mesh approach. These are well described in the book Data Management at Scale (Strengholt, 2023) where Strengholt highlights different degrees of the data mesh interpretation, or different topologies as he calls it. I will briefly summarize the ones I find the most interesting below as I think it highlights some of the complexity of the data mesh.

Fully Federated Domain Topology

This solution has no central orchestration, with a strong emphasis on federation. Here you have fine-grained decoupling and high reusability. The domains themselves are independent and serve data products to other domains. You don’t have any central authority and the compliance is enforced through the centralized platform.

The benefit of this approach is the high degree of flexibility with few dependencies. It also promotes the reuse of data products, as there would naturally be a large production of data products.

The challenge with this fully federated approach is that the independence of the domains makes the critical need for harmonization on data interoperability, governance, security standards and metadata difficult. Alignment can be challenging. Also, the nature of fine fine-grained federation promotes more separation in the architectural setup, meaning that you might find the integration job of making all these fine-grained products talk to each other difficult.

The fully federated approach poses a challenge in terms of harmonizing data interoperability, governance, security standards, and metadata. The independence of each domain makes it difficult to align and integrate the fine-grained products. The nature of this approach promotes separation in the architectural setup, making it challenging to establish communication between these products. Also, if you need to pull data from multiple data products and domains to solve analytical needs, you will probably have a challenge with meeting the need for high data quality, performance and interoperability.

Decentralization also requires a significant level of independence and technical expertise within each domain. It requires a substantial pool of highly skilled data professionals who understand the intricacies of the data mesh methodology. To successfully adopt the data mesh approach, an organization must have sufficient traction and a wide array of data products that demonstrate the value of embracing a data product mindset. However, building and sustaining such teams of data professionals can be a substantial investment for any organization.

Governed Domain Topology

A step away from the most fined theoretical data mesh approach is to centralize parts of your mesh components. In the Governed Domain Topology, the data product architecture is centralized. This way, the consumption of data products is also centralized making them more discoverable. Integrations become less complex, and standardization on metadata, distribution or consumption is easier to implement and enforce.

Despite the numerous advantages, the central distribution of your data can sometimes become a bottleneck. Moreover, if your data landscape consists of various cloud providers or technologies, integrating them into a centralized data distribution can present a significant challenge.

Partially Federated Domain Topology

Other organizations might want to go all in on the data mesh approach, but due to their technical setup and/or lack of data engineers and resources need to go with a more centralized solution with some federation.

You can have partly centralized data on the source system side, while the consumption side is more distributed. I like to think of this as more centralization closer to the sources. Your first data transformation steps, as in the bronze layer of a medallion architecture, or your landing and transformation zone is centralized. While your distribution layer or gold layer adopts the data mesh topology.

The challenge with this is the possible bottleneck on the source site, as bringing new sources into the solution requires a centralized team of data engineers. Less autonomy for the data product consumers domain.

What are the main benefits of a Data Mesh Architecture?

Autonomy

The federation and domain-driven design bring about a paradigm shift in the way organizations approach software development. By advocating for independence and accountability, these methodologies empower teams to take ownership of their domain and develop autonomous solutions. This decentralization fosters a culture of innovation and agility, allowing teams to adapt quickly to changing requirements and market demands.

Scalability

The scalability of these methodologies is another key benefit. With each team operating independently and focusing on their specific domain, the overall system becomes highly scalable. This means that as the organization grows and new functionalities are required, additional teams can be introduced seamlessly without disrupting the existing ones. This modular approach enables organizations to effectively manage complex projects and easily accommodate future growth.

Closer collaboration between business and tehcnology

One of the notable advantages of the data mesh approach, which is closely aligned with domain-driven design, is the closer collaboration between business and technology. By placing ownership of data in the hands of the business, this approach enables better alignment between data management strategies and overall business goals. It encourages cross-functional communication and enhances the understanding of data within the organization. This alignment fosters a shared vision and empowers decision-makers to make informed choices based on business objectives.

What are the main challenges of Data Mesh?

Risk of creating isolated data hubs

The decentralization certainly enables scalability and high productivity, but it can also lead to chaos. Without a certain level of centralization where the organization as a whole can establish boundaries, standards, and best practices, there is a risk that each team will develop their architecture, and choose its technologies, standards, data formats, and more. This may result in each team being solely responsible for their data, creating isolated data hubs that cannot be combined or integrated with other domains. Consequently, the overall value proposition of a data mesh can be compromised.

Not serving the organizations data model

The concept of the data product approach challenges the notion that there is a single data model that applies to the entire organization. While this may hold to some extent, in reality, there are interconnected relationships between domains and data products within an organization that should be standardized and governed. These relationships play a crucial role in maintaining the integrity and quality of your data model.

Missing a harmonized strategy

More decentralization makes it more difficult to harmonize around a strategy and set centralized governance, boundaries and standards. You can end up with siloed data domains or multiple fragmented data warehouses, ultimately blocking one of our initial justifications for implementing the data mesh approach, the organizational scalability.

So, is Data Mesh you enabler, or is it just creating a data mess?

Short answer, it depends.

Even though there is some great literature out there explaining and even defining the how-to’s of a data mesh approach, the reality is that organizations quickly interpret the approach to fit their organizational structure.

That can play out in different emphasis on the centralised components of the data mesh structure, opening up for too much autonomy creating as the ultimate consequence siloed data domains that create data products that are not interoperable and consumable for the organization as a whole. It can also lead to data products with different interpretations of the data model that can ultimately result in different truths.

However, the data mesh approach is the result of the need to move away from the monolithic data warehouse or data lake and the centralised data engineering team. It does enable autonomy and scalability with its federation.
A key enabler for data mesh will therefore be, despite the decentralised focus, a centralised plan. A data strategy containing some standards, an architecture, overall governance and best practices or rules. This will help you ensure that your data mesh doesn’t become a mess.

Hope you found this article helpful!

Useful links:

How do you set up your Data Governance in Microsoft Fabric? — 11. Oct 2023

How do you set up your Data Governance in Microsoft Fabric?


What is Data Governance in Microsoft Fabric?

So, what is data governance in Fabric? The governance domain contains many capabilities. If you follow the DAMA approach, you know that they have divided Data Governance into 10 capabilities, looking into architecture, data warehousing, operations, security, quality, and everything you do with your data from the source to delivered insights.

In Fabric, obviously, everything regarding Data Governance is still important. Despite that, in this article, I will focus on the specific Fabric components and features that help you govern your data in Fabric.

Let’s take a look at the new features Domains in Fabric, how Data Lineage is implemented, Roles and access management, policies and processes, and the purview hub.

I have previously written a blog on what your Power BI governance should contain. As fabric makes up more of your data ecosystem, you will additionally need to focus on other governance capabilities. Still, if you want to look into Power BI-specific governance, you can have a look at that one here:

What are Domains in Fabric?

To understand how you can set up your governance in Fabric, let us first take a look at the new functionality that was added to Fabric, Domains.

Today, the distributed and federated model is becoming more and more popular for organizations. The data mesh architecture is gaining traction, where you decentralize data architecture and have each business domain govern their own data.

The domains introduced in Fabric are a way to support this data mesh architecture. You can assign the workspaces owned by a business area to one domain. So all your sales-related workspaces could be in your Sales domain, while your marketing-related workspaces could be inside the Marketing domain.

Which Roles do we have in Fabric?

In Microsoft Fabric, you can divide your roles into three areas. You have your domain roles, tenant roles, and workspace roles. The Capacity admin and domain admin can delegate some of their responsibilities to contributors.

In the world of data governance, your domain admin could be your data owner or a technical resource working on behalf of the data owner, while the domain contributor could be your data steward. You could also give the domain admin role to your data stewards, depending on the role definitions in your organization.

The capacity admin and capacity contributor, as well as the overall Fabric admin, would normally be assigned to technical roles in your organization.

Through your workspace roles, you manage who can work with what from a technical developer’s perspective. To see the different capabilities that the different roles have, refer to the Microsoft documentation here.

Access Management in Fabric

There are four permision levels in Fabric

  • Workspace Permission
  • Item Permission
  • Compute Permission
  • Row and column level permission

Workspace permission in Fabric

Workspace permission provides access to all the items that are inside a workspace. That means you get access to all datalakes, data warehouses, data pipelines, dataflows, datasets, reports, etc. in the workspace.

In the workspace, there are also two access types you can give:

  • Read-only
  • Read-write

The read-only role is a viewer role that can view the content of the workspace, but not make changes. The role can also query data from SQL or Power BI reports, but not create new items or make changes to items.

Read-write is the Admin, Member, and Contributor role. They can view data directly in OneLake, write data to OneLake, and create and manage items.

Items permission in Fabric

Item permission makes it possible to provide access to one spesific item in the workspace direcly, without granting access to the workspace and all the items in the workspace.

This can be done through two methods:

Give access through Sharing

This feature can be configured to give connect-only permissions, full SQL access, or access to OneLake and Apache Spark.

In the Microsoft documentation page you can read the details on what the different sharing permissions provide access to

Give access through Manage Permissions

Here you can give direct access to items, or manage your already provided accesses.

Compute permission in Fabric

You can also provide access through the SQL endpoint in Fabric.

As an example, if you want to provide viewer-only access to the lakehouse, you can grant the user SELECT through the SQL endpoint.

Or, if you want to provide granular access to specific objects within the Warehouse, share the Warehouse with no additional permissions, then provide granular access to specific objects using T-SQL GRANT statement.

Column-Level & Row-Level Security for Farcic Warehouse & SQL Endpoint in Fabric

On October 3rd, Microsoft announced the public preview of Column-level and Row-level security for the Fabric warehouse and SQL endpoint.

Read the full announcement here: https://blog.fabric.microsoft.com/en-us/blog/announcing-column-level-row-level-security-for-fabric-warehouse-sql-endpoint

Row-level security allows you to control access to specific roles in your table for certain users or groups. This means you don’t have to create separate tables or reports to provide access to only certain parts of your data for specific users. For example, you can give a store manager access to only the sick leave data of their employees.

Column-level security works similarly, but it operates at the column level. This means you can restrict access to specific columns of a table, such as GDPR-related data like a customer’s full name, while allowing more users to access the remaining data in that table.

These ways of providing access can help you simplify management, reduce duplication, and increase the security of your data assets.

Best practices for access in Fabric

Microsoft has provided som general advice in what access type you should use when providing access to workspaces and specific items in Fabric. The following advice was found in the documentation here.

Write access: To have write access, users must be in a workspace role that allows writing. This applies to all data items, so limit workspaces to a single team of data engineers.

Lake access: To allow users to read data in OneLake, add them as Admin, Member, or Contributor, or share the item with ReadAll access.

General data access: Users with Viewer permissions can access data through the SQL endpoint for warehouses, lakehouses, and datasets.

Object level security: To keep sensitive data safe, grant users access to a warehouse or lakehouse SQL endpoint using the Viewer role. Then, use SQL DENY statements to limit access to specific tables.

Processes and Policies in Fabric

Information Protection in Fabric

Information Protection in Microsoft Fabric is based on labeling your data. This way you can set up sensitivity labels on your data in Fabric in order to monitor it and ensure that data is protected, even if it is exported out of Fabric.

These sensitivity labels are set up through the Microsoft Purview portal.

On Microsofts documentation pages you can see what type of labeling is possible, what scenario you should use what, and if it currently is supported in Fabric. See the full documentation here: https://learn.microsoft.com/en-us/fabric/governance/information-protection

Below I have pasted the label overview from that documentation:

Data Loss Prevention in Fabric

You can also set up Data Loss Prevention (DLP) policies in Fabric. So far it is only supported for datasets. You set up these DLP policies inside the Microsoft Purview compliance portal.

When setting this up in the Microsoft Purview portal it looks like the only policy category supported for Power BI/Fabric now is “Custom”.

For the DLP policy you can set up a set of actions that will happen if the policy detects a dataset that contains sensitive data. You can either set up:

  • User Notification
  • Alerts sent by email to aministrators and users

The DLP Policy will run every time a dataset is:

  • Published
  • Repluished
  • Refreshed through an on-remand refreshed
  • Refreshed through a scheduler refresh

When using the DLP feature, it is important to remember that a premium license is required. Additionally, it is worth noting that the DLP evaluation workload utilizes the premium capacity associated with the workspace where the dataset being evaluated is located. The CPU consumption of a of the DLP evaluation is calculated as 30 % of the CPU concumed by the action that triggered the evaluation. If you use a Premium Per User (PPU) license the cost of the DLP is covered up front by the lisence cost.

Endorsement in Fabric

In Fabric you can endorse all items except for the Power BI dashboards. Endorsement is a label you can use on your items to tell your Fabric users that this items hold some level quality.

There are two endorsments you can give an item:

  • Promoted
    • What is it?
      • Users can label items a spormoted if they think the item hold a high standard and could be valuable for others. Somone think the item is ready to use inside the organisation and valuable to share!
    • Who can promote?
      • Content owners and memebers with write permissions to items can promote.
  • Certified
    • What is it?
      • Users can label items as certified if the item meet organizational quality standards, is reliable, authorative and ready to use accross the organixation. Ites with this label holds a higher quality than the promoted ones.
    • Who can certifiy?
      • Fabric administrators can authorize selected users to assign the certified label. Domain administrators can be delegated the eanblement and configuration of specifying reviewers within each domain

Data Activator in Fabric

Microsoft released Data Activator in public preview on October 5th. It’s a tool that helps you automate alerts and actions based on your Fabric data. Using Data Activator, you can avoid the need to constantly monitor operational dashboards manually, helping you govern your data assets in Fabric.

Data activator desereves its own blog post, so I will just mention it here as a component to take advantage of in your Data Governance setup.

Read the full announcement here: https://blog.fabric.microsoft.com/en-us/blog/announcing-the-data-activator-public-preview/

Data Lineage and Metadata in Fabric

Data Lineage and Metadata management is an important enabler for governance. It help you get an overview of your data assets and can also be used to enable obsservability features of your data assets.

Metadata scanning in Fabric

In Fabric you can take advantage of metadata scanning through Admin REST APIs to get information as name of item, owner, senseitivity label, endorsement. Also, for datasets you can get more detailed information about that item as table and column name, DAX expressions and measures. To use this metadata scanning to collect information for your data consumers to look up existing ddata assets, and for the administrators and governance roles to manage the assets is benefitial.

There are four scanner APIs that can be used to catalog your data assets:

Lineage in Fabric

Each workspace in Fabric got a lineage view that can be acceessed by anyone with the Admin, Member or Contributor role for that workspace.

Lineage provdes an overview of how data flows through your items in Fabric, and can be a great way to answer questions as “What is my source for this report?”, “If I change this table, will any other data assets be affected?” and so on.

To view the lienage view for a workspace, click the lineage sign on the top right of the workspae. To see the lineage focused for one specific item, click on the “Show lineage” symbol on the right side of that object. To see the impact analysis, click the “Show impact accross workspace” symbol.

Microsoft Purview Hub in Fabric

In Fabric, administrators have access to Microsoft Purview hub that is a centralized page in Fabric that provide insight on the current state of their data assets.

Inside the Microsoft Purview hub consists of two main components:

  • A portal that will send you to Microsoft Purview. You need to have purchsed Microsoft Purview to take advantage of this.
  • A Microsoft Fabric data report that give you insights on all your Fabric Items. You can open the fill report to view more detailed information in the following report pages:
    • Overviw Report
    • Endorse Report
    • Sensitivity Report
    • Helo
    • Items page
    • Sensitivity page

This report give you insights on how many of your items are labeled with endorsement or sensitivity by item type and workspace. It also provides you with the overview of how your admins or contributors are working with labeling your Fabric items. If your organization have defined data stewards, this would be where you could see the overview of how your data stewards are governing the Fabric items.

Why is Data Governance in Fabric important?

Fabric is a great tool in the way it lowers the barrier of starting to develop new data assets. Also, as it is built from the business user perspective, starting from the Power BI user interface, it also lowers the technical barrier for many Power BI report developers and business analysts to do more with their data assets.

This is a great advantage, but also opens up for some new challenges. Anyone who has been governing BI assets in the past knows the struggle of making sure the reports are developed, managed, and governed in the right way. With Fabric, lowering the technical barrier to do more with your data, and moving further back in your development process, it also becomes easier to do things the wrong way.

Therefore, I think governance in Fabric is more important than ever.

Hope you found this article helpful!

Usefull links:

What is OneLake in Microsoft Fabric? — 4. Oct 2023

What is OneLake in Microsoft Fabric?

OneLake in Fabric is a Data Lake as a Service solution that provides one data lake for your entire organization, and one copy of data that multiple analytical engines can process.

Microsoft Fabric is the new and shiny tool that Microsoft released May 23rd during Build. There are multiple very interesting features and opportunities that follows with Fabric, but as there already exist some great articles that give a nice overview, I want to dig into some of the spesific Fabric components more in details.

So, let’s start with the feature that to me is one of the most game changing ones. A fundamental part of Microsoft Fabric, the OneLake.

Content

  1. Content
  2. What is OneLake in Microsoft Fabric?
  3. How can you use File Explorer with your OneLake in Microsoft Fabric?
  4. File Structure in OneLake
  5. Access and Permissions in OneLake
  6. What are the benefits of OneLake?

What is OneLake in Microsoft Fabric?

First, let’s start with an introduction. In short:

OneLake = OneDrive for data

The OneLake works as a foundation layer for your Microsoft Fabric setup. The idea is that you have one single data lake solution that you can use for your entire organisation. That drives some benefits and reduces complexity:

  • Unified governance
  • Unified storage
  • Unified transformations
  • Unified discovery

Per tenant in Microsoft Farbic you have ONE OneLake that is fully integrated for you. You do not have to provision it or set it up as you would with your previous Data Lakes in Azure.

OneLake is the storage layer to all tour Fabric experiences, but also to other external tools. In addition, you can virtually copy data you have in other storage locations into your OneLake using shortcuts. Shortcuts are objects in OneLake that point to other storage locations. This feature deserves its own blog post, so for now, let’s just summarize what OneLake is with the following:

OneLake in Fabric is a Data Lake as a Service solution that provides one data lake for your entire organization, and one copy of data that multiple analytical engines can process.

How can you use File Explorer with your OneLake in Microsoft Fabric?

So, as OneLake is your One Drive for data, you can now explore your data in your File Explorer. To set this up you need to download the OneLake file explorer application that integrates Microsoft OneLake with the Windows File Explorer. This can be done through this link: https://www.microsoft.com/en-us/download/details.aspx?id=105222

After downloading the application, you log in with the user you are using when logging into Fabric.

You can now view your workspaces as folders in your File Explorer.

You can then open up the workspaces you want to explore and drill down to specific tables. Below I have opened up my Sales Management workspace, then opened the data warehouse I have created in that workspace and then the tables I have in my data warehouse.

This also means that you can drag and drop data from your File Explorer into your desired Fabric folders – but not for all folders. This works if you want to drag and drop files to your Files folder in your datalake instead of uploading the files directly inside Fabric.

Below, I dragged my winequality-red.csv file from my regular folder to my Files folder inside a DataLake in OneLake.

It will then appear inside the DataLake explorer view in Fabric:

File Structure in OneLake

You can structure your data in OneLake using the Workspaces in Fabric. Workspaces will be familiar to anyone who has been using Power BI Service.

The Workspaces creates the top folder structure in your OneLake. These work as both storage areas and a collaborative environment where data engineers, data analysts and business users can work together on data assets within their domain.

The lakehouse and data warehouse that you might have created in your workspace will create the next level in your folder structure as shown below. This shows the Folder View of your workspaces.

Access and Permissions in OneLake

How do you grant access to the OneLake?

Inside a workspace there are two access types you can give:

  • Read-only
  • Read-write

The read-only role is a viewer role that can view the content of the workspace, but not make changes. The role can also query data from SQL or Power BI reports, but not create new items or make changes to items.

Read-write is the Admin, Member and Contributor role. They can view data directly in OneLake, write data to OneLake and create and manage items.

To grant users direct read access to data in OneLake, you have a few options:

  1. Assign them one of the following workspace roles:

    • Admin: This role provides users with full control over the workspace, including read access to all data within it.
    • Member: Members can view and interact with the data in the workspace but do not have administrative privileges.
    • Contributor: Contributors can access and contribute to the data in the workspace, but they have limited control over workspace settings.
  2. Share the specific item(s) in OneLake with the users, granting them ReadAll access. This allows them to read the content of the shared item without providing them with broader access to the entire workspace.

By utilizing these methods, you can ensure that users have the necessary read access to the desired data in OneLake.

What are the benefits of OneLake?

So to conclude, let’s try and summarise some of the benefits you get with OneLake in Fabric.

  • OneLake = One version of your data
    • No need to copy data to use it with another tool or to analyze it alongside other data sources. The shortcuts and DirectLake features are important enablers for this. These features deserve a separate blog.
  • Datalake as a Service
    • For each tenant you have a fully integrated OneLake. No need to spend time provisioning or handling infrastructure. It works as a Datalake as a Service.
  • Multiple Lakehouses
    • OneLake allows for the creation of multiple lakehouses within one workspace or across different workspaces. Each lakehouse has its own data and access control, providing security benefits.
  • Supports Delta Format
    • The OneLake supports the delta file format, which optimizes data storage for data engineering workflows. It offers efficient storage, versioning, schema enforcement, ACID transactions, and streaming support. It is well-integrated with Apache Spark, making it suitable for large-scale data processing applications.
  • Flexible Storage Format
    • You can store any type of file, structured or unstructured. That means that data scientists can work with raw data formats, while data analysts can work with structured data inside the same OneLake.
  • OneLake Explorer
    • Easy access to your OneLake with the OneLake Explorer to get a quick overview of your data assets, or upload files to your lakehouse.
  • Familiar UI
    • For any Power BI developer, the touch and feel of Fabric will be familiar.

Hope you found this blog useful! Let me know if you have started using OneLake!

Usefull links:

Join WITs Who Lunch on Meetup! — 2. Oct 2023

Join WITs Who Lunch on Meetup!

Hurray!

I’m pleased to announce the launch of WITs Who Lunch’ very own Meetup group! This is an exciting milestone in our journey, as it allows us to streamline group management and enhance accessibility for our members. With automatic notifications and calendar events, staying updated and connected will be a breeze. Moreover, our reach will expand, hopefully making it possible for us to connect with even more awesome WITs.

I want to thank all the awesome women who have contributed to our events so far!

To sign up for our meetup group, scroll down to How can I join? in this article.

If you want to learn more about this group, continue to read ❤

  1. Who are we?
  2. Who can join?
  3. How can you join?
  4. How can you contribute?
    1. Sponsor
    2. Come share your knowledge with us
    3. Spread the word!

Who are we?

Ok, so here are some stats!

Since May we have hosted three meetups, and the next one is planned in October.

  • May: Lunch – Get to know each other!
  • June: Evening Seminar – Imposter Syndrome
  • August: Evening Seminar – Power Imbalance

Safe to say, we do not only lunch!

We are currently 70 members in our closed LinkedIn group. Let me know if you want to be part of this group. The group is for sharing tips, events, insights, opinions, etc. in a closed and safe environment.

Every month we decide together when we meet, or vote on our LinkedIn group, on what the next topic should be. Topics that we are looking into are How to master LinkedIn, Failure Feast and Executive Precence.

Below you can see the wordcloud from the first survey that was sent out before we met, asking for expectations of our attendees.

Who can join?

  • You identify as a woman
  • You are working in tech
  • You are close enough to Oslo to join a evening seminar, lunch or breakfast once a month
  • You are looking for other awesome techies that you can have great discussions with, learn from and motivate!

How can you join?

Now this is easy! Just create a free membership on Meetup if you fo not have one already and join our group “WITs Who Lunch” here: https://www.meetup.com/wits-who-lunch/

If you want to join our closed group on LinkedIn to contribute to discussions and share content, contact me on LinkedIn. You can find my LinkedIn profile HERE.

How can you contribute?

Moving this concept from a “let’s just meet” to an organization gains some benefits on sponsor opportunities. We therefore welcome sponsors who would like to contribute. We don’t have many costs, but would be grateful for help with:

  • Meetup membership
  • Coffee at events
  • Snacks at event

If you are working for a company that could sponsor us please contact me through my LinkedIn profile HERE.

Come share your knowledge with us

Are you a woman that has already made it? Maybe you are a leader or an acknowledged technical expert within your field?

We would love you invite you to share your knowledge with us! If you want to contribute or know someone who might want to contact me through my LinkedIn profile HERE.

Spread the word!

If you are not able to join yourself but know other women you think might want to join, please share this blog post with them.

Why am I doing this?

After starting out in the workforce not that long ago I was a bit disappointed. There are so many amazing initiatives out there. And I am sure we have come a long way! Still, I think we can go a bit further.

Picture creds: Rodney Kidd

I am also lucky enough to be on the organizing Data Saturday Oslo and Microsoft Data Platform User Group Norway. Here I see that there are more male speakers and attendees than female. Why is that?

  • Anyone who works closely with me knows that I always nag about speaking opportunities and meetups. Still, I don’t see the changes I want happening as fast as I want them to. Is it because I am not reaching broad enough? Hopefully, this is a way to reach more of you and inspire you to join the data community.
  • I get a lot of my technical input from social media as LinkedIn and Twitter. This is also where I often discover events, meetups and opportunities to present and meet other techies. So what if I did not follow the “right people” on these social media? Would I not get all these opportunities then? Let’s, therefore, connect so we can give each other tips on tech updates and events to attend!
  • I see that there are more men attending than women at meetups and conferences. I wonder if that might be because we are missing someone to attend with. I rarely see multiple women attending together. I, therefore, hope this group will connect us so we can attend together!
  • Also, I see that men in general are great at letting each other know that they are performing well. They are so good at saying “You are doing a great job, Buddy”. Loud. At the coffee station. Or in the comments on social media. Let’s build a group where we can do more of this! I think we as Norwegian women have something to learn from our male colleagues here. Let’s try and be a bit louder in general, and also when we cheer each other on.
  • Let’s be each other’s Kitchen cabinet! I got this advice from a WIT lunch at PASS in 2022, and I love it. “Kitchen cabinet” refers to any group of trusted friends and associates, particularly in reference to a president’s or presidential candidate’s closest unofficial advisers (Wikipedia). I am hoping this could be an arena to build kitchen cabinets of trusted advisors who can give advice, help and support when needed. Hopefully, it will help you keep on going, and gain confidence and strength when needed.
  • And, I want to give a shout-out to Deborah Melkin and her blog A Woman in SQL 2023 where she digs into the numbers and basically says We need to be doing more. Reading that blog post was the last nudge I needed to – just do this! Thank you!
What are Dataflows Gen 2 in Fabric? — 16. Jun 2023

What are Dataflows Gen 2 in Fabric?

I have previously written a post on Power BI Dataflows explaining what it is, how you can set it up when you should use it, and why you should use it. I am a big fan of the Gen 1 Power BI Dataflows. So now, with the new introduction of Dataflows Gen 2 in Fabric, I had to take a deeper look.

In this article, we will look at the new features that separate Dataflows Gen 2 from Dataflows Gen 1. Then we’ll have a look at how you can set it up inside Fabric before I try to answer the when and why to use it. What new possibilities do we have with dataflows Gen 2?

After digging into the new dataflows Gen 2, there are still unanswered questions. Hopefully, in the weeks to come, new documentation and viewpoints will be available to answer some of these.

To learn the basics of a dataflow you can have a look at my previous article regarding dataflows gen 1.

  1. What are Dataflows Gen 2 in Fabric?
  2. What is the difference between Dataflows Gen 1 and Gen2 in Fabric?
    1. Output destination
    2. Integration with datapipeline
    3. Improved monitoring and refresh history
    4. Auto-save and background publishing
    5. High-scale compute
  3. How can you set up Dataflows Gen 2 in Fabric?
  4. When should you use Dataflows Gen 2 in Fabric?
    1. Limitations
  5. Why should you use Dataflows Gen 2?

What are Dataflows Gen 2 in Fabric?

To start, Dataflows Gen 2 in Fabric is a development from the original Power BI Dataflows Gen 1. It is still Power Query Online that provides a self-service data integration tool.

As previously, you can create reusable transformation logic and build tables that multiple reports can take advantage of.

What is the difference between Dataflows Gen 1 and Gen2 in Fabric?

So, what is new with Dataflows Gen 2 in Fabric?

There are a set of differences and new features listed in the Microsoft documentation here. They provide the following table.

FeatureDataflow Gen2Dataflow Gen1
Author dataflows with Power Query
Shorter authoring flow
Auto-Save and background publishing
Output destinations
Improved monitoring and refresh history
Integration with data pipelines
High-scale compute
Get Data via Dataflows connector
Direct Query via Dataflows connector
Incremental refresh
AI Insights support

But what is different, and what does that mean? I would say the features output destination and integration with data pipelines are the most existing changes and improvements from Gen 1. Let’s have a look.

Output destination

You can now set an output destination for your tables inside your dataflow. That is, for each table, you can decide if by running that dataflow, the data should be loaded into a new destination. Previously, the only destination for a dataflow would be a power bi report or another dataflow.

The current output destinations available are:

  • Azure SQL database
  • Lakehouse
  • Azure Data Explorer
  • Azure Synapse Analytics

And Microsoft says “many more are coming soon”.

Integration with datapipeline

Another big change is that you can now use your dataflow as an activity in a datapipeline.  This can be useful when you need to perform additional operations on the transformed data, and also opens up for reusability of transformation logic you have set up in a dataflow.

 

Improved monitoring and refresh history

In Gen 1 the refresh history is quite plain and basic as seen from the screenshot below.

In Gen 2, there have been some upgrades on the visual representations, as well as the level of detail you can look into.

Now you can easier see what refreshes succeeded and which ones failed with the green and red icons.

In addition, you can go one step deeper and look at each refresh separately. Here you get details on request ID, Session ID and Dataflow ID as well as seeing for the separate tables if they succeeded or not. This makes debugging easier.

Auto-save and background publishing

Now, Fabric will autosave your dataflow. This is a nice feature if you for whatever reason suddenly close your dataflow. The new dataflow will be saved with a generic name “Dataflow x” that you can later change.

High-scale compute

I have not found much documentation on this, but in short dataflow Gen 2 also got an enhanced compute engine to improve performance similar to Gen 1. Dataflow Gen 2 will create both Lakehouse and Warehouse items in your workspace and uses these to store and access data to improve performance for your dataflows.

How can you set up Dataflows Gen 2 in Fabric?

You can create a Dataflow Gen2 inside Data Factory inside of Fabric. Either through the workspace and “New”, or the start page for Data Factory in Fabric.

Here you can choose what source you want to get data from, if you want to build on an existing data flow, or if you want to import a Power Query Template.

If you have existing dataflows you want to use, you can choose to export them as a template and upload it as a starting point for your dataflow.

When should you use Dataflows Gen 2 in Fabric?

In general, the dataflows gen 2 can be used for the same purpose as dataflows Gen 1. But what is special about Dataflows Gen 2?

The new data destination feature combined with the integration to datapipeline provide some new opportunities:

  •  You can use the dataflow to extract the data and then transform the data. After that, you now have two options:
    • The dataflow can be used as a curated dataset for data analysts to develop reports.
    •  You can choose a destination for your transformed tables for consumption from that destination.
  •  You can use your dataflow as a step in your datapipeline. Here there are multiple options, but one could be
    • Use a dataflow to both extract and transform/clean your data. Then, invoked by your datapipeline, use your preferred coding language for more advanced modelling and to build business logic.

The same use cases that we had for dataflows Gen 1 also apply to dataflows Gen 2:

Dataflows are particularly great if you are dealing with tables that you know will be reused a lot in your organization, e.g. dimension tables, master data tables or reference tables.

If you want to take advantage of Azure Machine Learning and Azure Cognitive Services in Power BI this is available to you through Power BI Dataflows. Power BI Dataflows integrates with these services and offers an easy self-service drag-and-drop solution for non-technical users. You do not need an Azure subscription to use this but requires a Premium license. Read more about ML and Cognitive Services in Power BI Dataflows here.

In addition, Power BI Dataflows provides the possibility to incrementally refresh your data based on parameters to specify a date range. This is great if you are working with large datasets that are consuming all your memory – but you need a premium license to use this feature.

Limitations

But, there are also some limitations with dataflows Gen 2 stated by Microsoft:

  •  Not a replacement for a data warehouse.
  •  Row-level security isn’t supported.
  •  Fabric capacity workspace is required.

Why should you use Dataflows Gen 2?

As for the Gen 1 dataflows, Gen 2 can help us solve a range of challenges with self-service BI.

  • Improved access control
  • One source of truth for business logic and definitions
  • Provides a tool for standardization on the ETL process
  • Enables self-service BI for non-technical users
  • Enables reusability

But there are still some unanswered questions

Even though the new additions to Dataflows Gen 2 are exciting, there’s still some questions that remain unanswered.

As I read more documentation and get more time to play around with the tool, I hope to be able to update this article with answers.

  • What about version control? If you edit a dataflow as a transformation activity in your data pipeline it is important to be able to back track changes and be able to roll back to previous versions. How would that work?
  • What are the best practices? Is it best to use Power BI dataflows as the main ETL tool now, or should we use pipelines. Should dataflows be mainly used for simple transformations as cleansing, or should we perform as much transformation and logic development as possible?
    • To mainly use dataflows for simple clean up transformations and then use a notebook in a pipeline for more advanced transformations would be my first guess. But then the question on what provides best performance come up.

So, to conclude, the new dataflow Gen 2 features are awesome. It opens up some very exciting new opportunities for your ETL process. The question now is when those opportunities are something you should take advantage of, and when you should not.