How do you set up your Data Mesh in Microsoft Fabric?

Coming from the world of Microsoft analytics I got curious as to why Microsoft chose to go with Fabric as the name of the newly released analytical solution Microsoft Fabric. Looking into this, the architectural concept of Data Fabric appeared to me. I had been working with Data Mesh for a while, but the Fabric architecture was something I hadn’t really heard about before. Meaning of course that I needed to know more.

I was going to write a short blog about the differences between Fabric Architecture and Data Mesh and then see how these two architectures look inside Microsoft Fabric. Turns out there is too much to say, so I had to turn this into a mini-series.

The second out is Data Mesh in Microsoft Fabric!
So, let’s have a look at how you implement a data mesh architecture in Fabric

Stay tuned for more content on Fabric Architecture, and how Data Mesh and Fabric Architecture can be set up in Microsoft Fabric!

  1. What is Data Mesh Architecture?
  2. How do you handle your data products in Microsoft Fabric?
    1. Accessibility
    2. Interoperability
    3. Trusted
    4. Reusability
    5. Findable
    6. Limitations
  3. How do you handle your data domains in Microsoft Fabric?
    1. What are Domains in Microsoft Fabric?
    2. Limitations
  4. How do you handle your self-serve data platform with Microsoft Fabric?
  5. How do you set up data governance with data mesh in Microsoft Fabric?
    1. Federated Governance
    2. Centralized Governance
    3. Limitations
  6. Summary

What is Data Mesh Architecture?

The first post in this mini-series answers this question, looking into what data mesh is, different topologies, challenges and benefits. If you want to read up on the data mesh approach before looking into how this applies to Microsoft Fabric, you can take a look at that blog post here:

In short, the data mesh is an approach to how you manage your data. Data Mesh is not only an architectural and technical approach. To be successful, you also need to change your organisational processes and structure. Having both IT and the business onboard is therefore a crucial success factor when implementing data mesh in your organisation.

The data mesh approach comes from the domain-driven design and bounded context way of thinking. We find these concepts in the 4 components that make up data mesh:

  • Domains
  • Products
  • Federated Governance
  • Self-Service Data Platform

How do you handle your data products in Microsoft Fabric?

By introducing product thinking into the world of data, you get Data Products. A data product is a data asset that should be

  • Accessible
  •  Interoperable
  •  Trusted
  •  Reusable
  •  Findable

The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers.

So, what would a data product look like in Fabric? Sorry to disappoint, but that depends. It depends on how you define a data product in your organisation. Is it a table, a SQL view, a table, a Power BI semantic model, a dataflow or even a Power BI report? Or can it be all of these things?

Let’s have a look at how you can set up your data product in Fabric:

Accessibility

A data product is made accessible through one or more output ports for data consumers. In Fabric there are muliple ways of distributing your data products, but again – it denends on what your data product looks like.

For accessibility outside of Fabric, you can use a SQL endpoint or the underlying ADLS Gen 2 connection that your OneLake is based on.

For internal accessibility inside of Microsoft Fabric you can use a dataflow endpoint, a semantic model connection or an internal shortcut. Or it can just be accessible inside a workspace within one domain where other domains can connect to it an load it using their prefered integration tool within their domain.

Interoperability

A data product is interoperable through its metadata that also holds some of the standardization of the data product, as the schema and semanitcs. Below is a screenshot from Fabric of a dataset with a location, labelled endorsement, refresh date and sensitivity label.

Trusted

The metadata also enforces some of the governance over the data product with its ownership, and security or rights to use that ensures that our data product is trusted. in addition, the observability of the data products provides us with information about the SLA, timeliness and quality of the data product. This is all part of how we can trust our data product.

In Microsoft Fabric, the refresh log provides us with the observability of the data product. For the SLA and data quality, there is no documentation possibility inside of Microsoft Fabric, unless you buy the data catalog purview as an additional tool that integrates with Microsoft Fabric. Here you can document your data products. Purview could also help ensure that a data product is findable through search as a point further below. Still, as Purview is an external tool that requires an additional licence, this is not further considered in this blog.

Reusability

In Fabric, you can reuse a table, dataflow or dataset as needed to develop other data products. An example is the semantic link from the semantic model that now can be queried inside a notebook in your data lakehouse.

Findable

One way to better manage a data product in Microsoft Fabric could be to take advantage of labelling where you can put “Endorsed” and “Certified” on items. For instance, to determine if an item is a data product, you can rely on the labelling in Microsoft Fabric saying “Certified”. However, it’s important to note that this labelling restricts its use to other items in Fabric. Additionally, it is essential to ensure that the label is retained specifically for this purpose.

Limitations

In terms of data products, there are a few features that could enhance the Fabric platform:

  1. Tagging and Categorization: Introducing a capability to tag or categorize data items within Fabric would allow users to easily label their data as a certified Data Product. This would enable efficient organization and retrieval of specific datasets.
  2. Journey Tracking and discoverability: It would be beneficial to have a feature in Fabric that tracks the journey of a data product throughout its lifecycle. This way, users can easily monitor and trace the movement of their data items within the platform.
  3. Documentation and Restrictions: Providing more comprehensive documentation for data products is crucial. Users should have access to clear instructions on how to utilize and connect to the data, as well as any associated restrictions on usage. This information will help users leverage the data effectively and ensure compliance with any contractual obligations.
  4. Data Product Contract Specification: Introducing a data product contract specification feature in Fabric would be advantageous. This would allow users to define contractual terms for their data products. The contract could specify details such as permitted usage, data access restrictions, and any specific requirements for utilizing the data.

By incorporating these features, Fabric could offer a more robust and user-friendly experience for managing data products.

How do you handle your data domains in Microsoft Fabric?

A domain is a grouping of data, technology, teams, and people that work within the same analytical realm, usually within a business area. Examples of domains could be the organizational structure, like Sales, Marketing, Finance, and HR. It can also be more fine-grained or connected to a value chain, like Orders, Production, and Distribution. All of this depends on your organization and how the domains would serve you and your business in the best way. The key thing here is that each domain can be autonomous and have ownership over the data products that naturally belong to their domain.

Since each domain is autonomous, they can develop their data products and govern these as desired. Still, the governance should be aligned with the centralized data governance. I will come back to this. The ETL and data modelling are also handled locally by the domain while taking advantage of the self-service data platform provided through the central organization.

What are Domains in Microsoft Fabric?

To understand how you can set up your governance in Fabric, let us first take a look at the new functionality that was added to Fabric, Domains.

The domains introduced in Fabric are a way to support the data mesh architecture. You can assign the workspaces owned by a business area to one domain. So all your sales-related workspaces could be in your Sales domain, while your marketing-related workspaces could be inside the Marketing domain.

Microsoft Fabric is built from the Power BI developer perspective, and the specialist tools that Microsoft has brought into Fabric as Synapse and ADF are now more available for the generalist. This enables domains to become more independent of technical competence and self-sufficient. This drives the efficiency of each domain.

Limitations

The management of each domain could have been more detailed inside the admin portal of Microsoft Fabric. Today you can only automate what workspaces are places in which domains and who can be the admin of this. It would have been interesting if you could set up policies for each domain and have more possibilities to govern this.

How do you handle your self-serve data platform with Microsoft Fabric?

The next component of the data mesh approach is the self-serve Data Platform. The self-serve data platform is a centralized service that serves the domains with their need for infrastructure, storage, security, access management, ETL pipelines and more. Some call this data infrastructure a platform to highlight that the self-serve platform should serve the infrastructure, i.e. all the technical components and their integrations required to build data products.

This simplicity in distributing Microsoft Fabric as a data platform as a service is one of the biggest strengths. As it is a SaaS it provides you with all the necessary infrastructure and integrations to start building your data products.

By designating a Microsoft Fabric domain for each data mesh domain, organizations effortlessly extend the self-serve capabilities to every corner of their ecosystem. This inclusivity means that every Fabric capacity holder gains unfettered access to the diverse experiences offered by Fabric, empowering them to develop their individualized data products.

How do you set up data governance with data mesh in Microsoft Fabric?

To make each domain autonomous in its data product development and management, a federated governance model is used. The federation of governance is a way of deferring responsibilities to enable scalability. It promotes independence and accountability. This way, the domains can govern their data products in a way that is effective and makes sense to them.

Still, there should be some centralized governance providing a set of standards, and best practices, setting the necessary boundaries, and being a center of excellence, providing expertise to the federated domains.

In this article, I will focus on how data governance fits into the data mesh architecture. For those interested in the specific governance features of Microsoft Fabric, a blog post on setting up data governance within this framework is available.

How do you set up your Data Governance in Microsoft Fabric?

Federated Governance

The Domains enables federated governance in Microsoft Fabric. There is also a new role created with this and that is the domain admin that can delegate responsibilities to contributors.

Through your workspace roles, you manage who can work with what from a technical developer’s perspective. To see the different capabilities that the different roles have, refer to the Microsoft documentation here.

Centralized Governance

Fabric in itself will enable some standardization and enforce some standardisation and controle as there is a finite number of ways to develop your products and make them accessible.

The Purview hub within Microsoft Fabric emerges as a cornerstone for centralized computational governance. This hub offers a level of centralization that enables a comprehensive overview of domains and data products, allowing stakeholders to assess the status of each domain. It serves as a control centre, facilitating both a holistic perspective and the ability to drill down into individual domains for detailed evaluations.

In Microsoft Fabric you can also take advantage of some built-in policies such as the Data Loss Prevention policies and labelling that is further described in the blog linked above.

Limitations

While Microsoft Fabric inherently provides a level of standardization and control due to its finite number of development approaches and accessibility options, there are limitations. Notably, the platform currently lacks the capability to establish standards and patterns for solution development. More possibilities and granular control levels to set up access policies and development policies would be interesting.

Another example where Microsoft Fabric falls short is Master Data Management. There is no integrated solution enabling functionalities such as survivorship and the creation of a golden record, necessitating reliance on external tools.

Summary

In summary, while there are limitations in the Microsoft Fabric setup when implementing a data mesh architecture, I believe that Microsoft Fabric significantly enables some of the most crucial features of data mesh, particularly infrastructure as a service from a central team and the inherent enforcement of central governance through the limited number of methods for the domains to develop their products. While additional levels of control and options to monitor the data product journey would have been desirable, I am currently of the opinion that Microsoft Fabric comes pretty close.

Hope you found this article helpful!

Usefull links: