How do you set up your Data Governance in Microsoft Fabric?


What is Data Governance in Microsoft Fabric?

So, what is data governance in Fabric? The governance domain contains many capabilities. If you follow the DAMA approach, you know that they have divided Data Governance into 10 capabilities, looking into architecture, data warehousing, operations, security, quality, and everything you do with your data from the source to delivered insights.

In Fabric, obviously, everything regarding Data Governance is still important. Despite that, in this article, I will focus on the specific Fabric components and features that help you govern your data in Fabric.

Let’s take a look at the new features Domains in Fabric, how Data Lineage is implemented, Roles and access management, policies and processes, and the purview hub.

I have previously written a blog on what your Power BI governance should contain. As fabric makes up more of your data ecosystem, you will additionally need to focus on other governance capabilities. Still, if you want to look into Power BI-specific governance, you can have a look at that one here:

What are Domains in Fabric?

To understand how you can set up your governance in Fabric, let us first take a look at the new functionality that was added to Fabric, Domains.

Today, the distributed and federated model is becoming more and more popular for organizations. The data mesh architecture is gaining traction, where you decentralize data architecture and have each business domain govern their own data.

The domains introduced in Fabric are a way to support this data mesh architecture. You can assign the workspaces owned by a business area to one domain. So all your sales-related workspaces could be in your Sales domain, while your marketing-related workspaces could be inside the Marketing domain.

Which Roles do we have in Fabric?

In Microsoft Fabric, you can divide your roles into three areas. You have your domain roles, tenant roles, and workspace roles. The Capacity admin and domain admin can delegate some of their responsibilities to contributors.

In the world of data governance, your domain admin could be your data owner or a technical resource working on behalf of the data owner, while the domain contributor could be your data steward. You could also give the domain admin role to your data stewards, depending on the role definitions in your organization.

The capacity admin and capacity contributor, as well as the overall Fabric admin, would normally be assigned to technical roles in your organization.

Through your workspace roles, you manage who can work with what from a technical developer’s perspective. To see the different capabilities that the different roles have, refer to the Microsoft documentation here.

Access Management in Fabric

There are four permision levels in Fabric

  • Workspace Permission
  • Item Permission
  • Compute Permission
  • Row and column level permission

Workspace permission in Fabric

Workspace permission provides access to all the items that are inside a workspace. That means you get access to all datalakes, data warehouses, data pipelines, dataflows, datasets, reports, etc. in the workspace.

In the workspace, there are also two access types you can give:

  • Read-only
  • Read-write

The read-only role is a viewer role that can view the content of the workspace, but not make changes. The role can also query data from SQL or Power BI reports, but not create new items or make changes to items.

Read-write is the Admin, Member, and Contributor role. They can view data directly in OneLake, write data to OneLake, and create and manage items.

Items permission in Fabric

Item permission makes it possible to provide access to one spesific item in the workspace direcly, without granting access to the workspace and all the items in the workspace.

This can be done through two methods:

Give access through Sharing

This feature can be configured to give connect-only permissions, full SQL access, or access to OneLake and Apache Spark.

In the Microsoft documentation page you can read the details on what the different sharing permissions provide access to

Give access through Manage Permissions

Here you can give direct access to items, or manage your already provided accesses.

Compute permission in Fabric

You can also provide access through the SQL endpoint in Fabric.

As an example, if you want to provide viewer-only access to the lakehouse, you can grant the user SELECT through the SQL endpoint.

Or, if you want to provide granular access to specific objects within the Warehouse, share the Warehouse with no additional permissions, then provide granular access to specific objects using T-SQL GRANT statement.

Column-Level & Row-Level Security for Farcic Warehouse & SQL Endpoint in Fabric

On October 3rd, Microsoft announced the public preview of Column-level and Row-level security for the Fabric warehouse and SQL endpoint.

Read the full announcement here: https://blog.fabric.microsoft.com/en-us/blog/announcing-column-level-row-level-security-for-fabric-warehouse-sql-endpoint

Row-level security allows you to control access to specific roles in your table for certain users or groups. This means you don’t have to create separate tables or reports to provide access to only certain parts of your data for specific users. For example, you can give a store manager access to only the sick leave data of their employees.

Column-level security works similarly, but it operates at the column level. This means you can restrict access to specific columns of a table, such as GDPR-related data like a customer’s full name, while allowing more users to access the remaining data in that table.

These ways of providing access can help you simplify management, reduce duplication, and increase the security of your data assets.

Best practices for access in Fabric

Microsoft has provided som general advice in what access type you should use when providing access to workspaces and specific items in Fabric. The following advice was found in the documentation here.

Write access: To have write access, users must be in a workspace role that allows writing. This applies to all data items, so limit workspaces to a single team of data engineers.

Lake access: To allow users to read data in OneLake, add them as Admin, Member, or Contributor, or share the item with ReadAll access.

General data access: Users with Viewer permissions can access data through the SQL endpoint for warehouses, lakehouses, and datasets.

Object level security: To keep sensitive data safe, grant users access to a warehouse or lakehouse SQL endpoint using the Viewer role. Then, use SQL DENY statements to limit access to specific tables.

Processes and Policies in Fabric

Information Protection in Fabric

Information Protection in Microsoft Fabric is based on labeling your data. This way you can set up sensitivity labels on your data in Fabric in order to monitor it and ensure that data is protected, even if it is exported out of Fabric.

These sensitivity labels are set up through the Microsoft Purview portal.

On Microsofts documentation pages you can see what type of labeling is possible, what scenario you should use what, and if it currently is supported in Fabric. See the full documentation here: https://learn.microsoft.com/en-us/fabric/governance/information-protection

Below I have pasted the label overview from that documentation:

Data Loss Prevention in Fabric

You can also set up Data Loss Prevention (DLP) policies in Fabric. So far it is only supported for datasets. You set up these DLP policies inside the Microsoft Purview compliance portal.

When setting this up in the Microsoft Purview portal it looks like the only policy category supported for Power BI/Fabric now is “Custom”.

For the DLP policy you can set up a set of actions that will happen if the policy detects a dataset that contains sensitive data. You can either set up:

  • User Notification
  • Alerts sent by email to aministrators and users

The DLP Policy will run every time a dataset is:

  • Published
  • Repluished
  • Refreshed through an on-remand refreshed
  • Refreshed through a scheduler refresh

When using the DLP feature, it is important to remember that a premium license is required. Additionally, it is worth noting that the DLP evaluation workload utilizes the premium capacity associated with the workspace where the dataset being evaluated is located. The CPU consumption of a of the DLP evaluation is calculated as 30 % of the CPU concumed by the action that triggered the evaluation. If you use a Premium Per User (PPU) license the cost of the DLP is covered up front by the lisence cost.

Endorsement in Fabric

In Fabric you can endorse all items except for the Power BI dashboards. Endorsement is a label you can use on your items to tell your Fabric users that this items hold some level quality.

There are two endorsments you can give an item:

  • Promoted
    • What is it?
      • Users can label items a spormoted if they think the item hold a high standard and could be valuable for others. Somone think the item is ready to use inside the organisation and valuable to share!
    • Who can promote?
      • Content owners and memebers with write permissions to items can promote.
  • Certified
    • What is it?
      • Users can label items as certified if the item meet organizational quality standards, is reliable, authorative and ready to use accross the organixation. Ites with this label holds a higher quality than the promoted ones.
    • Who can certifiy?
      • Fabric administrators can authorize selected users to assign the certified label. Domain administrators can be delegated the eanblement and configuration of specifying reviewers within each domain

Data Activator in Fabric

Microsoft released Data Activator in public preview on October 5th. It’s a tool that helps you automate alerts and actions based on your Fabric data. Using Data Activator, you can avoid the need to constantly monitor operational dashboards manually, helping you govern your data assets in Fabric.

Data activator desereves its own blog post, so I will just mention it here as a component to take advantage of in your Data Governance setup.

Read the full announcement here: https://blog.fabric.microsoft.com/en-us/blog/announcing-the-data-activator-public-preview/

Data Lineage and Metadata in Fabric

Data Lineage and Metadata management is an important enabler for governance. It help you get an overview of your data assets and can also be used to enable obsservability features of your data assets.

Metadata scanning in Fabric

In Fabric you can take advantage of metadata scanning through Admin REST APIs to get information as name of item, owner, senseitivity label, endorsement. Also, for datasets you can get more detailed information about that item as table and column name, DAX expressions and measures. To use this metadata scanning to collect information for your data consumers to look up existing ddata assets, and for the administrators and governance roles to manage the assets is benefitial.

There are four scanner APIs that can be used to catalog your data assets:

Lineage in Fabric

Each workspace in Fabric got a lineage view that can be acceessed by anyone with the Admin, Member or Contributor role for that workspace.

Lineage provdes an overview of how data flows through your items in Fabric, and can be a great way to answer questions as “What is my source for this report?”, “If I change this table, will any other data assets be affected?” and so on.

To view the lienage view for a workspace, click the lineage sign on the top right of the workspae. To see the lineage focused for one specific item, click on the “Show lineage” symbol on the right side of that object. To see the impact analysis, click the “Show impact accross workspace” symbol.

Microsoft Purview Hub in Fabric

In Fabric, administrators have access to Microsoft Purview hub that is a centralized page in Fabric that provide insight on the current state of their data assets.

Inside the Microsoft Purview hub consists of two main components:

  • A portal that will send you to Microsoft Purview. You need to have purchsed Microsoft Purview to take advantage of this.
  • A Microsoft Fabric data report that give you insights on all your Fabric Items. You can open the fill report to view more detailed information in the following report pages:
    • Overviw Report
    • Endorse Report
    • Sensitivity Report
    • Helo
    • Items page
    • Sensitivity page

This report give you insights on how many of your items are labeled with endorsement or sensitivity by item type and workspace. It also provides you with the overview of how your admins or contributors are working with labeling your Fabric items. If your organization have defined data stewards, this would be where you could see the overview of how your data stewards are governing the Fabric items.

Why is Data Governance in Fabric important?

Fabric is a great tool in the way it lowers the barrier of starting to develop new data assets. Also, as it is built from the business user perspective, starting from the Power BI user interface, it also lowers the technical barrier for many Power BI report developers and business analysts to do more with their data assets.

This is a great advantage, but also opens up for some new challenges. Anyone who has been governing BI assets in the past knows the struggle of making sure the reports are developed, managed, and governed in the right way. With Fabric, lowering the technical barrier to do more with your data, and moving further back in your development process, it also becomes easier to do things the wrong way.

Therefore, I think governance in Fabric is more important than ever.

Hope you found this article helpful!

Usefull links: