By: Koen Verbeeck | Updated: 2023-12-15 | Comments (1) | Related: > Microsoft Fabric
Problem
I've been reading some tips about Microsoft Fabric on this website with great interest, and the term "OneLake" was used in some of them. Is this a data lake or something? What is its purpose?
Solution
Microsoft Fabric is a new end-to-end unified analytics platform in the cloud. It can handle different workloads, such as Spark or warehousing with T-SQL. One of its key features is the separation of storage and compute, allowing you to run those different compute workloads on the same sets of data. The driving force behind this is OneLake. When we represent Fabric in a diagram, we can see OneLake is the storage layer for every possible compute in Fabric:
OneLake is a single logical data lake for the entire Fabric platform in your organization. Microsoft commonly compares OneLake with OneDrive: "OneLake is the OneDrive for your data." Just like OneDrive, OneLake is a single place to store all your analytics data. It's a logical lake because, from the user's perspective, it looks like there's only one storage account, but there may be more than one Azure Data Lake storage account involved. However, OneLake acts as an interface between you and the actual storage.
At the time of writing, Microsoft Fabric was in public preview, where features or user interfaces may change.
There are several distinct features associated with OneLake, including:
- Open
- OneCopy
- Shortcuts
- OneLake File Explorer for Windows
- Connect Azure Storage Explorer to OneLake
OneLake is Open
All tables (both in the lakehouse and the warehouse) are stored behind the scenes as delta tables (which use a transactional layer above Parquet files). KQL tables are an exception as they use the proprietary format of Azure Data Explorer. But you can expose the tables as delta tables if you want. Both delta and Parquet are open-source, which means the risk of vendor lock-in is reduced. Any tool can be used to get data into Fabric if it writes delta tables that follow the standard. You can use any BI tool that can read from delta tables to do your reporting. Fabric provides you with all the options, but you are not obligated to use them all.
To clarify, like any data lake, you can store files of any format in OneLake. The lakehouse and the warehouse automatically create delta tables whenever you create a table. For example, in a lakehouse, you can see the delta tables at the top of the explorer and the actual files at the bottom:
Fabric uses a more efficient compression method for the Parquet files called V-ordering, but the resulting Parquet and delta table still follow the standard so that they can be read by other tools.
OneCopy
Since all tables are delta tables, if someone creates a table in a lakehouse and another developer uses the warehouse to create a table, they contribute to the same data lake. A great benefit of this strategy is that each analytical engine can read data from the other.
For example, you can write a T-SQL query in a warehouse that joins a table to a table from a lakehouse using the 4-part naming convention. This eliminates the need to copy data between two services. The following screenshot shows an example of a T-SQL query in a Fabric warehouse reading data from a Lakehouse table:
It is important to note that a warehouse and a lakehouse can currently read each other's delta tables, but they cannot write data to each other.
Shortcuts
Shortcuts are also a part of the OneCopy paradigm and play a pivotal role. Shortcuts reference data that is stored in other file locations. This can be a file in the same workspace or another Fabric workspace, or it can even be a folder in an external location, such as a blob container in Azure Blob Storage or an S3 bucket in AWS. You can create a shortcut in a Lakehouse or a KQL database.
At the time of writing, internal and external shortcuts to Azure Data Lake Storage and Amazon S3 are supported.
When creating an internal shortcut, you can choose between the available objects in your different workspaces:
In the case of a warehouse as the source, you can choose to create a shortcut for one or more tables or even an entire schema:
In the Lakehouse explorer, you can tell a table (or file/folder) is a shortcut when it has a little link icon:
Keep in mind there's a difference between creating a shortcut to files and a table. In the previous screenshots, I created a shortcut to a warehouse table from within the files section of a lakehouse. This results in a shortcut to the delta files and not to the "logical" table:
Shortcuts are an easy way to minimize the amount of data that needs to be copied between services. It can also play an important role in the data mesh architecture. If your organization has built data products, you can create shortcuts between the domains to make the data accessible.
OneLake File Explorer for Windows
Like OneDrive, you can view your OneLake files in Windows on your machine. You can download the tool here. After installing the tool and logging in, a new section will be available in your Windows Explorer:
It's almost exactly like OneDrive: you can view all the folders and files, but to open them, you need to sync them to your local machine first. In the explorer on the left, you can view all the Fabric workspaces in your tenant, and inside a workspace, you can view the different analytical services along with their folders and tables.
Connect Azure Storage Explorer to OneLake
You can also use the free Azure Storage Explorer tool to connect to your OneLake. In the documentation, you can learn more about connecting to your account. Once connected, you can view your files, folders, and tables (as the individual Parquet files and the delta logs):
Even here, OneLake acts as an interface. When you browse through your OneLake folders, there's no way of telling whether there are multiple storage accounts involved. Also, when you create a clone of a table, it will look like the Parquet files are copied, but in reality, they are just pointers to the original files, as explained in this blog post.
Next Steps
- You can find all Fabric tips in this overview.
- Download OneLake File Explorer and Azure Storage Explorer.
- Get more information about data mesh.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-12-15