Using Calculated Tables in Analysis Services Tabular 2016 for Aggregate Tables – Part 3

By:   |   Updated: 2017-07-04   |   Comments (4)   |   Related: 1 | 2 | 3 | 4 | > Analysis Services Development


Problem

One of the big improvements for DAX in Analysis Services (SSAS) 2016 is the ability to create calculated tables from DAX queries. In this final tip of the series, we’ll take a look at using a calculated table to create an aggregate table.

Solution

With the release of Analysis Services 2016, you can create custom calculated tables using the DAX query language. The tip How to create calculated tables in a tabular SQL Server Analysis Services SSAS model explains how you can create a calculated table in SQL Server Data Tools (Visual Studio) for a SSAS Tabular project. The same is possible in a Power BI Desktop model.

In the series of tips, we’ll cover the following topics:

  • Creating small lookup tables
  • Generating a Date table
  • Calculating an Aggregate table, which is the subject of this tip

An aggregate table is used when you have for example a particular query you want to use on your model, but it’s execution might take too long. With a calculated table, you can execute the query during processing of the model. After processing, the result is instantly available in the model and the table can be queried with much higher performance.

Before we start, we need to create a model in SSAS Tabular first.

Test Set-up

We’re going to create a Tabular model on top of the Wide World Importers data warehouse, which can be downloaded from Github.

Let’s start with importing a bunch of tables from SQL Server:

import from data source

Select SQL Server as the source:

sql server source

Enter the connection details:

connection string

We’re going to use the service account for impersonation. Make sure this account has read access on the WideWorldImportersDW database.

impersonation

Select the option to select from a list:

select from list

We’re going to model the Order star schema. Select the Order fact table and click on Select Related Tables. This will result in a total of 5 related dimensions.

select related tables

When a table is selected in the user interface, you can click on Preview & Filter to inspect the data. In most tables, you can get rid of unnecessary columns, such as LineageIDs or the primary key column of the fact table (which we don’t need for analysis).

remove unnecessary columns

Removing unnecessary columns is a good practice, since those columns only lead to confusion and bloat the model.

When we try to preview the city dimension, you might get the following error:

geography error

This is caused by the Location column, which has the geography data type. Unfortunately, we cannot remove the column in the wizard since the preview screen doesn’t load. We can solve this by creating a view on top of the city dimension:

USE [WideWorldImportersDW]
GO
 
CREATE VIEW [dbo].[VW_City] AS
SELECT
 [City Key]
,[City]
,[State Province]
,[Country]
,[Continent]
,[Sales Territory]
,[Region]
,[Subregion]
,[Latest Recorded Population]
FROM [Dimension].[City];
   

Now we can load the tables into our model:

load data into model

SSAS Tabular will auto-detect the relationships for us:

model

There are some role-playing dimensions in the model: Employee (SalesPerson and Picker) and Date (Order Date and Picked Date). We’re going to leave this for now, but in a real model you probably want to separate those using views (or maybe calculated tables).

To make our model more elegant, we hide all key columns:

hide key columns

The last step is to mark the Date dimension as the Date table for our model:

mark as date table

The model is now finished and ready to be deployed and queried.

Creating the Aggregate Table

As an example, we’re going to create a list of the 10 customers who placed the most orders. Let’s pretend this is a computational intensive query and we want to pre-calculate the result of this query.

The DAX query is as follows:

= TOPN(10 -- number of customers we want to retrieve
            ,SUMMARIZE( -- basically means 'do a group by over these columns and aggregate this measure'
                    'Order' -- source table
                    ,Customer[Customer] -- column to group over
                    ,"Total Amount",SUM('Order'[Total Excluding Tax]) -- measure, in the format of <name, expression>
                )
            ,[Total Amount] -- take the top 10 according to this measure (name is given in summarize clause)
        )
   

The result:

customer

We now have a table in our model with the pre-calculated results. We can get rid of the Unknown customer by including a filter in our DAX query:

= TOPN(10 -- number of customers we want to retrieve
            ,SUMMARIZE( -- basically means 'do a group by over these columns and aggregate this measure'
                    FILTER('Order',RELATED(Customer[Customer]) <> "Unknown") -- source table
                    ,Customer[Customer] -- column to group over
                    ,"Total Amount",SUM('Order'[Total Excluding Tax]) -- measure, in the format of <name, expression>
                )
            ,[Total Amount] -- take the top 10 according to this measure (name is given in summarize clause)
        )
   

This will get rid of that particular value (the Total Amount column is sorted descending and its format is set to decimal values.):

customer

The filter function in DAX is actually an iterator, so this might actually be computationally expensive for very large tables.

The downside of this approach of calculated tables is of course that it’s not related to any other tables in the model (which might also be an advantage in some use cases). Filtering on the dimensions will not affect the results of this table. Using aggregate tables does have its merits though. Imagine a large fact table on the daily level (or even with a smaller grain). You can create a “second-level” fact table on the monthly level (for example to compare the numbers against budgets) using an aggregate table. With this method, you have a new fact table which you can connect to your dimensions and performance will be improved.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-07-04

Comments For This Article




Friday, January 10, 2020 - 2:43:54 AM - Koen Verbeeck Back To Top (83666)

Hi Dhriti,

personally I'm not a fan of creating a "month date" dimension. I typically link my monthly data to the first or the last day of the month. It's a bit awkward when you drill down to the day level with monthly data, but the big advantage is that you can combine your data with daily data. The daily data is aggregated to the month level and then you can easily compare it with your monthly data. For example: daily sales data and monthly budget data.


Thursday, January 9, 2020 - 12:33:42 PM - Dhriti Roy Back To Top (83661)

Hi Koen,

Thank you for this article. This was really helpful.

Just one query. I have a fact table consisting of monthly data and a date dimension table. I want to make relationship between these two tables. I was thinking to create one calculated table which will be considered as Month dimension so that I can create relationship between the calculated month dimension and the fact table. 

Is this a right approach? If yes, then how to create that calculated table?

Thanks,

Dhriti


Monday, November 4, 2019 - 7:14:57 AM - Koen Verbeeck Back To Top (82966)

Hi Andreas,

personally I would pre-calculate aggregate tables in the layer before SSAS, so either another SSAS instance or the data warehouse.
If you don't need the detail data, I would just create my fact tables on the chosen grain. However, once you need the detail data you need to load it into your model.

If you want to avoid having too much historical data, you could create one pre-calculated fact table on the chosen grain with data for the previous years, and one fact table on the detail data for the current year. This way you can have very detailed reporting on the current situation and historical reporting on the previous years. You create the same measure on both fact tables and to have a grand total you just sum the two together.

Regards,
Koen


Friday, November 1, 2019 - 12:43:16 PM - Andreas Schlüter Back To Top (82953)

Hi Koen,

Thanks for this very insightful article. I liked the idea of a second-level fact table.

I am considering whether we can use this to avoid having to load All historical Detail data into the tabular model , instead only loading second Level facts produced in a separate ssas instance acting as a data  source for historical comparisons on the Aggregate level we use in reporting.

This way, we can reuse the aggregation measures instead of preaggregation in our etl, while at the same time avoiding to keep hundreds of million Original transaction records (calls in our call centers) in memory for years.

What is your view on this?

Thanks,

Andreas















get free sql tips
agree to terms