By: Haroon Ashraf | Updated: 2022-01-17 | Comments (2) | Related: More > Data Warehousing
Problem
As a SQL Server Business Intelligence Developer, I want to understand how and when to use a Factless Fact table in a data warehousing solution.
Solution
The solution is to go through the basics of a Factless table from a business intelligence (BI) perspective along with the proper understanding of how to use it considering a simple scenario.
About Factless Fact Table
This tutorial assumes that the reader is familiar with SQL Server database and data warehouse business intelligence concepts. As well as is comfortable with the common terms used and applied in this field (of business intelligence).
Let us try cover what a Factless Fact table is before we dive into a scenario that requires such a table in the building of a data warehouse business intelligence solution commonly known as a DWBI solution.
What is a dimension table?
A dimension table is just like a reference table in a data warehouse business intelligence solution. For example, Employee, City, Department are all dimensions that help us understand a fact properly.
What is a fact table?
A Fact table is typically a table created in a data warehouse which contains facts such as total number of employees in an organization or average sales figures for all the products and so on.
How a dimension is related to a fact table?
Dimensions are just like reference tables that are referenced in a Fact table along with the calculated or computed facts it contains. For example, you may find EmployeeKey (data warehouse generated id) from Employee dimension, DepartmentKey from Department dimension being passed to the FactSalary table.
How a dimension is related to a Factless table?
Traditionally, a dimension is related to a Factless table in the same way it is related to Fact table. However, the only difference is that a Factless table may contain multiple dimensions, but no fact (calculated column).
Can a dimension be linked with a fact and Factless table at the same time?
Yes, a dimension can be linked with both a Fact and Factless table at the same time.
When to use Factless FACT Tables
In order to understand the use of Factless FACT table we have to refer to, Kimball Group, one of the earliest pioneers in the field of Data Warehouse. According to Kimball Group, "It is possible that the event merely records a set of dimensional entities coming together at a moment in time. For example, an event of a student attending a class on a given day may not have a recorded numeric fact, but a fact row with foreign keys for calendar day, student, teacher, location, and class is well-defined."
In other words, we can easily capture numeric values known as facts (to be put into Fact table) in a business process, but when we want to capture out-of-the-box information focused on dimensions itself to extract interesting but beneficial information we turn towards Factless tables.
There may be many business-focused legitimate reasons that require you to build a Factless FACT table as an essential part of your data warehouse business intelligence solution architecture, but let us discuss the common ones to get an idea of this interesting DWBI concept.
Missing Factor
A Factless table can help your business to understand "missing factors" often overlooked or not considered. The simplest example that I can think of is related to product sales. Now, a data warehouse is smart enough to show the business the total sales per year, per month or per week. However, what about the products that were not sold but were on display and it may not be easy to find out what products were not sold per week or per month while the others were quickly going off the shelf.
Negative Analysis
The term "Missing Factor" mentioned above is more broadly accepted as Negative Analysis and so the Factless table actively provides information about the things, processes or events which did not occur.
Now, another example of using a Factless table is if we think of a company providing services to the customers and was approached by a customer who decided not to buy the service ultimately although the customer was assigned a sales person who booked a product demo with him and the customer was shown the product demo but then for some reason the customer chose not to complete the business transaction. Please remember this information is crucial as well because we are in an era where organizations race to have a competitive edge over their competitors in business while on the other side they are very keen to know what is not happening and why.
Identifying an Activity or Event
Factless tables can be also used for information extraction regarding an activity or event not predefined in the system. For example, we require a Factless tables to tell us which students did not attend the exam or simply which students were absent from a class presentation and so on.
Coverage
Now this simply means that one set of information is not enough so a paired set of the information is formed and then finding the odd ones out tell us the required information. For example, we need a Fact table that contains the information about total leaves taken by all students and then another set (Factless table) is required that stores the information about total students registered in a particular class. Then if we subtract those students who took a leave from the total number of registered students, we can understand which students did not bother to take leave.
How to use FactLess FACT Table
Let us now look at a common example of how to use a Factless Fact table.
Student-Exam Scenario
Think of a student-exam scenario where many students are registered but not all of them appear at the examination. Now, a Factless table can help us identify which students did not appear for the examination. However, there can be many other ways to use this approach.
Setup FactlessFactDW Database
Let us build a sample database called FactlessFactDW based on the following dimensions:
- Student
- Exam
- Date
- Subject
The data warehouse database centers around a Factless table called FactlessExam.
The database design is illustrated as follows:
Please setup a FactlessDW database with the above objects (tables) by running the following T-SQL code against your SQL instance:
-- (1) Setup FactlessFactDW database CREATE DATABASE FactlessFactDW; GO USE [FactlessFactDW] GO /****** Object: Table [DimDate] Script Date: 14-Dec-21 9:30:38 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [DimDate]( [DateId] [datetime] NOT NULL, [YearNumber] [int] NULL, [MonthNumber] [int] NULL, [DayNumber] [int] NULL, PRIMARY KEY CLUSTERED ( [DateId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Table [DimExam] Script Date: 14-Dec-21 9:30:39 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [DimExam]( [DimExamId] [int] NOT NULL, [Name] [nvarchar](50) NOT NULL, PRIMARY KEY CLUSTERED ( [DimExamId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Table [DimStudent] Script Date: 14-Dec-21 9:30:39 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [DimStudent]( [StudentId] [int] NOT NULL, [Name] [nvarchar](50) NOT NULL, [Age] [int] NOT NULL, PRIMARY KEY CLUSTERED ( [StudentId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Table [DimSubject] Script Date: 14-Dec-21 9:30:39 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [DimSubject]( [SubjectId] [int] NOT NULL, [Name] [nvarchar](50) NOT NULL, PRIMARY KEY CLUSTERED ( [SubjectId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Table [FactlessExam] Script Date: 14-Dec-21 9:30:39 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [FactlessExam]( [FactlessExamId] [int] NOT NULL, [ExamId] [int] NOT NULL, [StudentId] [int] NOT NULL, [DateId] [date] NOT NULL, [SubjectId] [int] NOT NULL, PRIMARY KEY CLUSTERED ( [FactlessExamId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-01T00:00:00.000' AS DateTime), 2021, 1, 1) GO INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-02T00:00:00.000' AS DateTime), 2021, 1, 2) GO INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-03T00:00:00.000' AS DateTime), 2021, 1, 3) GO INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-04T00:00:00.000' AS DateTime), 2021, 1, 4) GO INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-05T00:00:00.000' AS DateTime), 2021, 1, 5) GO INSERT [DimExam] ([DimExamId], [Name]) VALUES (1, N'Summer Exam 2021') GO INSERT [DimExam] ([DimExamId], [Name]) VALUES (2, N'Winter Exam 2021') GO INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (1, N'Asif', 35) GO INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (2, N'Mike', 32) GO INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (3, N'Sarah', 27) GO INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (4, N'Peter', 28) GO INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (5, N'Adil', 36) GO INSERT [DimSubject] ([SubjectId], [Name]) VALUES (1, N'Data Warehouse') GO INSERT [DimSubject] ([SubjectId], [Name]) VALUES (2, N'Business Intellignece') GO INSERT [DimSubject] ([SubjectId], [Name]) VALUES (3, N'Data Analysis') GO INSERT [DimSubject] ([SubjectId], [Name]) VALUES (4, N'Databases') GO INSERT [DimSubject] ([SubjectId], [Name]) VALUES (5, N'Database Testing') GO INSERT [FactlessExam] ([FactlessExamId], [ExamId], [StudentId], [DateId], [SubjectId]) VALUES (1, 1, 1, CAST(N'2021-01-01' AS Date), 1) GO INSERT [FactlessExam] ([FactlessExamId], [ExamId], [StudentId], [DateId], [SubjectId]) VALUES (2, 1, 2, CAST(N'2021-01-01' AS Date), 1) GO INSERT [FactlessExam] ([FactlessExamId], [ExamId], [StudentId], [DateId], [SubjectId]) VALUES (3, 1, 3, CAST(N'2021-01-01' AS Date), 1) GO
Please note that to make it simple, I am not using the foreign keys to link dimensions with Factless Fact table but in a standard solution you must use foreign keys to link dimensions with the Fact.
Student Attendance for the Examination
Let us now view all those students who actually sat in the examination first by running the following T-SQL script:
-- View all students who appeared in examination SELECT S.[StudentId] , S.Name, E.ExamId, E.SubjectId FROM [dbo].[FactlessExam] E INNER JOIN DBO.DimStudent S ON S.StudentId=E.StudentId
The output is as follows:
View All Students Who Did Not Appear in the Examination
Now the Factless table can also help us identify the students who were registered but did not appear at the examination by running the following T-SQL script:
-- View all students who registered but did not appear in examination SELECT S.[StudentId] ,S.Name,E.ExamId,E.SubjectId FROM [dbo].[FactlessExam] E FULL JOIN DBO.DimStudent S ON S.StudentID=E.StudentId WHERE ExamId IS NULL
The output can be seen as follows:
Congratulations, you learned about Factless FACT tables in a data warehouse using a simple scenario. See how you can apply this same approach for your BI projects.
Next Steps
- Please go through the tip and setup a data warehouse (database) for clients and services and then use a Factless FACT table to find out the clients who did not purchase any services.
- Please read this tip and see if you can implement a Factless FACT table for the Product-Order scenario mentioned in the tip.
- Check out the following related resources:
- Create a Star Schema Data Model in SQL Server using the Microsoft Toolset
- SQL Server Business Intelligence Dimensional Model
- Creating a date dimension or calendar table in SQL Server
- SSAS Terms and Concepts including Snowflake Schema
- Comparing Data Warehouse Design Methodologies for Microsoft SQL Server
- ETL Tools - SSIS vs. ADF
- All Power BI Tips
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2022-01-17