By: Semjon Terehhov | Updated: 2019-02-18 | Comments (1) | Related: > Database Design
Problem
There are many teams out there that are working with sensor data using Microsoft SQL Server. Flexibility of the relational model can be sometimes overwhelming leading to unnecessary complicated solutions that are near impossible to use in final reports. The purpose of this article is to provide an easy to understand and efficient Kimball (star-schema) data architecture for an IOT (sensor data) projects.
Solution
The IOT DataMart
Following are steps to build an IOT data mart.
Create the SQL Server Database
If you are planning a single node SQL server and do not require High Availability (you can afford a downtime of couple of hours once a month for patching or other issues) use "Simple Recovery Model". This recovery model combined with the TABLOCK insert query hint will minimize the use of the database log file and primarily write directly to the data file. In such a way you can significantly optimize IO performance of data insert operations.
T-SQL code for the database:
CREATE DATABASE DataMartIOT GO ALTER DATABASE DataMartIOT SET RECOVERY SIMPLE GO
Once we have a database in place, we can focus on the Kimball data model. Complicated as it may sound, things are easier than they look. Star architecture or the Kimball model specify that measures (the values that sensors are reading) should go to fact tables. Entries that you use to filter that data with should go to dimension tables. There is a one-to-many relationship between any given dimension and a fact table. Now we need to define the facts (sensor readings) and Dimensions (Sensor names, Date, Time, etc.).
SQL Server DataMart Dimensions for IOT Data Model
Dimensions will be used as filters and drilldowns in Power BI reports. A typical sensor dimension will look as follows:
CREATE SCHEMA Dim GO CREATE TABLE Dim.Sensors ( Id smallint IDENTITY(-32768,1) NOT NULL, -- allows for total of 65536 sensors SensorName varchar(255) NOT NULL, Unit varchar(255), CONSTRAINT PK_Dim_Sensors_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100), CONSTRAINT UX_Dim_Sensors_SensorName UNIQUE (SensorName) ) GO
The reason I have selected a smallint datatype for SensorId is its size of just 2 bytes and the fact that SQL Server has no unsigned integer datatype. Therefore, the seed starts with -32768 to make use of all the dataspace reserved for this column in a page. Remember if 65K+ number of sensors is not enough for your project this needs to be scaled up to INT or BIGINT datatype in both Dim.Sensors and Fact.Readings. Don’t overdo it, this column is used in clustered indexes and will be used in almost any join when you select data, the bigger it is the slower your system will be due to excessive IO.
Let's also add one more dimension called Groups. It will enable the end user to quickly switch between pre-defined sensor configuration.
CREATE TABLE Dim.Groups ( Id tinyint IDENTITY(0,1) NOT NULL, -- allows for total of 256 groups; scale up if needed GroupName varchar(255) NOT NULL, CONSTRAINT PK_Dim_Groups_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100), CONSTRAINT UX_Dim_Groups_GroupName UNIQUE (GroupName) ) GO
In case your IOT project is distributed over multiple geographical locations add a locations dimension:
CREATE TABLE Dim.Locations ( Id tinyint IDENTITY(0,1) NOT NULL, -- allows for total of 256 locations ; scale up if needed LocationName varchar(255) NOT NULL CONSTRAINT PK_Dim_Locations_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100), CONSTRAINT UX_Dim_Locations_LocationName UNIQUE (LocationName) ) GO
To start I have included a simple date dimension below, if you require a more sophisticated version, at the end of the article I have provided a link to a very well detailed date dimension.
CREATE TABLE Dim.Dates ( Id smallint IDENTITY(-32768,1) NOT NULL, -- allows for total of 65536 records or almost 180 years DateValue Date NOT NULL, CONSTRAINT PK_Dim_Dates_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100), CONSTRAINT UX_Dim_Dates_DateValue UNIQUE (DateValue) ) GO -- Populates Date Dimension with dates from 30 days back in time to almost 180 years in the future DECLARE @loop_iterator AS int = -30 --starting 30 days back in time DECLARE @date_var AS Date = DATEADD(DAY,@loop_iterator,GETDATE()) WHILE @loop_iterator < 65506 -- maximum for small int; change if using different datatype BEGIN DECLARE @temp_date AS date = DATEADD(DAY,@loop_iterator,@date_var) INSERT INTO Dim.Dates VALUES(@temp_date); SET @loop_iterator = @loop_iterator + 1 END GO
Followed by the time dimension, that holds an entry for every second in a day:
CREATE TABLE Dim.Times ( Id time(0) NOT NULL, -- time(0) for a nearest second precision. HourValue tinyint NOT NULL, --for drill down reports MinValue tinyint NOT NULL, --for drill down reports SecValue tinyint NOT NULL, --for drill down reports --MSValue smallint NOT NULL, --uncomment this line if you changed the precision to milliseconds CONSTRAINT PK_Dim_Times_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100), CONSTRAINT UX_Dim_Times_Values UNIQUE (HourValue,MinValue,SecValue) –add MSValue column if in use ) GO --Populating Time Dimension with values DECLARE @loop_iterator AS int = 0 DECLARE @time_var AS time(0) = N'00:00:00' WHILE @loop_iterator < 86400 BEGIN DECLARE @temp_id AS time(0) = DATEADD(second,@loop_iterator,@time_var) INSERT INTO Dim.Times VALUES (@temp_id,DATEPART(HOUR,@temp_id),DATEPART(MINUTE,@temp_id),DATEPART(SECOND,@temp_id)); SET @loop_iterator = @loop_iterator + 1 END GO
SQL Server Fact Table for IOT Data Model
Once done with dimensions we can proceed to the fact table. The table that stores actual readings from sensors and references to every dimension using the foreign key constraint. It will be the core of our star schema, where arms are dimensions. The T-SQL table definition is below:
CREATE SCHEMA Fact GO CREATE TABLE Fact.Readings ( DateId smallint NOT NULL, --Must be identical datatype to Date Dim TimeId time(0) NOT NULL, -- Must be identical datatype to Time Dim LocationId tinyint NOT NULL, -- Must be identical datatype to Location Dim GroupId tinyint NOT NULL, -- Must be identical datatype to Group Dim SensorId smallint NOT NULL, -- Must be identical datatype to Sensor Dim SensorValue decimal(9,2), -- this will cover almost all types of numeric data CONSTRAINT PK_Fact_Readings PRIMARY KEY (DateId, TimeId, LocationId, GroupId, SensorId), CONSTRAINT FK_Fact_Readings_Dim_Date_Id FOREIGN KEY (DateId) REFERENCES Dim.Dates(Id), CONSTRAINT FK_Fact_Readings_Dim_Time_Id FOREIGN KEY (TimeId) REFERENCES Dim.Times(Id), CONSTRAINT FK_Fact_Readings_Dim_Sensors_Id FOREIGN KEY (SensorId) REFERENCES Dim.Sensors(Id), CONSTRAINT FK_Fact_Readings_Dim_Groups_Id FOREIGN KEY (GroupId) REFERENCES Dim.Groups(Id), CONSTRAINT FK_Fact_Readings_Dim_Locations_Id FOREIGN KEY (LocationId) REFERENCES Dim.Locations(Id) ) GO
The final database diagram should look like this:
When you insert, modify or delete data SQL Server will automatically check that your data operation does not violate any relevant constrains. As good as it sounds, this requires additional CPU and IO resources, that’s why for best performance we will disable newly created foreign key constraints in the Fact.Readings table. When disabled, these still can be used for documentation purposes in both SSMS and Power BI:
ALTER TABLE Fact.readings NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Date_Id GO ALTER TABLE Fact.readings NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Time_Id GO ALTER TABLE Fact.Readings NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Sensors_Id GO ALTER TABLE Fact.Readings NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Groups_Id GO ALTER TABLE Fact.Readings NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Locations_Id GO
Sample SQL Server Data for IOT Data Model
To test our newly created data mart, use the sample data script provided:
INSERT INTO Dim.Sensors VALUES -- CREATES 10 sensor entries for sensors dimension (N'Temperature Sensor 01',N'Deg.C.'), (N'Humidity Sensor 01',N'%RH'), (N'Temperature Sensor 02',N'Deg.C.'), (N'Humidity Sensor 02',N'%RH'), (N'Temperature Sensor 03',N'Deg.C.'), (N'Humidity Sensor 03',N'%RH'), (N'Temperature Sensor 04',N'Deg.C.'), (N'Humidity Sensor 04',N'%RH'), (N'Temperature Sensor 05',N'Deg.C.'), (N'Humidity Sensor 05',N'%RH') GO INSERT INTO Dim.Locations VALUES -- CREATES 5 location entries for locations dimension (N'Oslo, Norway'), (N'Stockholm, Sweden'), (N'Copenhagen, Denmark'), (N'Helsinki, Finland'), (N'Reykjavik, Iceland') INSERT INTO Dim.Groups VALUES -- CREATES 2 groups for groups dimension (N'Temperature Sensors'), (N'Humidity Sensors') GO -- GENERATING SAMPLE TEMPERATURE AND HUMIDITY DATA FOR THE PAST 7 DAYS PRINT N'Generating Sample sensor data, this will take some minutes' DECLARE @DateId smallint, @DateValue date, @TimeId time(0); DECLARE @RandomSeed int = 12345; DECLARE @SinAngle decimal(9,2) = 0; DECLARE date_time_cursor CURSOR FOR SELECT Dates.Id AS DateId, Dates.DateValue AS DateValue, Times.Id AS TimeId FROM Dim.Dates AS Dates CROSS JOIN Dim.Times AS Times WHERE Dates.DateValue BETWEEN DATEADD(DAY,-7,GETDATE()) AND GETDATE() ORDER BY Dates.Id, Times.Id OPEN date_time_cursor FETCH NEXT FROM date_time_cursor INTO @DateId, @DateValue, @TimeId; WHILE @@FETCH_STATUS = 0 BEGIN SET @SinAngle = @SinAngle + 0.01; DECLARE @Temperature1 decimal (9,2) = 24.5 + RAND() + 5.0 * SIN(@SinAngle); DECLARE @RHValue1 decimal(9,2) = 50 + RAND() + 20.0 * SIN(@SinAngle); DECLARE @Temperature2 decimal (9,2) = 20.5 + RAND() + 6.0 * SIN(@SinAngle + 90); DECLARE @RHValue2 decimal(9,2) = 60 + RAND() + 15.0 * SIN(@SinAngle + 90); DECLARE @Temperature3 decimal (9,2) = 22.5 + RAND() + 3.0 * SIN(@SinAngle + 45); DECLARE @RHValue3 decimal(9,2) = 70 + RAND() + 10.0 * SIN(@SinAngle + 45); DECLARE @Temperature4 decimal (9,2) = 27.5 + RAND() + 4.0 * SIN(@SinAngle + 30); DECLARE @RHValue4 decimal(9,2) = 35 + RAND() + 17.0 * SIN(@SinAngle + 30); DECLARE @Temperature5 decimal (9,2) = 10.0 + RAND() + 7.0 * SIN(@SinAngle + 60); DECLARE @RHValue5 decimal(9,2) = 40 + RAND() + 25.0 * SIN(@SinAngle + 60); INSERT INTO Fact.Readings VALUES (@DateId,@TimeId,0,0,-32768,@Temperature1), (@DateId,@TimeId,0,1,-32767,@RHValue1), (@DateId,@TimeId,1,0,-32766,@Temperature2), (@DateId,@TimeId,1,1,-32765,@RHValue2), (@DateId,@TimeId,2,0,-32764,@Temperature3), (@DateId,@TimeId,2,1,-32763,@RHValue3), (@DateId,@TimeId,3,0,-32762,@Temperature4), (@DateId,@TimeId,3,1,-32761,@RHValue4), (@DateId,@TimeId,4,0,-32760,@Temperature5), (@DateId,@TimeId,4,1,-32759,@RHValue5) FETCH NEXT FROM date_time_cursor INTO @DateId, @DateValue, @TimeId; END; CLOSE date_time_cursor DEALLOCATE date_time_cursor GO
The RAND() function generates a random number between zero and one. In the above script it is used to add noise to the data. The SIN() function is used to add periodic fluctuations with a given amplitude.
Using data with T-SQL from IOT Data Model
The Fact.Readings table has a composite clustered index. For this index to work efficiently we need to specify in our WHERE clause a value for every column that is part of index PK_Fact_Readings (DateId, TimeId, LocationId, GroupId, SensorId). Below is an example of such a T-SQL query:
SELECT DimDates.DateValue, FactReadings.Id, FactReadings.SensorValue FROM Dim.Sensors AS DimSensors JOIN Fact.Readings AS FactReadings ON DimSensors.Id = FactReadings.SensorId JOIN Dim.Dates AS DimDates ON DimDates.Id = FactReadings.DateId JOIN Dim.Times AS DimTimes ON DimTimes.Id = FactReadings.TimeId JOIN Dim.Groups AS DimGroups ON DimGroups.Id = FactReadings.GroupId JOIN Dim.Locations AS DimLocations ON DimLocations.Id = FactReadings.LocationId WHERE DimDates.DateValue = 'Your Date Value' AND DimTimes.Id = 'Your Time Value' AND DimLocations.LocationName = N'Your Location Name' AND DimGroups.GroupName = N'Your Group Name' AND DimSensors.SensorName = N'Your Sensor Name'
Next Steps
- Some projects may require a sophisticated date dimension for advanced reports.
- To increase database concurrency during loading, use a partitioning scheme with a partition swap.
- Read more on minimum logging insert operations.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2019-02-18