By: Dallas Snider | Updated: 2014-12-04 | Comments (3) | Related: > Dates
Problem
In the emerging "Internet of Things", there are multitudes of devices collecting data at differing sampling rates. Integrating this data so the data has a common granularity in time is important to not only allow for accurate analysis and mining, but it will also aid in reducing the amount of data to be stored and processed.
Solution
In this tip, we will demonstrate how to use the T-SQL AVG function and GROUP BY clause to transform data collected from two devices sampling at 100Hz and 40Hz to one row per second. We will use T-SQL to generate simulated data for this example. Let's begin by creating two tables, one for Device A's data and another for Device B's data.
create table dbo.tblDeviceA ( pkDeviceA integer identity(1,1) primary key, recordingTime datetime2, xAxis decimal(5,3), yAxis decimal(5,3), zAxis decimal(5,3) ) create table dbo.tblDeviceB ( pkDeviceB integer identity(1,1) primary key, recordingTime datetime2, temperature decimal(5,2), humidity decimal(5,2) )
Next, we will populate the Device A table with our simulated data from a device sampling at 100Hz. The DATEADD function is incrementing the recording time by 10 milliseconds.
declare @i as integer declare @dateTime as datetime2 set @i=0 set @dateTime='2014-11-15 12:00:00.000'; while @i<1000 begin insert into dbo.tblDeviceA values (dateadd(ms,@i*10,@dateTime), 2.0+rand()-0.5, rand()-0.5, rand()/10) set @i=@i+1 end go
After inserting the rows, we will select the top 10 rows from the table to verify.
Now we will populate the Device B table with our simulated data from a device sampling at 40Hz. The DATEADD function is incrementing the recording time by 25 milliseconds.
declare @i as integer declare @dateTime as datetime2 set @i=0 set @dateTime='2014-11-15 12:00:00.000'; while @i<400 begin insert into dbo.tblDeviceB values (dateadd(ms,@i*25,@dateTime), 72.0+rand()/10, 45.0+rand()/10) set @i=@i+1 end go
After inserting the rows, we will select the top 10 rows from the table to verify.
In these next steps we will roll up the data for each device to one row of data per second. An accepted way of doing this is to average the recorded data values within each second. In the T-SQL below, we are using the CONVERT function to remove the milliseconds from the date/time value. The style value of 120 returns the date in the yyyy-mm-dd hh:mi:ss(24h) format. The same CONVERT function is used in the GROUP BY and ORDER BY clauses. The AVG function is applied to each of the data columns to get an average value per second, and the CAST function is used to limit the returned values to three decimal places.
select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, cast(avg(xAxis) as decimal(5,3)) as averageXAxis, cast(avg(yAxis) as decimal(5,3)) as averageYAxis, cast(avg(zAxis) as decimal(5,3)) as averageZAxis from dbo.tblDeviceA group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) order by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))
The results from this query for Device A are shown below. We have reduced the data from 100 rows per second to 1 row per second.
For Device B's data, we apply the same CONVERT function. Notice how the query does not need the sampling rate.
select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, cast(avg(temperature) as decimal(5,2)) as averageTemperature, cast(avg(humidity) as decimal(5,2)) as averageHumidity from dbo.tblDeviceB group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) order by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))
The results from this query for Device B are shown below. We have reduced the data from 40 rows per second to 1 row per second.
The last step in this tip will be to display the data from both devices together on one row. There are myriad ways to do this, but we will show one way that involves two subqueries with an INNER JOIN on the recording time columns.
select a.recordingTime, a.averageXAxis, a.averageYAxis, a.averageZAxis, b.averageTemperature, b.averageHumidity from ( select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, cast(avg(xAxis) as decimal(5,3)) as averageXAxis, cast(avg(yAxis) as decimal(5,3)) as averageYAxis, cast(avg(zAxis) as decimal(5,3)) as averageZAxis from dbo.tblDeviceA group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) )as A inner join ( select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, cast(avg(temperature) as decimal(5,2)) as averageTemperature, cast(avg(humidity) as decimal(5,2)) as averageHumidity from dbo.tblDeviceB group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) ) as B on A.recordingTime=B.recordingTime order by A.recordingTime
The results from the above query shows the data from both devices integrated on one row.
Next Steps
Temporary tables can be used to stage the data during the integration process. Also, this tip assumes that the clocks on the devices are synchronized. The DATEADD function can be used to adjust the times if the device clocks are not synchronized. Finally, read more about using the functions and clauses in T-SQL that are mentioned within this article in the following tips from MSSQLTips.com.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2014-12-04