By: Harris Amjad | Updated: 2023-07-05 | Comments | Related: > Functions System
Problem
Sometimes, we need to find the first or last n% records from a dataset based on a particular column. For instance, a teacher wants to know which students are in the bottom 25% based on marks. Hence, in such situations, the SQL NTILE() function helps. We will leverage this tip to get first or last n% records using NTILE() in SQL Server.
Solution
SQL Server's NTILE function is a powerful tool that allows you to divide records from a dataset into a specified number of groups based on a specified ordering. This further allows you to analyze your data more efficiently and perform various operations and calculations over the split data.
The NTILE function divides a set into equal ranges, with each division consisting of approximately equal records/rows, which is extremely useful when you want to split the data into percentile-based groups such as quartiles, deciles, or any number of ntiles you require.
This allows you to identify trends/patterns within your data and perform operations on each group separately. Hence, the tool is powerful in creating histograms, analyzing rankings, or even dividing the data to allow parallel processing.
NTILE Syntax
This is the basic syntax for the NTILE function:
--MSSQLTips.com NTILE(Buckets) OVER (PARTITION BY partition_expression ORDER BY order_expression)
- Buckets specifies the number of groups to divide your data into; therefore, buckets should be an integer greater than zero.
- PARTITION BY is an optional clause that allows you to partition based on your dataset into subsets before the NTILE function is applied to the dataset. NTILE will then be applied to each separate partition.
- ORDER BY is a required clause used to specify the column(s) used to rank/order the records within each subset/partition. In essence, it is the criteria upon which the dataset is divided into groups.
- The partition_expression is the expression upon which the partitions will be made. In this expression, you must provide the column(s) for the partition criteria.
- The order_expression is the expression upon which the NTILE ordering will be made. In this expression, like a partition expression, you must provide the column(s) for the ordering criteria.
In this tip, our primary focus will be on using the NTILE function to rank records based on specific criteria. Now that we have given a brief overview of the NTILE function and how it is used, it's time to dive into how to use this SQL function.
Create Tables and Data for Testing
Before we can use the NTILE function, we need a dataset to apply it.
Let's create a new database for testing:
--MSSQLTips.com CREATE DATABASE Information; GO; USE Information;
Next, create two tables within this database, one for employee records and the other for student records.
Since we are using the NTILE function, each table must consist of at least one column where we can rank our records. For ranking, our Employees table can have a Salary column, while our Students table will have a Final_Marks column. We also need a column to partition data into multiple subsets to demonstrate how to use the optional PARTITION BY clause. To accommodate this, let's include a Department column in the Employees table and a Course column in the Students table.
Here is the code to create the Employee_Information table:
--MSSQLTips.com CREATE TABLE Employee_Information ( Employee_ID INT PRIMARY KEY IDENTITY(1,1), Department VARCHAR(255), Salary INT );
Here is the code to create the Student_Information table:
--MSSQLTips.com CREATE TABLE Student_Information ( Student_ID INT PRIMARY KEY IDENTITY(1,1), Course VARCHAR(255), Final_Marks INT );
Now that both tables have been created, let's populate them, starting with the Employee_Information table:
--MSSQLTips.com INSERT INTO Employee_Information (Department, Salary) VALUES ('IT', 60000), ('IT', 50000), ('IT', 36000), ('IT', 29000), ('IT', 72000), ('IT', 52000), ('Sales', 23000), ('Sales', 53000), ('Sales', 45000), ('Sales', 56000), ('Sales', 54000), ('Sales', 46000), ('Sales', 57000), ('Sales', 55000), ('Sales', 47000), ('HR', 62500), ('HR', 58000), ('HR', 49000), ('HR', 61000), ('HR', 81000), ('HR', 59000), ('HR', 30000), ('HR', 62000), ('HR', 64000), ('HR', 60500), ('HR', 51000), ('HR', 63000);
Populate the Student_Information table now:
--MSSQLTips.com INSERT INTO Student_Information (Course, Final_Marks) VALUES ('Math', 92), ('Math', 80), ('Math', 84), ('Math', 66), ('Math', 40), ('Math', 51), ('Math', 83), ('Math', 69), ('Math', 76), ('Math', 40), ('Math', 39), ('Math', 30), ('English', 91), ('English', 89), ('English', 76), ('English', 53), ('English', 96), ('English', 65), ('English', 74), ('English', 69), ('English', 48), ('English', 35), ('English', 77), ('English', 54);
Before we move forward, you can display your tables using the following commands.
To display the Employee_Information table:
--MSSQLTips.com SELECT * FROM Employee_Information
To display the Student_Information table:
--MSSQLTips.com SELECT * FROM Student_Information
Now, let's analyze the data.
Using NTILE Examples on Employee Data
Suppose you are to raise the salaries of the employees whose salaries are in the first quartile and lower the salaries of the employees in the fourth quartile by specific values, irrespective of their departments. To do this, you must fetch the employees in the first and fourth quartiles before you can update the data. And so, first, you have to divide the Employees dataset into four equal groups.
Recall the syntax for NTILE:
--MSSQLTips.com NTILE(Buckets) OVER (PARTITION BY partition_expression ORDER BY order_expression)
In this case, Buckets will be 4 (for 4 quarters). We will not be using the PARTITION BY clause for this problem. Since we are ranking our employees by Salary, we will use ORDER BY and the order_expression will be Salary.
To apply the function on the Employees table, use the following code:
--MSSQLTips.com SELECT *, NTILE(4) OVER (ORDER BY Salary) AS Emp_Ranking FROM Employee_Information;
The code above will display the following result:
As you can see, the employees' salaries are divided into four groups ranked in ascending order. The higher the salary, the higher the number. However, that is because the ORDER BY clause orders the column by ascending order by default. You can add a DESC parameter like ORDER BY order_expression DESC to make the order descending.
You can also filter the results to only get the quartiles you want to update, as follows:
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (ORDER BY Salary) AS Emp_Ranking FROM Employee_Information) AS Temp_Table WHERE Emp_Ranking = 4;
The code above will return the employee records whose salary is in the fourth or highest-paid quartile.
Similarly, the code below will return the employee records whose salary is in the first or lowest-paid quartile.
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (ORDER BY Salary) AS Emp_Ranking FROM Employee_Information) AS Temp_Table WHERE Emp_Ranking = 1;
The result of the first snippet:
The result of the second snippet:
Now that you know how to select records using the NTILE function you can, in the same way, update the records as well, to your liking.
In the previous example, we computed the NTILES irrespective of the employee's department. If you, as the manager or the owner of a company, would like to manipulate the salaries in such a way that the salary of the employee in the first quartile of each department should be raised, and the salary of the employee in the fourth quartile of each department should be lowered, you would require the PARTITION BY clause to carry out your selection.
You can do so as follows:
--MSSQLTips.com SELECT *, NTILE(4) OVER (PARTITION BY Department ORDER BY Salary) AS Emp_Ranking FROM Employee_Information;
The result obtained from the code snippet above is:
As you can see, each department has four separate NTILEs now, and you can select the fourth quartile and the first quartile of each department as follows:
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (PARTITION BY Department ORDER BY Salary) AS Emp_Ranking FROM Employee_Information) AS Temp_Table WHERE Emp_Ranking = 4;
The code above will return the employee records whose salary belongs to the fourth quartile, partitioned by their department, or the highest-paid quartile in each department.
Similarly, the code below returns the employee records whose salary belongs to the first quartile partitioned by their departments or the lowest-paid quartile in each department.
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (PARTITION BY Department ORDER BY Salary) AS Emp_Ranking FROM Employee_Information) AS Temp_Table WHERE Emp_Ranking = 1;
The result of the first snippet:
The result of the second snippet:
Once this is done, you can update the records of the new values similarly. Now that we have shown you some examples of how NTILE works and how you can use it on the Employee_Information table, we will move on to the Student_Information table.
Using NTILE Examples on Student Table
Suppose you have the final student grades and you want to mark them according to a relative grading scheme:
- A – Fourth Quartile
- B – Third Quartile
- C – Second Quartile
- F – First Quartile
First, you need to divide your students' grades into four groups using the NTILE function, with the buckets parameter equal to 4 as shown below:
--MSSQLTips.com SELECT *, NTILE(4) OVER (ORDER BY Final_Marks) AS Student_Grade FROM Student_Information;
The query above returns the following table:
Notice how our dataset consists of two courses; however, we did not use the PARTITION BY clause, so the quartiles were made irrespective of which course marks were entered. This was done to emphasize the importance of the PARTITION BY clause. The data may not always make sense without being partitioned concerning other columns, as shown in the image above, and to correct this problem, we will now use the PARTITION BY clause.
Conventionally, a lower number is used to represent a higher rank, so we will use DESC in the ORDER BY clause:
--MSSQLTips.com SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade FROM Student_Information;
This query returns the result we desire:
Now, the Student_Grade column makes sense, as each course is now being graded separately. Let's filter our results so that we can assign the grades later on.
The students who will be receiving an 'F' grade are:
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade FROM Student_Information) AS Temp_Table WHERE Student_Grade = 4;
The students who will be receiving a 'C' grade are:
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade FROM Student_Information) AS Temp_Table WHERE Student_Grade = 3;
The students who will be receiving a 'B' grade are:
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade FROM Student_Information) AS Temp_Table WHERE Student_Grade = 2;
The students who will be receiving an 'A' grade are:
--MSSQLTips.com SELECT * FROM (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade FROM Student_Information) AS Temp_Table WHERE Student_Grade = 1;
Thus, you can exercise a relative grading system using the NTILE function.
Conclusion
You have learned how the NTILE function groups records and partitions the data. You can now apply this function to your datasets according to your specific issue and achieve your desired results. You can then update and manipulate your data and perform whatever operations you wish on your dataset. You can also analyze the information you retrieve.
Next Steps
- Check out these related tips
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-07-05