Introduction:
In this article,i am going to explain about how to create the ranking functions in sql server 2008.
Main:
Databases hold data. Users sometimes want to perform simple calculations or algorithms on that data to rank the results in a specific order—like gold, silver, and bronze medals in the Olympics or the top 10 customers by region. Starting with SQL Server 2005, functionality is provided for using ranking expressions with your result set. You can select a number of ranking algorithms, which are then applied to a column that you specify and applied in the scope of the executing query. If the data changes, your ranking algorithm will return different data the next time it is run. This comes in handy in Microsoft .NET Framework applications for paging and sorting in a grid, as well as in many other scenarios.
The ROW_NUMBER Function
The most basic ranking function is ROW_NUMBER. It returns a column as an expression that contains the row’s number in the result set. This number is used only in the context of the result set; if the result changes, the ROW_NUMBER changes. The ROW_NUMBER expression takes an ORDER BY statement with the column you want to use for the row count and the OVER operator, which links the ORDER BY to the specific ranking function you are using. The ORDER BY in the OVER clause replaces an ORDER BY at the end of the SQL statement.
The simple example in below, gives a row number to each row in the result set, ordering by SalesOrderID.
Row number ranking:
SELECT
SalesOrderID,
CustomerID,
ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RowNumber
FROM Sales.SalesOrderHeader
The results are shown here:
SalesOrderID CustomerID RowNumber
————— ————- —————
43659 676 1
43660 117 2
43661 442 3
43662 227 4
43663 510 5
43664 397 6
43665 146 7
43666 511 8
43667 646 9
:
ORDER BY Options
The ranking functions order your result set by the fields specified in the ORDER BY statement contained in the OVER clause. Alternatively, you can include an additional ORDER BY statement in your result set; this optional statement is distinct from the ORDER BY clause in the OVER expression. SQL Server allows this, but if you choose this option, the ROW_NUMBER function’s results are displayed in the order in which they are determined in the additional ORDER BY statement, not by the ORDER BY statement contained within the OVER clause. The results can, therefore, be confusing. To illustrate, if we provide an additional ORDER BY CustomerID clause to the very same query we just ran, we get these “jumbled” results:
SalesOrderID CustomerID RowNumber
————— ————- ————–
43860 1 202
44501 1 843
45283 1 1625
46042 1 2384
46976 2 3318
47997 2 4339
49054 2 5396
50216 2 6558
51728 2 8070
57044 2 13386
63198 2 19540
69488 2 25830
65310 3 21652
71889 3 28231
53616 3 9958
:
As you can see, if you expect the results to be sorted by the OVER clause’s ORDER BY statement, you’d expect results ranked by SalesOrderID, when in fact they’re ordered by CustomerID.
If you choose the ROW_NUMBER function to run against a nonunique column that contains multiple copies of the same value (also known as “ties,” such as the same amount of items sold and the same time in a race), ROW_NUMBER breaks the tie and still produces a running count so that no rows have the same number. In below, for example, CustomerID can repeat, which will generate several ties; SQL Server simply increases the running count for each row, regardless of how many ties exist.
Row number ranking with ties
SELECT
SalesOrderID,
CustomerID,
ROW_NUMBER() OVER (ORDER BY CustomerID) AS RowNumber
FROM Sales.SalesOrderHeader
The results are shown here:
SalesOrderID CustomerID RowNumber
————— ————– —————–
43860 1 1
44501 1 2
45283 1 3
46042 1 4
46976 2 5
47997 2 6
49054 2 7
50216 2 8
51728 2 9
57044 2 10
63198 2 11
69488 2 12
44124 3 13
:
Grouping and Filtering with ROW_NUMBER
When you want to include a GROUP BY function in your query, ranking functions do not work. The easy way around this limitation is to create your GROUP BY in a CTE and then perform your ranking on the results, as shown in below,
Grouping by row number
WITH CustomerSum
AS
(
SELECT CustomerID, SUM(TotalDue) AS TotalAmt
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
–this appends a row number to the end of the result set
SELECT
*,
ROW_NUMBER() OVER (ORDER BY TotalAmt DESC) AS RowNumber
FROM CustomerSum
Here are the results:
CustomerID TotalAmt RowNumber
————- ————— —————
678 1179857.4657 1
697 1179475.8399 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
155 1045197.0498 6
72 1005539.7181 7
:
To filter by a ROW_NUMBER, you have to put the ROW_NUMBER function in a CTE, as shown in below,
Filtering by row number
WITH NumberedRows AS
(
SELECT
SalesOrderID,
CustomerID,
ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RowNumber
FROM Sales.SalesOrderHeader
)
SELECT * FROM NumberedRows
WHERE RowNumber BETWEEN 100 AND 200
Here are the results:
SalesOrderID CustomerID RowNumber
————— ————- ————–
43759 13257 100
43760 16352 101
43761 16493 102
:
43857 533 199
43858 36 200
The RANK Function
The ranking function you will probably use the most is RANK, which ranks the data in the ORDER BY clause in the order you specify. RANK is syntactically exactly like ROW_NUMBER but with true ranking results. It works just like in the Olympics, when two people tie for the gold medal—the next rank is bronze. For example, with the RANK function, if four rows are tied with the value 1, the next row value for the rank column will be 5. Consider the code in below,
The RANK function
SELECT
SalesOrderID,
CustomerID,
RANK() OVER (ORDER BY CustomerID) AS Rank
FROM Sales.SalesOrderHeader
Here are the results:
SalesOrderID CustomerID Rank
————— ————- —————-
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 5
47997 2 5
49054 2 5
50216 2 5
51728 2 5
57044 2 5
63198 2 5
69488 2 5
44124 3 13
:
Just as with the other ranking functions, RANK needs the aid of a CTE to work with aggregates. Consider this query that ranks the customers from highest to lowest by total sales. We have to use a CTE to perform the aggregate first and then rank over the newly created aggregate expression, as shown in below,
Ranked aggregates
WITH CustomerSum AS
(
SELECT CustomerID, SUM(TotalDue) AS TotalAmt
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT
*,
RANK() OVER (ORDER BY TotalAmt DESC) AS Rank
FROM CustomerSum
The results are shown here. Notice that customer 678 is in first place:
CustomerID TotalAmt Rank
———– ——————— ——————–
678 1179857.4657 1
697 1179475.8399 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
:
As stated earlier, it is important to remember that the ranking functions provided by SQL Server are valid only for the scope of the running query. If the underlying data changes and then you run the same query again, you will get different results. In below, for example, let’s modify a record from customer 697. By changing one of the detail rows for an order placed by customer 697 and increasing the order quantity, we place customer 697 as our top customer.
Changing RANK results with underlying data changes
UPDATE Sales.SalesOrderDetail
SET OrderQty = 50 — the original value was 2
WHERE SalesOrderDetailID = 535
Now rerun the same query. Notice that customer 697 has now surpassed customer 678 as the top customer:
CustomerID TotalAmt Rank
———– ——————— ——————–
697 1272595.5474 1
678 1179857.4657 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
:
The DENSE_RANK and NTILE Functions
The last two ranking functions we will cover are DENSE_RANK and NTILE. DENSE_RANK works exactly like RANK except that it increments only on distinct rank changes—in other words, unlike in the Olympics, it awards a silver medal when there are two gold medals. below shows an example.
Ranking with DENSE_RANK
SELECT
SalesOrderID,
CustomerID,
DENSE_RANK() OVER (ORDER BY CustomerID) AS DenseRank
FROM Sales.SalesOrderHeader
WHERE CustomerID > 100
The results are shown here:
SalesOrderID CustomerID DenseRank
———— ———– ——————–
46950 101 1
47979 101 1
49048 101 1
50200 101 1
51700 101 1
57022 101 1
63138 101 1
69400 101 1
43855 102 2
44498 102 2
45280 102 2
46038 102 2
46951 102 2
47978 102 2
49103 102 2
50199 102 2
51733 103 3
57058 103 3
:
The following example shows the difference between RANK and DENSE_RANK. We will round the customers’ sales to the nearest hundred (because managers always like to look at whole numbers in their reports!) and look at the difference when we run into a tie. The code is shown in below,
RANK versus DENSE_RANK
WITH CustomerSum AS
(
SELECT
CustomerID,
ROUND(CONVERT(int, SUM(TotalDue)) / 100,
* 100 AS TotalAmt
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT *,
RANK() OVER (ORDER BY TotalAmt DESC) AS Rank,
DENSE_RANK() OVER (ORDER BY TotalAmt DESC) AS DenseRank
FROM CustomerSum
And here are the results:
CustomerID TotalAmt Rank DenseRank
———– ———– ——- ——————–
697 1272500 1 1
678 1179800 2 2
170 1134700 3 3
328 1084400 4 4
:
87 213300 170 170
667 210600 171 171
196 207700 172 172
451 206100 173 173
672 206100 173 173
27 205200 175 174
687 205200 175 174
163 204000 177 175
102 203900 178 176
:
Notice that customers 451 and 672 are tied, with the same total sales amount. They are ranked 173 by both the RANK and the DENSE_RANK functions. What happens next is where the difference between the two functions comes into play. Customers 27 and 687 are tied for the next position, and they are both assigned 175 by RANK and 174 by DENSE_RANK. Customer 163 is the next nontie, and it is assigned 177 by RANK and 175 by DENSE_RANK.
NTILE divides the returned rows into approximately evenly sized groups, the number of which you specify as a parameter to the function. It assigns each member of a group the same number in the result set. A perfect example of this is the percentile ranking in a college examination or a road race. below shows an example of using NTILE.
Ranking with NTILE
SELECT
SalesOrderID,
CustomerID,
NTILE(10000) OVER (ORDER BY CustomerID) AS NTile
FROM Sales.SalesOrderHeader
The results are shown here:
SalesOrderID CustomerID NTile
————— ————- —————
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 2
47997 2 2
49054 2 2
50216 2 2
51728 2 3
57044 2 3
63198 2 3
69488 2 3
44124 3 4
:
45024 29475 9998
45199 29476 9998
60449 29477 9998
60955 29478 9999
49617 29479 9999
62341 29480 9999
45427 29481 10000
49746 29482 10000
49665 29483 10000
(31465 row(s) affected)
Using All the Ranking Functions Together
So far, we have looked at the ranking functions in isolation. The ranking functions are just regular SQL Server expressions, so you can have as many of them as you want in a single SELECT statement. We’ll look at one last example in below that brings these all together into one SQL statement and shows the differences between the four ranking functions.
Contrasting SQL Server ranking functions
SELECT
SalesOrderID AS OrderID,
CustomerID,
ROW_NUMBER() OVER (ORDER BY CustomerID) AS RowNumber,
RANK() OVER (ORDER BY CustomerID) AS Rank,
DENSE_RANK() OVER (ORDER BY CustomerID) AS DenseRank,
NTILE(10000) OVER (ORDER BY CustomerID) AS NTile
FROM Sales.SalesOrderHeader
The results are shown here:
OrderID CustomerID RowNumber Rank DenseRank NTile
——– ————- ——— ——- ——— ——–
43860 1 1 1 1 1
44501 1 2 1 1 1
45283 1 3 1 1 1
46042 1 4 1 1 1
46976 2 5 5 2 2
47997 2 6 5 2 2
49054 2 7 5 2 2
50216 2 8 5 2 2
51728 2 9 5 2 3
57044 2 10 5 2 3
63198 2 11 5 2 3
69488 2 12 5 2 3
44124 3 13 13 3 4
44791 3 14 13 3 4
:
Ranking over Groups Using PARTITION BY
The ranking functions can also be combined with windowing functions. A windowing function divides a result set into equal partitions based on the values of your PARTITION BY statement in conjunction with the OVER clause in your ranking function. This is like applying a GROUP BY to your ranking function—you get a separate ranking for each partition. The example in below uses ROW_NUMBER with PARTITION BY to count the number of orders by order date by salesperson. We do this by using PARTITION BY SalesPersonID ORDER BY OrderDate. You can do this with any of the four ranking functions.
Ranking over groups with PARTITION BY
SELECT
SalesOrderID,
SalesPersonID,
OrderDate,
ROW_NUMBER() OVER (PARTITION BY SalesPersonID ORDER BY OrderDate) AS OrderRank
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
The results are shown here. You might find that the order of your rows varies slightly from the results shown.
SalesOrderID SalesPersonID OrderDate OrderRank
————— —————- ———— ————–
:
43659 279 2001-07-01 00:00:00.000 1
43660 279 2001-07-01 00:00:00.000 2
43681 279 2001-07-01 00:00:00.000 3
43684 279 2001-07-01 00:00:00.000 4
43685 279 2001-07-01 00:00:00.000 5
43694 279 2001-07-01 00:00:00.000 6
43695 279 2001-07-01 00:00:00.000 7
43696 279 2001-07-01 00:00:00.000 8
43845 279 2001-08-01 00:00:00.000 9
43861 279 2001-08-01 00:00:00.000 10
:
48079 287 2002-11-01 00:00:00.000 1
48064 287 2002-11-01 00:00:00.000 2
48057 287 2002-11-01 00:00:00.000 3
47998 287 2002-11-01 00:00:00.000 4
48001 287 2002-11-01 00:00:00.000 5
48014 287 2002-11-01 00:00:00.000 6
47982 287 2002-11-01 00:00:00.000 7
47992 287 2002-11-01 00:00:00.000 8
48390 287 2002-12-01 00:00:00.000 9
48308 287 2002-12-01 00:00:00.000 10
:
Let’s partition our ranking function by country, as shown in below 2-33. We’ll create a CTE to aggregate the sales by customer and by country. Then we’ll apply the ranking function over the TotalAmt field and the CustomerID field, partitioned by the CountryName field.
Aggregates with PARTITION BY
WITH CTETerritory AS
(
SELECT
cr.Name AS CountryName,
CustomerID,
SUM(TotalDue) AS TotalAmt
FROM
Sales.SalesOrderHeader AS soh
INNER JOIN Sales.SalesTerritory AS ter ON soh.TerritoryID = ter.TerritoryID
INNER JOIN Person.CountryRegion AS cr ON cr.CountryRegionCode = ter.
CountryRegionCode
GROUP BY
cr.Name, CustomerID
)
SELECT
*,
RANK() OVER(PARTITION BY CountryName ORDER BY TotalAmt, CustomerID DESC) AS Rank
FROM CTETerritory
The results look like this:
CountryName CustomerID TotalAmt Rank
————– ————- ———– ————–
Australia 29083 4.409 1
Australia 29061 4.409 2
Australia 29290 5.514 3
Australia 29287 5.514 4
Australia 28924 5.514 5
:
Canada 29267 5.514 1
Canada 29230 5.514 2
Canada 28248 5.514 3
Canada 27628 5.514 4
Canada 27414 5.514 5
:
France 24538 4.409 1
France 24535 4.409 2
France 23623 4.409 3
France 23611 4.409 4
France 20961 4.409 5
:
PARTITION BY supports other SQL Server aggregate functions, including MIN and MAX as well as your own scalar functions. You can apply your aggregate function in the same way that you apply the ranking functions, with a PARTITION BY statement. Let’s apply this technique to the current sample by adding a column to our result set using the AVG function, as shown in below, We will get the same results but with an additional column showing the average by country.
Using AVG with PARTITION BY
WITH CTETerritory AS
(
SELECT
cr.Name AS CountryName,
CustomerID,
SUM(TotalDue) AS TotalAmt
FROM
Sales.SalesOrderHeader AS soh
INNER JOIN Sales.SalesTerritory AS ter ON soh.TerritoryID = ter.TerritoryID
INNER JOIN Person.CountryRegion AS cr ON cr.CountryRegionCode = ter.
CountryRegionCode
GROUP BY
cr.Name, CustomerID
)
SELECT
*,
RANK() OVER (PARTITION BY CountryName ORDER BY TotalAmt, CustomerID DESC) AS Rank,
AVG(TotalAmt) OVER(PARTITION BY CountryName) AS Average
FROM CTETerritory
Here are the results:
CountryName CustomerID TotalAmt Rank Average
————– ————- ———– ——- ——————
Australia 29083 4.409 1 3364.8318
Australia 29061 4.409 2 3364.8318
Australia 29290 5.514 3 3364.8318
:
Canada 29267 5.514 1 12824.756
Canada 29230 5.514 2 12824.756
Canada 28248 5.514 3 12824.756
:
Conclusion:
Hope this helps,
Happy Coding.