Wednesday, October 16, 2024

How to use ROW_NUMBER() Function in SQL with nondeterministic order? Example Tutorial

The ROW NUMBER window function has a wide range of uses, much beyond the apparent ordering requirements. Most of the time, while computing row numbers, you need to do it in a specified order, which you provide in the function's window order clause. 

However, there are times when you need to compute row numbers in an arbitrary sequence, or in other words, in a nondeterministic manner. This might apply to the full query result or to individual partitions. Assigning unique values to result in rows, deduplicating data, and returning any row per group are all examples.


Computing row numbers with non-deterministic order

It's important to note that assigning row numbers in a non-deterministic sequence differs from assigning them in random order. 

With the former, it doesn't matter in what order they're allocated, or whether or not the same row numbers are assigned to the same rows in subsequent executions of the query. 

With the latter, you may assume that subsequent executions would alter which rows are allocated to which row numbers. 

This article looks at a few alternative methods for computing nondeterministic row numbers. The goal is to develop a technique that is both dependable and efficient.


When row number ordering matters?

Now, Let's see cases when row numbers ordering matters. To visualize this, let's create some data first. Let's use the previous database we used in our articles regarding cricketers and IPL. 

Let's create a table 'cricketers' and insert some data into it.

CREATE TABLE cricketer
(
  id INT NOT NULL PRIMARY KEY,
  name VARCHAR(10) NOT NULL,
  franchise VARCHAR(11) NOT NULL,
  price INT NOT NULL
);


Now, let's insert some data into the table.


INSERT INTO cricketer(id, name, franchise, price) VALUES
  (11, 'Virat','rcb', 501),
  ( 3, 'Rohit','mi', 201),
  ( 5, 'Bumrah','mi', 401),
  ( 7, 'Jadeja','rcb', 101),
  ( 2, 'Surya','rcb', 51);


The table looks something like this : 


How to use ROW_NUMBER() Function in SQL



Now, consider the below query and try to execute it on your local system/ online ide.


NOTE: MySQL users would require version 8.0+ for this function. Also, add a few entries yourself. Add some entries so that more than 1 player in a franchise will have a similar price. Be sure to do this exercise as it is absolutely required by our next example below. 

I have added a player dhoni in the 'mi' franchise with a price of 401. you guys can replicate that too.

SELECT id, name, franchise, price,
  ROW_NUMBER() OVER(PARTITION BY franchise ORDER BY price) AS n 
FROM cricketer;


Within each group specified by the column franchise, and sorted by the column price, you want row numbers assigned. I received the following results when I performed this query on my system:

How to use ROW_NUMBER() Function in SQL with nondeterministic order




The order in which row numbers are assigned is somewhat predictable and partially non-deterministic. What I mean is that you can be sure that a row with a higher price value will have a higher row number value inside the same partition. 

The sequence of assignment of row numbers among rows with the same franchise and price values is nondeterministic, however, because the price is not unique inside the franchise partition. The rows with the id values 5 and 8 are an example of this. The franchise value is 'mi', and the pricing value is 401 for both.

When I originally ran this query on my system, the row with id 5 received row number 2, and the row with id 8 received row number 3. Never mind the chance of this happening in SQL Server; if I execute the query again, the row with id 5 might possibly be allocated to row number 3, and the row with id 8 could theoretically be assigned to row number 2.


Complete deterministic order

If you need to assign row numbers in a perfectly predictable order, ensuring reproducible results across query runs as long as the underlying data doesn't change, the window partitioning and ordering clauses must have a unique mix of components. 

In our example, this might be accomplished by using the column id as a tiebreaker in the window order clause. The OVER clause would then be implemented.


SELECT id, name, franchise, price,
  ROW_NUMBER() OVER (PARTITION BY franchise ORDER BY price, id) AS n 
FROM cricketer;



In any case, SQL Server must process the rows sorted by the combination of window partitioning and ordering components when determining row numbers based on any relevant ordering specification, such as in Query 1. 

This may be accomplished by either retrieving data from an index that has been preordered or manually sorting the data. Because there is currently no index on T1 to enable the ROW NUMBER computation in Query 1, SQL Server must sort the data instead.

You may create a covering index using a key list based on the partitioning and ordering components and an include list based on the covering elements to avoid the requirement for sorting. This index is what I like to call a POC index (for partitioning, ordering, and covering).



When the order doesn't matter?

When you need to assign row numbers in a totally non-deterministic
order, things get a little complicated. In this scenario, it's normal to wish to utilize the ROW NUMBER function without supplying a window order requirement. 

First, let's see if the SQL standard permits this. The section of the standard that defines the syntactic rules for window functions is as follows:



How to use ROW_NUMBER() Function in SQL with nondeterministic order? Example Tutorial


Item 6 lists the functions,<ntile function>, <lead or lag function>, <rank function type>, or ROW NUMBER, and item 6a specifies that the window order clause must be present for the functions,, RANK, or DENSE RANK.

Although there is no clear wording specifying whether ROW NUMBER requires a window order clause or not, the functions mentioned in item 6 and absence in 6a may infer that the clause is optional for this function.

It's easy to see why methods like RANK and DENSE RANK would require a window order clause: these functions specialize in dealing with ties, and ties only occur when an ordering specification is specified. However, it's easy to see how an optional window order clause may help the ROW NUMBER function.


SELECT id, name, franchise, price,
  ROW_NUMBER() OVER(ORDER BY (SELECT 'No Order')) AS n 
FROM cricketer;


When to use row_number function in SQL



Conclusion

That's all about ROW_NUMBER function in SQL. A typical requirement is to compute row numbers based on non-deterministic order. It would have been wonderful if T-SQL just made the ROW NUMBER function's window order clause optional, but it doesn't. If not, it would have been wonderful if it allowed for the use of a constant as the ordering element, but it is also not an option.


Thanks for reading this tutorial so far. If you like this SQL Row_number tutorial then please share with your friends and colleagues. If you have any questions or doubts feel free to ask. 

No comments:

Post a Comment