The ROW NUMBER window function has a wide range of uses, much beyond
the apparent ordering requirements. Most of the time, while computing
row numbers, you need to do it in a specified order, which you provide
in the function's window order clause.
However, there are times
when you need to compute row numbers in an arbitrary sequence, or in
other words, in a nondeterministic manner. This might apply to the full
query result or to individual partitions. Assigning unique values to
result in rows, deduplicating data, and returning any row per group are
all examples.
Computing row numbers with non-deterministic order
It's
important to note that assigning row numbers in a non-deterministic
sequence differs from assigning them in random order.
With the former,
it doesn't matter in what order they're allocated, or whether or not the
same row numbers are assigned to the same rows in subsequent executions
of the query.
With the latter, you may assume that subsequent
executions would alter which rows are allocated to which row numbers.
This article looks at a few alternative methods for computing
nondeterministic row numbers. The goal is to develop a technique that is
both dependable and efficient.
When row number ordering matters?
Now,
Let's see cases when row numbers ordering matters. To visualize this,
let's create some data first. Let's use the previous database we used in
our articles regarding cricketers and IPL.
Let's create a table
'cricketers' and insert some data into it.
CREATE TABLE cricketer
(
id INT NOT NULL PRIMARY KEY,
name VARCHAR(10) NOT NULL,
franchise VARCHAR(11) NOT NULL,
price INT NOT NULL
);
Now, let's insert some data into the table.
INSERT INTO cricketer(id, name, franchise, price) VALUES
(11, 'Virat','rcb', 501),
( 3, 'Rohit','mi', 201),
( 5, 'Bumrah','mi', 401),
( 7, 'Jadeja','rcb', 101),
( 2, 'Surya','rcb', 51);
The table looks something like this :
Now, consider the below query and try to execute it on your local system/ online ide.
NOTE:
MySQL users would require version 8.0+ for this function. Also, add a
few entries yourself. Add some entries so that more than 1 player in a
franchise will have a similar price. Be sure to do this exercise as it
is absolutely required by our next example below.
I have added a player
dhoni in the 'mi' franchise with a price of 401. you guys can replicate
that too.
SELECT id, name, franchise, price,
ROW_NUMBER() OVER(PARTITION BY franchise ORDER BY price) AS n
FROM cricketer;
Within
each group specified by the column franchise, and sorted by the column
price, you want row numbers assigned. I received the following results
when I performed this query on my system:
The
order in which row numbers are assigned is somewhat predictable and
partially non-deterministic. What I mean is that you can be sure that a
row with a higher price value will have a higher row number value inside
the same partition.
The sequence of assignment of row numbers among
rows with the same franchise and price values is nondeterministic,
however, because the price is not unique inside the franchise partition.
The rows with the id values 5 and 8 are an example of this. The
franchise value is 'mi', and the pricing value is 401 for both.
When
I originally ran this query on my system, the row with id 5 received
row number 2, and the row with id 8 received row number 3. Never mind
the chance of this happening in SQL Server; if I execute the query
again, the row with id 5 might possibly be allocated to row number 3,
and the row with id 8 could theoretically be assigned to row number 2.
Complete deterministic order
If
you need to assign row numbers in a perfectly predictable order,
ensuring reproducible results across query runs as long as the
underlying data doesn't change, the window partitioning and ordering
clauses must have a unique mix of components.
In our example, this might
be accomplished by using the column id as a tiebreaker in the window
order clause. The OVER clause would then be implemented.
SELECT id, name, franchise, price,
ROW_NUMBER() OVER (PARTITION BY franchise ORDER BY price, id) AS n
FROM cricketer;
In
any case, SQL Server must process the rows sorted by the combination of
window partitioning and ordering components when determining row
numbers based on any relevant ordering specification, such as in Query
1.
This may be accomplished by either retrieving data from an index that
has been preordered or manually sorting the data. Because there is
currently no index on T1 to enable the ROW NUMBER computation in Query
1, SQL Server must sort the data instead.
You
may create a covering index using a key list based on the partitioning
and ordering components and an include list based on the covering
elements to avoid the requirement for sorting. This index is what I like
to call a POC index (for partitioning, ordering, and covering).
When the order doesn't matter?
When you need to assign row numbers in a totally non-deterministic
order, things get a little complicated. In this scenario, it's normal
to wish to utilize the ROW NUMBER function without supplying a window
order requirement.
First, let's see if the SQL standard permits this.
The section of the standard that defines the syntactic rules for window
functions is as follows:
Item 6 lists the functions,<ntile function>, <lead or lag function>, <rank function type>,
or ROW NUMBER, and item 6a specifies that the window order clause must
be present for the functions,, RANK, or DENSE RANK.
Although there is no clear wording specifying whether ROW NUMBER requires a window order clause or not, the functions mentioned in item 6 and absence in 6a may infer that the clause is optional for this function.
It's easy to see why methods like RANK and DENSE RANK would require a window order clause: these functions specialize in dealing with ties, and ties only occur when an ordering specification is specified. However, it's easy to see how an optional window order clause may help the ROW NUMBER function.
SELECT id, name, franchise, price,
ROW_NUMBER() OVER(ORDER BY (SELECT 'No Order')) AS n
FROM cricketer;
Conclusion
That's all about ROW_NUMBER function in SQL. A
typical requirement is to compute row numbers based on
non-deterministic order. It would have been wonderful if T-SQL just made
the ROW NUMBER function's window order clause optional, but it doesn't.
If not, it would have been wonderful if it allowed for the use of a
constant as the ordering element, but it is also not an option.
Other SQL and Database Articles you may like:
Thanks for reading this tutorial so far. If you like this SQL Row_number tutorial then please share with your friends and colleagues. If you have any questions or doubts feel free to ask.
No comments:
Post a Comment