# Xavier Geerinck

My thoughts, tutorials and learnings

June 5, 2015 - coding sql

# SQL - Part 3 - Analytical Functions (Ranking, Rows, Over, ...)

Xavier Geerinck

@XavierGeerinck

## Did you enjoy reading? Or do you want to stay up-to-date of new Articles?

Consider sponsoring me or providing feedback so I can continue creating high-quality articles!

## Analytical Functions

Analytical functions are brought into existence to solve the problems that we have with Aggregate functions. The biggest disadvantage of these Aggregate functions is that they always reduce the number of results, but what if we want to be able to perform functions on all our results based on the results? (think at ranking, row_number, top N problems, ...)

Here I will explain some of the Analytical function tricks that we can use to solve common problems.

Some of the database engine specific analytical functions:

• rank() : Returns a sequential number for the result set, this differs from row_number because this function gives duplicates the same number.
• dense_rank() : Does the same as rank(), only will it put gaps between the next result.
• row_number() : Returns a sequential number for the result set.
• ntile(n) : Distributes the result into groups, this returns the number to which the group belongs.

### The over() function

One of the things I have to explain first before we are able to show special queries is the usage of the over() function. The over() function will execute a reporting function over the complete table without the need of a GROUP BY statement. We can also specify a PARTITION BY to this over function, which is going to allow us to perform a GROUP BY on the OVER function.

More details to this can be found here: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001.htm

### MIN MAX problems (also called minimax)

Let's say that we have a table that holds the values of the language usages of a country. What if we want to get the maximum spoken languages from those countries, and then get the language that is spoken the least in that country?

This is a great problem to use a over function in. You just select the maximum usages, and then you can add for example a column that will put an X for the language that is spoken the least. To do this we use a case structure that checks if the max usage of a language is equal to the min usage over the whole result set of the max usages.

Result:

`SELECT iso, max(usage),  CASE    WHEN MAX(usage) = MIN(MAX(usage)) OVER() THEN 'x' ELSE ' '  ENDFROM languageusageWHERE hasc IN('BE', 'DE')GROUP BY iso`

Our result after running the query:

isousagemin(max(usages))
kur0.0049998134397970222
ned0.60320897726154155
tur0.026367175354153453
fra0.33231617285165865
ita0.02480559110148637
sla0.0148750668507394
ara0.015848016537060736
dui0.93813663669266079
poo0.0034948944691118493X
spa0.0049217442661679302
grn0.0045023195651903531

### Creating order based on criteria

We want to be able to assign numbers based on some criteria.

Example: Select races, and assign a number based on:

• rownumber for each resort, sorted by racedate within resort
• rownumber for resort and racedate, sorted by resort, racedate (for all)
• rownumber for racedate, sorted by racedate (for all)
`SELECT resort, racedate, rank() over(partition by resort order by racedate) "rownr / resort", rank() over(order by resort, racedate) "rownr by resort and racedate", rank() over(order by racedate) "rownr by racedate"FROM racesWHERE extract(year from racedate) = 2008GROUP BY resort, racedateORDER BY resort, racedate`

### Clipping

A rarely known technique for databases is called Clipping, clipping allows us to remove an extreme band of results from a result set. (For example, when we measured different points and we want to remove the extreme points from this resultset, then we can use clipping).

This is where ntile(n) comes in handy, we use it to divide our result set in groups, and then we select the groups that lie within boundaries.

Example: Let us select measure points of longitude and latitude, and we will remove the most extreme points.

Then the query becomes:

`SELECT lev2, latitude, longitude,  CASE     WHEN ntile(10) OVER (PARTITION BY lev2 ORDER BY latitude) BETWEEN 2 AND 9     THEN latitude   END x,  CASE     WHEN ntile(10) OVER (PARTITION BY lev2 ORDER BY longitude) BETWEEN 2 AND 9     THEN longitude   END yFROM citiesWHERE iso = 'BE'AND lev2 IS NOT NULLAND latitude IS NOT NULLAND longitude IS NOT NULL;`

### Simulating WIDTH_BUCKET

WIDTH_BUCKET is an oracle specific command that lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. (More here: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions214.htm)

Since this is only in oracle we want to simulate this so we can use it in other database engines. To simulate this we can use the function CEIL together with the OVER functions.

So then this becomes:

`SELECT ceil(row_number() OVER(ORDER BY ... ) / 5.0)`

### Cumulative Total and Gliding Totals

If we want to be able to select the cumulative total of a resultset, then we can use this query:

`SUM(points) OVER(ORDER BY points, name) AS cumul`

However if we want to calculate the gliding total, then we need to use this:

`SUM(points) OVER(ORDER BY date RANGE BETWEEN 90 PRECEDING AND CURRENT ROW) AS mov_average`

## Combined Select Queries

With combined select queries we mean the joining of 2 different select queries with the use of special operators such as union, union all, intersect, except, minus, ...

>One thing to note is that even though this is explained in this article, it is not recommended to be used. Combined Select Queries are a feature of the database engines but are mostly slower then CTE's (Common Table Expressions) that is why I recommend using those rather then the combined select queries. This is also the reason why this portion of the article is relatively slow compared to the rest.

### Union and Union ALL

The union operator will merge tables row by row but it will remove the duplicate rows. When we use union all then we will do the same as union, but this time we will include the duplicate rows.

Example:

`SELECT hasc2, lengthFROM boundariesWHERE hasc1 = 'DE'UNION ALLSELECT hasc1, lengthFROM boundariesWHERE hasc2 = 'DE'ORDER BY length`

> Special Note: If you are using a database engine that does not support the rollup and cube functions, then you can simulate those using the union operator.

### Intersect

The intersect operator will only keep the rows that the different queries have in common.

Example:

`SELECT name,  CASE    WHEN MAX(SUM(points)) OVER(PARTITION BY gender, season) = SUM(points) THEN 'x'    ELSE ' '  END "result"FROM RankingWHERE discipline IS NOT NULLGROUP BY gender, season, name  INTERSECTSELECT name, 'X'FROM competitorsWHERE nation = 'SWE'`

### Except of minus (difference)

The except of minus operators will take the first query and they will remove every row that is the result from the other queries.

Example:

`SELECT hasc, population, areaFROM regiosWHERE SUBSTR(hasc, 1, 2) = 'BE' AND level = 3  EXCEPTSELECT parent, SUM(population), SUM(area)FROM regiosWHERE SUBSTR(hasc, 1, 2) = 'BE' AND level = 4GROUP BY parent`

## Did you enjoy reading? Or do you want to stay up-to-date of new Articles?

Consider sponsoring me or providing feedback so I can continue creating high-quality articles!