June 5, 2015 - coding sql
SQL - Part 4 - JOIN + Simulating Functions

Xavier Geerinck
When creating JOIN queries in SQL we have to realize that we are going to combine every result from the result set with the result set itself. This means that we are going to create a new result set that is the size of the power of 2 of our original result set. (So if we have 1468 records, and we join it with itself, then we will get 2155024 records = 1468^2)
This is why it is a bad idea to use JOIN queries, however sometimes we have no other choice or our database engine does not support the better alternatives, and then we fall back to JOIN queries.
A better alternative when possible is to use CTE's (Common Table Expressions), this is supported in both MSSQL and Oracle SQL
Get the number of alternatives for a specific result
Use case: We want to get alternatives for a specific result that we found, but we can not use a ranking function and we want to solve this with a join.
Example: Our result contains the name, nation and the altitude of SKI resorts, we want to find other ski resorts in the same nation but that have an altitude that is equal to or higher then the current one.
Solution: To solve this, we just need to join and add our restrictions in that join. Which means that we have a restriction that the other resorts should be higher or equal, they should have another name but the nation should be the same
Query:
SELECT r1.name, r1.nation, COUNT(1) amountFROM resorts r1JOIN resorts r2 ONr1.name <> r2.nameAND r1.nation = r2.nationAND r1.finishaltitude <= r2.finishaltitudeGROUP BY r1.name, r1.nationORDER BY r1.nation, amount
Produce combinations of 2 results based on criteria
Use case: When you want to combine 2 result with each other based on several criteria, then we can use a JOIN query to.
Example: We have competitors and a ranking for the competitors. Our question is, how do we create a result that combines 2 competitors with each other with the following criteria:
- They have points in a different discipline (table ranking, column discipline)
- They are born in the same year (table competitors, column birthdate)
- Their age difference is not more than 1 month (table competitors, column birthdate)
- They are both born before 1974 (table competitors, column birthdate)
- They both have a ranking in season 2006 (table ranking, column season))
- We have a combination of a women and a man (table competitors, gender and table ranking, column gender
Solution: We first need 2 rankings, so we join the rankings with each other and make sure that the disciplines are different, the gender of the ranking table is different and the CID is smaller then.
Once we have that we can say, join competitor c1 on the first ranking result, just by the cid column. And join competitors c2 on the second ranking result with the criteria that it has the same birthdate as competitor 1.
In our where we then check if the season is correct, that the months between are also smaller then 1 and that they are born before 1974.
Query:
SELECT distinct c1.name, to_char(c1.birthdate, 'DD-MON-YYYY') bday1, c2.name, to_char(c2.birthdate, 'DD-MON-YYYY') bday2FROM ranking r1JOIN ranking r2 ON r1.cid < r2.cid AND r1.gender <> r2.gender AND r1.discipline <> r2.disciplineJOIN competitors c1 ON r1.cid = c1.cidJOIN competitors c2 ON c2.cid = r2.cidAND EXTRACT(year FROM c1.birthdate) = EXTRACT(year FROM c2.birthdate) -- Same year bornWHERE r1.season = 2006AND r2.season = 2006AND ABS(MONTHS_BETWEEN(c1.birthdate, c2.birthdate)) <= 1AND EXTRACT(year FROM c1.birthdate) < 1974ORDER BY c1.name, c2.name;
TOP N problem
Use case: Let's say you want to get the TOP N of a specific result set, this can be done by using the ranking function, but again this function is not available in the database engine that you are using. This is where we again will use a JOIN query to solve this problem.
Example: We need to get the top 5 heaviest men and women from a list of competitors, and this in separate columns.
Solution: To create our solution, we have to combine the competitors on their own, but only select the competitors that weigh less then our current competitor. You can see this as a ranking simulation where we give a number 1 to the heaviest person and then decrease the number. Then on the end we just add a HAVING clause that checks for 5 results.
Query:
SELECT c1.name, (CASE WHEN c1.gender = 'M' THEN TO_CHAR(c1.weight) ELSE ' ' END) MEN, (CASE WHEN c1.gender = 'L' THEN TO_CHAR(c1.weight) ELSE ' ' END) WOMENFROM competitors c1JOIN competitors c2 ON c1.gender = c2.gender AND c1.weight <= c2.weightWHERE c1.weight IS NOT NULLGROUP BY c1.name, c1.gender, c1.weightHAVING COUNT(CASE WHEN c1.weight < c2.weight THEN 1 END) < 5ORDER BY c1.weight DESC;
Simulating the MAX() function
Use Case: Our database engine does not support the MAX function, how can we simulate this when we have access to the JOIN functions?
Example: Let's select the most used languages for each country.
Solution: Since we do not have access to the MAX function we need to use our ranking simulation again that we used in the previous examples. So we assign a rank to each result, and then select where the rank is 1 and we return those results.
Query:
SELECT x.hasc, x.iso, x.gebruikFROM taalgebruik xJOIN taalgebruik y ON x.hasc = y.hasc AND x.gebruik <= y.gebruik GROUP BY x.hasc, x.iso, x.gebruik HAVING COUNT(1) = 1 ORDER BY x.hasc;
Calculating the Median
Use Case: You want to know the median of a result set.
Example: Let's say we have competitors from a nation with their weight, how do we get the median of the weight for each nation?
Solution: We know that we get the median by subtracting the values bigger then with the values lower then our value and then comparing these to the values equal to. This way we get the median. So we allso apply this on our join queries to find it.
Query:
SELECT x.nation, x.weightFROM competitors xJOIN competitors y ON x.nation = y.nationWHERE x.weight IS NOT NULL AND y.weight IS NOT NULL GROUP BY x.nation, x.weightHAVINGABS(COUNT(CASE WHEN y.weight > x.weight THEN 1 END) -COUNT(CASE WHEN y.weight < x.weight THEN 1 END)) <= COUNT(CASE WHEN y.weight = x.weight THEN 1 END)ORDER BY 1;
Getting results that are not in another table (without using set operators)
Use Case: You want to get the results that are not in another table, but you can't use the set operator. Think about getting countries that do not belong to an international organization.
Example: We want to find the countries that are not connected to an international organization.
Solution: We select every country, and then we check if the left join of the international organisation returns a NULL value on one of the columns. We do this in the having clausule and if it returns NULL, then we return 1.
Query:
SELECT r.nameFROM regios rLEFT JOIN members m on m.hasc = r.hascWHERE level = 0GROUP BY r.nameHAVING COUNT(CASE WHEN shortname IS NULL THEN 1 END) = 1;
Get intervals of missing numbers
Use Case: You want to be able to view all the missing numbers as an interval, for example: In a cinema, which seats are empty?
Example: Find the missing elevations for the country Iceland (IS).
Solution: For this problem it is better if we draw it. Imagine having a table like this:
result |
---|
1 |
2 |
3 |
6 |
7 |
10 |
11 |
12 |
14 |
We can see that we got gaps here, numbers 5, 8, 9, 13 are missing. Now how can we get these numbers into a list?
For this we need to join the original query twice, once by selecting the values bigger then, and once to select the values between. Once we have this we can then create a result set that will return this:
result | _ |
---|---|
4 | 5 |
8 | 9 |
13 |
Which is the result we want.
Query:
SELECT x.elevation + 1 elevation, CASE WHEN (y.elevation - x.elevation) <> 2 THEN y.elevation - 1 END " "FROM cities xJOIN cities y ON y.iso = x.iso AND y.elevation > x.elevation + 1LEFT JOIN cities z ON z.iso = x.iso AND z.elevation BETWEEN x.elevation + 1 AND y.elevation - 1WHERE x.iso = 'IS'AND x.elevation IS NOT NULLAND z.elevation IS NULLGROUP BY x.elevation, y.elevationORDER BY x.elevation;
Get intervals of connecting numbers
Use Case: You want to be able to view all the numbers that are connected to each other, for example: In a cinema, which seats are taken?
Example: Find the connecting numbers as an interval for the country Iceland.
Solution: We use the same thinking method as we did for the problem with the missing numbers, instead we now have all the numbers there and need to find the connecting numbers, which means the left and right boundary.
For this we will also use 2 extra joins, 1 join will select all the results that are bigger then the current result, and the second will then find the result set between the non join and the join.
Once we did that we need to have a HAVING clausule that will give us the correct result.
Query:
SELECT x.elevation elevation, CASE WHEN y.elevation <> x.elevation THEN y.elevation END " "FROM cities xJOIN cities y ON y.iso = x.iso AND y.elevation >= x.elevationJOIN cities z ON z.iso = x.iso AND z.elevation BETWEEN x.elevation - 1 AND y.elevation + 1WHERE x.iso = 'IS'AND x.elevation IS NOT NULLGROUP BY x.elevation, y.elevationHAVING COUNT(DISTINCT CASE WHEN z.elevation BETWEEN x.elevation AND y.elevation THEN z.elevation END) = y.elevation - x.elevation + 1AND COUNT(CASE WHEN z.elevation = x.elevation - 1 THEN 1 END) = 0AND COUNT(CASE WHEN z.elevation = y.elevation + 1 THEN 1 END) = 0ORDER BY x.elevation;