postgres: tasks from postgres interviews
Task 1: Emulation of rank()
The essence of the task, for each value denote its number in the ranking by a given field, here it is better to show an example. Column rank, gives the necessary result. And the actual query that solves our problem
SELECT user,
value,
rank() OVER (ORDER BY value DESC)
FROM table
+------+-------+------+
| user | value | rank |
+------+-------+------+
| 3 | 50 | 1 |
| 1 | 10 | 2 |
| 4 | 10 | 2 |
| 5 | 6 | 4 |
| 2 | 5 | 5 |
+------+-------+------+
Let’s try to solve it without window functions, the first time I was asked to do it I was very dumb, and I started to move in the right direction, but fatigue after about an hour of interview overcame me :)
SELECT t1.user, count(1) as rank
FROM public.test t1
cross join public.test t2
where t1.value <= t2.value
group by t1.user
+------+------+
| user | rank |
+------+------+
| 3 | 1 |
| 4 | 3 |
| 1 | 3 |
| 5 | 4 |
| 2 | 5 |
+------+------+
Here is the solution, it is very simple in fact, the idea is that when jojing and filtering we will get the number of rows equal to the place in the rating for a given value. Overall the solution is not difficult, like the problem, you just need to think a little bit. But there is a subtle point, if you have repeating values then you get missing rows depending on how many repeats you have, in our case it is 2 repeats, on the other hand, it is logical to think that they share n and n-1 place.
Task 2: Counting user sessions
I don’t know why every 2nd interview likes to give this task in different variations, nevertheless the essence is the same. There are also many solutions to this task, but we will try to solve it through window functions, as my favourite way of solving it. Let’s take one of the simplest variants of the problem, for each user we have a log of his data that we take every 5 minutes, if the difference between the two intervals is greater, then it is a new session, we need to mark each session.
+------+----------------------------+
| user | login |
+------+----------------------------+
| 1 | 2020-10-12 20:05:00.000000 |
| 1 | 2020-10-12 20:10:00.000000 |
| 1 | 2020-10-12 20:15:00.000000 |
| 1 | 2020-10-12 20:20:00.000000 |
| 1 | 2020-10-12 20:25:00.000000 |
| 1 | 2020-10-12 10:05:00.000000 |
| 1 | 2020-10-12 20:40:00.000000 |
| 1 | 2020-10-12 20:30:00.000000 |
| 1 | 2020-10-12 10:20:00.000000 |
| 1 | 2020-10-12 10:10:00.000000 |
| 1 | 2020-10-12 20:05:00.000000 |
| 2 | 2020-10-12 20:10:00.000000 |
| 2 | 2020-10-12 20:15:00.000000 |
| 2 | 2020-10-12 20:20:00.000000 |
| 2 | 2020-10-12 20:25:00.000000 |
| 2 | 2020-10-12 10:05:00.000000 |
| 2 | 2020-10-12 20:40:00.000000 |
| 2 | 2020-10-12 20:30:00.000000 |
| 2 | 2020-10-12 10:20:00.000000 |
| 2 | 2020-10-12 10:10:00.000000 |
+------+----------------------------+
First of all, let’s sort the data and apply the lag function to shift the data by 1 value downwards
SELECT t1.user,
login,
lag(login, 1, NULL) OVER
(PARTITION BY
t1.user ORDER BY t1.user,
login) next_log
FROM table t1
Now let’s subtract the offset and see the difference between the 2 repeating values and then apply the cumulative sum function, this is exactly what will give us the result we need.
select *,
sum(ind) over
(PARTITION BY t3.user
ORDER BY t3.user rows
between unbounded preceding and current row) as sessions
from (
select t2.user,
login,
case
when (login-prev_log) > '5 mins' then 1
else 0
end ind
from (
SELECT t1.user,
login,
lag(login, 1, NULL)
OVER (PARTITION BY t1.user
ORDER BY t1.user, login) prev_log
FROM table t1) t2
) t3
+------+----------------------------+-----+----------+
| user | login | ind | sessions |
+------+----------------------------+-----+----------+
| 1 | 2020-10-12 10:05:00.000000 | 0 | 0 |
| 1 | 2020-10-12 10:10:00.000000 | 0 | 0 |
| 1 | 2020-10-12 10:15:00.000000 | 0 | 0 |
| 1 | 2020-10-12 10:20:00.000000 | 0 | 0 |
| 1 | 2020-10-12 20:05:00.000000 | 1 | 1 |
| 1 | 2020-10-12 20:10:00.000000 | 0 | 1 |
| 1 | 2020-10-12 20:15:00.000000 | 0 | 1 |
| 1 | 2020-10-12 20:20:00.000000 | 0 | 1 |
| 1 | 2020-10-12 20:25:00.000000 | 0 | 1 |
| 1 | 2020-10-12 20:30:00.000000 | 0 | 1 |
| 2 | 2020-10-12 10:05:00.000000 | 0 | 0 |
| 2 | 2020-10-12 10:10:00.000000 | 0 | 0 |
| 2 | 2020-10-12 10:15:00.000000 | 0 | 0 |
| 2 | 2020-10-12 10:20:00.000000 | 0 | 0 |
| 2 | 2020-10-12 20:05:00.000000 | 1 | 1 |
| 2 | 2020-10-12 20:10:00.000000 | 0 | 1 |
| 2 | 2020-10-12 20:15:00.000000 | 0 | 1 |
| 2 | 2020-10-12 20:20:00.000000 | 0 | 1 |
| 2 | 2020-10-12 20:25:00.000000 | 0 | 1 |
| 2 | 2020-10-12 20:30:00.000000 | 0 | 1 |
+------+----------------------------+-----+----------+