If you’ve been writing SQL for a while, you’ve probably used aggregate functions like SUM
,
AVG
, or COUNT
to get summary statistics. But then someone mentions window functions and
you wonder why use those if I can already aggregate my data? If this question ever crossed
your mind this article is for you. We will go through the similarities, the differences and some
examples.
Note: I will be using the Snowflake TPCH_SF1 sample data for the examples below so you can recreate the results.
Although they look similar, aggregate functions and window functions behave very differently, and knowing when to use each will save you from writing overly complex queries.
Aggregate Functions: The Blender
Aggregate functions, such as SUM
, AVG
, COUNT
, MAX
, and MIN
, group rows together and
return a single value per group. They help answer questions like what is the average order price
per region? Or how many orders each customer makes per Q?
If the answer to your question is a single aggregated value in general or per group then an aggregate function is the way to go.
You can think of it as a blender, you give it multiple data points and it summarises it down to one result that represents all the input.
Example:
sql1SELECT
2 O_CUSTKEY,
3 AVG(O_TOTALPRICE) AS AVERAGE_ORDER
4FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
5GROUP BY O_CUSTKEY;
Results Snippet:
O_CUSTKEY | AVERAGE_ORDER |
---|---|
1910 | 129346.8600000000 |
74321 | 166403.0200000000 |
22052 | 183089.4550000000 |
86840 | 243148.4400000000 |
40787 | 164023.1266666667 |
141908 | 256498.7666666667 |
18455 | 137463.4500000000 |
81878 | 118406.8150000000 |
48365 | 196978.3700000000 |
86269 | 137881.3233333333 |
As you can see, this makes aggregate functions the best solution when you need a quick overview of your data without going into details. We can see the average order value for each customer, but we have no idea about the details that went into that number.
This brings us to where aggregate functions lack behind. Now if we go back to our blender example, if I were to give you a smoothie and ask you to compare its taste to each ingredient that went into making it that would be near impossible if you don’t know the ingredients to start with right? That’s where window functions come to use.
Window Functions: The Chopper
Same as aggregate functions, window functions perform calculations across rows, but they leave the rows intact. Let’s say you want to compare customers’ order values to the average order value. You will need to keep all the rows containing the order values and add a column containing the overall average.
This is exactly what a window function does. Instead of GROUP BY
, it used an OVER
clause to
know which groups to aggregate on while keeping the original rows.
Now going back to the kitchen, you can think of it as a chopper you use to make a salad. You do get the main result of combining all your ingredients but still you can distinguish them apart.
Example:
sql1SELECT
2 O_CUSTKEY,
3 O_TOTALPRICE,
4 AVG(O_TOTALPRICE) OVER(PARTITION BY O_CUSTKEY) AS AVERAGE_ORDER
5FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
6ORDER BY O_CUSTKEY;
Results Snippet:
O_CUSTKEY | O_TOTALPRICE | AVERAGE_ORDER |
---|---|---|
1 | 123076.84 | 97960.48500 |
1 | 74602.81 | 97960.48500 |
1 | 174645.94 | 97960.48500 |
1 | 95911.01 | 97960.48500 |
1 | 54048.26 | 97960.48500 |
1 | 65478.05 | 97960.48500 |
2 | 167016.61 | 146896.20428 |
2 | 174291.41 | 146896.20428 |
2 | 221397.35 | 146896.20428 |
2 | 33082.83 | 146896.20428 |
Now, we can see each individual order price along the average for that customer. This allows us to do deeper analyses and comparisons.
One Final Note: Filtering
Now it is important to note that filtering for each type of those functions is different.
For aggregate functions we use a HAVING
clause while repeating the function expression:
1SELECT
2 O_CUSTKEY,
3 AVG(O_TOTALPRICE) AS AVERAGE_ORDER
4FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
5GROUP BY O_CUSTKEY
6HAVING AVG(O_TOTALPRICE) > 10000;
And for window functions we can use a CTE or, when using snowflake, we can use a QUALIFY
clause:
CTE
sql 1WITH ORDER_AVG AS (
2 SELECT
3 O_CUSTKEY,
4 O_TOTALPRICE,
5 AVG(O_TOTALPRICE) OVER (PARTITION BY O_CUSTKEY) AS AVERAGE_ORDER
6 FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
7)
8SELECT *
9FROM order_avg
10WHERE AVERAGE_ORDER > 150000
11ORDER BY O_CUSTKEY;
QUALIFY
sql1SELECT
2 O_CUSTKEY,
3 O_TOTALPRICE,
4 AVG(O_TOTALPRICE) OVER (PARTITION BY O_CUSTKEY) AS AVERAGE_ORDER
5FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
6QUALIFY AVERAGE_ORDER > 150000
7ORDER BY O_CUSTKEY;
Wrapping Up: Knowing Which Tool to Use
Aggregate and Window functions are one of the most used tools in querying data, and understanding when to use each helps us write more efficient and readable queries.
So next time you’re writing SQL and trying to decide between GROUP BY
and OVER
, just ask
yourself: Do I want a smoothie or a salad?