Skip to content
Small Things SQL

500x improvement using Group By Cube and Having (Oracle)

3 min read

Understanding how the group by and having clauses work can help us write more efficient SQL. I want to demonstrate this with an example based on a real situation where rewriting a query resulted in a 500x performance improvement.

A scheduled job ran a complex query daily to truncate and reload a table used for reporting. The job used to run for about an hour and was identified for investigation. The query aggregated data from several tables, of which we will use simplified versions below.

We have two lookup tables for countries and categories along with an orders table:

1select * from sts_countries
2;
COUNTRY_CODECOUNTRY_NAME
USUSA
CACANADA
MXMEXICO
CHCHINA
ARARGENTINA
BRBRAZIL
1select * from sts_categories
2;
CATEGORY_CODECATEGORY_DESCR
C1Category 1
C2Category 2
C3Category 3
C4Category 4
1select * from sts_orders fetch first 8 rows only
2;
ORDER_IDCOUNTRY_CODECATEGORY_CODEORDER_AMT
1ARC158788
2BRC1726424
3CAC1725327
4CHC1508595
5MXC1133405
6USC1992442
7ARC2348913
8BRC2257835

The query in the job was aggregating order data (sum of order_amt) for these combinations:

  • one row for each combination of country, year and category, excluding the data for USA
  • one row for each combination of country and category, excluding the data for USA
  • one row for each combination of country and year, excluding the data for USA
  • one row for each combination of year and category, including the data for USA
  • one row for each country for all years and all categories, excluding the data for USA
  • one row for each category for all countries and all years, including the data for USA
  • one row for each year for all countries and all categories, including the data for USA
  • one row for grand totals (all countries, all years and all categories, including USA)

The task could have been accomplished using fewer queries but for unknown reasons, there were 8 select statements that were combined using UNION operations as shown below.

1-- country, year, category totals excluding USA
2select c.country_name, o.order_year, cat.category_descr,
3 sum(o.order_amt) as tot_amt
4 from sts_countries c
5 inner join sts_orders o on (o.country_code = c.country_code)
6 inner join sts_categories cat on (cat.category_code = o.category_code)
7 where c.country_name != 'USA'
8 group by c.country_name, o.order_year, cat.category_descr
9UNION
10-- country, category totals excluding USA (all years)
11select c.country_name, null as order_year, cat.category_descr,
12 sum(o.order_amt) as tot_amt
13 from sts_countries c
14 inner join sts_orders o on (o.country_code = c.country_code)
15 inner join sts_categories cat on (cat.category_code = o.category_code)
16 where c.country_name != 'USA'
17 group by c.country_name, cat.category_descr
18UNION
19-- country, year totals excluding USA (all categories)
20select c.country_name, o.order_year, 'All' as category_descr,
21 sum(o.order_amt) as tot_amt
22 from sts_countries c
23 inner join sts_orders o on (o.country_code = c.country_code)
24 where c.country_name != 'USA'
25 group by c.country_name, o.order_year
26UNION
27-- year, category totals for all countries (including USA)
28select 'All' as country_name, o.order_year, cat.category_descr,
29 sum(o.order_amt) as tot_amt
30 from sts_orders o
31 inner join sts_categories cat on (cat.category_code = o.category_code)
32 group by o.order_year, cat.category_descr
33UNION
34-- country totals excluding USA (all years and all categories)
35select c.country_name, null as order_year, 'All' as category_descr,
36 sum(o.order_amt) as tot_amt
37 from sts_countries c
38 inner join sts_orders o on (o.country_code = c.country_code)
39 where c.country_name != 'USA'
40 group by c.country_name
41UNION
42-- category totals for all countries and all years (including USA)
43select 'All' as country_name, null as order_year, cat.category_descr,
44 sum(o.order_amt) as tot_amt
45 from sts_orders o
46 inner join sts_categories cat on (cat.category_code = o.category_code)
47 group by cat.category_descr
48UNION
49-- year totals for all countries and all categories (including USA)
50select 'All' as country_name, o.order_year, 'All' as category_descr,
51 sum(o.order_amt) as tot_amt
52 from sts_orders o
53 group by o.order_year
54UNION
55-- grand totals for all countries, all years and all categories (including USA)
56select 'All' as country_name, null as order_year, 'All' as category_descr,
57 sum(o.order_amt) as tot_amt
58 from sts_orders o
59;
COUNTRY_NAMEORDER_YEARCATEGORY_DESCRTOT_AMT
All2018All22859852
All2018Category 15864425
All2018Category 24116212
All2018Category 35611821
All2018Category 47267394
All2019All21410101
All2019Category 15749186
All2019Category 24929132
All2019Category 35706655
All2019Category 45025128
All2020All23525939
........

120 rows selected

If you look at what the individual queries are doing, we can see that USA country data is excluded only when the group by clause includes the country_name column. If you ignore that filter, we are generating all possible combinations of country name, order year and category description. Note that we are fetching the sts_orders data 8 times in this query.

We can write a simpler and more efficient query with a better understanding of the capabilities of group by and having. Just as a where clause allows us to filter raw data, having lets us filter the resultset from a group by operation. Typical examples you would see for having clause use aggregate functions for filtering grouped results. But you can also use the columns on which the group by is operating for filtering. Our main enhancement is to use the cube extension of group by to generate aggregates for all combinations of country name, order year and category description. Once we have aggregates for all combinations of these three columns, we can filter out the summary rows for USA with having. The following query is equivalent to the above but reads the orders data only once.

1select coalesce(country_name,'All') as country_name,
2 order_year,
3 coalesce(category_descr,'All') as category_descr,
4 sum(order_amt) as tot_amt
5from sts_orders o
6inner join sts_countries c on (c.country_code = o.country_code)
7inner join sts_categories cat on (cat.category_code = o.category_code)
8group by cube(country_name, category_descr, order_year)
9having (grouping(country_name) = 1 -- all countries
10 or
11 (grouping(country_name) = 0 and country_name != 'USA') -- country level summary
12 )
13;

What is this query doing? It uses cube with group by to generate all 8 combinations of the country, year and category columns (single column at a time, two columns at a time, three columns at a time and grand totals). It then excludes the rows where the country_name is USA using the having clause.

The grouping() function returns 0 if the summary row is using the column passed to it for grouping and 1 if not. Rows for all countries' summary data will have a value of 1 for country name(since those rows are not actually using a specific country name value). To achieve our goal, we can exclude rows with the value of USA for the country name when the value is 0.

By eliminating the unncecessary UNIONs and reducing the number of full scans on the orders table from 8 to 1, this query is much simpler and is far more efficient.

Group by supports extensions such as rollup and grouping sets in addition to cube. These extensions can be extremely useful and are worth spending time to understand better. A future post will go into this in more detail. Stay tuned!


The actual query this example is based on had 24 small select statements that were combined with union and had several large tables joined in each select statement. The rewritten query was about 10 lines long compared to over 200 lines for the original code and ran in a few seconds compared to about an hour for the original query.


To run the code yourself, use the script below to create the data. It inserts 6 rows into sts_country table, 4 rows into sts_category table and two rows for each combination of country, category and year (using 2018, 2019 and 2020 for order years) into sts_orders. As a result, we have 6x4x3x2, or 144 rows in sts_orders table.

1--- Script to create tables and add sample data
2
3drop table sts_orders purge;
4drop table sts_countries purge;
5drop table sts_categories purge;
6
7create table sts_countries (
8 country_code varchar2(2) not null primary key,
9 country_name varchar2(30) not null
10);
11create table sts_categories (
12 category_code varchar2(2) not null primary key,
13 category_descr varchar2(30) not null
14);
15create table sts_orders (
16 order_id number(10,0) not null,
17 country_code varchar2(2) not null,
18 category_code varchar2(2) not null,
19 order_year number(4) not null,
20 order_amt number(18,2) not null,
21 constraints orders_pk primary key (order_id),
22 constraint ctry_fk foreign key (country_code)
23 references sts_countries (country_code),
24 constraint cat_fk foreign key (category_code)
25 references sts_categories (category_code)
26);
27
28select * from all_types
29where typecode = 'COLLECTION'
30and type_name like '%CHAR%ARRAY%'
31;
32
33insert into sts_countries
34with t1 as (
35 select mdsys.t_varchar2_array(
36 'US,USA',
37 'CA,CANADA',
38 'MX,MEXICO',
39 'CH,CHINA',
40 'AR,ARGENTINA',
41 'BR,BRAZIL'
42 ) as ctry
43 from dual
44)
45select substr(column_value,1,2) as ctry_cd, substr(column_value,4) as ctry_nm
46from t1, table(ctry)
47;
48commit
49;
50
51insert into sts_categories
52with t2 as (
53 select mdsys.t_varchar2_array(
54 'C1,Category 1',
55 'C2,Category 2',
56 'C3,Category 3',
57 'C4,Category 4'
58 ) as cat
59 from dual
60)
61select substr(column_value,1,2) as cat_cd, substr(column_value,4) as cat_descr
62from t2, table(cat)
63;
64commit
65;
66
67exec dbms_random.seed('some random seed text');
68
69insert into sts_orders
70 (order_id, country_code, category_code, order_year, order_amt)
71--
72with yrs as (
73 select 2018 + mod(lvl1,3) as yr
74 from (select level as lvl1 from dual connect by level < 7)
75), order_dta as (
76 select country_code, category_code, yr
77 from yrs
78 cross join sts_countries cn
79 cross join sts_categories ct
80)
81select rownum as order_id,
82 order_dta.country_code, order_dta.category_code, order_dta.yr,
83 round(dbms_random.value(10000, 1000000)) as order_amt
84from order_dta
85;
86commit
87;
Thanks for reading! Feel free to reach out if you have any interesting SQL or data projects that I can help with.
© 2024 Venkat Panyam. All rights reserved.