500x improvement using Group By Cube and Having (Oracle)
— 3 min read
Understanding how the group by
and having
clauses work can help us write more efficient SQL. I want to demonstrate this with an example based on a real situation where rewriting a query resulted in a 500x performance improvement.
A scheduled job ran a complex query daily to truncate and reload a table used for reporting. The job used to run for about an hour and was identified for investigation. The query aggregated data from several tables, of which we will use simplified versions below.
We have two lookup tables for countries and categories along with an orders table:
1select * from sts_countries2;
COUNTRY_CODE COUNTRY_NAME US USA CA CANADA MX MEXICO CH CHINA AR ARGENTINA BR BRAZIL
1select * from sts_categories 2;
CATEGORY_CODE CATEGORY_DESCR C1 Category 1 C2 Category 2 C3 Category 3 C4 Category 4
1select * from sts_orders fetch first 8 rows only2;
ORDER_ID COUNTRY_CODE CATEGORY_CODE ORDER_AMT 1 AR C1 58788 2 BR C1 726424 3 CA C1 725327 4 CH C1 508595 5 MX C1 133405 6 US C1 992442 7 AR C2 348913 8 BR C2 257835
The query in the job was aggregating order data (sum of order_amt
) for these combinations:
- one row for each combination of country, year and category, excluding the data for USA
- one row for each combination of country and category, excluding the data for USA
- one row for each combination of country and year, excluding the data for USA
- one row for each combination of year and category, including the data for USA
- one row for each country for all years and all categories, excluding the data for USA
- one row for each category for all countries and all years, including the data for USA
- one row for each year for all countries and all categories, including the data for USA
- one row for grand totals (all countries, all years and all categories, including USA)
The task could have been accomplished using fewer queries but for unknown reasons, there were 8 select statements that were combined using UNION
operations as shown below.
1-- country, year, category totals excluding USA2select c.country_name, o.order_year, cat.category_descr,3 sum(o.order_amt) as tot_amt4 from sts_countries c5 inner join sts_orders o on (o.country_code = c.country_code)6 inner join sts_categories cat on (cat.category_code = o.category_code) 7 where c.country_name != 'USA'8 group by c.country_name, o.order_year, cat.category_descr9UNION10-- country, category totals excluding USA (all years)11select c.country_name, null as order_year, cat.category_descr,12 sum(o.order_amt) as tot_amt13 from sts_countries c14 inner join sts_orders o on (o.country_code = c.country_code)15 inner join sts_categories cat on (cat.category_code = o.category_code) 16 where c.country_name != 'USA'17 group by c.country_name, cat.category_descr18UNION19-- country, year totals excluding USA (all categories)20select c.country_name, o.order_year, 'All' as category_descr,21 sum(o.order_amt) as tot_amt22 from sts_countries c23 inner join sts_orders o on (o.country_code = c.country_code)24 where c.country_name != 'USA'25 group by c.country_name, o.order_year26UNION27-- year, category totals for all countries (including USA)28select 'All' as country_name, o.order_year, cat.category_descr,29 sum(o.order_amt) as tot_amt30 from sts_orders o31 inner join sts_categories cat on (cat.category_code = o.category_code) 32 group by o.order_year, cat.category_descr33UNION34-- country totals excluding USA (all years and all categories)35select c.country_name, null as order_year, 'All' as category_descr,36 sum(o.order_amt) as tot_amt37 from sts_countries c38 inner join sts_orders o on (o.country_code = c.country_code)39 where c.country_name != 'USA'40 group by c.country_name41UNION42-- category totals for all countries and all years (including USA)43select 'All' as country_name, null as order_year, cat.category_descr,44 sum(o.order_amt) as tot_amt45 from sts_orders o46 inner join sts_categories cat on (cat.category_code = o.category_code) 47 group by cat.category_descr48UNION49-- year totals for all countries and all categories (including USA)50select 'All' as country_name, o.order_year, 'All' as category_descr,51 sum(o.order_amt) as tot_amt52 from sts_orders o53 group by o.order_year54UNION55-- grand totals for all countries, all years and all categories (including USA)56select 'All' as country_name, null as order_year, 'All' as category_descr,57 sum(o.order_amt) as tot_amt58 from sts_orders o59;
COUNTRY_NAME ORDER_YEAR CATEGORY_DESCR TOT_AMT All 2018 All 22859852 All 2018 Category 1 5864425 All 2018 Category 2 4116212 All 2018 Category 3 5611821 All 2018 Category 4 7267394 All 2019 All 21410101 All 2019 Category 1 5749186 All 2019 Category 2 4929132 All 2019 Category 3 5706655 All 2019 Category 4 5025128 All 2020 All 23525939 .. .. .. .. 120 rows selected
If you look at what the individual queries are doing, we can see that USA
country data is excluded only when the group by clause includes the country_name
column. If you ignore that filter, we are generating all possible combinations of country name
, order year
and category description
. Note that we are fetching the sts_orders
data 8 times in this query.
We can write a simpler and more efficient query with a better understanding of the capabilities of group by
and having
. Just as a where
clause allows us to filter raw data, having
lets us filter the resultset from a group by
operation. Typical examples you would see for having
clause use aggregate functions for filtering grouped results. But you can also use the columns on which the group by
is operating for filtering. Our main enhancement is to use the cube
extension of group by
to generate aggregates for all combinations of country name
, order year
and category description
. Once we have aggregates for all combinations of these three columns, we can filter out the summary rows for USA with having
. The following query is equivalent to the above but reads the orders data only once.
1select coalesce(country_name,'All') as country_name,2 order_year,3 coalesce(category_descr,'All') as category_descr,4 sum(order_amt) as tot_amt5from sts_orders o6inner join sts_countries c on (c.country_code = o.country_code)7inner join sts_categories cat on (cat.category_code = o.category_code)8group by cube(country_name, category_descr, order_year)9having (grouping(country_name) = 1 -- all countries10 or11 (grouping(country_name) = 0 and country_name != 'USA') -- country level summary12 )13;
What is this query doing? It uses cube
with group by
to generate all 8 combinations of the country, year and category columns (single column at a time, two columns at a time, three columns at a time and grand totals). It then excludes the rows where the country_name
is USA
using the having
clause.
The grouping()
function returns 0 if the summary row is using the column passed to it for grouping and 1 if not. Rows for all countries' summary data will have a value of 1 for country name
(since those rows are not actually using a specific country name
value). To achieve our goal, we can exclude rows with the value of USA
for the country name
when the value is 0.
By eliminating the unncecessary UNIONs and reducing the number of full scans on the orders table from 8 to 1, this query is much simpler and is far more efficient.
Group by supports extensions such as rollup
and grouping sets
in addition to cube
. These extensions can be extremely useful and are worth spending time to understand better. A future post will go into this in more detail. Stay tuned!
The actual query this example is based on had 24 small select statements that were combined with union
and had several large tables joined in each select statement. The rewritten query was about 10 lines long compared to over 200 lines for the original code and ran in a few seconds compared to about an hour for the original query.
To run the code yourself, use the script below to create the data. It inserts 6 rows into sts_country
table, 4 rows into sts_category
table and two rows for each combination of country, category and year (using 2018, 2019 and 2020 for order years) into sts_orders
. As a result, we have 6x4x3x2, or 144 rows in sts_orders
table.
1--- Script to create tables and add sample data2
3drop table sts_orders purge;4drop table sts_countries purge;5drop table sts_categories purge;6
7create table sts_countries (8 country_code varchar2(2) not null primary key,9 country_name varchar2(30) not null10);11create table sts_categories (12 category_code varchar2(2) not null primary key,13 category_descr varchar2(30) not null14);15create table sts_orders (16 order_id number(10,0) not null,17 country_code varchar2(2) not null,18 category_code varchar2(2) not null,19 order_year number(4) not null,20 order_amt number(18,2) not null,21 constraints orders_pk primary key (order_id),22 constraint ctry_fk foreign key (country_code)23 references sts_countries (country_code),24 constraint cat_fk foreign key (category_code)25 references sts_categories (category_code)26);27
28select * from all_types29where typecode = 'COLLECTION'30and type_name like '%CHAR%ARRAY%'31;32
33insert into sts_countries34with t1 as (35 select mdsys.t_varchar2_array(36 'US,USA',37 'CA,CANADA',38 'MX,MEXICO',39 'CH,CHINA',40 'AR,ARGENTINA',41 'BR,BRAZIL'42 ) as ctry43 from dual44)45select substr(column_value,1,2) as ctry_cd, substr(column_value,4) as ctry_nm46from t1, table(ctry)47;48commit49;50
51insert into sts_categories52with t2 as (53 select mdsys.t_varchar2_array(54 'C1,Category 1',55 'C2,Category 2',56 'C3,Category 3',57 'C4,Category 4'58 ) as cat59 from dual60)61select substr(column_value,1,2) as cat_cd, substr(column_value,4) as cat_descr62from t2, table(cat)63;64commit65;66
67exec dbms_random.seed('some random seed text');68
69insert into sts_orders 70 (order_id, country_code, category_code, order_year, order_amt)71--72with yrs as (73 select 2018 + mod(lvl1,3) as yr74 from (select level as lvl1 from dual connect by level < 7)75), order_dta as (76 select country_code, category_code, yr77 from yrs78 cross join sts_countries cn79 cross join sts_categories ct80)81select rownum as order_id,82 order_dta.country_code, order_dta.category_code, order_dta.yr,83 round(dbms_random.value(10000, 1000000)) as order_amt84from order_dta85;86commit87;