Playing with MySQL: GROUP BY
SQL function “Group by” would be primarily used to grouping the result values. Its use may be larger, and the results may be most usefull.
What is the function “GROUP BY”
Anyone who knows the SQL language, knows function “group by”, but even so … (little theory):
The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
Test table and data
We’ll prepare a test table on which we test our SQL queries. The table will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE TABLE `table` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `help_id` INT(5) UNSIGNED ZEROFILL DEFAULT NULL, `sum` INT(5) UNSIGNED ZEROFILL DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci; INSERT INTO `table`(`help_id`,`sum`) VALUES (1,5); INSERT INTO `table`(`help_id`,`sum`) VALUES (1,4); INSERT INTO `table`(`help_id`,`sum`) VALUES (1,8); INSERT INTO `table`(`help_id`,`sum`) VALUES (2,6); INSERT INTO `table`(`help_id`,`sum`) VALUES (2,10); |
If we do a simple grouping query above the table, it would look like this:
1 2 3 |
SELECT help_id, SUM(`sum`) FROM `table` GROUP BY `help_id`; |
In this case, we get the result of unique column value `help_id` a sum of his values:
1 2 3 4 5 6 7 |
+---------+------------+ | help_id | SUM(`sum`) | +---------+------------+ | 00001 | 17 | | 00002 | 16 | +---------+------------+ 2 rows in set (0.05 sec) |
First task: the number of records after grouping
We will do a simple query like the previous one, and now we need to find the number of rows in the result.
Use FOUND_ROWS
The first option is to use the FOUND_ROWS, which returns the number of rows from the last query. Query will look like this:
1 2 3 4 5 6 7 8 |
#standart select quuery with SQL_CALC_FOUND_ROWS syntax SELECT SQL_CALC_FOUND_ROWS COUNT(*) AS `count` FROM `table` GROUP BY help_id; #second query get num of lines from previous query SELECT FOUND_ROWS() AS `count_of_count`; |
The only rule is that the second query we must call immediately after the first query. The result of both queries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
+-------+ | count | +-------+ | 3 | | 2 | +-------+ 2 rows in set (0.01 sec) +----------------+ | count_of_count | +----------------+ | 2 | +----------------+ 1 row in set (0.12 sec) |
First query returns a standard result of the grouped result. In second result we can see num of lines from first query.
Solution with subSelect
If you like subselect, this solution is very simple:
1 2 3 4 5 6 7 8 |
SELECT COUNT(*) AS `count_unique` FROM ( SELECT COUNT(*) AS `count`, help_id FROM `table` GROUP BY help_id ) AS own; |
In this case, we obtain as a result of only the number of rows:
1 2 3 4 5 6 |
+--------------+ | count_unique | +--------------+ | 2 | +--------------+ 1 row in set (0.00 sec) |
At Exaplain of query we can see, that query is veri simple, but server makes two queries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: Select tables optimized away *************************** 2. row *************************** id: 2 select_type: DERIVED table: table type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using temporary; Using filesort |
Shortest solution – use DISTINCT function
Probably the easiest solution to this problem is to use DISTINCT (though not use grouping):
1 2 |
SELECT COUNT(DISTINCT help_id) AS `count_unique` FROM `table`; |
At the result we can again,see only one value
1 2 3 4 5 6 |
+--------------+ | count_unique | +--------------+ | 2 | +--------------+ 1 row in set (0.00 sec) |
If we look at EXPLAIN query, that is again very simple
1 2 3 4 5 6 7 8 9 10 11 |
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: table type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 Extra: |
The second task: the total sum after grouping
We have a result that contains the grouped values, now we need to get the total sum from values.
Using “WITH ROLLUP” modifier
The GROUP BY clause permits a WITH ROLLUP modifier that causes extra rows to be added to the summary output. These rows represent higher-level (or super-aggregate) summary operations. ROLLUP thus enables you to answer questions at multiple levels of analysis with a single query. It can be used, for example, to provide support for OLAP (Online Analytical Processing) operations.
We will show you how it works on this example:
1 2 3 4 5 6 |
SELECT `help_id`, SUM(`sum`) AS `sum` FROM `table` GROUP BY help_id WITH ROLLUP; |
After executing this query we can see that the result contains one more line results. This line contain sum from individual lines:
1 2 3 4 5 6 7 |
+---------+------+ | help_id | sum | +---------+------+ | 00001 | 17 | | 00002 | 16 | | NULL | 33 | +---------+------+ |
sql server automatically calculates the sum of the values if we use any afregate function. Finally, let’s explain of this query
1 2 3 4 5 6 7 8 9 10 11 |
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: table type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using filesort |
Use “WITH ROLLUP” with HAVING
If you need to use the HAVING clause, the result will be the same as without it:
1 2 3 4 5 6 |
SELECT help_id, SUM(`sum`) AS `sum` FROM `table` GROUP BY help_id WITH ROLLUP HAVING SUM > 16; |
Last row with total sum is again sum from all rows. HAVING is not taken into total result.
1 2 3 4 5 6 7 |
+---------+------+ | help_id | sum | +---------+------+ | 00001 | 17 | | NULL | 33 | +---------+------+ 2 rows in set (0.00 sec) |
Third task: average of the results
We have the result after grouping, and now we need to average values. For this we need count of lines in result and sum of values.
Solution with subquery
First solution is with subquery, which contain standart query with group, and main query contan from this value averages:
1 2 3 4 5 6 7 8 9 |
SELECT COUNT(help_id) AS `lines`, SUM(csum) AS `sum_of_all`, SUM(csum)/COUNT(help_id) AS `average` FROM ( SELECT help_id, SUM(`sum`) AS `csum` FROM `table` GROUP BY help_id ) AS `own`; |
The result shows the mean value (sum of all values / count of lines ):
1 2 3 4 5 6 |
+-------+------------+---------+ | lines | sum_of_all | average | +-------+------------+---------+ | 2 | 33 | 16.5000 | +-------+------------+---------+ 1 row in set (0.00 sec) |
Again, let’s look at EXPLAIN:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2 Extra: *************************** 2. row *************************** id: 2 select_type: DERIVED table: table type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using temporary; Using filesort |
Using “WITH ROLLUP”
Possibility to get results is to use the WITH ROLLUP modifier. In this case, we do not do subquery:
1 2 3 4 5 6 7 8 |
SELECT COUNT(DISTINCT help_id), COUNT(*), SUM(`sum`) AS `sum`, SUM(`sum`)/COUNT(DISTINCT help_id) AS 'average' FROM `table` GROUP BY help_id WITH ROLLUP; |
Compared to the previous solution contains this result subtotals of individual lines. At the last line is again our search results:
1 2 3 4 5 6 7 8 |
+-------------------------+----------+------+---------+ | COUNT(DISTINCT help_id) | COUNT(*) | sum | average | +-------------------------+----------+------+---------+ | 1 | 3 | 17 | 17.0000 | | 1 | 2 | 16 | 16.0000 | | 2 | 5 | 33 | 16.5000 | +-------------------------+----------+------+---------+ 3 rows in set (0.00 sec) |
Usage is quite simple and contains more data (for someone redundant data). Explain from this query:
1 2 3 4 5 6 7 8 9 10 11 |
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: table type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using filesort |
Fourth task: all combination of all values
A nice example of the use the group by is to get all combinations of two columns. SQL query will look like this:
1 2 3 4 5 |
SELECT `table`.help_id, `temp`.`sum` FROM `table` JOIN `table` `temp` GROUP BY `table`.`help_id`, `temp`.`sum` ORDER BY `table`.`help_id`, `temp`.`sum`; |
The result is a combination of all values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
+---------+-------+ | help_id | sum | +---------+-------+ | 00001 | 00004 | | 00001 | 00005 | | 00001 | 00006 | | 00001 | 00008 | | 00001 | 00010 | | 00002 | 00004 | | 00002 | 00005 | | 00002 | 00006 | | 00002 | 00008 | | 00002 | 00010 | +---------+-------+ 10 rows in set (0.00 sec) |
We are merging data by two columns. Let go to EXPLAIN:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: table type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: temp type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 |
Evaluation
At this the time of ORM is nice to occasionally play with SQL queries.
Thank you. interesting