Playing with MySQL: GROUP BY

SQL function “Group by” would be primarily used to grouping the result values. Its use may be larger, and the results may be most usefull.

What is the function “GROUP BY”

Anyone who knows the SQL language, knows function “group by”, but even so … (little theory):

The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

Test table and data

We’ll prepare a test table on which we test our SQL queries. The table will look like this:

CREATE TABLE `table` (
          `id` INT(11) NOT NULL AUTO_INCREMENT,
          `help_id` INT(5) UNSIGNED ZEROFILL DEFAULT NULL,
          `sum` INT(5) UNSIGNED ZEROFILL DEFAULT NULL,
          PRIMARY KEY (`id`)  
        ) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;

INSERT INTO `table`(`help_id`,`sum`) VALUES (1,5);
INSERT INTO `table`(`help_id`,`sum`) VALUES (1,4);
INSERT INTO `table`(`help_id`,`sum`) VALUES (1,8);
INSERT INTO `table`(`help_id`,`sum`) VALUES (2,6);
INSERT INTO `table`(`help_id`,`sum`) VALUES (2,10);

CREATE TABLE `table` (

`id` INT(11) NOT NULL AUTO_INCREMENT,

`help_id` INT(5) UNSIGNED ZEROFILL DEFAULT NULL,

`sum` INT(5) UNSIGNED ZEROFILL DEFAULT NULL,

PRIMARY KEY (`id`)

) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;

INSERT INTO `table`(`help_id`,`sum`) VALUES (1,5);

INSERT INTO `table`(`help_id`,`sum`) VALUES (1,4);

INSERT INTO `table`(`help_id`,`sum`) VALUES (1,8);

INSERT INTO `table`(`help_id`,`sum`) VALUES (2,6);

INSERT INTO `table`(`help_id`,`sum`) VALUES (2,10);

If we do a simple grouping query above the table, it would look like this:

SELECT help_id, SUM(`sum`)
FROM `table`
GROUP BY `help_id`;

SELECT help_id, SUM(`sum`)

FROM `table`

GROUP BY `help_id`;

In this case, we get the result of unique column value `help_id` a sum of his values:

+---------+------------+
| help_id | SUM(`sum`) |
+---------+------------+
|   00001 |         17 |
|   00002 |         16 |
+---------+------------+
2 rows in set (0.05 sec)

+---------+------------+

| help_id | SUM(`sum`) |

+---------+------------+

| 00001 | 17 |

| 00002 | 16 |

+---------+------------+

2 rows in set (0.05 sec)

First task: the number of records after grouping

We will do a simple query like the previous one, and now we need to find the number of rows in the result.

Use FOUND_ROWS

The first option is to use the FOUND_ROWS, which returns the number of rows from the last query. Query will look like this:

#standart select quuery with SQL_CALC_FOUND_ROWS syntax 
SELECT 
SQL_CALC_FOUND_ROWS  COUNT(*) AS `count`
FROM `table`
GROUP BY help_id;

#second query get num of lines from previous query
SELECT FOUND_ROWS() AS `count_of_count`;

#standart select quuery with SQL_CALC_FOUND_ROWS syntax

SELECT

SQL_CALC_FOUND_ROWS COUNT(*) AS `count`

FROM `table`

GROUP BY help_id;

#second query get num of lines from previous query

SELECT FOUND_ROWS() AS `count_of_count`;

The only rule is that the second query we must call immediately after the first query. The result of both queries:

+-------+
| count |
+-------+
|     3 |
|     2 |
+-------+
2 rows in set (0.01 sec)

+----------------+
| count_of_count |
+----------------+
|              2 |
+----------------+
1 row in set (0.12 sec)

+-------+

| count |

+-------+

| 3 |

| 2 |

+-------+

2 rows in set (0.01 sec)

+----------------+

| count_of_count |

+----------------+

| 2 |

+----------------+

1 row in set (0.12 sec)

First query returns a standard result of the grouped result. In second result we can see num of lines from first query.

Solution with subSelect

If you like subselect, this solution is very simple:

SELECT COUNT(*) AS `count_unique`
FROM (
	SELECT 
	      COUNT(*) AS `count`,
	      help_id
	FROM `table`
	GROUP BY help_id
) AS own;

SELECT COUNT(*) AS `count_unique`

FROM (

SELECT

COUNT(*) AS `count`,

help_id

FROM `table`

GROUP BY help_id

) AS own;

In this case, we obtain as a result of only the number of rows:

+--------------+
| count_unique |
+--------------+
|            2 |
+--------------+
1 row in set (0.00 sec)

+--------------+

| count_unique |

+--------------+

| 2 |

+--------------+

1 row in set (0.00 sec)

At Exaplain of query we can see, that query is veri simple, but server makes two queries:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: table
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra: Using temporary; Using filesort

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: NULL

type: NULL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: NULL

Extra: Select tables optimized away

*************************** 2. row ***************************

id: 2

select_type: DERIVED

table: table

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

Extra: Using temporary; Using filesort

Shortest solution – use DISTINCT function

Probably the easiest solution to this problem is to use DISTINCT (though not use grouping):

SELECT COUNT(DISTINCT help_id) AS `count_unique`
FROM `table`;

1 2	SELECT COUNT(DISTINCT help_id) AS `count_unique` FROM `table`;

At the result we can again,see only one value

+--------------+
| count_unique |
+--------------+
|            2 |
+--------------+
1 row in set (0.00 sec)

+--------------+

| count_unique |

+--------------+

| 2 |

+--------------+

1 row in set (0.00 sec)

If we look at EXPLAIN query, that is again very simple

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra:

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: table

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

Extra:

The second task: the total sum after grouping

We have a result that contains the grouped values, now we need to get the total sum from values.

Using “WITH ROLLUP” modifier

The GROUP BY clause permits a WITH ROLLUP modifier that causes extra rows to be added to the summary output. These rows represent higher-level (or super-aggregate) summary operations. ROLLUP thus enables you to answer questions at multiple levels of analysis with a single query. It can be used, for example, to provide support for OLAP (Online Analytical Processing) operations.

We will show you how it works on this example:

SELECT 
      `help_id`, 
      SUM(`sum`) AS `sum`
FROM `table`
GROUP BY help_id
WITH ROLLUP;

SELECT

`help_id`,

SUM(`sum`) AS `sum`

FROM `table`

GROUP BY help_id

WITH ROLLUP;

After executing this query we can see that the result contains one more line results. This line contain sum from individual lines:

+---------+------+
| help_id | sum  |
+---------+------+
|   00001 |   17 |
|   00002 |   16 |
|    NULL |   33 |
+---------+------+

+---------+------+

| help_id | sum |

+---------+------+

| 00001 | 17 |

| 00002 | 16 |

| NULL | 33 |

+---------+------+

sql server automatically calculates the sum of the values if we use any afregate function. Finally, let’s explain of this query

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra: Using filesort

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: table

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

Extra: Using filesort

Use “WITH ROLLUP” with HAVING

If you need to use the HAVING clause, the result will be the same as without it:

SELECT 
	help_id,
	SUM(`sum`) AS `sum`
FROM `table`
GROUP BY help_id
WITH ROLLUP HAVING SUM > 16;

SELECT

help_id,

SUM(`sum`) AS `sum`

FROM `table`

GROUP BY help_id

WITH ROLLUP HAVING SUM > 16;

Last row with total sum is again sum from all rows. HAVING is not taken into total result.

+---------+------+
| help_id | sum  |
+---------+------+
|   00001 |   17 |
|    NULL |   33 |
+---------+------+
2 rows in set (0.00 sec)

+---------+------+

| help_id | sum |

+---------+------+

| 00001 | 17 |

| NULL | 33 |

+---------+------+

2 rows in set (0.00 sec)

Third task: average of the results

We have the result after grouping, and now we need to average values. For this we need count of lines in result and sum of values.

Solution with subquery

First solution is with subquery, which contain standart query with group, and main query contan from this value averages:

SELECT 	COUNT(help_id) AS `lines`, 
	SUM(csum) AS `sum_of_all`, 
	SUM(csum)/COUNT(help_id) AS `average`
FROM (
	SELECT help_id, SUM(`sum`) AS `csum`
	FROM `table`
	GROUP BY help_id
)
 AS `own`;

SELECT COUNT(help_id) AS `lines`,

SUM(csum) AS `sum_of_all`,

SUM(csum)/COUNT(help_id) AS `average`

FROM (

SELECT help_id, SUM(`sum`) AS `csum`

FROM `table`

GROUP BY help_id

)

AS `own`;

The result shows the mean value (sum of all values / count of lines ):

+-------+------------+---------+
| lines | sum_of_all | average |
+-------+------------+---------+
|     2 |         33 | 16.5000 |
+-------+------------+---------+
1 row in set (0.00 sec)

+-------+------------+---------+

| lines | sum_of_all | average |

+-------+------------+---------+

| 2 | 33 | 16.5000 |

+-------+------------+---------+

1 row in set (0.00 sec)

Again, let’s look at EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
        Extra:
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: table
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra: Using temporary; Using filesort

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: <derived2>

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 2

Extra:

*************************** 2. row ***************************

id: 2

select_type: DERIVED

table: table

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

Extra: Using temporary; Using filesort

Using “WITH ROLLUP”

Possibility to get results is to use the WITH ROLLUP modifier. In this case, we do not do subquery:

SELECT 
	COUNT(DISTINCT help_id),
	COUNT(*),
	SUM(`sum`) AS `sum`, 
	SUM(`sum`)/COUNT(DISTINCT help_id) AS 'average'
FROM `table`
GROUP BY help_id
WITH ROLLUP;

SELECT

COUNT(DISTINCT help_id),

COUNT(*),

SUM(`sum`) AS `sum`,

SUM(`sum`)/COUNT(DISTINCT help_id) AS 'average'

FROM `table`

GROUP BY help_id

WITH ROLLUP;

Compared to the previous solution contains this result subtotals of individual lines. At the last line is again our search results:

+-------------------------+----------+------+---------+
| COUNT(DISTINCT help_id) | COUNT(*) | sum  | average |
+-------------------------+----------+------+---------+
|                       1 |        3 |   17 | 17.0000 |
|                       1 |        2 |   16 | 16.0000 |
|                       2 |        5 |   33 | 16.5000 |
+-------------------------+----------+------+---------+
3 rows in set (0.00 sec)

+-------------------------+----------+------+---------+

+-------------------------+----------+------+---------+

| 1 | 3 | 17 | 17.0000 |

| 1 | 2 | 16 | 16.0000 |

| 2 | 5 | 33 | 16.5000 |

+-------------------------+----------+------+---------+

3 rows in set (0.00 sec)

Usage is quite simple and contains more data (for someone redundant data). Explain from this query:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra: Using filesort

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: table

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

Extra: Using filesort

Fourth task: all combination of all values

A nice example of the use the group by is to get all combinations of two columns. SQL query will look like this:

SELECT `table`.help_id, `temp`.`sum`
FROM `table`
JOIN `table` `temp`
GROUP BY `table`.`help_id`, `temp`.`sum`
ORDER BY `table`.`help_id`, `temp`.`sum`;

SELECT `table`.help_id, `temp`.`sum`

FROM `table`

JOIN `table` `temp`

GROUP BY `table`.`help_id`, `temp`.`sum`

ORDER BY `table`.`help_id`, `temp`.`sum`;

The result is a combination of all values:

+---------+-------+
| help_id | sum   |
+---------+-------+
|   00001 | 00004 |
|   00001 | 00005 |
|   00001 | 00006 |
|   00001 | 00008 |
|   00001 | 00010 |
|   00002 | 00004 |
|   00002 | 00005 |
|   00002 | 00006 |
|   00002 | 00008 |
|   00002 | 00010 |
+---------+-------+
10 rows in set (0.00 sec)

+---------+-------+

| help_id | sum |

+---------+-------+

| 00001 | 00004 |

| 00001 | 00005 |

| 00001 | 00006 |

| 00001 | 00008 |

| 00001 | 00010 |

| 00002 | 00004 |

| 00002 | 00005 |

| 00002 | 00006 |

| 00002 | 00008 |

| 00002 | 00010 |

+---------+-------+

10 rows in set (0.00 sec)

We are merging data by two columns. Let go to EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: temp
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: table

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

Extra: Using temporary; Using filesort

*************************** 2. row ***************************

id: 1

select_type: SIMPLE

table: temp

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

Evaluation

At this the time of ORM is nice to occasionally play with SQL queries.

Comments are closed for this page

max

04/07/2013

Thank you. interesting