[DATABASE] MySQL 关于 count() 不同用法的效率问题


* 本文基于 MySQL 5.7 + InnoDB 引擎


count(1) vs count(id) vs count(colmun) vs count(*) : Who is faster?

对于这几种查询写法的效率问题,经常会被面试官问到。

先说结论:count(*) = count(1) > count(id) > count(colmun)

解析

count() 是一个聚合函数,对于返回的结果集,server 层会维护一个变量 count,然后一行行地循环判断,如果 count() 的参数不是 NULL,count 就加 1,否则不加。最后返回 count 给客户端。

而 InnoDB 在循环的时候,是循环的索引。在官方文档(MySQL 5.7 Reference Manual - 12.20.1 Aggregate Function Descriptions)中 count() 函数说明有这么一段:

Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.

在 MySQL 5.7.18 之前,InnoDB 通过扫描聚簇索引来处理 SELECT COUNT(*) 语句。从 MySQL 5.7.18 开始,除非有一个索引或优化器提示指示优化器使用一个不同的索引,否则 InnoDB 会通过遍历可用的最小二级索引来处理 SELECT COUNT(*) 语句。如果不存在二级索引,则扫描聚簇索引。

虽然文档是写的 SELECT COUNT(*),事实上,不光是 count(*)count(1)count(colmun) (colmun 为任何索引列,包括主键索引) 也都是一样的。

没有二级索引:

mysql> explain select count(id) from test_tb1;
+----+-------------+----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | test_tb1 | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |   100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(1) from test_tb1;
+----+-------------+----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | test_tb1 | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |   100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

key 列为 PRIMARY,走聚簇索引。

有二级索引:

mysql> explain select count(id) from test_tb;
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | test_tb | NULL       | index | NULL          | idx1 | 109     | NULL |   10 |   100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

mysql> explain select count(1) from test_tb;
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | test_tb | NULL       | index | NULL          | idx1 | 109     | NULL |   10 |   100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(col1) from test_tb;
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | test_tb | NULL       | index | NULL          | idx1 | 109     | NULL |   10 |   100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

key 列为 idx1,走二级索引 idx1

为什么选择的是二级索引而不是聚簇索引呢?应该是因为相同数量的二级索引记录可以比聚簇索引记录占用更少的存储空间,所以二级索引树比聚簇索引树小,这样遍历二级索引的 I/O 成本比遍历聚簇索引的 I/O 成本小。

count(colmun)

先从最慢的说起,count(colmun) 有以下几种情况:

  1. 如果”colmun”定义为 not null 的话,一行行地从记录里面读出这个字段,判断不能为 null,按行累加。
  2. 如果”colmun”定义允许为 null,那么执行的时候,判断到有可能是 null,还要把值取出来再判断一下,不是 null 才累加。
  3. 如果”colmun”不是索引列的话,还要执行全表扫描。
    mysql> explain select count(col1) from test_tb;
    +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------+
    | id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
    +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------+
    |  1 | SIMPLE      | test_tb | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |   100.00 | NULL  |
    +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------+
    1 row in set, 1 warning (0.01 sec)
    

count(id)

遍历表索引(二级索引,没有就是聚簇索引),把每一行的 id 值都取出来,返回给 server 层。server 层拿到 id 后,判断是不可能为空的,就按行累加。

count(1)

遍历表索引(二级索引,没有就是聚簇索引),但不取值。server 层对于返回的每一行,参数“1”很明显不可能为空的,按行累加。

可以看到 count(1)count(id) 少了一步取值,所以通常来说 count(1) 的执行效率会比 count(id) 稍微高一点。

count(*)

还是这篇官方文档(MySQL 5.7 Reference Manual - 12.20.1 Aggregate Function Descriptions),这段写的很清楚:

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

从执行过程建议也可以看出来:

mysql> explain select count(*) from test_tb; show warnings;
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | test_tb | NULL       | index | NULL          | idx1 | 109     | NULL |   10 |   100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

+-------+------+-----------------------------------------------------------------------+
| Level | Code | Message                                                               |
+-------+------+-----------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `count(*)` from `test_db`.`test_tb` |
+-------+------+-----------------------------------------------------------------------+
1 row in set (0.01 sec)

执行 SELECT COUNT(*) 其实就是执行 SELECT COUNT(0),执行逻辑和 SELECT COUNT(1) 相同,所以 count(*) = count(1)