文章目录

1 SQL处理流程图
2 索引介绍
3 种类及查找算法演变
4 BTREE查找算法演变
5 MySQL的BTREE索引
— 5.1 介绍
—5.2 聚簇（区）索引（集群索引，聚集索引）
—5.3 辅助索引
6 使用索引考虑的事项
—6.1 回表是什么？回表会带来什么问题？怎么减少回表？
—6.2 索引树高度的影响因素？如何解决？
—6.3 如何计算一个索引的高度
7 索引的操作应用
—7.1 压测
—7.2 查询表的索引
—7.3 创建索引
—7.4 联合索引创建
—7.5 前缀索引创建
—7.6 删除索引
8 执行计划获取和分析
—8.1 命令介绍
—8.2 使用方法
—8.3 执行计划信息介绍
—8.4 type信息详解
—8.5 key_len
—8.6 联合索引应用细节
— 8.7 extra
—8.8 应用场景
—8.9 扩展
9 索引应用规范
—9.1 建立索引的原则（DBA运维规范）
—9.2 不走索引的情况（开发规范）
—9.3 优化器针对索引的算法
——9.3.1 MySQL索引的自优化-AHI（自适应HASH索引）
——9.3.2 MySQL-Change buffer
10 可选的优化器算法-索引
—10.1 优化器算法查询
—10.2 如何修改
—10.3 index_condition_pushdown （ICP）
—10.4 MRR : Multi Range Read
—10.5 关于连接算法
—10.6 SNLJ 普通嵌套循环连接
—10.7 BNLJ
—10.8 BKA
—10.9 MySQL8.0 版本索引的新特性
11 hash join
—11.1 工作过程
—11.2 probe 探测阶段
——11.2.1 构建阶段
——11.2.2 探测阶段
——11.2.3 溢出到磁盘

1 SQL处理流程图

2 索引介绍

索引相当于一本书的目录，可以优化查询。会影响到加锁的申请及应用过程。

3 种类及查找算法演变

BTREE : 99.9% ----> InnoDB
RTREE : no ----> MongoDB
HASH : MEM引擎 ----> Redis
FULLTEXT ：TEXT类型 ---> ES

1000W user 查找 ： 张三
遍历
二分法 ---> 二叉树 ---> 红黑树 ---> Balance Ttree(平衡多叉树，简称为BTREE)

4 BTREE查找算法演变

B-TREE ： 普通 BTREE
B+TREE ： 叶子节点双向指针
B++TREE（B*TREE）： 枝节点的双向指针

5 MySQL的BTREE索引

— 5.1 介绍

MySQL Btree 两类结构
1.Clustered ： 聚簇索引
   簇就是extent区的意思
   extent 区 ： 连续的64个数据页（page），默认1M。区是聚簇索引分配叶子节点空间的最小单元。
   IOT组织表 ： 会按照聚簇索引组织方式，存储表中的数据行。聚簇索引组织表。表的数据行都是（逻辑）有序的存储到聚簇索引中     的。按照聚簇索引组织存储（叶子节点）。 

2. Secondary ： 辅助索引
普通单列索引
普通联合索引
唯一索引
前缀索引

—5.2 聚簇（区）索引（集群索引，聚集索引）

前提：
When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered
index. Define a primary key for each table that you create. If there is no
logical unique and non-null column or set of columns, add a new auto-increment
column, whose values are filled in automatically.
If you do not define a PRIMARY KEY for your table, MySQL locates the first
UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the
clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally
generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column
containing row ID values. The rows are ordered by the ID that InnoDB assigns to
the rows in such a table. The row ID is a 6-byte field that increases
monotonically as new rows are inserted. Thus, the rows ordered by the row ID are
physically in insertion order.

1、 如果表中设置了主键（例如ID列），自动根据ID列生成聚簇索引
2、 如果没有设置主键，自动选择第一个UN唯一键的列作为聚簇索引
3、 自动生成隐藏（6字节row_id）的聚簇索引。
InnoDB表中一定是有聚簇索引。
建议： 在建表时，显示的创建主键，最好是数字自增列。

功能：
聚簇索引组织表。
将逻辑上连续的数据，在磁盘存储时也是物理（同一个区内）上连续的。
1. 录入数据时，按照聚簇索引组织存储数据，在磁盘上有序存储数据行。
2. 加速查询。基于ID作为条件的判断查询。

btree构建过程:
a. 叶子节点： 存储数据行时就是有序的，直接将数据行的page作为叶子节点（相邻的叶子结点，有双向指针）
b. 枝节点 ： 提取叶子节点ID的范围+指针，构建枝节点（相邻枝节点，有双向指针）
c. 根节点 ： 提取枝节点的ID的范围+指针，构建根节点

—5.3 辅助索引

存的是什么？
辅助索引列值+ID主键值+指针

前提：
需要人为创建辅助索引，将经常作为查询条件的列创建辅助索引，起到加速查询的效果。

btree 构建过程
a. 叶子节点：提取主键（ID）+辅助索引列，按照辅助索引列进行从小到大排序后，生成叶子节点。（相邻的叶子结点，有双向指针）
b. 枝节点 ：提取叶子节点辅助索引列的范围+指针，构建枝节点（相邻枝节点，有双向指针）
c. 根节点 ：提取枝节点的辅助索引列的范围+指针，构建根节点

功能 ：
按照辅助索引列，作为查询条件时。
1. 查找辅助索引树，得到ID值
2. 拿着ID值回表（聚簇索引）查询
联合索引把重复值少的列作为最左列。

聚簇索引和辅助索引关系参考：
All indexes other than the clustered index are known as secondary indexes. In
InnoDB, each record in a secondary index contains the primary key columns for
the row, as well as the columns specified for the secondary index. InnoDB uses
this primary key value to search for the row in the clustered index.
If the primary key is long, the secondary indexes use more space, so it is
advantageous to have a short primary key.
https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

6 使用索引考虑的事项

—6.1 回表是什么？回表会带来什么问题？怎么减少回表？

a. 辅助索引列作为查询条件时，先查找辅助索引树得到ID，再到聚簇索引树查找数据行的过程。
b. IO量多、IO次数多、随机IO会增多、SQL层和engine交互多次。IO偏高 ，CPU 偏高。
c. 减少回表建议：
1. 辅助索引能够完全覆盖查询结果，可以使用联合索引。
2. 尽量让查询条件精细化，尽量使用唯一值多的列作为查询条件
3. 优化器：MRR（Multi-Range-Read），ICP（index condition pushdown） ，锦上添花的功能。
mysql> select @@optimizer_switch;
mysql> set global optimizer_switch='mrr=on';
功能：
1. 辅助索引查找后得到ID值，进行自动排序
2. 一次性回表，很有可能受到B+TREE中的双向指针的优化查找。

—6.2 索引树高度的影响因素？如何解决？

a. 高度越低越好

b.数据行越多，高度越高，一般3层20以内的列一般可以存2000w数据。。
  1. 分区表。一个实例里管理（把大表按照时间,list,hash月份等进行分区,oralce较多使用）
  2. 按照数据特点，进行归档表。pt-archiver（分离冷热数据归档）
  3. 分布式中间件架构。针对海量数据、高并发业务主流方案。
  4. 分布式数据库
  5. 在设计方面，满足三大范式。
  

c.主键规划：长度过长。所以主键尽量使用自增数字列。

d. 列值长度越长，数据量大的话，会影响到高度。建议使用前缀索引，比如100字符只取前10个字符，构建索引树。

e. 数据类型的选择，选择合适的、简短的数据类性。
例如：
1.存储人的年龄 ，使用 tinyint比char(3)好一些
2.存储人名，char(20)和varchar(20)的选择哪一个好？
  a. 站在数据插入性能角度思考，应该选：char
  b. 从节省空间角度思考,应该选：varchar
  c. 从索引树高度的角度思考，应该选：varchar。
综述所述，建议使用varchar类型存储变长列值。

—6.3 如何计算一个索引的高度

以test.t100w为例： idx(num) ---> 高度
t100w ---> t100w.ibd ---> page? ---> 在数据页的某个位置？

1. 确认idx起始的数据页
mysql> alter table t100w add index idx_num(num);

mysql> desc t100w;
+-------+-----------+------+-----+-------------------+-----------------------------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+-------------------+-----------------------------------------------+
| id | int | YES | | NULL | |
| num | int | YES | MUL | NULL | |
| k1 | char(2) | YES | | NULL | |
| k2 | char(4) | YES | | NULL | |
| dt | timestamp | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
+-------+-----------+------+-----+-------------------+-----------------------------------------------+
5 rows in set (0.00 sec)

mysql> select * from information_schema.innodb_indexes;
+----------+-----------------+----------+------+----------+---------+-------+-----------------+
| INDEX_ID | NAME | TABLE_ID | TYPE | N_FIELDS | PAGE_NO | SPACE | MERGE_THRESHOLD |
+----------+-----------------+----------+------+----------+---------+-------+-----------------+
| 179 | idx_num | 1078 | 0 | 2 | 41 | 19 | 50 |

mysql> select * from information_schema.innodb_tables;
| TABLE_ID | NAME | FLAG | N_COLS | SPACE | ROW_FORMAT | ZIP_PAGE_SIZE | SPACE_TYPE | INSTANT_COLS |
+----------+------------------------------------------------------------+------+--------+------------+-
| 1078 | test/t100w | 33 | 8 | 19 | Dynamic | 0 | Single | 0 |
+----------+------------------------------------------------------------+------+--------+------------+-

mysql> select b.NAME, a.NAME, a.INDEX_ID, a.PAGE_NO from information_schema.innodb_indexes as a join information_schema.innodb_tables as b on a.table_id=b.table_id where b.name='test/t100w';
+------------+-----------------+----------+---------+
| NAME | NAME | INDEX_ID | PAGE_NO |
+------------+-----------------+----------+---------+
| test/t100w | GEN_CLUST_INDEX | 178 | 4 |
| test/t100w | idx_num | 179 | 41 |
+------------+-----------------+----------+---------+
2 rows in set (0.00 sec)

test/t100w 索引存储在哪里？
[root@localhost test]# ls -l /data/3306/data/test/t100w.ibd
-rw-r----- 1 mysql mysql 75497472 Jan 30 15:04 /data/3306/data/test/t100w.ibd

2. 获取索引树高度的信息
hexdump -s(offset跳过多少字节) xxx -n（显示后续的10个值，类似limit,跳过多少显示多少） 10 xxx.ibd
计算需要跳过的数值：offset(-s) = 41*16*1024+64
[root@db01 test]# hexdump -s 671808 -n 10 /data/3306/data/test/t100w.ibd
00a4040(开头标记) 0200(index的level，即索引树高度3层) 0000 0000 0000 b300(index的ID,即179)
00a404a(结束标记)

7 索引的操作应用

—7.1 压测

source /root/t100w.sql
mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k1='aa'" engine=innodb --number-of-queries=2000 -uroot -p123 -verbose

--concurrency=100 ： 模拟同时100会话连接
--create-schema='test' ： 操作的库是谁
--query="select * from test.t100w where k2='780P'" ：做了什么操作
--number-of-queries=2000 ： 一共做了多少次查询

Benchmark
    Running for engine rbose
    Average number of seconds to run all queries: 734.948 seconds
    Minimum number of seconds to run all queries: 734.948 seconds
    Maximum number of seconds to run all queries: 734.948 seconds
    Number of clients running queries: 100
    Average number of queries per client: 20
alter table test.t100w add index idx_k2(k2);
再次测试：
mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k2='780P'" engine=innodb --number-of-queries=2000 -uroot -p123 -verbose
Benchmark
    Running for engine rbose
    Average number of seconds to run all queries: 0.404 seconds
    Minimum number of seconds to run all queries: 0.404 seconds
    Maximum number of seconds to run all queries: 0.404 seconds
    Number of clients running queries: 100
    Average number of queries per client: 20

—7.2 查询表的索引

mysql> desc t100w;
mysql> show index from t100w;
Key
PK --> 主键（聚簇索引）
MUL --> 辅助索引
UNI --> 唯一索引

—7.3 创建索引

单列辅助索引
select * from test.t100w where k2='780P'
优化方式：
语法：
alter table 表名 add index 索引名(列名);
alter table t100w add index idx_k2(k2);

—7.4 联合索引创建

mysql> alter table t100w add index idx_k1_num(k1,num);

—7.5 前缀索引创建

判断前缀长度多少合适：
select count(distinct(left(name,5))) from city ;
select count(distinct name) from city ;
创建前缀索引
mysql> alter table city add index idx_n(name(5));

—7.6 删除索引

alter table city drop index idx_n;

8 执行计划获取和分析

—8.1 命令介绍

获取语句的执行计划工具，只针对索引应用和优化器算法应用部分信息。
explain
desc

—8.2 使用方法

mysql> desc select * from city where countrycode='CHN';
mysql> explain select * from city where countrycode='CHN';
+----+-------------+-------+------------+------+---------------+------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key |key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | city | NULL | ref | CountryCode | CountryCode |3 | const | 363 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+

—8.3 执行计划信息介绍

table ：此次查询访问的表
type ：索引查询的类型（ALL、index、range、ref、eq_ref、const(system)、NULL）
possible_keys ：可能会应用的索引
key : 最终选择的索引
key_len ：索引覆盖长度，主要是用来判断联合索引应用长度。
rows ：需要扫描的行数
Extra ：额外信息

—8.4 type信息详解

# ALL 没有使用到索引

a. 查询条件没建立索引
mysql> desc select * from city where name='paking';
b. 有索引不走
mysql> alter table city add index idx_name(name);
mysql> desc select * from city where name='%pak%';
mysql> desc select * from city where name!= 'paking';
mysql> desc select * from city where name not in ( 'paking');
c.索引统计信息失效，过旧
d.查询结果集在25%以上，优化器有可能会走全表扫描。
e.隐式转换
mysql> create table test(ID int not null primary key,num char(10));
mysql> insert into test values(1,'110'),(2,'1210'),(3,'12210'),(4,'12219'),(5,'219'),(6,'21119');
mysql> alter table test add index idx(num);
mysql> desc select * from test where num='110';
+----+-------------+-------+------------+
| id | select_type | table | partitions | type
+----+-------------+-------+------------+----
| 1 | SIMPLE | test | NULL | ref 
mysql> desc select * from test where num=110;
+----+-------------+-------+------------+-------+
| id | select_type | table | partitions | type
+----+-------------+-------+------------+-------+
| 1 | SIMPLE | test | NULL | index |

# index 全索引扫描

mysql> create table test(ID int not null primary key,num char(10));
mysql> insert into test values(1,'110'),(2,'1210'),(3,'12210'),(4,'12219'),(5,'219'),(6,'21119');
mysql> alter table test add index idx(num);
mysql> desc select num from test;
+----+-------------+-------+------------+-----
| id | select_type | table | partitions | type 
+----+-------------+-------+------------+------
| 1 | SIMPLE | test | NULL | index 
注意： 对索引树全索引扫描，建议不要出现。特别是聚簇索引INDEX，相当于全表扫描。

# range 索引范围扫描

会受到B+TREE额外优化，叶子节点双向指针
mysql> use world;
mysql> desc select * from city where id<100;
+----+-------------+-------+------------+---
| id | select_type | table | partitions | type 
+----+-------------+-------+------------+----
| 1 | SIMPLE | city | NULL | range
mysql> desc select * from city where countrycode like 'CH%';
以下两种查询，大几率受不到叶子节点双向指针优化。 
mysql> desc select * from city where countrycode in ('CHN','USA'); 
mysql> desc select * from city where countrycode='CHN' or countrycode='USA';
注意： 范围查找最好缩小查询范围。比如> < 要有上限和下限，或者可以使用limit进行限制。
特别是在做delete 和 update。在索引设计不合理时，使用limit有可能会出现主从数据不一致。
建议： 如果查询列重复值少的话，我们建议改写为 union all
desc
select * from city where countrycode='CHN'
union all
select * from city where countrycode='USA';

改写前压测：
mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create schema='world' --query="select * from world.city where countrycode='CHN' or
countrycode='USA';" engine=innodb --number-of-queries=2000 -uroot -p123 -verbose

改写后压测：
mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create schema='world' --query="select * from world.city where countrycode='CHN' union
all select * from city where countrycode='USA';" engine=innodb --number-ofqueries=2000 -uroot -p123 -verbose

# ref 辅助索引等值查询

辅助索引的等值查询
desc select * from city where countrycode='CHN';

#eq_ref

多表连接查询中，非驱动表的连接条件关联列是主键或唯一键时。
如下city是驱动表，coyntry是非驱动表。
mysql> alter table city add index idx_p(population);
mysql> desc select city.name,country.name from city join country on city.countrycode=country.code where city.population<100;
+----+-------------+---------+------------+--------+-----------
| id | select_type | table | partitions | type | possible_keys | key 
+----+-------------+---------+----------+---------+-----------------
| 1 | SIMPLE | city | NULL | range | CountryCode,idx_p | idx_p 
| 1 | SIMPLE | country | NULL | eq_ref | PRIMARY | PRIMARY

# const(system)

主键或唯一键等值查询
mysql> desc select * from city where id=1;
+----+-------------+-------+------------+-------+---
| id | select_type | table | partitions | type | possible_keys 
+----+-------------+-------+------------+-------+----
| 1 | SIMPLE | city | NULL | const | PRIMARY

# NULL

mysql> desc select * from city where id=1000000000000000;
+----+-------------+-------+------------+------+-------------
| id | select_type | table | partitions | type | possible_keys | key 
+----+-------------+-------+------------+------+-------
| 1 | SIMPLE | NULL | NULL | NULL | NULL

—8.5 key_len

5.7版本之前，可以用来判断联合索引应用的长度部分。
例如：
idx(a,b,c)
我们希望应用联合索引的部分越多越好
select * from t1 where a and b and c

2. 如何计算
key_len=a+b+c
列的key_len长度，按照每列的最大预留长度来做的计算。

建表
use test;
create table test ( 
id int not null primary key auto_increment, 
a int not null ,  #4
b int , #5
c char(10) not null ,#40
d varchar(10),#43
e varchar(10) not null #42
)engine=innodb charset=utf8mb4;


插入数据：
mysql> insert into test values(1,1,1,'a','a','a'),(2,2,2,'b','b','b'),(22,22,22,'bb','bb','bb'),(11,11,11,'1a','1a','1a'),(111,111,111,'11a','11a','11a');
建立索引： 
alter table test add index idx(a,b,c,d,e);

测试：
mysql> desc select * from test where a=1 and b=1 and c='a' and d='a' and e='a';
+----+-------------+-------+------------+------+---------------+------+----
| id | select_type | table | partitions | type | possible_keys | key | key_len 
+----+-------------+-------+------------+------+---------------+------+---
| 1 | SIMPLE | test | NULL | ref | idx | idx | 134 |
mysql> desc select * from test where a=10 and b=10 and c='a' and d='a';
+----+-------------+-------+------------+------+---------------+------+-
| id | select_type | table | partitions | type | possible_keys | key| key_len |
+----+-------------+-------+------------+------+---------------+------+-
| 1 | SIMPLE | test | NULL | ref | idx | idx | 92 
mysql> desc select * from test where a=10 and b=10 and c='a';
+----+-------------+-------+------------+------+---------------+------+---
| id | select_type | table | partitions | type | possible_keys | key | key_len 
+----+-------------+-------+------------+------+---------------+--
| 1 | SIMPLE | test | NULL | ref | idx | idx | 49 
mysql> desc select * from test where a=10 and b=10;
+----+-------------+-------+------------+------+---------------+------+---
| id | select_type | table | partitions | type | possible_keys | key | key_len
+----+-------------+-------+------------+------+---------------+------+---
| 1 | SIMPLE | test | NULL | ref | idx | idx | 9 

5个列覆盖： 4+5+40+43+42=134 
4个列覆盖： 4+5+40+43=92 
3个列覆盖： 4+5+40=49 
2个列覆盖： 4+5=9 
1个列覆盖： 4

新的办法：
mysql> desc format=json select * from t100w where num=771751 and k1='47' and k2='ghLM';
"key": "idx_n_k1_k2",
"used_key_parts": [
        "num",
        "k1",
        "k2"
]

mysql> desc format=json select * from t100w where num=771751 and k1 like '4%' and k2='ghLM';
mysql> desc format=json select * from t100w where k1='47' and num=771751 and k2='ghLM';
mysql> desc format=json select * from t100w where k1='47' and k2='ghLM'; 
mysql> desc format=json select * from t100w where num=771751 and k2='ghLM';
mysql> desc format=json select * from t100w where k1='47' and num=771751 and k2 like 'g%';

—8.6 联合索引应用细节

1. 条件：
联合索引应用要满足最左原则
a. 建立联合索引时，选择重复值较少的列作为最左列。
b. 使用联合索引时，查询条件中，必须包含最左列，才有可能应用到联合索引。
联合索引不同覆盖场景
mysql> alter table t100w add index idx(num,k1,k2);

2. 全部覆盖 (key_len:31)
mysql> desc select * from t100w where num=913759 and k1='ej' and k2='EFfg';
mysql> desc select * from t100w where k1='ej' and k2='EFfg' and num=913759 ;
mysql> desc select * from t100w where num=913759 and k1='ej' and k2 in('EFfg','abcd');
mysql> desc select * from t100w where num=913759 and k1='ej' and k2 like 'EF%'

说明：
(a,b,c)
a= and b= and c=
b= and c= and a=

3. 部分覆盖
idx(a,b,c)

where a = and b =
where b = and a =
where a =
where a = and b like '%' and c=

例如：
mysql> desc select * from t100w where num=913759 and k1>'zz' and k2='EFfg';

如何优化？
(num,k1,k2) ----> (num,k2,k1)
mysql> desc select * from t100w where num=913759 and k2='EFfg' and k1>'zz' ;

4. 完全不覆盖
idx(a,b,c)
where b c
where b
where c

冗余索引：
mysql> select * from sys.schema_redundant_indexes where table_schema='world'\G

一次没使用的索引
mysql> select * from sys.schema_unused_indexes where object_schema='world';

— 8.7 extra

# 即额外的信息，参数值包括以下几种：
using filesort排序较多
 ---> group by \ order by \distinct \ union all
mysql> desc select * from world.city where countrycode='CHN' order by population;
+----+-------------+-------+------------+------+---------------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra 
| 1 | SIMPLE | city | NULL | ref | CountryCode | CountryCode | 3 | const | 363 | 100.00 | Using filesort |

优化
mysql> alter table city add index idx_c_p(countrycode,population);
mysql> desc select * from world.city where countrycode='CHN' order by population; 
+----+-------------+-------+------------+------+---------------------+------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------------+-----
| 1 | SIMPLE | city | NULL | ref | CountryCode,idx_c_p | idx_c_p | 3 | const | 363 | 100.00 | NULL |
+----+-------------+-------+------------+------+--

using tmp 
内存临时表（join，子查询，结果集较大时会造成内存压力），可以从拆分子查询的方式来优化。

优化器算法的选择（ICP SNLJ BNL BKA HJ）

—8.8 应用场景

数据库慢：
a. 应急性的慢。
系统层面top -Hp MysqlPID;
数据库层面show full processlist;information_schema.processlist;找到慢语句，然后kill select/DDL，分析语句explain SQL ---> 优化索引、改写语句
b. 间歇性慢。
slowlog ----> 慢语句 ---> explain SQL ---> 优化索引、改写语句

—8.9 扩展

explain format=json ,tree.....做深层次语句代价排查
真实执行语句获取真实的执行计划:
mysql> desc analyze select * from t100w where num=913759 and k1>'zz' and k2='EFfg';
查看语句在cpu，内存，IO的代价情况：
mysql> set profiling=1;开启功能(会话级别)
mysql> select * from t100w where num=913759 ;
mysql> show profiles;总延时情况
mysql> show profile for query 1; 详细延时情况
mysql> show profile cpu for query 1;CPU方面的延时情况
mysql> show profile block io for query 31;磁盘IO方面的延时情况（磁盘到内存Block_ops_in）
mysql> show profile memory for query 3;内存方面的延时

optimizer_trace

9 索引应用规范

—9.1 建立索引的原则（DBA运维规范）

说明
为了使索引的使用效率更高，在创建索引时，必须考虑在哪些字段上创建索引和创建什么类型的索引。
总结： 建索引原则
(1) 必须要有主键,最好数字自增列(业务无关列，不经常更新的列)。
(2) 经常做为where条件列 order by group by join on, distinct 的条件(业务:产品功能+用户行为)
(3) 联合索引最左原则。
(4) 列值长度较长的索引列,我们建议使用前缀索引.
(5) 降低索引条目,一方面不要创建没用索引,不常使用的索引清理
sys.schema_unused_indexes
sys.schema_redundant_indexes
idx(a,b,c) a ab abc
(6) 索引维护要避开业务繁忙期，建议用pt-osc\gh-ost
(7) 大量数据导入的时候，先把索引禁用。

—9.2 不走索引的情况（开发规范）

(1)没有查询条件，或者查询条件没有建立索引
(2)查询的结果集，超过了总数行数25%，优化器觉得就没有必要走索引了。
(3)索引本身失效，统计信息不真实（过旧） analyze table t1;
(4)查询条件使用函数在索引列上，或者对索引列进行运算，运算包括(+，-，*，/，! 等)
(5)隐式转换导致索引失效.这一点应当引起重视.也是开发中经常会犯的错误.
   telnum char(11)
   select * from t1 where telnum = '110'; √
   select * from t1 where telnum = 110; X
(6)<> ，not in 不走索引（辅助索引）
(7)单独的>,<,in 有可能走，也有可能不走，和结果集大小有关，尽量结合业务添加limit
(8)like "%_" 百分号在最前面不走

—9.3 优化器针对索引的算法

——9.3.1 MySQL索引的自优化-AHI（自适应HASH索引）

MySQL的InnoDB引擎，能够创建只有Btree。
AHI作用：
自动评估"热"的内存索引page,生成HASH索引表。
帮助InnoDB快速读取索引页。加快索引读取的效果。
相当与索引的索引。

——9.3.2 MySQL-Change buffer

比如insert，update，delete数据。
对于聚簇索引会立即更新。
对于辅助索引，不是实时更新的。
在InnoDB内存结构中，加入了insertbuffer（会话），现在版本叫changebuffer。
Changebuffer功能是临时缓冲辅助索引需要的数据更新。
当我们需要查询新insert的数据，会在内存中进行merge(合并)操作，此时辅助索引就是最新的。

10 可选的优化器算法-索引

—10.1 优化器算法查询

select @@optimizer_switch;
index_merge=on,
index_merge_union=on,
index_merge_sort_union=on,
index_merge_intersection=on,
engine_condition_pushdown=on,
index_condition_pushdown=on,
mrr=on,mrr_cost_based=on,
block_nested_loop=on,
batched_key_access=off,
materialization=on,
semijoin=on,
loosescan=on,
firstmatch=on,
duplicateweedout=on,
subquery_materialization_cost_based=on,
use_index_extensions=on,
condition_fanout_filter=on,
derived_merge=on

—10.2 如何修改

1. my.cnf
optimizer_switch='batched_key_access=on'

2. set global optimizer_switch='batched_key_access=on';

3. hints
SELECT /*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */ f1
FROM t3 WHERE f1 > 30 AND f1 < 33;

SELECT /*+ BKA(t1) NO_BKA(t2) */ * FROM t1 INNER JOIN t2 WHERE ...;
SELECT /*+ NO_ICP(t1, t2) */ * FROM t1 INNER JOIN t2 WHERE ...;
SELECT /*+ SEMIJOIN(FIRSTMATCH, LOOSESCAN) */ * FROM t1 ...;
EXPLAIN SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;

SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;

https://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html

—10.3 index_condition_pushdown （ICP）

介绍： 索引下推 ，5.6+ 加入的特性 
idx(a,b,c) 
where a = and b > and c = 

作用： SQL层做完过滤后，只能用a,b的部分辅助索引，将c列条件的过滤下推到engine层，进行再次过滤。 
排除无用的数据页。 
最终去磁盘上拿数据页。 
大大减少无用IO的访问。

测试1: ICP开启时
idx(k1,k2)
mysql> SET global optimizer_switch='index_condition_pushdown=ON'

[root@db01 ~]# mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k1 > 'Za' and k2='rsEF'" engine=innodb --number-of-queries=2000 -uroot -p123 -verbose

测试2:ICP关闭时：
idx(k1,k2)
mysql> SET global optimizer_switch='index_condition_pushdown=OFF'

[root@db01 ~]# mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k1 > 'Za' and k2='rsEF'" engine=innodb --number-of-queries=2000 -uroot -p123 -verbose

测试3:优化索引
idx(k2,k1)
[root@db01 ~]# mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k1 > 'Za' and k2='rsEF'" engine=innodb --number-of-queries=2000 -uroot -p123 -verbose

具体参考 ：
https://dev.mysql.com/doc/refman/5.7/en/index-condition-pushdownoptimization.html
https://mariadb.com/kb/en/index-condition-pushdown/

—10.4 MRR : Multi Range Read

1 作用： 减少回表。
2 开关方法：
mysql> set global optimizer_switch='mrr=on,mrr_cost_based=off';
3 区别
具体参考 ：
https://dev.mysql.com/doc/refman/5.7/en/mrr-optimization.html
https://mariadb.com/kb/en/multi-range-read-optimization/

压力测试：
mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --reateschema='world' --query="select * from world.city where name in ('Aachen','Aalborg','Aba','Abadan','Abaetetuba')" engine=innodb --number-ofqueries=20000 -uroot -p123 -verbose

—10.5 关于连接算法

SNLJ: 普通的嵌套循环连接。

a join b on a.id=b.id
如何选择驱动表？
1. 优化器会自己选，选择结果集小的（数据量少的）
2. 人工也可以干预。left join ，原则上可以强制左表驱动表

伪代码：
for each row in A matching range {
    block
    for each row in B {
      A.xx = B.yy ，send to client
    }
}

INLJ ===> 普通索引 ===> 唯一索引 eq_ref
BNL ===> block nest loop join ===> 一般是在被驱动表连接条件索引不走的时候。
BNL + MRR ===> BKA

8.0.18+版本之后，加入了 hash join
8.0.20+版本，将原来使用BNL算法的查询，转换为HASH join

—10.6 SNLJ 普通嵌套循环连接

例子：
A join B
on A.xx = B.yy

伪代码：
for each row in A matching range {
    block
    for each row in B {
      A.xx = B.yy ，send to client
    }
}

例子：
mysql> desc select /*+ BNL(teacher,course) */ * from teacher join course on teacher.tno=course.tno;

关于驱动表选择的优化思路：
理论支撑：
mysql> desc select * from city join country on city.countrycode=country.code;
mysql> desc select * from city left join country on city.countrycode=country.code;

查询语句执行代价：
mysql> desc format=json select * from city join country on city.countrycode=country.code ;
mysql> desc format=json select * from city left join country on city.countrycode=country.code;

实践检验：
[root@db01 ~]# mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='world' --query="select * from city left join country on city.countrycode=country.code ;" engine=innodb --number-ofqueries=2000 -uroot -p123 -verbose
[root@db01 ~]# mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='world' --query="select * from city join country on city.countrycode=country.code ;" engine=innodb --number-of-queries=2000 -uroot -p123 -verbose

注： 可以通过 left join 强制驱动表。

—10.7 BNLJ

在 A和B关联条件匹配时，不再一次一次进行循环。
而是采用一次性将驱动表的关联值和非驱动表匹配.一次性返回结果
主要优化了， CPU消耗，减少了IO次数

In EXPLAIN output,
use of BNL for a table is signified
when the Extra value contains Using join buffer (Block Nested Loop)

for block row in a matching range {
    for each row in b {
      a.x = b.y ，send to client
    }
}

—10.8 BKA

主要作用，使用来优化非驱动表的关联列有辅助索引。
BNL+ MRR的功能。
开启方式：
mysql> set global optimizer_switch='mrr=on,mrr_cost_based=off';
mysql> set global optimizer_switch='batched_key_access=on';
重新登陆生效。

—10.9 MySQL8.0 版本索引的新特性

a. 不可见索引。invisable/visable index
针对优化器不可见。但是索引还在磁盘存在，还会自动维护。
对于索引维护时，不确定索引是否还有用。这时可以临时设定为invisable。

ALTER TABLE t1 ALTER INDEX i_idx INVISIBLE;
ALTER TABLE t1 ALTER INDEX i_idx VISIBLE;

b. 倒序索引。
select * from t1 where c = order by a , b desc ;
idx(c,a, b desc )

c. 8.0.18+ hash join
join 操作用不上索引优化的情况下。
8.0.20 后 BNL完全替代为了Hash Join

hash join ：
1. 被驱动表的关联条件，索引应用不好。
2. 全表关联，没有太多谓词条件的时候。
3. ash Join 必须包含等值条件关联

11 hash join

—11.1 工作过程

hash join 工作过程分为2个阶段：
select a.name,a.age ,b.addr,b.tel from a join b on a.id=b.id ;
1. 选驱动表
2. 优化器算法 ---> Hash join

build 阶段：
a表数据进行构建hash表
key: hash(id) ===> x001
value:name%age

probe 探测阶段:
b表数据处理：
对b表关联列进行hash
a.id 扫描一遍 b表

—11.2 probe 探测阶段

——11.2.1 构建阶段

从参与join的2个表中选一个，选择占空间小的那个表，不是行数少的，这里假设选择了countries 表。
对 countries 表中每行的 join 字段值进行 hash 计算：
hash(countries.country_id)
计算后放入内存中 hash table 的相应位置。
所有行都存放到 hash table 之后，构建阶段完成。

——11.2.2 探测阶段

对 persons 表中每行中的 join 字段的值进行 hash 计算：
hash(persons.country_id)
拿着计算结果到内存 hash table 中进行查找匹配，找到一行就发给 client。
这样就完成了整个 join 操作，每个表只扫描一次就可以了，扫描匹配时间也是恒定的，非常高效。

这个例子中， countries 表顺利的全部放入了内存，可用内存的大小是由 join_buffer_size 控制
的。
实际环境中，肯定会有比较大的表，那么超过了可用内存时怎么办呢？
需要溢出到磁盘了。

——11.2.3 溢出到磁盘

在构建阶段过程中，如果内存满了，会把表中剩余数据写到磁盘上。
不会只写入一个文件，会分成多个块文件。
MySQL 会保证每个块文件的大小都是适合可用内存的。
怎么决定某一行记录写入哪个块文件呢？也是通过hash计算join字段决定的：

可以看到，对于大表，构建阶段分为了2步：
  写入内存 hash table
  写入块文件
然后是探测阶段，首先还是会走一遍和之前一样的流程，就是扫描一遍 persons 表的每一行，和内存中的 hash table 进行匹配。

但因为内存中的 hash table 不是全部数据，所以需要额外的处理：
把 persons 表的数据也写入多个块文件中。
怎么决定某一行记录写入哪个块文件呢？和构建阶段写入块文件的思路相同，这样，构建阶段的块文件和此处的块文件就是一一对应的关系了。

在正常的探测流程走完之后，开始处理这些块文件中的内容了。
逐一加载构建阶段的块文件到内存中，加载过程和正常的构建过程一致，对块文件中的每行数据进行hash 计算，放入内存的 hash table 中。
构建好一个块文件之后，选择与其对应的探测块文件开始探测。
例如构建的是第0个构建块文件，那么就选择第0个探测块文件。
就这样一对一对的块文件进行处理，直到全部完成。