我有一个表,该表包含大约100.000个博客文章,并通过1:n关系链接到具有50个供稿的表。当我用select语句查询两个表时(按发布表的datetime字段排序),MySQL始终使用文件排序,导致查询时间非常慢(> 1秒)。这是postings表的架构(简化):
postings
+---------------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | feed_id | int(11) | NO | MUL | NULL | | | crawl_date | datetime | NO | | NULL | | | is_active | tinyint(1) | NO | MUL | 0 | | | link | varchar(255) | NO | MUL | NULL | | | author | varchar(255) | NO | | NULL | | | title | varchar(255) | NO | | NULL | | | excerpt | text | NO | | NULL | | | long_excerpt | text | NO | | NULL | | | user_offtopic_count | int(11) | NO | MUL | 0 | | +---------------------+--------------+------+-----+---------+----------------+
这是feed桌子:
feed
+-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | type | int(11) | NO | MUL | 0 | | | title | varchar(255) | NO | | NULL | | | website | varchar(255) | NO | | NULL | | | url | varchar(255) | NO | | NULL | | +-------------+--------------+------+-----+---------+----------------+
这是执行时间超过1秒的查询。请注意,该post_date字段具有索引,但MySQL并未使用它对发布表进行排序:
post_date
SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM (`postings`) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15
该explain extended查询命令的结果表明MySQL正在使用文件排序:
explain extended
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+ | 1 | SIMPLE | postings | ref | feed_id,is_active,user_offtopic_count | is_active | 1 | const | 30996 | Using where; Using filesort | | 1 | SIMPLE | feeds | eq_ref | PRIMARY,type | PRIMARY | 4 | feedian.postings.feed_id | 1 | Using where | +----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
当我删除该order by部分时,MySQL停止使用文件排序。如果您对如何优化此查询以使MySQL通过使用索引来排序和选择数据有任何想法,请告诉我。我已经尝试了一些方法,例如,根据一些博客文章的建议,在所有按位置/按字段排序的字段上创建了一个组合索引,但是这也不起作用。
order by
在postings (is_active, post_date)(按此顺序)上创建一个复合索引。
postings (is_active, post_date)
它将用于按进行过滤is_active和排序post_date。
is_active
MySQL应该在中显示REF对此索引的访问方法EXPLAIN EXTENDED。
MySQL
REF
EXPLAIN EXTENDED
请注意,您在上有一个RANGE过滤条件user_offtopic_count,这就是为什么在过滤和按其他字段排序时都不能在该字段上使用索引的原因。
RANGE
user_offtopic_count
根据您选择的程度user_offtopic_count(即,满足多少行user_offtopic_count < 10),创建索引user_offtopic_count并对post_dates进行排序可能会更有用。
user_offtopic_count < 10
为此,在上创建一个复合索引,postings (is_active, user_offtopic_count)并确保RANGE使用对该索引的访问方法。
postings (is_active, user_offtopic_count)
哪个索引会更快取决于您的数据分布。创建两个索引,FORCE然后看看哪个更快:
FORCE
CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count); CREATE INDEX ix_active_date ON postings (is_active, post_date); SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM `postings` FORCE INDEX (ix_active_offtopic) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15 /* This should show RANGE access with few rows and keep the FILESORT */ SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM `postings` FORCE INDEX (ix_active_date) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15 /* This should show REF access with lots of rows and no FILESORT */