什么是在MySQL中获取累计金额的“正确”查询?
我有一张表格,其中保存有关文件的信息,一个列列表包含文件的大小(以字节为单位)。(实际文件保存在磁盘上的某个位置)
我想要这样的累积文件大小:
+------------+---------+--------+----------------+ | fileInfoId | groupId | size | cumulativeSize | +------------+---------+--------+----------------+ | 1 | 1 | 522120 | 522120 | | 2 | 2 | 316042 | 316042 | | 4 | 2 | 711084 | 1027126 | | 5 | 2 | 697002 | 1724128 | | 6 | 2 | 663425 | 2387553 | | 7 | 2 | 739553 | 3127106 | | 8 | 2 | 700938 | 3828044 | | 9 | 2 | 695614 | 4523658 | | 10 | 2 | 744204 | 5267862 | | 11 | 2 | 609022 | 5876884 | | ... | ... | ... | ... | +------------+---------+--------+----------------+ 20000 rows in set (19.2161 sec.)
现在,我使用以下查询来获得以上结果
SELECT a.fileInfoId , a.groupId , a.size , SUM(b.size) AS cumulativeSize FROM fileInfo AS a LEFT JOIN fileInfo AS b USING(groupId) WHERE a.fileInfoId >= b.fileInfoId GROUP BY a.fileInfoId ORDER BY a.groupId, a.fileInfoId
但是,我的解决方案非常慢。(大约19秒,没有缓存)。
解释给出以下执行细节
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+ | 1 | SIMPLE | a | index | PRIMARY,foreignId | PRIMARY | 4 | NULL | 14905 | | | 1 | SIMPLE | b | ref | PRIMARY,foreignId | foreignId | 4 | db.a.foreignId | 36 | Using where | +----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
如何优化上述查询?
更新 我已经更新了问题,以提供表结构和使用20,000条记录测试数据填充表的过程。
CREATE TABLE `fileInfo` ( `fileInfoId` int(10) unsigned NOT NULL AUTO_INCREMENT , `groupId` int(10) unsigned NOT NULL , `name` varchar(128) NOT NULL , `size` int(10) unsigned NOT NULL , PRIMARY KEY (`fileInfoId`) , KEY `groupId` (`groupId`) ) ENGINE=InnoDB; delimiter $$ DROP PROCEDURE IF EXISTS autofill$$ CREATE PROCEDURE autofill() BEGIN DECLARE i INT DEFAULT 0; DECLARE gid INT DEFAULT 0; DECLARE nam char(20); DECLARE siz INT DEFAULT 0; WHILE i < 20000 DO SET gid = FLOOR(RAND() * 250); SET nam = CONV(FLOOR(RAND() * 10000000000000), 20, 36); SET siz = FLOOR((RAND() * 1024 * 1024)); INSERT INTO `fileInfo` (`groupId`, `name`, `size`) VALUES(gid, nam, siz); SET i = i + 1; END WHILE; END;$$ delimiter ; CALL autofill();
您可以使用一个变量-它比任何联接都快得多:
SELECT id, size, @total := @total + size AS cumulativeSize, FROM table, (SELECT @total:=0) AS t;
这是在运行Debian 5.0并具有128MB RAM的奔腾III上的快速测试案例:
创建表:
DROP TABLE IF EXISTS `table1`; CREATE TABLE `table1` ( `id` int(11) NOT NULL auto_increment, `size` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;
填写20,000个随机数:
DELIMITER // DROP PROCEDURE IF EXISTS autofill// CREATE PROCEDURE autofill() BEGIN DECLARE i INT DEFAULT 0; WHILE i < 20000 DO INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000))); SET i = i + 1; END WHILE; END; // DELIMITER ; CALL autofill();
检查行数:
SELECT COUNT(*) FROM table1; +----------+ | COUNT(*) | +----------+ | 20000 | +----------+
运行累积总数查询:
SELECT id, size, @total := @total + size AS cumulativeSize FROM table1, (SELECT @total:=0) AS t; +-------+------+----------------+ | id | size | cumulativeSize | +-------+------+----------------+ | 1 | 226 | 226 | | 2 | 869 | 1095 | | 3 | 668 | 1763 | | 4 | 733 | 2496 | ... | 19997 | 966 | 10004741 | | 19998 | 522 | 10005263 | | 19999 | 713 | 10005976 | | 20000 | 0 | 10005976 | +-------+------+----------------+ 20000 rows in set (0.07 sec)
更新
我错过了原始问题中的groupId分组,这肯定使事情变得有些棘手。然后,我写了一个使用临时表的解决方案,但我不喜欢它-混乱且过于复杂。我走了,做了更多的研究,并且想出了更简单,更快捷的方法。
我不能说所有这鈥攊ñ其实功劳,我可以勉强主张任何可言,因为它仅仅是修改后的版本仿效行数从常见的MySQL查询。
它非常简单,优雅且非常快速:
SELECT fileInfoId, groupId, name, size, cumulativeSize FROM ( SELECT fileInfoId, groupId, name, size, @cs := IF(@prev_groupId = groupId, @cs+size, size) AS cumulativeSize, @prev_groupId := groupId AS prev_groupId FROM fileInfo, (SELECT @prev_groupId:=0, @cs:=0) AS vars ORDER BY groupId ) AS tmp;
SELECT ... AS tmp如果您不介意prev_groupID返回列,则可以删除外部。我发现没有它,它的运行速度会稍快一些。
SELECT ... AS tmp
prev_groupID
这是一个简单的测试用例:
INSERT INTO `fileInfo` VALUES ( 1, 3, 'name0', '10'), ( 5, 3, 'name1', '10'), ( 7, 3, 'name2', '10'), ( 8, 1, 'name3', '10'), ( 9, 1, 'name4', '10'), (10, 2, 'name5', '10'), (12, 4, 'name6', '10'), (20, 4, 'name7', '10'), (21, 4, 'name8', '10'), (25, 5, 'name9', '10'); SELECT fileInfoId, groupId, name, size, cumulativeSize FROM ( SELECT fileInfoId, groupId, name, size, @cs := IF(@prev_groupId = groupId, @cs+size, size) AS cumulativeSize, @prev_groupId := groupId AS prev_groupId FROM fileInfo, (SELECT @prev_groupId := 0, @cs := 0) AS vars ORDER BY groupId ) AS tmp; +------------+---------+-------+------+----------------+ | fileInfoId | groupId | name | size | cumulativeSize | +------------+---------+-------+------+----------------+ | 8 | 1 | name3 | 10 | 10 | | 9 | 1 | name4 | 10 | 20 | | 10 | 2 | name5 | 10 | 10 | | 1 | 3 | name0 | 10 | 10 | | 5 | 3 | name1 | 10 | 20 | | 7 | 3 | name2 | 10 | 30 | | 12 | 4 | name6 | 10 | 10 | | 20 | 4 | name7 | 10 | 20 | | 21 | 4 | name8 | 10 | 30 | | 25 | 5 | name9 | 10 | 10 | +------------+---------+-------+------+----------------+
这是20,000行表中最后几行的示例:
| 19481 | 248 | 8CSLJX22RCO | 1037469 | 51270389 | | 19486 | 248 | 1IYGJ1UVCQE | 937150 | 52207539 | | 19817 | 248 | 3FBU3EUSE1G | 616614 | 52824153 | | 19871 | 248 | 4N19QB7PYT | 153031 | 52977184 | | 132 | 249 | 3NP9UGMTRTD | 828073 | 828073 | | 275 | 249 | 86RJM39K72K | 860323 | 1688396 | | 802 | 249 | 16Z9XADLBFI | 623030 | 2311426 | ... | 19661 | 249 | ADZXKQUI0O3 | 837213 | 39856277 | | 19870 | 249 | 9AVRTI3QK6I | 331342 | 40187619 | | 19972 | 249 | 1MTAEE3LLEM | 1027714 | 41215333 | +------------+---------+-------------+---------+----------------+ 20000 rows in set (0.31 sec)