我有一个关于MySql的问题。我有一个带有7.479.194记录的表。有些记录是重复的。我想这样做:
insert into new_table select * from old_table group by old_table.a, old_table.b
所以我会取出重复的条目…但是问题是这是大量数据。该表是MyIsam。
这是示例数据-我想按城市,short_ccode进行分组…
id city post_code short_ccode ---------------------------------------------------- 4732875 Celje 3502 si 4733306 Celje 3502 si 4734250 Celje 3502 si
我想我必须为my.ini文件修改一些内存,以便按语句分组……哪些设置对此负责?
我有一台具有3bg RAM和2Ghz处理器的机器。
我的ini文件:
# aaaMySQL Server Instance Configuration File # ---------------------------------------------------------------------- # Generated by the MySQL Server Instance Configuration Wizard # # # Installation Instructions # ---------------------------------------------------------------------- # # On Linux you can copy this file to /etc/my.cnf to set global options, # mysql-data-dir/my.cnf to set server-specific options # (@localstatedir@ for this installation) or to # ~/.my.cnf to set user-specific options. # # On Windows you should keep this file in the installation directory # of your server (e.g. C:\Program Files\MySQL\MySQL Server 4.1). To # make sure the server reads the config file use the startup option # "--defaults-file". # # To run run the server from the command line, execute this in a # command line shell, e.g. # mysqld --defaults-file="C:\Program Files\MySQL\MySQL Server 4.1\my.ini" # # To install the server as a Windows service manually, execute this in a # command line shell, e.g. # mysqld --install MySQL41 --defaults-file="C:\Program Files\MySQL\MySQL Server 4.1\my.ini" # # And then execute this in a command line shell to start the server, e.g. # net start MySQL41 # # # Guildlines for editing this file # ---------------------------------------------------------------------- # # In this file, you can use all long options that the program supports. # If you want to know the options a program supports, start the program # with the "--help" option. # # More detailed information about the individual options can also be # found in the manual. # # # CLIENT SECTION # ---------------------------------------------------------------------- # # The following options will be read by MySQL client applications. # Note that only client applications shipped by MySQL are guaranteed # to read this section. If you want your own MySQL client program to # honor these values, you need to specify it as an option during the # MySQL client library initialization. # [client] port=3306 # SERVER SECTION # ---------------------------------------------------------------------- # # The following options will be read by the MySQL Server. Make sure that # you have installed the server correctly (see above) so it reads this # file. # [wampmysqld] # The TCP/IP Port the MySQL Server will listen on port=3306 #Path to installation directory. All paths are usually resolved relative to this. basedir=d:/wamp/bin/mysql/mysql5.0.45 #log file log-error=d:/wamp/logs/mysql.log #Path to the database root datadir=d:/wamp/bin/mysql/mysql5.0.45/data # The default character set that will be used when a new schema or table is # created and no character set is defined default-character-set=utf8 # The default storage engine that will be used when create new tables when default-storage-engine=MyISAM # The maximum amount of concurrent sessions the MySQL server will # allow. One of these connections will be reserved for a user with # SUPER privileges to allow the administrator to login even if the # connection limit has been reached. max_connections=1000 # Query cache is used to cache SELECT results and later return them # without actual executing the same query once again. Having the query # cache enabled may result in significant speed improvements, if your # have a lot of identical queries and rarely changing tables. See the # "Qcache_lowmem_prunes" status variable to check if the current value # is high enough for your load. # Note: In case your tables change very often or if your queries are # textually different every time, the query cache may result in a # slowdown instead of a performance improvement. query_cache_size=16M # The number of open tables for all threads. Increasing this value # increases the number of file descriptors that mysqld requires. # Therefore you have to make sure to set the amount of open files # allowed to at least 4096 in the variable "open-files-limit" in # section [mysqld_safe] table_cache=500 # Maximum size for internal (in-memory) temporary tables. If a table # grows larger than this value, it is automatically converted to disk # based table This limitation is for a single table. There can be many # of them. tmp_table_size=32M # How many threads we should keep in a cache for reuse. When a client # disconnects, the client's threads are put in the cache if there aren't # more than thread_cache_size threads from before. This greatly reduces # the amount of thread creations needed if you have a lot of new # connections. (Normally this doesn't give a notable performance # improvement if you have a good thread implementation.) thread_cache_size=12 #*** MyISAM Specific options # The maximum size of the temporary file MySQL is allowed to use while # recreating the index (during REPAIR, ALTER TABLE or LOAD DATA INFILE. # If the file-size would be bigger than this, the index will be created # through the key cache (which is slower). myisam_max_sort_file_size=100G # If the temporary file used for fast index creation would be bigger # than using the key cache by the amount specified here, then prefer the # key cache method. This is mainly used to force long character keys in # large tables to use the slower key cache method to create the index. myisam_max_extra_sort_file_size=100G # If the temporary file used for fast index creation would be bigger # than using the key cache by the amount specified here, then prefer the # key cache method. This is mainly used to force long character keys in # large tables to use the slower key cache method to create the index. myisam_sort_buffer_size=32M # Size of the Key Buffer, used to cache index blocks for MyISAM tables. # Do not set it larger than 30% of your available memory, as some memory # is also required by the OS to cache rows. Even if you're not using # MyISAM tables, you should still set it to 8-64M as it will also be # used for internal temporary disk tables. key_buffer_size=64M # Size of the buffer used for doing full table scans of MyISAM tables. # Allocated per thread, if a full scan is needed. read_buffer_size=2M read_rnd_buffer_size=8MK # This buffer is allocated when MySQL needs to rebuild the index in # REPAIR, OPTIMZE, ALTER table statements as well as in LOAD DATA INFILE # into an empty table. It is allocated per thread so be careful with # large settings. sort_buffer_size=256M #*** INNODB Specific options *** # Use this option if you have a MySQL server with InnoDB support enabled # but you do not plan to use it. This will save memory and disk space # and speed up some things. #skip-innodb # Additional memory pool that is used by InnoDB to store metadata # information. If InnoDB requires more memory for this purpose it will # start to allocate it from the OS. As this is fast enough on most # recent operating systems, you normally do not need to change this # value. SHOW INNODB STATUS will display the current amount used. innodb_additional_mem_pool_size=20M # If set to 1, InnoDB will flush (fsync) the transaction logs to the # disk at each commit, which offers full ACID behavior. If you are # willing to compromise this safety, and you are running small # transactions, you may set this to 0 or 2 to reduce disk I/O to the # logs. Value 0 means that the log is only written to the log file and # the log file flushed to disk approximately once per second. Value 2 # means the log is written to the log file at each commit, but the log # file is only flushed to disk approximately once per second. innodb_flush_log_at_trx_commit=1 # The size of the buffer InnoDB uses for buffering log data. As soon as # it is full, InnoDB will have to flush it to disk. As it is flushed # once per second anyway, it does not make sense to have it very large # (even with long transactions). innodb_log_buffer_size=8M # InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and # row data. The bigger you set this the less disk I/O is needed to # access data in tables. On a dedicated database server you may set this # parameter up to 80% of the machine physical memory size. Do not set it # too large, though, because competition of the physical memory may # cause paging in the operating system. Note that on 32bit systems you # might be limited to 2-3.5G of user level memory per process, so do not # set it too high. innodb_buffer_pool_size=512M # Size of each log file in a log group. You should set the combined size # of log files to about 25%-100% of your buffer pool size to avoid # unneeded buffer pool flush activity on log file overwrite. However, # note that a larger logfile size will increase the time needed for the # recovery process. innodb_log_file_size=10M # Number of threads allowed inside the InnoDB kernel. The optimal value # depends highly on the application, hardware as well as the OS # scheduler properties. A too high value may lead to thread thrashing. innodb_thread_concurrency=8 [mysqld] port=3306
这将填充NEW_TABLE唯一值,该id值是该串的第一个ID:
NEW_TABLE
id
INSERT INTO NEW_TABLE SELECT MIN(ot.id), ot.city, ot.post_code, ot.short_ccode FROM OLD_TABLE ot GROUP BY ot.city, ot.post_code, ot.short_ccode
如果您想要每束最高的id值:
INSERT INTO NEW_TABLE SELECT MAX(ot.id), ot.city, ot.post_code, ot.short_ccode FROM OLD_TABLE ot GROUP BY ot.city, ot.post_code, ot.short_ccode