How to speed up mysql update join query between 2 tables?

249    Asked by DavidEdmunds in SQL Server , Asked on Mar 14, 2023

I have a powerful dedicated server.


Intel  I7-6700K -
64GB DDR4 2400 MHz
1x480GB   SSD
running mysql server along with nginx,php
innodb-ft-min-token-size = 1
innodb-ft-enable-stopword = 0
innodb_buffer_pool_size = 40G
max_connections = 2000
[deploy@ns540545 ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         45G         11G        107M        6.4G         16G
Swap:          2.0G        1.4G        640M
it was expensive so i got another dedicated server for cost cutting let's call it not-so powerful dedicated server
Intel  i3-2130  
8GB DDR3 1333 MHz   
2TB 
running mysql server along with nginx,php
innodb-ft-min-token-size = 1
innodb-ft-enable-stopword = 0
innodb_buffer_pool_size = 4G
max_connections = 2000
[root@privateserver deploy]# free -h
              total        used        free      shared  buff/cache   available
Mem:           7.7G        7.5G         73M         24M        150M         79M
Swap:           39G        7.8G         32G

I moved the database from a powerful server to a not-so-powerful server. I can feel slight performance degradation while running simple queries which is fine, but this one query which used to take 2 minutes on a powerful server now it takes around 26.6525 hours and counting on a not-so powerful server.


UPDATE content a JOIN peers_data b ON a.hash = b.hash SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated
More info about tables which are exactly same on both the dedicated server


CREATE TABLE `peers_data` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `hash` char(40) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `seeders` int(11) NOT NULL DEFAULT '0',
  `leechers` int(11) NOT NULL DEFAULT '0',
  `is_updated` int(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`hash`),
  UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `content` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `content_id` int(11) unsigned NOT NULL DEFAULT '0',
  `hash` char(40) CHARACTER SET ascii NOT NULL DEFAULT '',
  `title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `tags` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `category` smallint(3) unsigned NOT NULL DEFAULT '0',
  `category_name` varchar(50) CHARACTER SET ascii COLLATE ascii_bin DEFAULT '',
  `sub_category` smallint(3) unsigned NOT NULL DEFAULT '0',
  `sub_category_name` varchar(50) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `size` bigint(20) unsigned NOT NULL DEFAULT '0',
  `seeders` int(11) unsigned NOT NULL DEFAULT '0',
  `leechers` int(11) unsigned NOT NULL DEFAULT '0',
  `upload_date` datetime DEFAULT NULL,
  `uploader` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '0',
  `uploader_level` varchar(10) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
  `comments_count` int(11) unsigned NOT NULL DEFAULT '0',
  `is_updated` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`content_id`),
  UNIQUE KEY `unique` (`id`) USING BTREE,
  KEY `hash` (`hash`),
  KEY `uploader` (`uploader`),
  KEY `sub_category` (`sub_category`),
  KEY `category` (`category`),
  KEY `title_index` (`title`),
  KEY `category_sub_category` (`category`,`sub_category`),
  KEY `seeders` (`seeders`),
  KEY `uploader_sub_category` (`uploader`,`sub_category`),
  KEY `upload_date` (`upload_date`),
  KEY `uploader_upload_date` (`uploader`,`upload_date`),
  KEY `leechers` (`leechers`),
  KEY `size` (`size`),
  KEY `uploader_seeders` (`uploader`,`seeders`),
  KEY `uploader_size` (`uploader`,`size`),
  FULLTEXT KEY `title` (`title`),
  FULLTEXT KEY `tags` (`tags`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
mysql> explain UPDATE content a JOIN peers_data b ON a.hash = b.hash SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated ;
+----+-------------+-------+------------+--------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | UPDATE      | a     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL | 4236260 |   100.00 | NULL        |
|  1 | SIMPLE      | b     | NULL       | eq_ref | PRIMARY       | PRIMARY | 160     | func |       1 |   100.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+------+---------+----------+-------------+
2 rows in set (0.00 sec)
records in peers_data 6,367,417
records in content 4,236,268

How can i speed up the above update join query ? i was expecting about 1 hour on not-so powerful server, but 26 hours+ is too much.

what am I doing wrong ? or missing here ?

I have tried to compensate for RAM on a not-so-powerful server by setting 32 GB + swap space. Is innodb buffer pool 4 gb too much ?


Answered by David Piper

Your query should be taking a long time. Why ??? Look at your query:


UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;
Please note the columns begin mysql update join by the query in the content table:
seeders
leechers
is_updated
Which of these columns are indexed ?
seeders
leechers
What indexes do you have in the content table involving those columns ?
KEY `seeders` (`seeders`), <<<<-------------------------------- THIS ONE !!!
KEY `uploader_sub_category` (`uploader`,`sub_category`),
KEY `upload_date` (`upload_date`),
KEY `uploader_upload_date` (`uploader`,`upload_date`),
KEY `leechers` (`leechers`), <<<<------------------------------ THIS ONE !!!
KEY `size` (`size`),
KEY `uploader_seeders` (`uploader`,`seeders`), <<<<------------ THIS ONE !!!
What is going on under the hood ?

If you are changing the values for seeders and leeches, these three(3) indexes are having their leaf nodes reshuffled. Even if the majority of the values are not being changed, rows are being locked and copies of your data are being stockpiled in your undo logs (for the sake of MVCC). This results is additional disk I/O (ibdata1 should be growing)

#INNODB

The InnoDB Buffer Pool will be going through the "Perfect Storm". Why ?

Please note the InnoDB Architecture (Picture from Percona CTO Vadim Tkachenko)

Please note the Insert Buffer. Here is what the MySQL Documentation says about it:

The change buffer is a special data structure that caches changes to secondary index pages when affected pages are not in the buffer pool. The buffered changes, which may result from INSERT, UPDATE, or DELETE operations (DML), are merged later when the pages are loaded into the buffer pool by other read operations.

Unlike clustered indexes, secondary indexes are usually non unique, and inserts into secondary indexes happen in a relatively random order. Similarly, deletes and updates may affect secondary index pages that are not adjacently located in an index tree. Merging cached changes at a later time, when affected pages are read into the buffer pool by other operations, avoids substantial random access I/O that would be required to read-in secondary index pages from disk.

Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately.

Change buffer merging may take several hours when there are numerous secondary indexes to update and many affected rows. During this time, disk I/O is increased, which can cause a significant slowdown for disk-bound queries. Change buffer merging may also continue to occur after a transaction is committed. In fact, change buffer merging may continue to occur after a server shutdown and restart (see Section 14.21.2, “Forcing InnoDB Recovery” for more information).

In memory, the change buffer occupies part of the InnoDB buffer pool. On disk, the change buffer is part of the system tablespace, so that index changes remain buffered across database restarts.

Here is where the "Perfect Storm" comes in: By default, InnoDB storage engine reserves up to 25% of the Buffer Pool for Change Buffering. All changes to your three(3) secondary indexes have to pile up in the InnoDB Buffer Pool's Insert Buffer. When the actual index pages land in the Buffer Pool, the merge process will subsequently be pushed to disk (Note the Insert Buffer inside ibdata1), producing more disk I/O.

#SUGGESTIONS
###SUGGESTION #1
Get rid of those indexes. Why ? This will eliminate the change buffering needed for managing those indexes during your mass UPDATE.
Run the following query:
SELECT
    COUNT(1) rowcount,
    COUNT(DISTINCT seeders) seeders_count,
    COUNT(DISTINCT leechers) leechers_count
FROM content;
If seeders_count is less than 5% of rowcount, then the cardinality of seeders_count can disqualify the seeders from being of any use. Same going with leechers.
For that index uploader_seeders, run this query:
SELECT COUNT(1) uploader_seeders_count FROM
(SELECT DISTINCT uploader,seeders FROM content) A;
If uploader_seeders_count is less than 5% of rowcount (from previous query), then get rid of uploader_seeders index.
To get rid of those indexes, run this:
ALTER TABLE content DROP INDEX uploader_seeders,DROP INDEX seeders,DROP INDEX leechers;
###SUGGESTION #2
Change the Insert Buffer size to the max value, the run the query:
SET GLOBAL innodb_change_buffer_max_size = 50;
UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;
SET GLOBAL innodb_change_buffer_max_size = 25;
Also, set the Buffer Pool Size (innodb_buffer_pool_size) to 20G. Change that value in my.cnf.
If you have MySQL 5.7, simply run
mysql> SET GLOBAL innodb_buffer_pool_size = 1024 * 1024 * 1024 * 20;
If you have MySQL 5.6 and prior, you must restart mysqld.
###SUGGESTION #3 (RISKY)
If you do not want to drop the indexes, you could disable change buffering during the mass UPDATE, then enable it afterwards:
Run the following:
SET GLOBAL innodb_change_buffering = 'none';
UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;
FLUSH TABLES;
SET GLOBAL innodb_change_buffering = 'all';

This is risky because this speeds up changes to indexes in favour of not having buffering to recover in the event of a crash or reboot.

  ###SUGGESTION #4 (RISKY)

Another cavalier approach would be to disable the Double Write Buffer. Since a restart is required, do this:

service mysql restart --skip-innodb_doublewrite
Do your mass UPDATE
service mysql restart
This is not recommended for Production. Dev and Staging only please !!!

Your Answer

Interviews

Parent Categories