Thursday, 18 December 2014

Drop Table slow

A few days back we were given a MySQL database hosted in EC2 with around 5TB data to move some of the tables to a new server.  The intention was to share the tables between 2 EC2 instances. Since AWS had the option to take online snapshots, the plan was to take a snapshot, create a new machine with that snapshot and drop the unwanted tables.
So everything went as planned until creating a new machine with the snapshot.  The real challenge was dropping the unwanted tables. 
It took around 4 minutes to Drop a table whose size is 20GB.  It took 20 minutes to drop a 100GB table. The time kept on increasing for larger tables. MySQL even went to “defunct” when we killed the drop query and at times crashed.
To track down this issue we executed drop table in one session and checked the processlist from another session that gave the below output.

mysql> show processlist \G
*************************** 1. row ***************************
     Id: 2352
   User: dbuser
   Host: localhost
     db: db
Command: Query
   Time: 2573
  State: checking permissions
   Info: DROP TABLE `test`

I wondered if it is a MySQL user related permission or a OS related one.
Now I went on to check the “InnoDB status”. Found something interesting in the “ROW OPERATIONS”. Printing it below

0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 6117, id 47405873539392, state: doing insert buffer merge
Number of rows inserted 167872705, updated 1018100, deleted 55251906, read 19531441141
71.93 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

And Innodb status output also gave me lots of semaphore waits on data dictionary

InnoDB Change Buffering :

From MySQL reference manual “Index changes resulting from SQL statements, which could normally involve random I/O operations, are held back and performed periodically by a background thread. This sequence of operations can write the disk blocks for a series of index values more efficiently than if each value were written to disk immediately."

So here is what had really happened. The given DB server is big in data and used to get frequent inserts and didn't have enough ram to accumulate all the pages in “InnoDB buffer pool”. So it had to cache all the secondary index changes to “Insert Buffer”. These cached changes are flushed to disk only when the pages are loaded to the “buffer pool” or when the server is idle. When we took a snapshot and mounted it in a new server the database was idle and so the InnoDB main thread started merging all the changes cached in “insert buffer” to the disk. So it was the “Innodb main thread” that held the lock on those tables and the drop statement that we executed has waited for the InnoDB main thread to complete the insert buffer merge process.

So we waited for the buffer merge process to complete and then we executed the drop statements.
All those tables got dropped in seconds. Even a 700GB table got dropped in 5 to 10 seconds.
Also setting “innodb_fast_shutdown=0” and shutting down MySQL does the buffer merge operation.
But what I feel is “buffer merging by keeping the server idle or setting “innodb_fast_shutdown=0” and restarting results in same amount of time”.

Also the status “Checking permissions” that we got in the processlist output is very misleading. It is natural for everybody to thing that the drop statement is waiting for MySQL user permission or OS related permission. But actually it was waiting for permission from the InnoDB main thread.

I still wonder why we need to buffer merges for a table that is to be dropped. I need to think further.

Note : There might some some other reasons too for drop table slowness. But this is the one I have faced till date.

--Aravinth C