A database undergoes series of Schema/DDL changes during its tenure. Frequency of these changes might be less compared to normal DML workload but they are crucial and complex part of any DB workload. MySQL provides 3 inherent approaches to achieve this viz. COPY (create shadow table, copy, swap), INPLACE (online DDL that allow parallel workload while DDL is progressing), INSTANT (starting MySQL 8.x that allows instant DDL execution).
All these approaches achieve the same goal but differ in the way they allow parallel workload to execute like COPY blocks parallel workload, INPLACE/INSTANT allows parallel workload. One of the most widely used default approach for most of the command in MySQL is INPLACE. In this article we cover ‘MySQL INPLACE alter’ – looking at aspects like steps involved, locking model, intermediate objects created, etc…
NOTE: This study is based on our research on specific MySQL versions . With recent versions there might be some differences
INPLACE or ONLINE DDL was introduced in MySQL-5.6 as a better alternative to improve over the existing COPY approach. COPY approach creates shadow table and then copies over data from the old table to the new table. While the copy operation is in progress, parallel DML workload on the said table is not allowed to progress. INPLACE changed this paradigm by allowing parallel DML workload to progress while DDL is in progress.
INPLACE has multiple variants.
● INPLACE that updates only metadata. (like renaming of column).
● INPLACE that adds new object without touching existing old objects (like add index).
● INPLACE that causes table rebuild (changing existing column definition or adding new column).
Main challenge with INPLACE (or ONLINE DDL) is how to track the ongoing DML changes to the table and how to incorporate them to the main table.
Before we get into details of specific variants let’s understand the steps that ALTER execute at generic level. Once we get hold of them we would then check which of these steps are applicable to each of the variants.
Generic Steps for ALTER execution:
- Step-1: Parse ALTER statement and create the ALTER context to execute the flow.
- Step-2: Obtain the needed lock on the object to modify.
- Following locks are obtained:
- object-type: GLOBAL (assume complete db), lock-type: Intention Exclusive (denote intention to modify the object), duration: statement
- object-type: SCHEMA (db that host the table), lock-type: Intention Exclusive (denote intention to modify the object), duration: transaction
- Following locks are obtained:
- object-type: TABLE (table to modify), lock-type: Shared Upgradable (shared for now may be upgraded in the future as needed), duration: transaction
- Step-3: Setup/Populate internal object structure that would also aid in creating a new table (with altered definition) and act as in-memory representation of the same. This step also enforces validation check like say column to add is duplicate, indexed column doesn’t exist, etc…
- Step-4: Next is to create a temporary/intermediate table with new definition. Till 5.7, MySQL has 2 sub-systems for hosting same table definition. MySQL sub-system and InnoDB SE sub-system. Both of these sub-system has their own table definition/object cached in their respective Data-Dictionary. In this step, table is created in MySQL sub-system only, thereby resulting in creation of frm file but no entry is yet registered inside InnoDB SE maintained data-dictionary. This is step when MySQL claims to have a table but InnoDB doesn’t and one of the major reasons that causes DDL inconsistency. (Solved using unified Data-Dictionary in MySQL-8.x).
- Step-5: Check if operation can be executed as INPLACE. Not all operations can be executed as INPLACE. Each statement has different lock requirement especially tenure. Some statement may need exclusive lock for complete alter duration (including copying of the old data) while some may need it for a shorter time. All this get established as part of this step. If user has requested to execute alter under a specific lock mode feasibility of the same is cross-validated. User is allowed to request stronger than needed lock but not the weaker (say statement needs exclusive lock but user requested to operate with shared lock).
- Step-6: Upgrade to EXCLUSIVE lock as evaluated by the flow.
- Step-2 obtained SHARED_UPGRADABLE lock which is now upgraded to an EXCLUSIVE lock. This would block other threads/sessions from accessing the said table and there-by providing exclusive access to the alter session to make critical changes to the table structure.
- Lock notification is also propagated to SE. Ideally, it was for SE to lock the file, hinting that MySQL sub-system plans to read/write from the said table. Over period of time semantics has been tweaked to also indicate the start of the statement.
- Step-7 (Prepare Phase of ALTER table):
- Create altered table (table with new definition) at SE level. Note: This table is also referred as a temporary/intermediate table as it would eventually replace the original table. Step-4 created table in MySQL sub-system, this step would do the same at storage-engine level including adding new table object to the data-dictionary and creation of low-level ibd file for storing the table data.
- Old table may get parallel updates. Same needs to be recorded in temporary staging area. This staging area is then attached to the old table to log the needed updates.
- Next is to assign read-view. ALTER needs to copy data from the old table to the new table. In order to read consistent data ALTER needs a read snapshot at the said timestamp.
- Step-8: Downgrade back to SHARED_UPGRADABLE.
- Downgrade lock back to SHARED UPGRADABLE on completion of the prepare phase. Critical operation to change Data-Dictionary has been done now. Upcoming operations will not get blocked if they tend to perform conflicting operations as changes has been now registered in DD. (Some of the operation may demand to carry the EXCLUSIVE lock even for next phases).
- Step-9 (Copy Phase of ALTER table):
- Copy data from old table to new table. There are multiple aspects here.
- InnoDB stores data in form of indexes so practically copy operation is rebuilding all indexes including CLUSTERED INDEX.
- In InnoDB, clustered index has all columns so instead of re-building each index from its respective index it is better to scan/parse clustered index and builds new clustered index and other secondary indexes. This helps complete the operation through single read pass.
- Clustered-Index stores entries in sorted fashion. InnoDB will use this property and read the entries in sorted fashion that would aid in rebuilding clustered index using sorted build algorithm (a.k.a Bulk Btree Build). This sorted build also help achieve optimal structure of tree post re-build.
- Entries for other indexes are also derived from clustered index entries. These entries may not be in sorted fashion with respect to their keys. Once enough entries are read (to fill the temporary staging buffer limited by innodb_sort_buffer_size) these entries are sorted and written to an intermediate file (a.k.a merge-file). N such intermediate buffers are then re-merged using merge-sort to create a continuous stream of sorted data that can then be used to again created secondary indexes using sorted-build algorithm.
- Step-9a (first phase apply operation):
- Step-9 copies over data and so could be time consuming. This is time when table may see parallel DML action. Before the COPY phase ends, flow would try to fold the entries that it has captured through parallel DML into the newly created index. [Note: This is not the final fold operation as table may still get modified but since COPY is longest phase most of the parallel DML changes will be captured during this phase and will get folded in the first phase apply operation].
- Step-10 (Upgrade to EXCLUSIVE lock):
- Before starting last phase (COMMIT), lock is again upgraded to EXCLUSIVE this would ensure no DML is operational on the said table there-by allowing alter session to complete the final phase of apply operation too. (With FK involve to block CASCADE operation locks are also established on parent tables).
- Also, obtain EXCLUSIVE lock at storage-engine/InnoDB level. While MDL locks are good during recovery there could be a flow-path that would skip MDL locks and directly obtain InnoDB locks.
- Step-11 (COMMIT phase of ALTER table):
- Stop background thread from accessing the table to modify.
- Complete final/second phase of apply operations. This will fold all parallel DML changes done to the table during ALTER flow to the newly created altered table. (Note: session has exclusive lock so no more parallel DML changes are allowed).
- Rename the ibd file at storage engine level with old table name. This rename operation is done using SWAP algorithm.
- old-table-name-ibd -> temporary-table-name-ibd
- new/intermediate-table-name-ibd -> old-table-name-ibd
- Drop the old table which was renamed to temporary-table-name-ibd. (Removes the copy of old-table object from data-dictionary).
- Needed changes are also made to the system tablespace (or persistent copy of InnoDB data-dictionary).
- Re-enable stats re-creation.
- Finally rename the table in MySQL sub-system that would replace the frm files.
- Altered table is re-opened. This time it opens altered newly created table tagged under same table-name.
- Step-12 (Trailing action):
- binlog ALTER statement, invalidate query cache, release mdl locks, etc…
Let’s answer some common questions ?
DML flow detects if there is an active online DDL running on the said table. If yes, then DML actions are logged to a temporary buffer. This buffer is bounded by innodb_sort_buffer_size (default = 1M). If the buffer is full, it is written to a temporary file. Maximum size of this file is bounded by innodb_online_alter_log_max_size (default = 128M). File is created inside mysql temporary directory (defined by either TMPDIR environment variable or –tmpdir. Normally defaults to /tmp). Being a temporary file it may not be visible as it is unlinked immediately on creation maintaining only alter session reference.
This helps protect against the generic action like FLUSH TABLES WITH READ LOCK while ALTER is in progress.
All DDL commands, need to obtain needed set of the locks in given order. MDL_SHARED_UPGRADABLE lock is not compatible with other lock of the same type from different session that causes said session to wait till existing session is complete. Parallel DML often requests MDL_SHARED_WRITE which is compatible with MDL_SHARED_UPGRADABLE lock.
performance_schema.metadata_locks has then details.
+-------------+--------------------+----------------+-----------------------+---------------------+---------------+-------------+--------+-----------------+----------------+ | OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID | +-------------+--------------------+----------------+-----------------------+---------------------+---------------+-------------+--------+-----------------+----------------+ | GLOBAL | NULL | NULL | 140735743667376 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | | 27 | 17 | | SCHEMA | test | NULL | 140735743667568 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | | 27 | 17 | | TABLE | test | sbtest2 | 140735743590240 | SHARED_UPGRADABLE | TRANSACTION | GRANTED | | 27 | 17 | | GLOBAL | NULL | NULL | 140734938277440 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | | 29 | 11 | | SCHEMA | test | NULL | 140734938279232 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | | 29 | 11 | | TABLE | test | sbtest2 | 140734938277344 | SHARED_UPGRADABLE | TRANSACTION | PENDING | | 29 | 11 | | GLOBAL | NULL | NULL | 140735139619904 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | | 28 | 15 | | TABLE | test | sbtest2 | 140735139619808 | SHARED_WRITE | TRANSACTION | GRANTED | | 28 | 15 | | TABLE | performance_schema | metadata_locks | 140735005384768 | SHARED_READ | TRANSACTION | GRANTED | | 32 | 3 |
In the above example, 2 independent alter statements are fired on the same table. As you can see GLOBAL/SCHEMA level I_X locks are granted but SHARED_UPGRADABLE lock is blocked. At the same time SHARED_WRITE lock originating from parallel DML update command has been granted too.
Note: Remember to set performance-schema-instrument=’wait/lock/metadata/sql/mdl=ON’ before using metadata_locks table.
Lot depends on what stage ALTER is while DB restarts but it most cases it is easy to put the table in an inconsistent state.
In the example below we stimulate restart after the table is renamed in InnoDB but before it is renamed in MySQL sub-system. Post restart table is not accessible.
mysql> select * from sbtest1 limit 10; ERROR 1146 (42S02): Table 'test.sbtest1' doesn't exist 2019-11-27T12:58:09.327136Z 2 [Warning] InnoDB: Table test/sbtest1 contains 5 user defined columns in InnoDB, but 4 columns in MySQL. Please check INFORMATION_SCHEMA.INNODB_SYS_COLUMNS and http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html for how to resolve the issue.
Restoring from backup is only safest way out there are some short-cut that may work depending on stage of ALTER crash. Like the case above we manually copied over new altered frm file in place of old table frm file and the table was restored. Of course, options like these are more risky and may or may not work so consulting expert is advisable and ensure you take backup to avoid further ruining the stage.
Answer lies in the rename operation of MySQL. Re-check Step-11.
sbtest1.ibd (table to alter ibd file) #sql-ib43-2591602943.ibd (intermediate/temporary ibd file that represent table rebuild with new definition)
During commit, flow would rename, sbtest1.ibd -> X.ibd (another temporary name) and then would rename #sql-ib44-2591602944.ibd -> sbtest1.ibd So for the moment while the 2nd step is yet to execute and 1st step is executed user may see 2 temporary file name but not the original-table.ibd file.
ls -l *.ibd #sql-ib43-2591602943.ibd #sql-ib44-2591602944.ibd (renamed version of sbtest1.ibd).
DML would detect this and would stop appending any further data marking the condition accordingly. ALTER detects this condition and report failure to end-user. Note: DML action doesn’t fail due to this reason.
No. All DML sessions would append to the same file.
ALTER flow applies of DML entries has 2 phases: first phase is as part of copy-phase where-in most of the parallel cached DML operations are folded and trailing/pending operations (if any) are left-out for folding during COMMIT phase. Since COMMIT establishes EXCLUSIVE lock, view that commit sees for the table is final and so parallel DML can’t continue to keep adding more entries to the list till ALTER is done.
ALTER apply these DML entries in batches. It reads N entries that fits the block (again bounded by innodb_sort_buffer_size) and apply them before reading the next set of entries. Before moving to the next block ALTER take a momentary EXCLUSIVE internal clustered index level locks. This helps ALTER get a consistent view of the list that is being parallel modified by DML operations. With this batch stop-gap approach, ALTER eventually will surely catch up with the DML.
Understanding INPLACE variation
Now let’s see steps involved with each of these variants.
Variant-1: INPLACE that updates only metadata
- Under this variant only metadata is updated so there is no need to rebuild table there-by skipping copy operation. Given limited operation it almost happens instantaneously unless other conflicting lock causes ALTER to delay.
- As far as steps are concerned, again all steps are executed as listed above except the copy step. This also means, ibd file is not re-created and so the space-id of the table continue to retain as original.
example: alter table sbtest1 change column col1 col2 int not null default 1000;
Variant-2: INPLACE that adds new object without touching existing one
- Operation involved addition of new object (say index) that would not cause existing table to re-build. (Note some of the object addition like column will cause complete table rebuild).
- Again, ALTER continue to execute all the steps above but this time COPY phase is limited to addition of new object. Algorithm for COPY continue to remain the same viz. Read entries from cluster index, construct secondary index entries and cache them in temporary/intermediate merge file, sort these entries and insert them using Bulk Build or sorted build. Given main clustered index is not re-build operation is pretty quick and same table ibd (in turn space-id) can be reused.
- Given the original table is not being touched parallel DML changes continue to modify the original table and so there is no need to re-merge/apply the parallel DML operations.
example: alter table sbtest1 add index k_1(k);
Variant-3: INPLACE that causes table rebuild
- This operation involves complete table re-build and follows all the steps described above including re-build of the clustered index.
example: alter table sbtest1 add column col int not null default 1000;
Unified Data-Dictionary starting MySQL-8.x
Till 5.7, there were 2 copies of the table metadata one in MySQL (in form of frm) and other in InnoDB SE (in form of data-dictionary). As we saw above, it was easy to get a table in an inconsistent state if both copies fail to agree. MySQL-8.x introduces unified data-dictionary (getting rid of frm file) there-by ensuring single consistent copy of the table metadata and also help fix multiple inconsistency issues associated with the table. MySQL-8.x also makes DDL atomic so changes can be rolled back if the things fail during execution.
Let’s understand how this change updates execution flow of ALTER table. MySQL-8.x continue to execute all steps outlined in generic flow (like 5.7) except there are few actions that are different in MySQL-8.x. Let’s see what these changes/differences are:
● MySQL-8.x now takes following 2 additional locks as part of Step-2.
● object-type: TABLESPACE (tablespace that host the table), lock-type: Intention Exclusive (denote intention to modify the object), duration: transaction
● object-type: BACKUP LOCK (assume global scope), lock-type: Intention Exclusive (denote intention to modify the object), duration: transaction
● BACKUP LOCK (new in MySQL-8.x) is meant to take active BACKUP of a running instance. ALTER statement obtains BACKUP LOCK that avoids backup action from taking inconsistent backup since table is about to undergo a change. TABLESPACE lock are present in MySQL-5.7 too but seems like involvement of frm files didn’t help establish their context correctly. Anyway, TABLESPACE lock ensure no other tablespace level operation is allowed while alter is progress.
● Next major change is with Step-4 that creates a new altered table (in turn persist frm files). With MySQL-8.x, there are no frm file so 8.x flow will continue to create an in-memory altered table object but will persist it to a unified data-dictionary (persist is delayed if the storage engine support atomic DDL like InnoDB).
● As part of Step-7, in-memory table object is created in InnoDB SE along with entries to data-dictionary. This step continue to remain crucial even with MySQL-8.x. In-memory table object (compatible with InnoDB) is created followed by creation of an intermediate table (+ tablespace) and said entries are then persisted (note: step-4 delayed it) to unified data-dictionary. Existing entries are removed and new entries corresponding to new tablespace with new space id are added. (Querying information_schema.innodb_tablespace post this step will make new space-id visible).
● Rest of the steps of copy, apply (parallel DML operations) and commit (including rename) continue to remain the same.
● 8.x also takes EXCLUSIVE locks on the intermediate table that are created.
● Given the unified data dictionary tables are atomic (cached in a transactional storage engine) if something fails during execution of ALTER, complete transaction is rolled back there-by ensuring unified data dictionary tables are restored to their original state.
● Reboot post crash during ALTER table will restore the original state of the table and will also get rid of the intermediate file (tablespace file) from the disk.
Summing it up
ALTER table is pretty complex operation and one must plan its execution carefully considering multiple factors like time to alter, effect of parallel DML and also effect on parallel DML, blocking of conflicting operations, etc…