Bootstrapping
Maxwell allows you to "bootstrap" data into your stream. This will perform a
select * from table
and output the results into your stream, allowing you
to recreate your entire dataset by playing the stream from the start.
Using the maxwell-bootstrap utility
You can use the maxwell-bootstrap
utility to begin boostrap operations from the command-line.
option | description |
---|---|
--log_level LOG_LEVEL | log level (DEBUG, INFO, WARN or ERROR) |
--user USER | mysql username |
--password PASSWORD | mysql password |
--host HOST | mysql host |
--port PORT | mysql port |
--database DATABASE | mysql database containing the table to bootstrap |
--table TABLE | mysql table to bootstrap |
--where WHERE_CLAUSE | where clause to restrict the rows bootstrapped from the specified table |
--client_id CLIENT_ID | specify which maxwell instance should perform the bootstrap operation |
--comment COMMENT | arbitrary comment to be added to every bootstrap row record |
Starting a table bootstrap
You can start a bootstrap using:
bin/maxwell-bootstrap --database fooDB --table barTable
Optionally, you can include a where clause to replay part of the data.
bin/maxwell-bootstrap --database fooDB --table barTable --where "my_date >= '2017-01-07 00:00:00'"
Alternatively you can insert a row in the maxwell.bootstrap
table to trigger a bootstrap.
mysql> insert into maxwell.bootstrap (database_name, table_name) values ('fooDB', 'barTable');
Note that if a Maxwell client_id has been set you should specify the client id.
mysql> insert into maxwell.bootstrap (database_name, table_name, client_id) values ('fooDB', 'barTable', 'custom_maxwell_client_id');
You can schedule bootstrap tasks to be run in the future by setting the started_at column. Maxwell will wait until this time to start the bootstrap.
mysql> insert into maxwell.bootstrap (database_name, table_name, client_id, started_at) values ('fooDB', 'barTable', 'custom_maxwell_client_id', '2020-05-18 12:30:00');
Async vs Sync bootstrapping
The Maxwell replicator is single threaded; events are captured by one thread from the binlog and replicated to Kafka one message at a time.
When running Maxwell with --bootstrapper=sync
, the same thread is used to do bootstrapping, meaning that all binlog events are blocked until bootstrapping is complete.
Running Maxwell with --bootstrapper=async
however, will make Maxwell spawn a separate thread for bootstrapping.
In this async mode, non-bootstrapped tables are replicated as normal by the main thread, while the binlog events for bootstrapped tables are queued and sent to the replication stream at the end of the bootstrap process.
Bootstrapping Data Format
- a bootstrap starts with an event of
type = "bootstrap-start"
- then events with
type = "bootstrap-insert"
(one per row in the table) - then one event per
INSERT
,UPDATE
orDELETE
with standard event types i.e.type = "insert"
,type = "update"
ortype = "delete"
that occurred since the beginning of bootstrap - finally an event with
type = "bootstrap-complete"
Here's a complete example:
mysql> create table fooDB.barTable(txt varchar(255));
mysql> insert into fooDB.barTable (txt) values ("hello"), ("bootstrap!");
mysql> insert into maxwell.bootstrap (database_name, table_name) values ("fooDB", "barTable");
Corresponding replication stream output of table fooDB.barTable
:
{"database":"fooDB","table":"barTable","type":"insert","ts":1450557598,"xid":13,"data":{"txt":"hello"}}
{"database":"fooDB","table":"barTable","type":"insert","ts":1450557598,"xid":13,"data":{"txt":"bootstrap!"}}
{"database":"fooDB","table":"barTable","type":"bootstrap-start","ts":1450557744,"data":{}}
{"database":"fooDB","table":"barTable","type":"bootstrap-insert","ts":1450557744,"data":{"txt":"hello"}}
{"database":"fooDB","table":"barTable","type":"bootstrap-insert","ts":1450557744,"data":{"txt":"bootstrap!"}}
{"database":"fooDB","table":"barTable","type":"bootstrap-complete","ts":1450557744,"data":{}}
Failure Scenarios
If Maxwell crashes during bootstrapping the next time it runs it will rerun the bootstrap in its entirety - regardless of previous progress.
If this behavior is not desired, manual updates to the bootstrap
table are required.
Specifically, marking the unfinished bootstrap row as 'complete' (is_complete
= 1) or deleting the row.