Reference
At the minimum, you will need to specify 'host', 'user', 'password', 'producer'. The kafka producer requires 'kafka.bootstrap.servers', the kinesis producer requires 'kinesis_stream'.
general
option | argument | description | default |
---|---|---|---|
config | STRING | location of config.properties file |
$PWD/config.properties |
log_level | LOG_LEVEL | log level | info |
daemon | running maxwell as a daemon | ||
env_config_prefix | STRING | env vars matching prefix are treated as config values |
mysql
option | argument | description | default |
---|---|---|---|
host | STRING | mysql host | localhost |
user | STRING | mysql username | |
password | STRING | mysql password | (no password) |
port | INT | mysql port | 3306 |
jdbc_options | STRING | mysql jdbc connection options | DEFAULT_JDBC_OPTS |
ssl | SSL_OPT | SSL behavior for mysql cx | DISABLED |
schema_database | STRING | database to store schema and position in | maxwell |
client_id | STRING | unique text identifier for maxwell instance | maxwell |
replica_server_id | LONG | unique numeric identifier for this maxwell instance | 6379 (see notes) |
master_recovery | BOOLEAN | enable experimental master recovery code | false |
gtid_mode | BOOLEAN | enable GTID-based replication | false |
recapture_schema | BOOLEAN | recapture the latest schema. Not available in config.properties. | false |
max_schemas | LONG | how many schema deltas to keep before triggering compaction operation | unlimited |
replication_host | STRING | server to replicate from. See split server roles | schema-store host |
replication_password | STRING | password on replication server | (none) |
replication_port | INT | port on replication server | 3306 |
replication_user | STRING | user on replication server | |
replication_ssl | SSL_OPT | SSL behavior for replication cx cx | DISABLED |
replication_jdbc_options | STRING | mysql jdbc connection options for replication server | DEFAULT_JDBC_OPTS |
schema_host | STRING | server to capture schema from. See split server roles | schema-store host |
schema_password | STRING | password on schema-capture server | (none) |
schema_port | INT | port on schema-capture server | 3306 |
schema_user | STRING | user on schema-capture server | |
schema_ssl | SSL_OPT | SSL behavior for schema-capture server | DISABLED |
schema_jdbc_options | STRING | mysql jdbc connection options for schema server | DEFAULT_JDBC_OPTS |
producer options
option | argument | description | default |
---|---|---|---|
producer | PRODUCER_TYPE | type of producer to use | stdout |
custom_producer.factory | CLASS_NAME | fully qualified custom producer factory class, see example | |
producer_ack_timeout | PRODUCER_ACK_TIMEOUT | time in milliseconds before async producers consider a message lost | |
producer_partition_by | PARTITION_BY | input to kafka/kinesis partition function | database |
producer_partition_columns | STRING | if partitioning by 'column', a comma separated list of columns | |
producer_partition_by_fallback | PARTITION_BY_FALLBACK | required when producer_partition_by=column. Used when the column is missing | |
ignore_producer_error | BOOLEAN | When false, Maxwell will terminate on kafka/kinesis/pubsub publish errors (aside from RecordTooLargeException). When true, errors are only logged. See also dead_letter_topic | true |
file producer
option | argument | description | default |
---|---|---|---|
output_file | STRING | output file for file producer |
|
javascript | STRING | file containing javascript filters |
kafka producer
option | argument | description | default |
---|---|---|---|
kafka.bootstrap.servers | STRING | kafka brokers, given as HOST:PORT[,HOST:PORT] |
|
kafka_topic | STRING | kafka topic to write to. | maxwell |
dead_letter_topic | STRING | the topic to write a "skeleton row" (a row where data includes only primary key columns) when there's an error publishing a row. When ignore_producer_error is false , only RecordTooLargeException causes a fallback record to be published, since other errors cause termination. Currently only supported in Kafka publisher |
|
kafka_version | KAFKA_VERSION | run maxwell with specified kafka producer version. Not available in config.properties. | 0.11.0.1 |
kafka_partition_hash | [ default | murmur3 ] | hash function to use when choosing kafka partition | default |
kafka_key_format | [ array | hash ] | how maxwell outputs kafka keys, either a hash or an array of hashes | hash |
ddl_kafka_topic | STRING | if output_ddl is true, kafka topic to write DDL changes to | kafka_topic |
kinesis producer
option | argument | description | default |
---|---|---|---|
kinesis_stream | STRING | kinesis stream name |
sqs producer
option | argument | description | default |
---|---|---|---|
sqs_queue_uri | STRING | SQS Queue URI |
pubsub producer
option | argument | description | default |
---|---|---|---|
pubsub_topic | STRING | Google Cloud pub-sub topic | |
pubsub_platform_id | STRING | Google Cloud platform id associated with topic | |
ddl_pubsub_topic | STRING | Google Cloud pub-sub topic to send DDL events to | |
pubsub_request_bytes_threshold | LONG | Set number of bytes until batch is send | 1 |
pubsub_message_count_batch_size | LONG | Set number of messages until batch is send | 1 |
pubsub_publish_delay_threshold | LONG | Set time passed in millis until batch is send | 1 |
pubsub_retry_delay | LONG | Controls the delay in millis before sending the first retry message | 100 |
pubsub_retry_delay_multiplier | FLOAT | Controls the increase in retry delay per retry | 1.3 |
pubsub_max_retry_delay | LONG | Puts a limit on the value in seconds of the retry delay | 60 |
pubsub_initial_rpc_timeout | LONG | Controls the timeout in seconds for the initial RPC | 5 |
pubsub_rpc_timeout_multiplier | FLOAT | Controls the change in RPC timeout | 1.0 |
pubsub_max_rpc_timeout | LONG | Puts a limit on the value in seconds of the RPC timeout | 600 |
pubsub_total_timeout | LONG | Puts a limit on the value in seconds of the retry delay, so that the RetryDelayMultiplier can't increase the retry delay higher than this amount | 600 |
rabbitmq producer
option | argument | description | default |
---|---|---|---|
rabbitmq_user | STRING | Username of Rabbitmq connection | guest |
rabbitmq_pass | STRING | Password of Rabbitmq connection | guest |
rabbitmq_host | STRING | Host of Rabbitmq machine | |
rabbitmq_port | INT | Port of Rabbitmq machine | |
rabbitmq_virtual_host | STRING | Virtual Host of Rabbitmq | |
rabbitmq_exchange | STRING | Name of exchange for rabbitmq publisher | |
rabbitmq_exchange_type | STRING | Exchange type for rabbitmq | |
rabbitmq_exchange_durable | BOOLEAN | Exchange durability. | false |
rabbitmq_exchange_autodelete | BOOLEAN | If set, the exchange is deleted when all queues have finished using it. | false |
rabbitmq_routing_key_template | STRING | A string template for the routing key, %db% and %table% will be substituted. |
%db%.%table% . |
rabbitmq_message_persistent | BOOLEAN | Eanble message persistence. | false |
rabbitmq_declare_exchange | BOOLEAN | Should declare the exchange for rabbitmq publisher | true |
redis producer
option | argument | description | default |
---|---|---|---|
redis_host | STRING | Host of Redis server | localhost |
redis_port | INT | Port of Redis server | 6379 |
redis_auth | STRING | Authentication key for a password-protected Redis server | |
redis_database | INT | Database of Redis server | 0 |
redis_type | [ pubsub | xadd | lpush | rpush ] | Selects either Redis Pub/Sub, Stream, or List. | pubsub |
redis_key | STRING | Redis channel/key for Pub/Sub, XADD or LPUSH/RPUSH | maxwell |
redis_stream_json_key | STRING | Redis XADD Stream Message Field Name | message |
redis_sentinels | STRING | Redis sentinels list in format host1:port1,host2:port2,host3:port3... Must be only used with redis_sentinel_master_name | |
redis_sentinel_master_name | STRING | Redis sentinel master name. Must be only used with redis_sentinels |
formatting
option | argument | description | default |
---|---|---|---|
output_binlog_position | BOOLEAN | records include binlog position | false |
output_gtid_position | BOOLEAN | records include gtid position, if available | false |
output_commit_info | BOOLEAN | records include commit and xid | true |
output_xoffset | BOOLEAN | records include virtual tx-row offset | false |
output_nulls | BOOLEAN | records include fields with NULL values | true |
output_server_id | BOOLEAN | records include server_id | false |
output_thread_id | BOOLEAN | records include thread_id | false |
output_schema_id | BOOLEAN | records include schema_id, schema_id is the id of the latest schema tracked by maxwell and doesn't relate to any mysql tracked value | false |
output_row_query | BOOLEAN | records include INSERT/UPDATE/DELETE statement. Mysql option "binlog_rows_query_log_events" must be enabled | false |
output_primary_keys | BOOLEAN | DML records include list of values that make up a row's primary key | false |
output_primary_key_columns | BOOLEAN | DML records include list of columns that make up a row's primary key | false |
output_ddl | BOOLEAN | output DDL (table-alter, table-create, etc) events | false |
output_null_zerodates | BOOLEAN | should we transform '0000-00-00' to null? | false |
output_naming_strategy | STRING | naming strategy of field name of JSON. can be underscore_to_camelcase |
none |
filtering
option | argument | description | default |
---|---|---|---|
filter | STRING | filter rules, eg exclude: db.*, include: *.tbl, include: *./bar(bar)?/, exclude: foo.bar.col=val |
encryption
option | argument | description | default |
---|---|---|---|
encrypt | [ none | data | all ] | encrypt mode: none = no encryption. "data": encrypt the data field only. all : encrypt entire maxwell message |
none |
secret_key | string | specify the encryption key to be used | null |
high availability
option | argument | description | default |
---|---|---|---|
ha | enable maxwell client HA | ||
jgroups_config | string | location of xml configuration file for jGroups | $PWD/raft.xml |
raft_member_id | string | uniquely identify this node within jgroups-raft cluster |
monitoring / metrics
option | argument | description | default |
---|---|---|---|
metrics_prefix | STRING | the prefix maxwell will apply to all metrics | MaxwellMetrics |
metrics_type | [slf4j | jmx | http | datadog] | how maxwell metrics will be reported | |
metrics_jvm | BOOLEAN | enable jvm metrics: memory usage, GC stats, etc. | false |
metrics_slf4j_interval | SECONDS | the frequency metrics are emitted to the log, in seconds, when slf4j reporting is configured | 60 |
http_port | INT | the port the server will bind to when http reporting is configured | 8080 |
http_path_prefix | STRING | http path prefix for the server | / |
http_bind_address | STRING | the address the server will bind to when http reporting is configured | all addresses |
http_diagnostic | BOOLEAN | enable http diagnostic endpoint | false |
http_diagnostic_timeout | MILLISECONDS | the http diagnostic response timeout | 10000 |
metrics_datadog_type | [udp | http] | when metrics_type includes datadog this is the way metrics will be reported, can only be one of [udp | http] |
udp |
metrics_datadog_tags | STRING | datadog tags that should be supplied, e.g. tag1:value1,tag2:value2 | |
metrics_age_slo | INT | Latency service level objective threshold in seconds (Optional). When set, a message.publish.age.slo_violation metric is emitted to Datadog if the latency exceeds the threshold |
|
metrics_datadog_interval | INT | the frequency metrics are pushed to datadog, in seconds | 60 |
metrics_datadog_apikey | STRING | the datadog api key to use when metrics_datadog_type = http |
|
metrics_datadog_site | STRING | the site to publish metrics to when metrics_datadog_type = http |
us |
metrics_datadog_host | STRING | the host to publish metrics to when metrics_datadog_type = udp |
localhost |
metrics_datadog_port | INT | the port to publish metrics to when metrics_datadog_type = udp |
8125 |
misc
option | argument | description | default |
---|---|---|---|
bootstrapper | [async | sync | none] | bootstrapper type. See bootstrapping docs. | async |
init_position | FILE:POSITION[:HEARTBEAT] | ignore the information in maxwell.positions and start at the given binlog position. Not available in config.properties. | |
replay | BOOLEAN | enable maxwell's read-only "replay" mode: don't store a binlog position or schema changes. Not available in config.properties. | |
buffer_memory_usage | FLOAT | Determines how much memory the Maxwell event buffer will use from the jvm max memory. Size of the buffer is: buffer_memory_usage * -Xmx" | 0.25 |
LOG_LEVEL: [ debug | info | warn | error ]
SSL_OPTION: [ DISABLED | PREFERRED | REQUIRED | VERIFY_CA | VERIFY_IDENTITY ]
PRODUCER_TYPE: [ stdout | file | kafka | kinesis | pubsub | sqs | rabbitmq | redis ]
DEFAULT_JDBC_OPTS: zeroDateTimeBehavior=convertToNull&connectTimeout=5000
PARTITION_BY: [ database | table | primary_key | transaction_id | column | random ]
PARTITION_BY_FALLBACK: [ database | table | primary_key | transaction_id ]
KAFKA_VERSION: [ 0.8.2.2 | 0.9.0.1 | 0.10.0.1 | 0.10.2.1 | 0.11.0.1 ]
PRODUCER_ACK_TIMEOUT: In certain failure modes, async producers (kafka, kinesis, pubsub, sqs) may simply disappear a message, never notifying maxwell of success or failure. This timeout can be set as a heuristic; after this many milliseconds, maxwell will consider an outstanding message lost and fail it.
Configuration methods
Maxwell is configurable via the command-line, a properties file, or the environment. The configuration priority is:
command line options > scoped env vars > properties file > default values
config.properties
Maxwell can be configured via a java properties file, specified via --config
or named "config.properties" in the current working directory.
Any command line options (except init_position
, replay
, kafka_version
and
daemon
) may be specified as "key=value" pairs.
via environment
If env_config_prefix
given via command line or in config.properties
, Maxwell
will configure itself with all environment variables that match the prefix. The
environment variable names are case insensitive. For example, if maxwell is
started with --env_config_prefix=FOO_
and the environment contains FOO_USER=auser
,
this would be equivalent to passing --user=auser
.
Deployment scenarios
At a minimum, Maxwell needs row-level-replication turned on into order to operate:
[mysqld]
server_id=1
log-bin=master
binlog_format=row
GTID support
As of 1.8.0, Maxwell contains support for
GTID-based replication.
Enable it with the --gtid_mode
configuration param.
Here's how you might configure your mysql server for GTID mode:
$ vi my.cnf
[mysqld]
server_id=1
log-bin=master
binlog_format=row
gtid-mode=ON
log-slave-updates=ON
enforce-gtid-consistency=true
When in GTID-mode, Maxwell will transparently pick up a new replication position after a master change. Note that you will still have to re-point maxwell to the new master.
GTID support in Maxwell is considered beta-quality at the moment; notably, Maxwell is unable to transparently upgrade from a traditional-replication scenario to a GTID-replication scenario; currently, when you enable gtid mode Maxwell will recapture the schema and GTID-position from "wherever the master is at".
RDS configuration
To run Maxwell against RDS, (either Aurora or Mysql) you will need to do the following:
- set binlog_format to "ROW". Do this in the "parameter groups" section. For a Mysql-RDS instance this parameter will be in a "DB Parameter Group", for Aurora it will be in a "DB Cluster Parameter Group".
- setup RDS binlog retention as described here.
The tl;dr is to execute
call mysql.rds_set_configuration('binlog retention hours', 24)
on the server.
Split server roles
Maxwell uses MySQL for 3 different functions:
- A host to store the captured schema in (
--host
). - A host to replicate from (
--replication_host
). - A host to capture the schema from (
--schema_host
).
Often, all three hosts are the same. host
and replication_host
should be different
if maxwell is chained off a replica. schema_host
should only be used when using the
maxscale replication proxy.
Multiple Maxwell Instances
Maxwell can operate with multiple instances running against a single master, in
different configurations. This can be useful if you wish to have producers
running in different configurations, for example producing different groups of
tables to different topics. Each instance of Maxwell must be configured with a
unique client_id
, in order to store unique binlog positions.
With MySQL 5.5 and below, each replicator (be it mysql, maxwell, whatever) must
also be configured with a unique replica_server_id
. This is a 32-bit integer
that corresponds to mysql's server_id
parameter. The value you configure
should be unique across all mysql and maxwell instances.