Uncanny Recursions that happen every now and then!

Apache Kafka Producer Benchmarks - Java vs. Jython vs. Python

v1.0

Note: This post is open to suggestions that can help achieve fairer results with these benchmarks. It is a versioned post and I would be incrementing it if anything related to these benchmarks changes. Changes can be tracked here as well.

In my previous post, I wrote about how we can interface Jython with Kafka 0.8.x and use Java consumer clients directly with Python code. As a followup to that, I got curious about what would be the performance difference between Java, Jython and Python clients. In this post, I am publishing some of the benchmarks that I have been doing with Java, Jython and Python producers.

Acknowledgement: I was further inspired by Jay Kreps’s amazing blog post on Kafka 0.8.1 benchmarks and a lot of context around my own benchmakrs have been borrowed from his post. It would be a good idea to read his post first before going through the rest of it here.

All of these benchmarks were run on AWS EC2 instances with the following specifications:

  • Three m4.large shared tenancy Kafka broker servers as a cluster with 128 GiB EBS Magnetic storage each.
  • One m4.xlarge shared tenancy Zookeeper server for the whole Kafka cluster with 128 GiB EBS Magnetic storage.

I am not aware of any reasons where a Zookeeper cluster would perform significantly better than a single Zookeper server particularly for this configuration, so I went for a single server instead. For these benchmarks, Zookeer server was also used to run the producers for each language choice.

I kept Kafka and Zookeeper server configurations to mostly the default ones defined in server.properties and zookeeper.properties respectively. In particular, none of the server configurations were tweeked for these benchmarks.

You can see the server configurations here.

For each of these benchmarks, message size was kept small at 100 bytes for exactly the same reason as cited by Jay Kreps.

Following are the build versions for the tools used for running these benchmarks:

  • Kafka: 0.9.0.1
  • Java: Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
  • Jython: 2.7.0
  • Python 2.7.6
  • kafka-python: 1.0.2
  • Python futures module: 3.0.5

I divided the benchmarks into three different cases with each case recording results for all the runs. Before going into the results, lets be clear about the following:

  • Case: Case is the configuration for running the benchmarks. For example, producing on a topic with no replication is a case.
  • Run: The choice of language for running a benchmark is a run. For example, a Java producer benchmark is a run.
  • Pure Java: Java code running official Apacha Kafka producer API. See implementation.
  • Java interfaced Jython: Jython code is just used to call producer code. The actual implementation is still in Java. See implementation.
  • Pure Jython: Where possible, everything is written in Python code using Python standard libraries, except the actual internal producer implementation which is imported from official Kafka libraries. See implementation.
  • Pure Python: Pure Python code that uses kafka-python client for Kafka. See implementation.

These benchmarks were run for the following cases:

  1. Single producer (single thread) producing on a topic with no replication and 6 partitions.
  2. Single producer (single thread) producing asynchronousyly on a topic with replication factor of 3 and 6 partitions.
  3. Single producer (single thread) producing syncronously on a topic with replication factor of 3 and 6 partitions.

In this context, a producer running in async mode would mean that the number of acknowledgments the producer requires the leader to have received from its followers before considering a request complete is 0. Sync mode is where the leader would wait on all the followers to acknowledge before acknowledging it to the the producer.

Single producer producing on a topic with no replication and 6 partitions.

Results are tabulated below for this case. I was shocked to see such a stark difference between kafka-python and official Java producer clients. And as expected, Java and Java interfaced Jython producers showed very similar results.

Results:

  No. of records sent records/sec MB/sec avg latency (ms) max latency (ms) 50th (ms) 95th (ms) 99th (ms) 99.9th (ms)
Pure Java 50000000 747417.671943 71.28 429.34 2828.00 265 1251 1817 2497
Java interfaced Jython 50000000 753046.071359 71.82 310.54 2189.00 1 1416 1798 1798
Pure Jython 50000000 117115.217951 11.1689775421 53.95983326 4088.0 2 328 1108 3262
Pure Python 50000000 9707.5761892 0.92578660862 25.82151426 1334.0 26 44 54 226

Single producer producing asynchronousyly on a topic with replication factor of 3 and 6 partitions.

Not surprisingly, Java and Java interfaced Jython producers fared similarly here as well. It is also interesting to note that results for Pure Jython and Pure Python producers are not much different from the previous respective runs. My assumption is that in this case, they are mostly limited by the speed of code execution rather than producing to a topic with 3x replication. What surprises me is the fact that Java producer in this case produced half as many records/sec compared to the run in previous case, although it was run in async mode. This in contrast with the results for Kafka 0.8.1 in Jay Kreps’s post.

Results:

  No. of records sent records/sec MB/sec avg latency (ms) max latency (ms) 50th (ms) 95th (ms) 99th (ms) 99.9th (ms)
Pure Java 50000000 427051.126561 40.73 1160.60 7428.00 459 4282 6288 7266
Java interfaced Jython 50000000 446791.589595 42.61 1032.90 5925.00 219 4602 5595 5852
Pure Jython 50000000 118026.409589 11.2558755483 231.39137682 5179.0 6 982 1983 4590
Pure Python 50000000 9845.42251444 0.938932658619 26.58918002 1138.0 25 44 59 539

Single producer producing syncronously on a topic with replication factor of 3 and 6 partitions.

For this case, I increased the batch.size from 8196 to 64000 for each run to accomodate for sync mode.

Again, Pure Python and Pure Jython producers showed similar throughput, although max latencies are slightly higher than the previous run. It does seem that throughput in this case is mostly limited by code execution or kafka-python producer client implementation. Throughput for Pure Java and Java interfaced Jython has almost halved compared to the previous case which makes sense here as we ran these producers in sync mode.

Results:

  No. of records sent records/sec MB/sec avg latency (ms) max latency (ms) 50th (ms) 95th (ms) 99th (ms) 99.9th (ms)
Pure Java 50000000 205475.511429 19.60 2502.79 11165.00 1769 6741 9492 10640
Java interfaced Jython 50000000 192588.426976 18.37 2659.39 19267.00 751 9236 14518 18843
Pure Jython 50000000 110700.535126 10.5572257162 1220.20796466 7115.0 861 4770 6052 6747
Pure Python 49999231 9844.45746764 0.938840624584 39.2650510365 2359.0 28 57 452 1108

You can find everything related to these benchmarks here. I would love to hear some feedback on this while I am working on publishing results for consumer benchmarks in my upcoming post.


Interfacing Jython with Kafka 0.8.x

Note: A complementing repository for this post can be found at: https://github.com/mrafayaleem/kafka-jython

With the release of Kafka 0.9.0, the consumer API was redesigned to remove the dependency between consumer and Zookeeper. Prior to the 0.9.0 release, Kafka consumer was depenedent on Zookeeper for storing its offsets and the complex rebalancing logic was built right into the “high level” consumer. This was causing a couple of issues which are discussed here. Due to the issues invloving complex rebalance algoritihm within the consumer, some Kafka clients had no support for “coordinated consumption”. Essentially, it means that without “coordinated consumption”, you cannot have N consumers within the same consumer group consuming on the same topic which defeats one of the purpose of Kafka; to balance load across multiple consumers. I saw this problem while using kafka-python 0.9.2 client library with Kafka 0.8.x. While I was trying to setup multiple consumers within the same consumer group, I noticed that every message to that topic would be reproduced on each and every consumer within that same group, defeating the purpose of Kafka being used as a traditional queing system with a pool of consumers. The issue has been discussed on Github.

One of the solution that I managed to figure out without moving away from Python was to use Jython for writing consumers for Kafka 0.8.x. The other solution was to use PyKafka for Kafka 0.8.x as it supports coordinated consumption.

In this post, I would discuss about what you can do with Jython to resolve the Kafka 0.8.x issue and why it actually works. In a more traditional sense, this post may also work as a tutorial for interfacing Jython with Java.

Setup

You need to have the following installed on your system. The tutorial assumes that you are familiar with these tools.

  • Java SDK
  • Python
  • virtualenv and virtualenvwrapper
  • Kafka 0.9.x or Kafka 0.8.x (assuming you know how to setup and run Kafka)

I have setup a bare-bones repositiory for working with this tutorial here. To keep things at a minimum, Kafka 0.9.1 binaries and Jython 2.7.0 installer are included within this repo. This means that you can directly run Kafa and Zookeeper after cloning this.

Setting up a virtualenv for Jython

Jython is fully compatible with virtualenv and tools such as pip and setting a virtualenv with Jython as the interpretter is pretty straightforward.

Jython can be installed using the GUI or console. For GUI, execute the jar and follow the steps. For installing it via console, you can use the following command to start with:

java -jar jython_installer-2.7.0.jar --console

Make a note of the location where you have installed Jython.

cd into the directory where you cloned the repo and create a virtualenv using Jython as your interpretter by using the following:

mkvirtualenv -p /jython-installation-path/jython2.7.0/bin/jython -a kafka-jython kafka-jython

You should be in the repo directory right now with your virtualenv already activated.

The project layout is as follows:

.
├── bin
│   └── windows
├── build
├── config
├── examples
│   └── src
│       └── main
│           ├── java
│           │   └── kafkajython
│           └── python
│               └── consumers
├── libs
└── requirements
  • bin: Contains helper scripts from Kafka and other binaries.
  • build: This is where your compiled files would go.
  • config: Various kafka configs.
  • examples: Jython and Java code for this tutorial.
  • libs: Kafka jars which we will use as dependencies.
  • requirements: Python library dependencies.

Installing Python dependencies

Once in the repo directory, install all Python dependencies using:

pip install -r requirements/development.txt

Compiling source code

Since one of our examples depends on calling Java class directly from Jython, we need to compile it first using:

javac -cp ".:/your-directory/kafka-jython/libs/*" -d build examples/src/main/java/kafkajython/Consumer*

We tell java compiler to include all the dependencies in the lib directory while compiling and put the compiled files in the build directory.

A bit about interfacing Jython with Java

Essentially, there are two ways that you can write consumers for this case.

  1. Write everything in Java and call it directly from Jython.
  2. Write everything in Jython by importing from Java standard library and Kafka directly in your source code.

We would cover both in a while.

1 - Write everything in Java and call it directly from Jython

Lets look at some of the code for the consumers. This “high level” consumer example has been borrowed directly from here so do check it out for a more elaborate explanation.

Note: For backwards compatibility, Kafka 0.9.1 still supports the high level consumer API.

ConsumerTest class is a runnable that consumes messages from Kafka stream and waits (blocks) for new messages. ConsumerGroupExample is the entry point where we specify the number of concurrent consumers (within the same consumer group) to use when consuming a topic.

The main method on ConsumerGroupExample accepts an array of strings. In later part, you would see that we actually pass the array of strings through Jython.

ConsumerGroupExample.java

package kafkajython;

import kafka.consumer.ConsumerConfig;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class ConsumerGroupExample {
    private final ConsumerConnector consumer;
    private final String topic;
    private  ExecutorService executor;

    public ConsumerGroupExample(String a_zookeeper, String a_groupId, String a_topic) {
        consumer = kafka.consumer.Consumer.createJavaConsumerConnector(
                createConsumerConfig(a_zookeeper, a_groupId));
        this.topic = a_topic;
    }

    public void shutdown() {
        if (consumer != null) consumer.shutdown();
        if (executor != null) executor.shutdown();
        try {
            if (!executor.awaitTermination(5000, TimeUnit.MILLISECONDS)) {
                System.out.println("Timed out waiting for consumer threads to shut down, exiting uncleanly");
            }
        } catch (InterruptedException e) {
            System.out.println("Interrupted during shutdown, exiting uncleanly");
        }
    }

    public void run(int a_numThreads) {
        Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
        topicCountMap.put(topic, new Integer(a_numThreads));
        Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
        List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);

        // now launch all the threads
        //
        executor = Executors.newFixedThreadPool(a_numThreads);

        // now create an object to consume the messages
        //
        int threadNumber = 0;
        for (final KafkaStream stream : streams) {
            executor.submit(new ConsumerTest(stream, threadNumber));
            threadNumber++;
        }
    }

    private static ConsumerConfig createConsumerConfig(String a_zookeeper, String a_groupId) {
        Properties props = new Properties();
        props.put("zookeeper.connect", a_zookeeper);
        props.put("group.id", a_groupId);
        props.put("zookeeper.session.timeout.ms", "400");
        props.put("zookeeper.sync.time.ms", "200");
        props.put("auto.commit.interval.ms", "1000");

        return new ConsumerConfig(props);
    }

    public static void main(String[] args) {
        String zooKeeper = args[0];
        String groupId = args[1];
        String topic = args[2];
        int threads = Integer.parseInt(args[3]);

        ConsumerGroupExample example = new ConsumerGroupExample(zooKeeper, groupId, topic);
        example.run(threads);

        try {
            Thread.sleep(90000);
        } catch (InterruptedException ie) {

        }
        example.shutdown();
    }
}

ConsumerTest.java

package kafkajython;

import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;

public class ConsumerTest implements Runnable {
    private KafkaStream m_stream;
    private int m_threadNumber;

    public ConsumerTest(KafkaStream a_stream, int a_threadNumber) {
        m_threadNumber = a_threadNumber;
        m_stream = a_stream;
    }

    public void run() {
        ConsumerIterator<byte[], byte[]> it = m_stream.iterator();
        while (it.hasNext())
            System.out.println("Thread " + m_threadNumber + ": " + new String(it.next().message()));
        System.out.println("Shutting down Thread: " + m_threadNumber);
    }
}

Execute this “high level” java example using Jython:

jython -J-cp "/your-directory/Projects/kafka-jython/libs/*:/your-directory/Projects/kafka-jython/build:." examples/src/main/python/java_interfaced_jython_consumer.py

Note that we need to tell Jython about both dependencies; Kafka jars and .class files that we compiled earlier in this tutorial.

Now, when you produce messages on Kafka, you should see them printing in your shell.

Lets go through the Jython code in java_interfaced_jython_consumer.py now.

java_interfaced_jython_consumer.py

from kafkajython import ConsumerGroupExample


def run():
    # List of arguments for initializing the consumer
    args = [
        "localhost:2181",
        "test-group",
        "another-replicated-topic",
        "3"
    ]
    ConsumerGroupExample.main(args)

if __name__ == '__main__':
    run()

Notice how the Java classes that we compiled earlier can be directly imported here. Since those Java classes were packaged in kafkajython namespace, it is necessary to do the import from the same package.

In the run function here, we pass a list of strings to the main method of ConsumerGroupExample that is written in Java.

Lets see how we can write this example purely in Jython.

2 - Write everything in Jython by importing from Java standard library and Kafka directly in your source code

Following is the code for “high level” consumer written entirely in Jython:

group.py

from concurrent.futures import ThreadPoolExecutor

from java.util import Properties
from java.util import HashMap
from java.lang import String

from kafka.consumer import ConsumerConfig
from kafka.consumer import Consumer


class HighLevelConsumer(object):

    def __init__(self, zookeeper, group_id, topic, thread_count=1, callback=lambda x, y: (x, y)):
        self.consumer = Consumer.createJavaConsumerConnector(
            self._create_consumer_config(zookeeper, group_id)
        )
        self.topic = topic
        self.thread_count = thread_count
        self.callback = callback

    def consume(self):
        topic_count_map = HashMap()
        topic_count_map.put(self.topic, self.thread_count)
        consumer_map = self.consumer.createMessageStreams(topic_count_map)
        streams = consumer_map.get(self.topic)

        with ThreadPoolExecutor(max_workers=self.thread_count) as executor:
            futures = []
            for i, stream in enumerate(streams):
                futures.append(executor.submit(self._decorate(self.callback, i, stream)))

            for future in futures:
                future.result()

    @staticmethod
    def _decorate(callback, thread, stream):
        def decorated():
            it = stream.iterator()
            while it.hasNext():
                callback(thread, String(it.next().message()))

        return decorated

    @staticmethod
    def _create_consumer_config(zookeeper, group_id):
        props = Properties()
        props.put("zookeeper.connect", zookeeper)
        props.put("group.id", group_id)
        props.put("zookeeper.session.timeout.ms", "400")
        props.put("zookeeper.sync.time.ms", "200")
        props.put("auto.commit.interval.ms", "1000")

        return ConsumerConfig(props)

You can see how trivial it is to use Java libraries and other Java specific language constructs in Jython. Moreover, the code doesn’t look any different from traditional Python. However, notice the explicit Java String import: from java.lang import String. In Java, you never need to do this because java.lang is auto-imported. This is not the case with Jython.

Following is the entry point for the Jython consumer.

pure_jython_consumer.py

from consumers.group import HighLevelConsumer


def process_message(thread, message):
    print str(thread) + ': ' + str(message)


def run():
    consumer = HighLevelConsumer(
            'localhost:2181', 'unknown', 'another-replicated-topic', 3, callback=process_message)
    consumer.consume()

if __name__ == '__main__':
    run()

The Jython implementation is more or less the same as the one written in Java. One difference is that I have used a backport of Python 3.2 futures package to create thread pool instead of using concurrent utils from Java. The other difference is that the Jython consumer allows you to pass functions as callbacks so you can manipulate the incoming messages.

Lets run this using:

jython -J-cp "/Users/rafay/Projects/kafka-jython/libs/*" examples/src/main/python/pure_jython_consumer.py

You should notice incoming messages from the Kafka producer.

Important note: One very important mention in context of this post is that Jython threads are always mapped to Java threads. Jython actually lacks the global interpreter lock (GIL), which is an implementation detail of CPython. This means that Jython can actually give you better performance on mult-threaded compute-intensive tasks written in Python. You can read more about it here.

Conslusion

I think using Jython for coordinated consumption with Kafka 0.8.x is a good idea when:

  • You cannot move away from Python because of library dependencies but you still want the coordinated consumers.
  • Your Kafka infrastructure cannot migrate to Kafka 0.9.x (which is a requirement if you want to use new Kafka consumer clients) and you still want coordinated consumers written in Python.

In the long run, it would be better to just update your Kafka infrastructure to 0.9.x. You would definitely get better support and more features; such as the fact that latest version of kafka-python implements the new Kafka consumer client which supports coordinated consumers.

Conclusively, Jython has worked well for this problem. However, I am not aware of how well it would perform in a huge scale production environment with several consumers, consumer groups, topics and partitions.

Note: This post might contain some edits which can be tracked here.


Live MySQL Schema Changes on Amazon RDS with Percona - A Walkthrough

At dubizzle, we recently reached a roadblock on a project where we had to change schema on around 130 tables in our production database with master/slave replication. The simplest possible solution that could have been was to recreate same tables with new schemas, copy the data and update our application to point to the new ones. This couldn’t have been made possible without a significant downtime for the whole website which wasn’t acceptible. So, we leveraged Percona Toolkit for MySQL to do the job and we would like to share our learnings in this post.

What is Percona Toolkit for MySQL?

Percona Toolkit is a collection of open source command line tools to perform a variety of MySQL database tasks. Reconsider the problem that we had in question and how we could have solved it manually.

  • Recreate around 130 tables with updated schema.
  • Take the website down or put it in read only mode to ensure zero data discrepancy between old and new tables.
  • Copy everything from old tables to the new ones.
  • Rebuild foreign keys.
  • Rename new and old tables or update the app to point to the new ones.
  • Redeploy the website.

The other solution was to setup a complex set of triggers, copy/read/write operations and have a rollback plan. All of this sounds very risky to manage on your own without expecting any downtime and this is where Percona Toolkit fits in to do the job for you.

Percona Toolkit has a bunch of tools for various tasks. The one we are going to discuss here in this post is pt-online-schema-change.

Preparing for the big day

Simply testing the tool on your local machine before doing it on production is no plan at all. You have to have a similar replica of production at your disposal to test how the change would perform. For this reason, we experimented pt-online-schema-change on a production like set up which we call the staging environment. Making sure that each and every parameter/configuration on the staging database is exactly the same as production is a very crucial part of testing how the migration would perform in production.

A note about AWS RDS MySQL instances: AWS RDS doesn’t give you SUPER privileges on your MySQL instances so you would have to play around and tweek your DB Parameter Group. The most important parameter in context of this post is the log_bin_trust_function_creators. The default value for this in most of the cases is 0 (disabled).

Qouting from MySQL docs here: This variable applies when binary logging is enabled. It controls whether stored function creators can be trusted not to create stored functions that will cause unsafe events to be written to the binary log. If set to 0 (the default), users are not permitted to create or alter stored functions unless they have the SUPER privilege in addition to the CREATE ROUTINE or ALTER ROUTINE privilege. A setting of 0 also enforces the restriction that a function must be declared with the DETERMINISTIC characteristic, or with the READS SQL DATA or NO SQL characteristic. If the variable is set to 1, MySQL does not enforce these restrictions on stored function creation. This variable also applies to trigger creation.

Since pt-online-schema-change cannot get SUPER privilege on RDS instances, log_bin_trust_function_creators has to be enabled for trigger creation and other routines. You can read more about it at MySQL docs.

What you need to know about pt-online-schema-change

Earlier, we discussed how one of the solutions could be to setup a complex set of triggers and copy/read/write operations to perform schema changes. Internally, pt-online-schema-change tool works in pretty much the same way. Lets look at an overview of what it does for that.

  • Creates an empty copy of the old table with the new schema.
  • Creates triggers on the old tables to update corresponding rows in the new ones.
  • Copy all the records from old tables to the new one.
  • Rebuild relationships.
  • Swap old/new tables by doing atomic RENAME TABLE operation.
  • Drop the old tables (default behaviour).

Note: The operations listed are not necessarily performed in the same order.

A typical pt-online-schema-change command for performing schema change would look something like this:

pt-online-schema-change --dry-run --nocheck-replication-filters --recursion-method="dsn=D=<database>,t=dsns" --chunk-size=2000 --alter-foreign-keys-method=rebuild_constraints --alter 'add column other_id INT DEFAULT NULL, add column item_hash VARCHAR(255) DEFAULT NULL, add column json_data TEXT DEFAULT NULL, add column item_id VARCHAR(255) DEFAULT NULL, add column from_other_item TINYINT DEFAULT NULL' h=<dbhost>,D=<database>,t=<table_name>

Command options and explanations:

Note: Some of the description has been copied verbatim from pt-online-schema-change docs for berevity.

dry-run: Create and alter the new table, but do not create triggers, copy data, or replace the original table. A safe mechanism to see what actual migration might look like. Note that it cannot reproduce an exact production like migration scenario. Think of it as a mock test.

nocheck-replication-filters: Abort if any replication filter is set on any server. The tool looks for server options that filter replication, such as binlog_ignore_db and replicate_do_db. If it finds any such filters, it aborts with an error.

If the replicas are configured with any filtering options, you should be careful not to modify any databases or tables that exist on the master and not the replicas, because it could cause replication to fail. For more information on replication rules, see http://dev.mysql.com/doc/en/replication-rules.html target.

The default values for this option is yes so if you don’t intend to use it, make sure that there are no replication filters on your slave. Replication filters are rules to make decisions about whether to execute or ignore statements. For your slave, these rules define which statements received from the master needs to be executed or ignored. Now consider a case where you have a filter on your slave that says don’t execute any ALTER statement for table_a. When you do the schema change for table_a on master, the slave would never see that change. Eventually, this could cause the replication to fail after the RENAME operation. For this reason, the default value for this is yes. If you decide to change it to no, make yourself aware of the replication filters that you have on your slaves before you get into a situation where replication starts failing for one of your tables.

recursion-method: This specifies the methods that the tool uses to find slave hosts. Methods are as follows:

METHOD       USES
===========  ==================
processlist  SHOW PROCESSLIST
hosts        SHOW SLAVE HOSTS
dsn=DSN      DSNs from a table
none         Do not find slaves

However, for various reasons, your RDS instance might be configured to not give up the correct slave host information to pt-online-schema-change as this was the case with our setup. The most concrete way to do this is to create a DSN (Data Source Name) table for pt-online-schema-change to read information from. For this, create a table with the following structure:

CREATE TABLE `dsns` (`id` int(11) NOT NULL AUTO_INCREMENT, `parent_id` int(11) DEFAULT NULL, `dsn` varchar(255) NOT NULL, PRIMARY KEY (`id`));

After that, you can specify your slave hosts by creating entries for them like this:

INSERT INTO dsns(dsn) VALUES('h=<slave_host>,P=3306');

By specifying recursion-method="dsn=D=mydatabase,t=dsns", you are telling percona to find slaves in a table callled dsns in database mydatabase.

alter-foreign-keys-method: This is only required if you have child tables that reference the tables that are going to be changed as part of the schema change. The recommended method is rebuild_constraints which uses ALTER TABLE on child tables to drop and re-add foreign key refereneces that reference the new table. The other two riskier options are drop_swap and none and if you happen to use them, please make sure that you know the intricate details. You can read about them in pt-online-schema-change docs.

chunk-size: The tool performs copy operation in small chunks of data (a chunk is a collection of rows). This option governs the number of rows that are selected for the copy operation and overriding the default value would disable the dynamic chunk size adjustment behaviour. This option does not work with tables without primary keys or unique indexes because these are used for chunking the tables.

During the schema change, there could be a situation where the number of rows in a chunk on one of your replicas is more than what is there on master. This could happen because of replica lag and you would have to adjust your chunk-size for that. Note that a larger chunk size means more load on your database.

Following are some of the resources to look into how Percona Toolkit handles chunking.

Related options are: chunk-size-limit, chunk-time and alter-foreign-keys-method.

Additional Resources:

alter: The schema modification that you want to apply.

You can read more on rest of the options here: pt-online-schema-change docs

The Migration Plan:

  • Prepare a full fledged replica (call it staging) of your production environment with exact master/slave replicaion. Make sure that your staging is sufficiently populated with data.
  • Creating artificial load on your staging databases when testing live schema changes is highly recommended. You can use tools such as Selenium and JMeter to do that.
  • Make sure that DB Parameter Group in AWS RDS on prodction and staging are same and log_bin_trust_function_creators is set to 1.
  • Prepare a set of scripts to do dry runs and actual schema changes. This may include a list of tables and a shell script which would run pt-online-schema-change and store the logs for each run.
  • Make sure that you store logs for each and every run (staging and production both).
  • Do dry runs on staging and production to see the expected outcome.
  • Have someone from your DevOps team to help you out in case if things start falling apart during the production migration.
  • Choose an appropriate time do the migration. Ideally, this should be the least busiest time for your website.
  • Take backups of production before the migration in case you have to restore.

Log Sample Output:

Here is a sample log from one of our migrations.

Found 3 slaves:
  ip-some-number
  ip-some-number
  ip-some-number
Will check slave lag on:
  ip-some-number
  ip-some-number
  ip-some-number
Operation, tries, wait:
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `some_database`.`some_table`...
Creating new table...
Created new table some_database._some_table_new OK.
Altering new table...
Altered `some_database`.`_some_table_new` OK.
2015-10-21T02:35:03 Creating triggers...
2015-10-21T02:35:03 Created triggers OK.
2015-10-21T02:35:03 Copying approximately 10000 rows...
2015-10-21T02:35:05 Copied rows OK.
2015-10-21T02:35:05 Swapping tables...
2015-10-21T02:35:06 Swapped original and new tables OK.
2015-10-21T02:35:06 Dropping old table...
2015-10-21T02:35:06 Dropped old table `some_database`.`_some_table_old` OK.
2015-10-21T02:35:06 Dropping triggers...
2015-10-21T02:35:06 Dropped triggers OK.
Successfully altered `some_database`.`some_table`.

Errors seen on Production:

2015-10-19T12:24:52 Error copying rows from `some_database`.`some_table` to `some_database`.`_some_table_new`: 2015-10-19T12:24:52 Cannot copy table `some_database`.`some_table` because on the master it would be checksummed in one chunk but on these replicas it has too many rows:
  4130 rows on ip-some-number
The current chunk size limit is 4000 rows (chunk size=1000 * chunk size limit=4.0).

Resolution: Increase chunk size

Conclusion and Results:

pt-online-schema-change from Percona worked exceptionally well with us. The migration was tested very well on our staging environment and it worked in pretty much the same way on production.


Helpful commands:

  • To get list of tables with particular column:
SELECT DISTINCT TABLE_NAME  FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME NOT REGEXP "^test_.*(_archive)|(_new)|(_old)$" AND TABLE_NAME REGEXP "^test_.*" AND COLUMN_NAME = 'feed_id' AND TABLE_SCHEMA ='some_schema';
  • To find number of connections:
SHOW STATUS WHERE `variable_name` = 'Threads_connected';
  • To find if some table is locked:
SHOW OPEN TABLES WHERE `Table` LIKE '%[TABLE_NAME]%' AND `Database` LIKE '[DBNAME]' AND In_use > 0;
  • To find all foreign keys on a table:
SELECT TABLE_NAME,COLUMN_NAME,CONSTRAINT_NAME, REFERENCED_TABLE_NAME,REFERENCED_COLUMN_NAME from INFORMATION_SCHEMA.KEY_COLUMN_USAGE where REFERENCED_TABLE_NAME = '<table_name>';

Percona toolkit docs (2.2+)

Other resources:

Note: This post might contain some edits which can be tracked here.


Introducing Uncanny Recursions

My very first attempt at setting up a personal blog cum website which I am hoping to maintain for a really long time. This is set up using Github pages and a Jekyll theme called Hyde.

To know more about me and my work, you can follow me at:

You can also reach me out on my email: mrafayaleem[at]gmail.com

Thanks!