Learning the nuances of GCP and TCP keepalive

I started a new job a little while ago and I’m learning technologies that I was not exposed to at my previous employer. In fact, I took this job specifically knowing that I would be working with unfamiliar technologies.

At my previous job I supported MySQL and saw the myriad issues that customer encountered while running MySQL in their environments. One relatively recent issue came with MySQL 8.0.27 and Innodb ClusterSets. This is a new feature that allows you to create interconnected Innodb Clusters, and one of the steps in doing that is to CLONE a new primary host in the child cluster. The mysqlsh tool is used for this purpose, but when executing the CLONE from the primary side on GCP the operation would fail.

The problem was that the connection being used by mysqlsh was severed, but mysqlsh was never “informed” about this. The underlying cause seems to be the software defined nature of cloud computing wearing through the thin veneer. It was found that the connection would not be terminated if mysqlsh was run on the replica server (where data was being CLONEd to). I wasn’t directly handling this particular issue, simply advising another engineer and reviewing data.

Today I encountered this problem in my new job when a benchmark was creating indexes on a large table it would fail with a Lost connection...2013 error. After scrutinizing the MySQL source code for a while, and performing some strace operations of my own, I concluded that MySQL was not at fault and was not doing anything to cause a client connection to timeout.

I decided to adjust the Linux kernel sysctl variable net.ipv4.tcp_keepalive_time to 300 seconds and see if it had an effect, it did and the outcome was exactly what I’d hoped for. I did some further testing with select sleep(900) and found that GCP silently evicts the idle TCP connection right around 900 seconds since the command was issued.

Why does adjusting the tcp_keepalive_time make a difference? The MySQL client marks the socket connection as “keepalive” by setting the option SO_KEEPALIVE, this causes the Linux kernel to start sending “keepalive” packets ater the tcp_keepalive_time expires. The default value for tcp_keepalive_time is 7200 seconds and after that expires the kernel sends keepalives every 75 seconds to keep the channel open. Setting the tcp_keepalive_time to a value lower than the GCP eviction timeout will prevent connections that are waiting on a long running task from closing, such as index creation on a large table.

Using Docker to Create Pop-Up MySQL Instances

Pop-Up shops are those short lived stores at malls and other places, often times they are kiosks. They serve to satisfy temporal demands like nano quadcopters or engraved keychains. In this context you can create MySQL instances that are short lived, easily provisioned, and easily disposed of.

Imagine you are a developer, or the DBA who has to tell a developer when their code breaks, and you would like an easy way to validate code against the production schema, but not impact your production systems?

This recipe makes some assumptions:

  • You have a MySQL slave or a secondary Innodb Cluster instance to CLONE from
  • You are using MySQL 8.0.17 or later
  • You don’t have hundreds of gigabytes to terabytes of data

If you have a lot of data in your production environment, this won’t be a viable solution, but if your data is in the 10s of gigabytes, this could work for you.

I’m going to present 2 options: 1) A completely standalone transient instance of MySQL 2) A semi-persistent instance of MySQL that can live on an external encrypted SSD or other secured storage.

Continue reading “Using Docker to Create Pop-Up MySQL Instances”

MySQL 8 Network Backup Using Docker and CLONE

One of the shortcomings of MySQL GPL is that it does not come with a first party online backup solution. With the release of MySQL 8.0.17 the CLONE plugin was introduced, this essentially integrated online backup as a plugin to the MySQL Server.

The MySQL 8.0 Reference Manual describes how to use the CLONE plugin to perform local clones (backups) here: https://dev.mysql.com/doc/refman/8.0/en/clone-plugin-local.html

Doing local clones is incredibly useful and a really fast way of making an image backup. I would argue that the CLONE plugin is better for local image backups than competing solutions simply because the syntax is more brief and efforts were made to integrate CLONE into the server, thereby reducing the impact of performing CLONE operations.

The CLONE plugin can either clone to the server’s default data directory or to another directory specified in the CLONE command. I will demonstrate the latter usage for making online remote backups without modifying the data directory of the container.

Continue reading “MySQL 8 Network Backup Using Docker and CLONE”

Reducing the Impact of YouTube’s API Quota

I started redesigning my website a several weeks ago, my objective was to create a centralized hub for sharing written information, code, video, and photography. It was rather easy to solve most of those problems, and sharing my latest YouTube video was simple at first.

I had this niggling feeling that my new website was on the heavyweight side, after all it’s WordPress based and I had a few plugins. The annoying reCaptcha logo was popping up everywhere, even when it wasn’t used. After using the Coverage tab in Chrome and installing yet more WordPress plugins to trim the fat, I tried get it down to as small a footprint as I could. Then came the Google PageSpeed Insights. Sometimes we are blissfully unaware of our problems and go through life with blinders on, PageSpeed Insights simultaneously woke me and gave me yet another obsession to chase.

Continue reading “Reducing the Impact of YouTube’s API Quota”