Asynchronous MySQL queries with non-blocking readiness checks

by Oliver on Sunday, February 17th, 2013.

Well, despite my best intentions, here I am again writing Ruby. I decided to automate a small part of some data analysis I’ve had to do a few times, starting with the database queries themselves. Unfortunately the data is spread over several hosts and databases and the first implementation simply queried them serially. The next iteration used the mysql2 gem‘s asynchronous query functionality but still naively blocked on the results retrieval rather than polling the IOs to see when they could be read from.

It doesn’t actually add anything to my script to do this, but it seemed like a small learning opportunity and somewhat interesting so here is the guts of that code:

The code is pretty simple and the comments should reveal the intent of any confusing lines. The only part that was slightly irritating was receiving file descriptor numbers from Mysql2::Client#socket rather than the IO itself, hence having to re-open the same file descriptor.

In this case I haven’t done anything fancy after checking when the results are ready, but you can see how this could be trivially turned into a system for querying multiple backends for the same data and returning the fastest result which is a quite popular pattern at the moment.

Tags: , , , , ,

Sunday, February 17th, 2013 Tech 1 Comment

MySQL replication and the idempotence myth

by Oliver on Tuesday, January 11th, 2011.

I had a fairly frustrating time today trying to explain to someone that indeed the order of actions matters very much when trying to recover failed replication on a MySQL server. We had a good dump from the master taken with --single-transaction --master-data=2 (yes, all InnoDB tables here), and had imported it successfully. However, at the critical moment of reading the binary log name and position from the comments at the start of the dump file, the operator in charge had mistakenly entered an old CHANGE MASTER statement from the history and started the replication at an earlier position than it should have been.

The completely expected happened, and a short while later replication failed again on a duplicate insertion error. At this point there was some disagreement between myself and the operators nursing the system back to health. Their position was that we can simply start the replication from the correct point and let it continue safely from there. My position was that once you have restarted replication from an incorrect point in the binary logs, you can no longer be assured that your data is consistent with what is on the master.

Consider if you will a brief timeline of events:

Timeline 1 Timeline 2
insert into foo values 1,2 delete from foo where bar > 2
insert into foo values 3,4 insert into foo values 1,2
delete from foo where bar > 2 insert into foo values 3,4

OK, so that’s fairly nasty PseudoSQL but you get the idea. Regardless of which (contrived) timeline you judge to be the “correct” one, reordering them will lead to a different set of data at the end. In Timeline 1 you will have only values 1 and 2 in your table, whereas in Timeline 2 you will have 1 through to 4. Relying on the fact that you can skip past duplicate insertion errors will not save you from the inevitable updates or deletes that can also change the dataset in non-idempotent ways.

“But that’s a completely contrived example” you might very well say. “It’s unlikely something like that would happen on the real dataset” may be another phrase used. My answer to that is – as a sysadmin do you know 100% how the application works? Do you know every SQL query that will occur?

Consider Marty going back to 1955 in Back to the Future. The past is changed, and he has to spend a bunch of effort fixing things up so that he is eventually born, and still alive at least until 1985. He gets back to 1985 and a lot of things are not as he remembers. In order to ensure that it was just as he left it, he would have to ensure that every event from 1955 to 1985 happened just as it did in his own timeline. Similarly, to have any surety that your dataset will end up just “similar” to how it should be, you would have to trace through the entire binary log, fixing problems as you go (and even then the outcome would be doubtful).

There are far better uses of our time than making mistakes and pretending they don’t exist.

Tags: , ,

Tuesday, January 11th, 2011 Tech No Comments

Reasons to love PostgreSQL

by Oliver on Sunday, November 21st, 2010.

I’ve been hacking on a small Ruby on Rails project to summarise and then prettily display pmacct traffic data. It’s not an original idea by any stretch and in fact is directly inspired by a system we had at my last job. For obvious reasons I have to at least reimplement the idea myself, and while the original was written in Python using PostgreSQL, mine will be using Ruby and MySQL.

One small pain point I’ve already encountered is the lack of IP address functions in MySQL. PostgreSQL at least as far back as 7.4 has supported IP address functions and in the case when you need to be comparing addresses and summarising based on local subnets they are sorely missed. It really makes me sad, and wonder why even the latest MySQL does not have any support for these functions.

Granted, having built-in support for certain functions does not always help you. It is well known that you can often gain performance for your application by sorting records in the code rather than in the database. However, without even having the option there to allow you to make a comparison, it is impossible to say which way would be better. I know that in terms of convenience and LOC savings, I’d prefer the functions to be in the database engine.

Tags: , , , ,

Sunday, November 21st, 2010 Tech No Comments

Of sparse files and men

by Oliver on Friday, September 24th, 2010.

One of my favourite interview questions is about sparse files. I’m constantly surprised at how few people know about them, especially in this day when thin-provisioning of storage is so pervasive you can run into it just about anywhere. Most commonly though, candidates will respond back with something to the effect of “a file with holes in it” which is more or less correct. I think out of everyone I have interviewed this year, only one person was able to tell me how to create a sparse file on the command line, and nobody was able to tell me how you can find out the space they actually occupy.

For the record, here are the two commands. Firstly, to create a 2GB sparse file:

$ dd if=/dev/zero of=test2G count=0 bs=1M seek=2048
0+0 records in
0+0 records out
0 bytes (0 B) copied, 6.6628e-05 s, 0.0 kB/s

And secondly, to display how much space the file actually takes up:

$ ls -lahs test2G
0 -rw-r--r-- 1 ohookins ohookins 2.0G 2010-09-21 09:40 test2G

That’s the actual size on disk in the left-most column. I always take slight pleasure in showing “actual terminal output” to candidates where I’ve created a 10TB file on my 128GB hard drive, and watch their faces as they try to figure out what is going on, although this is usually followed by disappointment as I realise it’s another negative point for them on the interview. What can I say, I’m a BOFH interviewer.

A frequently painful item in the lives of MySQL administrators everywhere is the shared InnoDB tablespace. This usually lives at /var/lib/mysql/ibdata1 and can be anywhere from 10MB to many GB in size depending on your dataset and configuration. The sadly default option of keeping ALL InnoDB databases inside this file can make it grow without bound, and that space cannot be claimed back – rows can be deleted and reused for new insertions but the size of the file cannot be shrunk on disk. I was thinking about this the other day and I started wondering whether InnoDB might have enough smarts in it to give some of this disk space back, via file “holes”.

One database server I manage does in fact have innodb_file_per_table enabled by sadly the common tablespace has still grown. We take binary tarballs of the entire MySQL data directory from this machine for a variety of purposes so the tarball is steadily growing. I decided to run ls -ls inside /var/lib/mysql and saw this:

# ls -ls ibdata1
6137716 -rw-rw---- 1 mysql mysql 6278873088 Sep 21 09:04 ibdata1

That first number is how many allocated blocks are on disk from the file. You can quite easily find out the block size:

$ dumpe2fs -h /dev/mapper/localhost-root 2>/dev/null | grep '^Block size'
Block size:               4096

Oh. It really doesn’t seem like ls is using the filesystem block size. It must be using 1024 bytes. Multiplying the block count by 1024 we get 6285021184 bytes, which is actually MORE than it reported the file size to be (by a whole 6148096 bytes… almost 6MB). That’s a WTF.

My somewhat educated guess is that due to the behaviour of InnoDB (periodically reaching a low-threshold of free space in the ibdata1 file, and extending it by the innodb_autoextend_increment, usually 8MB) the data file is increased in size periodically, leaving the last “end” block not completely filled. After we have increased the size many hundreds of times we might end up with quite a few 4K blocks only partially filled. This may explain that the allocated blocks total a significantly greater (i.e. megabytes rather than kilobytes) amount than the actual bytes of data in the file are using.

So much for smart sparse file usage, InnoDB.

Tags: , ,

Friday, September 24th, 2010 Tech 2 Comments

MySQL talk @ LCA08

by Oliver on Sunday, September 12th, 2010.

Here are some of the files from my MySQL talk at Linux Conf AU 2008, where I presented material on replicating, high-availability, load-balanced MySQL.


Tags: , , ,

Sunday, September 12th, 2010 Tech No Comments