Well, despite my best intentions, here I am again writing Ruby. I decided to automate a small part of some data analysis I’ve had to do a few times, starting with the database queries themselves. Unfortunately the data is spread over several hosts and databases and the first implementation simply queried them serially. The next iteration used the mysql2 gem‘s asynchronous query functionality but still naively blocked on the results retrieval rather than polling the IOs to see when they could be read from.
It doesn’t actually add anything to my script to do this, but it seemed like a small learning opportunity and somewhat interesting so here is the guts of that code:
The code is pretty simple and the comments should reveal the intent of any confusing lines. The only part that was slightly irritating was receiving file descriptor numbers from Mysql2::Client#socket rather than the IO itself, hence having to re-open the same file descriptor.
In this case I haven’t done anything fancy after checking when the results are ready, but you can see how this could be trivially turned into a system for querying multiple backends for the same data and returning the fastest result which is a quite popular pattern at the moment.
I had a fairly frustrating time today trying to explain to someone that indeed the order of actions matters very much when trying to recover failed replication on a MySQL server. We had a good dump from the master taken with
--single-transaction --master-data=2 (yes, all InnoDB tables here), and had imported it successfully. However, at the critical moment of reading the binary log name and position from the comments at the start of the dump file, the operator in charge had mistakenly entered an old
CHANGE MASTER statement from the history and started the replication at an earlier position than it should have been.
The completely expected happened, and a short while later replication failed again on a duplicate insertion error. At this point there was some disagreement between myself and the operators nursing the system back to health. Their position was that we can simply start the replication from the correct point and let it continue safely from there. My position was that once you have restarted replication from an incorrect point in the binary logs, you can no longer be assured that your data is consistent with what is on the master.
Consider if you will a brief timeline of events:
|Timeline 1||Timeline 2|
|insert into foo values 1,2||delete from foo where bar > 2|
|insert into foo values 3,4||insert into foo values 1,2|
|delete from foo where bar > 2||insert into foo values 3,4|
OK, so that’s fairly nasty PseudoSQL but you get the idea. Regardless of which (contrived) timeline you judge to be the “correct” one, reordering them will lead to a different set of data at the end. In Timeline 1 you will have only values 1 and 2 in your table, whereas in Timeline 2 you will have 1 through to 4. Relying on the fact that you can skip past duplicate insertion errors will not save you from the inevitable updates or deletes that can also change the dataset in non-idempotent ways.
“But that’s a completely contrived example” you might very well say. “It’s unlikely something like that would happen on the real dataset” may be another phrase used. My answer to that is – as a sysadmin do you know 100% how the application works? Do you know every SQL query that will occur?
Consider Marty going back to 1955 in Back to the Future. The past is changed, and he has to spend a bunch of effort fixing things up so that he is eventually born, and still alive at least until 1985. He gets back to 1985 and a lot of things are not as he remembers. In order to ensure that it was just as he left it, he would have to ensure that every event from 1955 to 1985 happened just as it did in his own timeline. Similarly, to have any surety that your dataset will end up just “similar” to how it should be, you would have to trace through the entire binary log, fixing problems as you go (and even then the outcome would be doubtful).
There are far better uses of our time than making mistakes and pretending they don’t exist.
I’ve been hacking on a small Ruby on Rails project to summarise and then prettily display pmacct traffic data. It’s not an original idea by any stretch and in fact is directly inspired by a system we had at my last job. For obvious reasons I have to at least reimplement the idea myself, and while the original was written in Python using PostgreSQL, mine will be using Ruby and MySQL.
One small pain point I’ve already encountered is the lack of IP address functions in MySQL. PostgreSQL at least as far back as 7.4 has supported IP address functions and in the case when you need to be comparing addresses and summarising based on local subnets they are sorely missed. It really makes me sad, and wonder why even the latest MySQL does not have any support for these functions.
Granted, having built-in support for certain functions does not always help you. It is well known that you can often gain performance for your application by sorting records in the code rather than in the database. However, without even having the option there to allow you to make a comparison, it is impossible to say which way would be better. I know that in terms of convenience and LOC savings, I’d prefer the functions to be in the database engine.
One of my favourite interview questions is about sparse files. I’m constantly surprised at how few people know about them, especially in this day when thin-provisioning of storage is so pervasive you can run into it just about anywhere. Most commonly though, candidates will respond back with something to the effect of “a file with holes in it” which is more or less correct. I think out of everyone I have interviewed this year, only one person was able to tell me how to create a sparse file on the command line, and nobody was able to tell me how you can find out the space they actually occupy.
For the record, here are the two commands. Firstly, to create a 2GB sparse file:
$ dd if=/dev/zero of=test2G count=0 bs=1M seek=2048 0+0 records in 0+0 records out 0 bytes (0 B) copied, 6.6628e-05 s, 0.0 kB/s
And secondly, to display how much space the file actually takes up:
$ ls -lahs test2G 0 -rw-r--r-- 1 ohookins ohookins 2.0G 2010-09-21 09:40 test2G
That’s the actual size on disk in the left-most column. I always take slight pleasure in showing “actual terminal output” to candidates where I’ve created a 10TB file on my 128GB hard drive, and watch their faces as they try to figure out what is going on, although this is usually followed by disappointment as I realise it’s another negative point for them on the interview. What can I say, I’m a BOFH interviewer.
A frequently painful item in the lives of MySQL administrators everywhere is the shared InnoDB tablespace. This usually lives at /var/lib/mysql/ibdata1 and can be anywhere from 10MB to many GB in size depending on your dataset and configuration. The sadly default option of keeping ALL InnoDB databases inside this file can make it grow without bound, and that space cannot be claimed back – rows can be deleted and reused for new insertions but the size of the file cannot be shrunk on disk. I was thinking about this the other day and I started wondering whether InnoDB might have enough smarts in it to give some of this disk space back, via file “holes”.
One database server I manage does in fact have innodb_file_per_table enabled by sadly the common tablespace has still grown. We take binary tarballs of the entire MySQL data directory from this machine for a variety of purposes so the tarball is steadily growing. I decided to run ls -ls inside /var/lib/mysql and saw this:
# ls -ls ibdata1 6137716 -rw-rw---- 1 mysql mysql 6278873088 Sep 21 09:04 ibdata1
That first number is how many allocated blocks are on disk from the file. You can quite easily find out the block size:
$ dumpe2fs -h /dev/mapper/localhost-root 2>/dev/null | grep '^Block size' Block size: 4096
Oh. It really doesn’t seem like ls is using the filesystem block size. It must be using 1024 bytes. Multiplying the block count by 1024 we get 6285021184 bytes, which is actually MORE than it reported the file size to be (by a whole 6148096 bytes… almost 6MB). That’s a WTF.
My somewhat educated guess is that due to the behaviour of InnoDB (periodically reaching a low-threshold of free space in the ibdata1 file, and extending it by the innodb_autoextend_increment, usually 8MB) the data file is increased in size periodically, leaving the last “end” block not completely filled. After we have increased the size many hundreds of times we might end up with quite a few 4K blocks only partially filled. This may explain that the allocated blocks total a significantly greater (i.e. megabytes rather than kilobytes) amount than the actual bytes of data in the file are using.
So much for smart sparse file usage, InnoDB.
- February 2017
- January 2017
- December 2016
- October 2016
- August 2016
- June 2016
- May 2016
- August 2015
- June 2015
- April 2015
- March 2015
- January 2015
- December 2014
- November 2014
- August 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- July 2013
- June 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- November 2010
- October 2010
- September 2010