adrift on a cosmic ocean

Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.

Of sparse files and men

Posted by Oliver on the 24th of September, 2010 in category Tech
Tagged with: innodbmysqlsparse files

One of my favourite interview questions is about sparse files. I'm constantly surprised at how few people know about them, especially in this day when thin-provisioning of storage is so pervasive you can run into it just about anywhere. Most commonly though, candidates will respond back with something to the effect of "a file with holes in it" which is more or less correct. I think out of everyone I have interviewed this year, only one person was able to tell me how to create a sparse file on the command line, and nobody was able to tell me how you can find out the space they actually occupy.

For the record, here are the two commands. Firstly, to create a 2GB sparse file:

$ dd if=/dev/zero of=test2G count=0 bs=1M seek=2048
0+0 records in
0+0 records out
0 bytes (0 B) copied, 6.6628e-05 s, 0.0 kB/s

And secondly, to display how much space the file actually takes up:

$ ls -lahs test2G
0 -rw-r--r-- 1 ohookins ohookins 2.0G 2010-09-21 09:40 test2G

That's the actual size on disk in the left-most column. I always take slight pleasure in showing "actual terminal output" to candidates where I've created a 10TB file on my 128GB hard drive, and watch their faces as they try to figure out what is going on, although this is usually followed by disappointment as I realise it's another negative point for them on the interview. What can I say, I'm a BOFH interviewer.

A frequently painful item in the lives of MySQL administrators everywhere is the shared InnoDB tablespace. This usually lives at /var/lib/mysql/ibdata1 and can be anywhere from 10MB to many GB in size depending on your dataset and configuration. The sadly default option of keeping ALL InnoDB databases inside this file can make it grow without bound, and that space cannot be claimed back - rows can be deleted and reused for new insertions but the size of the file cannot be shrunk on disk. I was thinking about this the other day and I started wondering whether InnoDB might have enough smarts in it to give some of this disk space back, via file "holes".

One database server I manage does in fact have innodb_file_per_table enabled by sadly the common tablespace has still grown. We take binary tarballs of the entire MySQL data directory from this machine for a variety of purposes so the tarball is steadily growing. I decided to run ls -ls inside /var/lib/mysql and saw this:

# ls -ls ibdata1
6137716 -rw-rw---- 1 mysql mysql 6278873088 Sep 21 09:04 ibdata1

That first number is how many allocated blocks are on disk from the file. You can quite easily find out the block size:

$ dumpe2fs -h /dev/mapper/localhost-root 2>/dev/null | grep '^Block size'
Block size:               4096

Oh. It really doesn't seem like ls is using the filesystem block size. It must be using 1024 bytes. Multiplying the block count by 1024 we get 6285021184 bytes, which is actually MORE than it reported the file size to be (by a whole 6148096 bytes... almost 6MB). That's a WTF.

My somewhat educated guess is that due to the behaviour of InnoDB (periodically reaching a low-threshold of free space in the ibdata1 file, and extending it by the innodb_autoextend_increment, usually 8MB) the data file is increased in size periodically, leaving the last "end" block not completely filled. After we have increased the size many hundreds of times we might end up with quite a few 4K blocks only partially filled. This may explain that the allocated blocks total a significantly greater (i.e. megabytes rather than kilobytes) amount than the actual bytes of data in the file are using.

So much for smart sparse file usage, InnoDB.

© 2010-2018 Oliver Hookins and Angela Collins