Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.
One of my favourite interview questions is about sparse files. I'm constantly surprised at how few people know about them, especially in this day when thin-provisioning of storage is so pervasive you can run into it just about anywhere. Most commonly though, candidates will respond back with something to the effect of "a file with holes in it" which is more or less correct. I think out of everyone I have interviewed this year, only one person was able to tell me how to create a sparse file on the command line, and nobody was able to tell me how you can find out the space they actually occupy.
For the record, here are the two commands. Firstly, to create a 2GB sparse file:
$ dd if=/dev/zero of=test2G count=0 bs=1M seek=2048 0+0 records in 0+0 records out 0 bytes (0 B) copied, 6.6628e-05 s, 0.0 kB/s
And secondly, to display how much space the file actually takes up:
$ ls -lahs test2G 0 -rw-r--r-- 1 ohookins ohookins 2.0G 2010-09-21 09:40 test2G
That's the actual size on disk in the left-most column. I always take slight pleasure in showing "actual terminal output" to candidates where I've created a 10TB file on my 128GB hard drive, and watch their faces as they try to figure out what is going on, although this is usually followed by disappointment as I realise it's another negative point for them on the interview. What can I say, I'm a BOFH interviewer.
A frequently painful item in the lives of MySQL administrators
everywhere is the shared InnoDB tablespace. This usually lives at
/var/lib/mysql/ibdata1 and can be anywhere from 10MB to many GB in
size depending on your dataset and configuration. The sadly default
option of keeping ALL InnoDB databases inside this file can make it grow
without bound, and that space cannot be claimed back - rows can be
deleted and reused for new insertions but the size of the file cannot be
shrunk on disk. I was thinking about this the other day and I started
wondering whether InnoDB might have enough smarts in it to give some of
this disk space back, via file "holes".
One database server I manage does in fact have
innodb_file_per_table enabled by sadly the common tablespace has
still grown. We take binary tarballs of the entire MySQL data directory
from this machine for a variety of purposes so the tarball is steadily
growing. I decided to run
ls -ls inside
/var/lib/mysql and saw
# ls -ls ibdata1 6137716 -rw-rw---- 1 mysql mysql 6278873088 Sep 21 09:04 ibdata1
That first number is how many allocated blocks are on disk from the file. You can quite easily find out the block size:
$ dumpe2fs -h /dev/mapper/localhost-root 2>/dev/null | grep '^Block size' Block size: 4096
Oh. It really doesn't seem like
ls is using the filesystem block
size. It must be using 1024 bytes. Multiplying the block count by 1024
we get 6285021184 bytes, which is actually MORE than it reported the
file size to be (by a whole 6148096 bytes... almost 6MB). That's a
My somewhat educated guess is that due to the behaviour of InnoDB
(periodically reaching a low-threshold of free space in the
file, and extending it by the
8MB) the data file is increased in size periodically, leaving the last
"end" block not completely filled. After we have increased the size
many hundreds of times we might end up with quite a few 4K blocks only
partially filled. This may explain that the allocated blocks total a
significantly greater (i.e. megabytes rather than kilobytes) amount than
the actual bytes of data in the file are using.
So much for smart sparse file usage, InnoDB.