adrift on a cosmic ocean

Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.

I/O redirection "optimizations"

Posted by Oliver on the 12th of September, 2010 in category Tech
Tagged with: data migrationddlinuxlvmredirectionshell

Quite a while back, I had to migrate a few terabytes of data from one machine to another. Not that special a task, and certainly a few terabytes is not that much but at the time it was a reasonable amount and even over 1Gbps network it can take some time. Fortunately it was not time critical and I could take the server in question down for a while to facilitate the migration. The data in question was a number of discrete filesystems on a bunch of LVM logical volumes, thus I was able to basically just recreate the LVs on the destination and do a straight bit copy.

That all being said, I still wanted it to complete quickly! After eradicating the usual readahead settings being set too low for sequential reads from the source, the copy occurred more or less as expected, and I kept a watchful eye on iostat. This is where things got a bit strange, as I noticed identical read and write values coming back from the destination LV. The basic formula of the copy was as follows:

# source
for i in /dev/VolumeGroup/*; do
    LE=`lvdisplay $i | grep "Current LE" | awk '{print $NF}'`
    NAME=`basename $i`
    echo "${NAME}:${LE}" | nc newmachine 30001
    sleep 10
    dd if=$i bs=4M | nc newmachine 30000
    sleep 10
done
echo "DONE:0" | nc newmachine 30001

# destination
while true
do
    INFO=`nc -l 30001`
    NAME=`echo $INFO | cut -f1 -d:`
    LE=`echo $INFO | cut -f2 -d:`
    if [ $NAME == "DONE" ]
    then
        break
    fi
    lvcreate -l $LE -n $NAME /dev/VolumeGroup
    nc -l 30000 > /dev/VolumeGroup/$NAME
done

Unfortunately I don't have the actual code around, so the above is only an off-the-top-of-my-head approximation, but you should get the idea:

  • Loop over our logical volumes we want to migrate over to the new machine, determining name and number of logical extents (yes, we have to do some extra work if logical extent size differs between source and destination).
  • Pipe the number of LEs and the name of the LV to the destination over a "control" channel so that the new LV can be created, and wait a few seconds for this to take place.
  • Read out the source LV with a reasonable block size, and pipe it over to the destination where it is piped directly into the new LV. I may have added an intermediate stage of dd to ensure an output block size of 4MB as well, but my memory fails me.

So, as I mentioned, at this point I noticed that not only was data being written to the destination LV (as you would expect) but a corresponding amount was being simultaneously read from it. I was not able to resolve this discrepancy at the time, although I suspected perhaps some intelligence in part of the redirection on the output side was trying to determine which blocks actually needed overwriting.

A couple of months ago I spotted this post in Chris Siebenmann's blog which may explain it. He has certainly run into a similar confounding case of system "intelligence".

© 2010-2018 Oliver Hookins and Angela Collins