Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.
I had a fairly frustrating time today trying to explain to someone that
indeed the order of actions matters very much when trying to recover
failed replication on a MySQL server. We had a good dump from the master
--single-transaction --master-data=2 (yes, all InnoDB
tables here), and had imported it successfully. However, at the critical
moment of reading the binary log name and position from the comments at
the start of the dump file, the operator in charge had mistakenly
entered an old
CHANGE MASTER statement from the history and started
the replication at an earlier position than it should have been.
The completely expected happened, and a short while later replication failed again on a duplicate insertion error. At this point there was some disagreement between myself and the operators nursing the system back to health. Their position was that we can simply start the replication from the correct point and let it continue safely from there. My position was that once you have restarted replication from an incorrect point in the binary logs, you can no longer be assured that your data is consistent with what is on the master.
Consider if you will a brief timeline of events:
Timeline 1 Timeline 2 insert into foo values 1,2 delete from foo where bar > 2 insert into foo values 3,4 insert into foo values 1,2 delete from foo where bar > 2 insert into foo values 3,4
OK, so that's fairly nasty PseudoSQL but you get the idea. Regardless of which (contrived) timeline you judge to be the "correct" one, reordering them will lead to a different set of data at the end. In Timeline 1 you will have only values 1 and 2 in your table, whereas in Timeline 2 you will have 1 through to 4. Relying on the fact that you can skip past duplicate insertion errors will not save you from the inevitable updates or deletes that can also change the dataset in non-idempotent ways.
"But that's a completely contrived example" you might very well say. "It's unlikely something like that would happen on the real dataset" may be another phrase used. My answer to that is - as a sysadmin do you know 100% how the application works? Do you know every SQL query that will occur?
Consider Marty going back to 1955 in Back to the Future. The past is changed, and he has to spend a bunch of effort fixing things up so that he is eventually born, and still alive at least until 1985. He gets back to 1985 and a lot of things are not as he remembers. In order to ensure that it was just as he left it, he would have to ensure that every event from 1955 to 1985 happened just as it did in his own timeline. Similarly, to have any surety that your dataset will end up just "similar" to how it should be, you would have to trace through the entire binary log, fixing problems as you go (and even then the outcome would be doubtful).
There are far better uses of our time than making mistakes and pretending they don't exist.