Archive for November, 2011

Exceptional circumstances

by Oliver on Thursday, November 24th, 2011.

I’m still building up to my article on how to properly mock calls to Rake’s sh to facilitate testing of your Rakefile tasks but I haven’t quite worked up the strength to trace the calls through the libraries and into Eigenclass wonderland. For the moment, I’ve got just enough fodder to write a bit about determinism in testing.

Since Friday, both a coworker and I have been hit by the same testing failure. Not the exact same one mind you as we are working on two different pieces of software but the same failure in that the order of our tests matter. Anyone experienced in TDD will tell you that you need to make sure your tests don’t depend on the order in which they run so that your units of testing don’t build upon previous assumptions and don’t fully test each unit in isolation. Easier said than done unfortunately.

It would be nice if we could test the cartesian product of all of our units in all possible orders to rule out any unplanned dependency but it is simply not feasible for anything but the smallest programs. In any case, our small programs managed to cause a small bit of hell just fine, thankyou! Enough babbling and onto the code:


desc 'Run tests'
RSpec::Core::RakeTask.new('test') do |t|
  t.pattern = 'spec/*_spec.rb'
end

This is a fairly standard task that sets up RSpec tests using RSpec’s Rake helpers. Not much to see here – except for the fact that my two test files run in opposite orders on different machines. I managed to rule out different Ruby versions (exactly the same, installed in both places by RVM), Gem versions (same of Rake, RSpec and all relevant dependencies), even the locale on both machines was identical to rule out sorting order differences in the filenames.

RSpec uses RSpec::Core::RakeTask#pattern to assemble a FileList with the pattern you have set. FileList (defined by Rake) basically uses Dir.glob to get its dirty work done:


# Add matching glob patterns.
    def add_matching(pattern)
      Dir[pattern].each do |fn|
        self << fn unless exclude?(fn)
      end
    end

Dir[] just aliases Dir.glob, and within the source of that you can find the following:


dp = readdir(dirp->dir);

readdir(3) just returns the directory entries in the order they are linked together, which is also not related to inode numbering but as best as I can tell is from outer leaf inwards (since the most recently created file is listed first).

Now I have some idea of why the testing order can be different, but I'm no closer to the cause of the problem - my tests succeed when run in one order and not in the opposite order. The errors look something like this:


  1) Nagios#new raises an exception when the command file is missing
     Failure/Error: expect { Nagios.new(@cmd_file, @status_file) }.to raise_error(NagiosFileError, /not found/)
       expected NagiosFileError with message matching /not found/, got #
     # ./spec/nagios_spec.rb:26

Hmm, a bare Exception object with no information about where it came from. I had a few suspicions:

  • I have no idea how to code Ruby
  • I changed the code in some subtle way and broke the tests legitimately
  • Some Mock object's lifetime is longer than expected and sticking around

This last idea seemed most plausible since I was able to put some debugging code into the constructor of the object and it was not being called at all. Mocha usefully has an unstub method which allows you to remove stubs on an object/class you had previously set up and return it to its previous state, but this seemed to be a no-go:


     Failure/Error: Nagios.unstub(:new)
       The method `new` was not stubbed or was already unstubbed

I installed the very useful ruby-debug and invoked that just before the failing tests started and did some poking around but wasn't able to find much (although that's probably more to do with lack of skill than the debugger lacking functionality). How to proceed? Unfortunately the easiest way forward seemed to be the brute force method - comment out tests in the preceding file until it starts working then narrow the field. Fortunately the problem uncovered itself immediately:


  # Change the existing mock to be something we can pick up in output
  before(:each) do
    Nagios.stubs(:new).returns('surprise !')
  end

Then in the test output:


 NoMethodError:
       undefined method `cmd_file' for "surprise !":String
     # ./spec/nagios_spec.rb:55

Hmm, that's no good. The stubbing is surviving between test files. TL;DR - It's good to read the documentation! RSpec supports several mocking frameworks - FlexMock, Mocha, RR, bring your own framework and of course, RSpec itself. I already knew of this before but since I've been working mostly with Test::Unit up until recently it didn't register in my brain whatsoever and I just reached for Mocha out of habit.

It turns out that if you use a different mocking framework but don't tell RSpec about it, weird stuff happens - that is to say, the kind of behaviour you see above. You add a small block to your test code something like this:


RSpec.configure do |config|
  config.mock_framework = :mocha
end

Once I had confirmed this was the actual problem and my tests were passing, I decided to rewrite all my mocking code to use RSpec mocking anyway - less gems, easier to maintain.

So the main takeaways from this experience are:

  • Read the documentation!
  • Ruby is powerful and frequently easy to use, but can still not be easy to interrogate when there are problems happening that don't seem to make sense.
  • Having source code on hand to see what your software is really doing is an awesome thing. Admittedly, the metaprogramming behind testing and mocking frameworks can be some mind-bending stuff, but the code is there when you want to dive in.
  • Testing can be frustrating, but awesome.
  • Ordering is not always the problem. Try to get your tests order-independent, but just like code coverage it can be a large time sink and the law of diminishing returns applies.

Tags: , , , , , , ,

Thursday, November 24th, 2011 Tech 1 Comment

Making a ‘complete’ service

by Oliver on Friday, November 18th, 2011.

One of the things that is pushed here a lot is the necessity to make all new applications really production-ready before being deployed as a service. There is of course the adage “If it’s not monitored, it doesn’t exist” but beyond that there are a lot of other little fiddly details just to make something run. I’m talking about things like init scripts, logging etc.

It’s very easy to get wrapped up writing code for some new idea you have or perhaps extending some existing codebase, and with most frameworks it is relatively easy to run your app from a terminal on your local workstation. What changes when you want to take it to a real server is generally (and I’m limiting this post to these topics):

  • packaging
  • service lifetime management
  • monitoring
  • logging

This is not an exhaustive list. When making these additions to my Sinagios app I felt more pain than expected, and suddenly it made sense why we tend to see a lot of applications that fall down so much in these areas – it’s really boring and stupid work. Seriously. It’s not fun, and so much of it could be automated away but sadly we don’t live in that kind of a world.

Packaging is largely a non-issue thanks to such tools as FPM (although it has some minor issues still), and it is all too simple to add some basic monitoring to a Sinatra app:


# Health check for monitoring systems
get '/v1/health/?' do
  # Just try to verify the command and status files look ok, rescue the
  # exception and allow the message to propagate to the output with a
  # reasonable error code.
  begin
    nagios = Nagios.new
  rescue Exception => detail
    body detail.message
    status 500
  else
    body 'OK'
    status 200
  end
end

The next pain point was generating the init script. I’d really like to see a somewhat more specific framework than the one currently in place – shell script is ubiquitous but sometimes it is just annoying. Sadly most apps still don’t fit into the generic functions available in /etc/init.d/functions and I found it took an hour or so to craft the file just right to get my app launching nicely with rackup. There should be some convention over placement and naming of entities like PID files, process names, what to do with standard in/out/error, run users, log/run/work directories etc etc. It could be so much easier, but it’s a problem for smarter people than I.

Finally I hit logging and this was a really irritating one, not least because the Rubyforge website was down all day and I couldn’t get to the documentation. In daemon mode, rackup will still output errors to STDERR so you have the option of either redirecting that at invocation time or in the config.ru file (and possibly later as well, I didn’t bother to try). Standard output (which in interactive mode will accept the access logging) ceases to offer anything useful as Rack wraps up logging in its CommonLogger class.

There are a multitude of different ways to skin this cat but the simplest seems to be telling Rack (through Sinatra) which logging “middleware” to use, in our case just a simple instance of Logger (effectively just appending to a file, but it can also add timestamps, priorities, and take care of rotation). Sadly the basic example fails still:


logger = Logger.new('/var/log/sinagios/access.log')
use Rack::CommonLogger, logger

This leads to:

!! Unexpected error while processing request: undefined method `write' for #<Logger:0x7f6f28d033d0>

But only if you are running non-daemonized!

That’s right – if you add this code to your app it will just break it and not leave a trace in any logs anywhere. You will have to run it interactively again to figure out the problem, although fortunately it is staring us right in the face. Rack::CommonLogger expects the logger interface to support the write method, which Logger does not at the moment. Let’s just alias it to the append operator:


Logger.class_eval { alias :write :'<<' }

This will have Logger just append the log entries without any additional formatting or timestamps as Rack::CommonLogger already adds these for us. I also added logrotate fragments to ensure the logs don’t consume the entire disk (although connecting to Scribe would have been better, but is a project for another day). This work has been somewhat enlightening but not exactly thrilling – I can see why it is such a chore for developers and frequently left out altogether. Now it is done though, I can at least reuse it but it would be nice to see some better system frameworks for these common and boring tasks for getting the real stuff running in production.

Tags: , , , , , ,

Friday, November 18th, 2011 Tech No Comments

Sinagios – a very simple RESTful API to Nagios downtime

by Oliver on Monday, November 14th, 2011.

Last week I spent a bit of time scratching an itch I’ve had with doing maintenance. We use Nagios for our monitoring (without any addons, plugins or frontends) and scheduling downtime is a pain. Not only that, but deleting downtime is impossible short of screenscraping or (shock) actually clicking in the user interface. When I’m done doing maintenance I want to delete all the downtime (regardless of if the window will naturally come to an end soon or not) because I want notifications to start sending immediately if there is a real problem.

Naturally I did some diligence beforehand as this is hardly a topic that hasn’t come up before, and sure enough some guy has implemented something similar but in Python. Nevertheless I wanted to do some coding so I present to you Sinagios.

It’s a pretty simple Sinatra app (and thus obviously Ruby), in fact it only does three things:

  • Lists currently scheduled downtime
  • Schedules new downtime
  • Deletes downtime

I also got to dive into a bit of RSpec and Rack-test since I’ve mostly gotten away with Test::Unit before and non-web-facing systems so it was an interesting change. Fairly soon I’ll be uploading a spec file for making RPMs and adding some Puppet recipes to deploy it so that it’s a more complete set of stuff.

Hopefully someone will find it useful!

Tags: , , ,

Monday, November 14th, 2011 Tech 1 Comment

DevOpsDays Göteborg Ignite talk videos up!

by Oliver on Tuesday, November 8th, 2011.

I’m trying to take every opportunity that presents itself to blab about what I’m working on, so I gave an Ignite talk at the recent DevOpsDays in Göteborg, Sweden. The videos are now up, and fortunately for you I spoke first so I’m right at the start of the video.

It was my first Ignite talk and my memory is always hopeless so there was a bit of slide confusion but it resolved itself quite nicely 😉

Enjoy!

Tags: , ,

Tuesday, November 8th, 2011 Tech No Comments