Archive for May, 2011

How we use Vagrant as a throwaway testing environment

by Oliver on Tuesday, May 31st, 2011.

As I mentioned in my presentation at PuppetCamp, one of the goals of our Puppet code testing was to run Puppet Faces-style compile tests on all of our modules. Since this requires a somewhat realistic deployment environment, a clean system, and our modules to be installed as packages (since that’s how we are distributing our own modules) it makes sense to run this stage in a throwaway virtual machine.

This is where Vagrant comes in. I won’t dive into what Vagrant is all about since the website describes it well, but instead try to focus on my testing methodology. Using Vagrant VMs is a natural fit for this use case, but there are a few aspects which also make it a bit difficult to do this kind of testing (probably irrespective of the tool you use):

  • VMs are typically set up on a private NATed network on the hypervisor machine. This has the added benefit of not requiring cooperation from the regular network, but means that you cannot straightforwardly connect to each VM as needed by your testing system.
  • Related to the previous point, your test jobs will be running on a Jenkins build master or slave, which may then need to SSH/RPC to the Vagrant/Virtualbox host, which will then need to SSH to the Vagrant VMs. Not very nice.
  • Vagrant has the same VM spin-up lag problems that the cloud has – you want a new instance to become available rapidly so testing completes fast. Alternatively you can pre-build your VMs but this introduces complexities in the testing system.
  • Related to the previous point, bugs/limitations in Vagrant/Virtualbox mean that you can’t spin up several VMs simultaneously due to some race conditions.
  • Your testing system exists outside the VMs – you want to get your code to test inside the VM.

These were probably the hardest problems I had to solve (aside from just writing the Rake tasks to wrap around Puppet Faces). I’ll break these down into sections and describe the solutions. I won’t go into the usual nightmare of how to get an appropriately recent version of Ruby/Rubygems etc onto a CentOS 5.5 machine so that you can actually run Vagrant – that’s surely been solved many times already.

Networking

This was a relatively easy one. Vagrant VMs run on a host-only NATed network, but we need to somehow get the Jenkins jobs running something initially on the Vagrant host. Basically, I decided the easiest course of action would be to make the physical machine a Jenkins slave right from the start. So I have the master connecting to this machine as the jenkins user, which is also the user I am using to run all of the Vagrant stuff. Jenkins owns the VMs, so to speak, which makes connectivity and permissions on them much easier.

Once a job has started it is possible, by using the vagrant command, to connect to any of the VMs and run commands on them, destroy them, bring them back up again etc.

SSH and jobs within jobs

As I mentioned in a previous post, working with Net::SSH can be fiddly, but it is by far preferable to command line SSH invocation and all of the usual string-manipulation, input/output and error-handling nightmares that entails. The basic principle for running these test jobs is:

  1. Run the job from Jenkins, which runs an initial rake task on the Vagrant hypervisor machine (the test wrapper)
  2. The test wrapper calls the Vagrant libraries and sets up configuration for the VM in question.
  3. The build tools (previously checked out from version control to the Vagrant hypervisor machine) are copied to the Vagrant VM’s shared directory on the hypervisor. They are now available to the VM as well.
  4. A Net::SSH call is made to the VM, which calls the “real” test task from the build tools in the shared folder.
  5. Testing runs in the VM.
  6. Net::SSH call to the VM finishes, and the VM is destroyed.

Now we only really have a single RPC via Net::SSH which is relatively easy to manage programmatically. Of course, we lose granularity in the output since we are running through an SSH session. We cannot raise exceptions or pass structured data – but have to rely on exit codes and STDOUT/STDERR. So far this doesn’t seem like a big limitation in this use case.

VM Lifetime

My first mental approach was to have a number of VMs constantly running on the machine (roughly equal to the number of physical CPU cores), and a helper script or daemon that handled allocation of VMs to incoming test jobs. After each job completes, the helper would mark the VM as unavailable, destroy and recreate it from the base box image and when ready again return it to the available pool of VMs. This seemed like a nice idea at first but I realised the coding would take some time.

As soon as I had a basic idea for what the testing tasks themselves would look like, I decided to implement them for a single job first as a proof of concept. It was hard-coded to a single VM which made things much simpler, and it just worked. Pretty soon I moved on to creating the rest of my test jobs and had to find a way to handle the parallel builds. The simplest possible solution (when using Jenkins, at least) is to prepare a fixed number of VMs (say 8, to match the physical number of cores) and tie each one to a specific build executor of the Jenkins slave (you obviously need to also configure Jenkins to use no more than this number of executors on the slave).

Thus, as long as your create/destroy sequences work (hint: put them in a begin/rescue block in the ensure section) then you will have no problem running jobs continually – each new job run gets a fresh instance of the VM, tied to that executor. The problem that I ran into here was the bug/limitation of Vagrant and/or Virtualbox that prevents multiple simultaneous VM creations. Inevitably if your number of jobs exceeds your number of build executors and the jobs are doing similar things, they will all finish at similar times and attempt to cycle the VMs at the same time, which inevitably fails.

It seemed like I was back at square one, looking at having a smart helper script which automated VM recycling. I actually implemented locking around what I judged to be the critical code section where the VMs were created but I continued to experience problems. In the end, Patrick Debois of DevOpsDays fame made a suggestion to keep all the VMs running and use his sahara add-on to Vagrant in order to do lightweight snapshots/rollbacks on the VMs rather than a full-blown create/destroy between tests. Now the general run procedure looks like:

  1. Start wrapper task on Vagrant hypervisor machine
  2. Call real test task through Net::SSH to VM
  3. After Net::SSH call completes, run vagrant sandbox rollback on VM

The rollback takes a matter of seconds (the snapshot is generated just once, when I first create each VM on hypervisor boot), and now each compile job takes generally about 45 seconds in total. This is far better than the < 10 minutes time goal I had originally set, and the system is working much better than I had originally imagined, despite it being effectively a “workaround” for my mental design.

Sharing code with the VM

A while back I forcibly separated out my build toolchain from my Puppet code, so that it would be possible to store our Puppet modules in any repository so that Dev teams could maintain their own modules but build them using our common toolchain. This has generally been a success, and as a bonus it also makes them pretty easy to share and copy around. In the case of Vagrant, Jenkins automatically checks out the build toolchain at the start of the job and calls the test wrapper task.

The test wrapper task then copies the checked-out build toolchain directory into the Vagrant VM’s shared directory (in my case, /vagrant/puppetvm#{ENV[‘BUILD_EXECUTOR’]} which is the directory that the VM’s Vagrantfile lives in). Inside the VM, the files are also visible at the same path. The basic mechanism for the test is now as follows:

  1. Copy the build tools to the VM shared directory
  2. Call Net::SSH to the VM and install the Rake gem
  3. Call Net::SSH to the VM and run rake on the “real” test task

Actually running the tests

So what does my “real” test task look like? This will look different for everyone, so I’ll just list the general steps:

  1. Set up Yum repository fragments so we have access to the Puppet packages, and the repository that contains our syntax-checked Puppet modules.
  2. Yum install puppet and supporting packages (puppet-interfaces and interface-utils which were pre-release parts of Puppet Faces we’ve been using a while) and the package of the Puppet module we wish to test.
  3. Run Puppet Faces compile tests on the module
  4. Exit with a reasonable exit code for the wrapper task to catch and use as the result of the job.

The hardest part of this all is making changes to the testing toolchain, since when you want to look at how something failed your VM has already been rolled back. This is just a sad fact of having to tie VMs to build executors, but we also can’t just leave failed builds lying around with their VMs (especially in the case of real test failures). If anything, it has strengthened the need to pull core logic out of the Rake tasks and into a separate set of build libraries that can be tested separately with Test::Unit or RSpec, but given the reliance of so much of the testing on shell interaction, it is difficult to test adequately (especially when you resort to mocking).

Tags: , , ,

Tuesday, May 31st, 2011 Tech 7 Comments

Video of talk from Puppet Camp 2011 Amsterdam

by Oliver on Friday, May 20th, 2011.

Enjoy.

Tags: , , ,

Friday, May 20th, 2011 Tech 1 Comment

How we use cucumber-puppet

by Oliver on Monday, May 16th, 2011.

I got a question recently by email in followup to my presentation at Puppet Camp in Amsterdam about how we use cucumber-puppet. I touched on the subject only briefly in my talk but what I did say is that it revolutionised my approach to Puppet in general. Don’t get too high an opinion of the tool from that statement! Behaviour-driven development in general was a new thing to me and did change my ways of thinking, but my opinions of it in conjunction with Puppet have changed over the months slightly.

Before I go into too much depth, let’s take a look at the tool and how it is used. To be fair, cucumber-puppet is a good tool (as is cucumber itself and cucumber-puppet’s cousin, cucumber-nagios). Typically you’ll start off by running cucumber-puppet-gen world in your Puppet code repository and let it generate the infrastructure necessary to start writing your own tests. Basically they are grouped into three main categories:

  • modules – where you actually write your high-level language tests
  • steps – the Ruby-language breakdowns to help cucumber-puppet turn natural-language requests into things to test in the Puppet catalog
  • support – cucumber-puppet settings and globals

As you might have noticed, you are actually directing cucumber-puppet with natural-language, which gets translated into native Ruby and applied as tests for various content in the Puppet catalog. It’s actually not that much magic. Let’s look at an example feature:


Feature: Base repositories
  In order to have a system that can install packages
  As a sysadmin
  I want all of the CentOS repositories to be available

  Scenario: CentOS yum repositories
    Given a node of class "yum::base"
   When I compile the catalog
    Then there should be a yum repository "Base"
    And there should be a yum repository "Updates"
    And there should be a yum repository "Extras"

Pretty cool huh? The most important aspect of this is that it is readable by humans. As you go on though, you’ll realise it is somewhat verbose and you are prone to much repetition. Anyway, let’s take a look at some of the steps that make this work. You’ll notice we said a node of class "yum::base". It’s not exactly a real node, we are just directing cucumber-puppet to compile a catalog that has just a single class in it – yum::base and treat it as if that is the entire node.


Given /^a node of class "([^\"]*)"$/ do |klass|
  @klass = klass
end

Then /^there should be a yum repository "([^\"]*)"$/ do |name|
  steps %Q{
    Then there should be a resource "Yum::Repo[#{name}]"
    And it should be "enabled"
  }
end

Then /^it should be "(enabled|disabled)"$/ do |bool|
  if bool == "enabled"
    fail unless @resource["enabled"] == "1"
  else
    fail unless @resource["enabled"] == "0"
  end
end

Then /^there should be a resource "([^\"]*)"$/ do |res|
  @resource = resource(res)
  fail "Resource #{res} was not defined" unless @resource
end

Those are all the steps necessary to make the previous feature work. They should be fairly clear even if you have no idea about Ruby or cucumber-puppet. Some important items to note:

  • Steps can call other steps, so that you have a useful abstraction mechanism between many different things that test resources in similar ways – e.g. for presence in the catalog.
  • Yes, not all the feature text is actually parsed. A lot of it is human-understandable, but computationally-useless fluff.
  • We have replaced the built-in Yum provider with our own (that is just a managed directory and template files in a defined type) – as the Yum provider famously doesn’t support purging.
  • …@resource[“enabled”] == “1” looks wrong, but that’s how Yum repositories represent settings so we mirror that here, even if it is not strictly boolean.

You do get a lot of steps for free if you run cucumber-puppet-gen world, like the first and last I’ve quoted, so you don’t have to come up with it all by yourself. This is the general style of testing with cucumber-puppet at least up to version 0.0.6 – very verbose, can duplicate your Puppet code in an almost 1:1 ratio (or even beyond it) but still a very useful tool for refactors etc. Starting with 0.1.0 it was possible to create a catalog policy – think of it as a bit closer to reality as instead of testing arbitrary aspects of your Puppet code, or fake machine definitions (or even real ones, with faked facts) you can now test some real facts from a real machine and be sure that the machine’s catalog compiles.

We’re not doing this just yet (I’ve been busy working on other aspects of our Puppet systems, but in theory it is not much work to get going. You do, however need some mechanism for transporting your real machine YAML files from your Puppetmasters to your testing environment for runs through cucumber-puppet. While this is definitely a step forward, it also gets into the territory that perhaps more people are considering – pre-compiling catalogs on your real Puppetmasters and checking for success/failure there. It also gives you the ability to check for differences between catalogs on the same machine when inputs change (i.e. your configuration data changes) – Puppet Faces will give you the ability to do this quite easily.

Another couple of cool things about cucumber-puppet before I sign off. Due to it being based on cucumber, it has the capacity for generating output in many different useful formats. For example, you can output a JUnit report (in XML format). Jenkins supports JUnit reports natively, so you can run your cucumber-puppet tests in a Jenkins job and have the JUnit test results integrated into the build result and history. Very cool.

Finally, since you are testing catalogs with cucumber-puppet you can make tests for just about anything in those catalogs. For example, if you are generating some application configuration using an ERB template and want to check that certain values have been correctly substituted, you can just test what has been generated as the file content:


Scenario: Proxy host and port have sensible defaults
  Given a node of class "mymodule::myapp"
  And we have loaded "test" settings
  And we have unset the fact "proxy_host"
  And we have unset the fact "proxy_port"
  When I compile the catalog
  Then there should be a file "/etc/myapp/config.properties"
  And the file should contain "proxy.port=-1"
  And the file should contain /proxy\.host=$/

----

Then /^the file should contain "(.*)"$/ do |text|
  fail "File parameter 'content' was not specified" if @resource["content"].nil?
  fail "Text content [#{text}] was not found" unless @resource["content"].include?(text)
end

Then /^the file should contain \/([^\"].*)\/$/ do |regex|
  fail "File parameter 'content' was not specified" if @resource["content"].nil?
  fail "Text regex [/#{regex}/] did not match" unless @resource["content"] =~ /#{regex}/
end

The complexity and coverage of your tests are only limited by your inquisitivity with respect to the Puppet catalog, and your Ruby sklils (both of which are easily developed).

Tags: , , ,

Monday, May 16th, 2011 Tech 2 Comments

reblog: Modeling Class Composition With Parameterized Classes

by Oliver on Wednesday, May 11th, 2011.

http://www.puppetlabs.com/blog/modeling-class-composition-with-parameterized-classes/

I’d like to point out a fine blog post by Dan Bode on parametrised class theory and application. No surprises, I’ve done a lot of work with Dan and our new Puppet implementations reflect this, but I find it to be a very logical way of working.

Tags:

Wednesday, May 11th, 2011 Tech No Comments

CI and Vagrant SSH sessions

by Oliver on Sunday, May 8th, 2011.

If you saw my talk in Amsterdam or read my slides (or even work with me, <shudder/>) you would know that one of my stages of Puppet module testing is compile tests using Puppet Faces (at least, a pre-release version of it since we started using it a while ago). I’m just putting the finishing touches on the system so that it uses dynamically provisioned VMs using Vagrant and one of the slightly fiddly bits is actually running stuff inside the VMs – you’d think this would be a pretty large use-case in terms of what Vagrant is for, but I’m not sure how much success is being had with it in reality.

Vagrant wraps around the Ruby Net::SSH library to provide SSH access into its VMs (which just use VirtualBox’s port forwarding, since all of the VMs are run on a NATed private network on the host). You can easily issue vagrant ssh from within a VM root and it will just work. You can even issue the secret vagrant ssh -e 'some command' line to run a command within the shell, but it will wrap around Net::SSH::Connection::Session.exec! which is completely synchronous (probably what you want) and saves up all the output until it has the exit code when the process has terminated (probably not what you want). Especially when the process takes a few minutes, and you are watching Jenkins’ job console output, you want to see what is going on.

Unfortunately Vagrant’s libraries don’t expose the full richness of Net::SSH, but even if they did you wouldn’t be much better off. Net::SSH gives you connection sessions in which you can issue multiple commands synchronously (as Vagrant typically does), or multiple commands asynchronously – and basically this equates to “assume they will run in parallel, in no particular order”. There is also no direct handling of output and return codes – you need to set up callbacks for these. What this all amounts to is a bit of hackery just to get line-by-line output for our Jenkins job, and capture the return codes of each command properly.

#!/usr/bin/env ruby                                                                     
require 'rubygems'                                                                      
require 'net/ssh'                                                                       

def ssh_command(session, cmd)
  session.open_channel do |channel|
    channel.exec(cmd) do |ch, success|                                                  
      ch.on_data do |ch2, data|
        $stdout.puts "STDOUT: #{data}"                                                  
      end
      ch.on_extended_data do |ch2, type, data|                                          
        $stderr.puts "STDERR: #{data}"                                                  
      end
      ch.on_request "exit-status" do |ch2, data|                                        
        return data.read_long                                                           
      end                                                                               
    end                                                                                 
  end
  session.loop                                                                          
end                                                                                     

Net::SSH.start 'localhost', 'vagrant' do |session|
  ['echo foo >&2','sleep 5','echo bar','/bin/false','echo baz'].each do |command|       
    if (rc = ssh_command(session,command)) != 0                                         
      puts "ERROR: #{command}"                                                          
      puts "RC: #{rc}"                                                                  
      session.close                                                                     
      exit(rc)                                                                          
    end                                                                                 
  end                                                                                   
end

What this should give you is:

$ ./ssh_test.rb 
STDERR: foo
STDOUT: bar
ERROR: /bin/false
RC: 1
$ echo $?

Vagrant 0.7.3 sets up a read-only accessor to the Net::SSH::Connection::Session object contained within the Vagrant::SSH object so it’s easy to just hook in to that and set up the code above to get slightly more flexible SSH access to the VM for our CI tasks:

commands = ['some','list','of','commands']

env = Vagrant::Environment.new(:cwd => VMDIR)
env.primary_vm.ssh.execute do |ssh|
  commands.each do |c|
    if (rc = ssh_command(ssh.session,c)) != 0
      puts "ERROR: #{c}"
      puts "RC: #{rc}"
      ssh.session.close
      exit(rc)                                    
    end
  end
end

Tags: , , ,

Sunday, May 8th, 2011 Tech No Comments

strace leaving your processes in “stopped” state?

by Oliver on Thursday, May 5th, 2011.

https://bugzilla.redhat.com/show_bug.cgi?id=590172

I can’t tell you how many times this has bitten me, and left me completely confused. Now I know it is just another bug… sigh.

Tags: , , ,

Thursday, May 5th, 2011 Tech No Comments