ruby

Safety and discipline in coding

by Oliver on Tuesday, January 28th, 2014.

A couple of years ago now at least, at my previous employer, I was just starting to feel reasonably comfortable with Ruby. I can attribute a large part of that to having had a great boss who gave me the time to sink a bunch of time into coding up our IaaS and Config Management systems while still being effectively a Systems Administrator, and another large part to reading Metaprogramming Ruby. A lot of the knowledge from the book has now seeped out of my brain, but it is an excellent read and I thoroughly recommend it if you want to know more about Ruby. I think it was on recommendation from Cody Herriges or Ken Barber of Puppet Labs, maybe both – cheers, whoever it was!

Feeling like Ruby was simply the best thing, I was quite quickly cut down by other coworkers with more development experience than I when topics such as Object#method_missing and Object#send were brought up. To a non-Rubyist I can see how these look like gaping vulnerabilities just waiting to be abused, and to be fair they can be (and are abused a lot). Of course, they are also staples of metaprogramming and require discipline and awareness of how to use them safely in that paradigm.

Specifically around the case of Object#send we were comparing method visibility functionality between Ruby, Python and perhaps Java. Ruby seemed to be quite understandable in this regard but the obvious exception pointed out is that Object#send ignores all visibility rules. Arriving exceptionally late to the party now in 2014 (and in my defence, only as a result of some necessary code-diving and lack of using Ruby much recently) I notice that 1.9 and later includes Object#public_send in its API.

Again, since discipline and awareness should be core developer traits, I don’t think it is entirely necessary but it is nice that the API evolved to address the need for better use of built-in method visibility controls. As for me, hopefully I can reduce my cycle time of finding new language features down from years to just a few months šŸ™‚

Tags: ,

Tuesday, January 28th, 2014 Tech No Comments

Asynchronous MySQL queries with non-blocking readiness checks

by Oliver on Sunday, February 17th, 2013.

Well, despite my best intentions, here I am again writing Ruby. I decided to automate a small part of some data analysis I’ve had to do a few times, starting with the database queries themselves. Unfortunately the data is spread over several hosts and databases and the first implementation simply queried them serially. The next iteration used the mysql2 gem‘s asynchronous query functionality but still naively blocked on the results retrieval rather than polling the IOs to see when they could be read from.

It doesn’t actually add anything to my script to do this, but it seemed like a small learning opportunity and somewhat interesting so here is the guts of that code:


The code is pretty simple and the comments should reveal the intent of any confusing lines. The only part that was slightly irritating was receiving file descriptor numbers from Mysql2::Client#socket rather than the IO itself, hence having to re-open the same file descriptor.

In this case I haven’t done anything fancy after checking when the results are ready, but you can see how this could be trivially turned into a system for querying multiple backends for the same data and returning the fastest result which is a quite popular pattern at the moment.

Tags: , , , , ,

Sunday, February 17th, 2013 Tech 1 Comment

Another Personal Evolution – From Ruby to Go

by Oliver on Friday, January 4th, 2013.

Almost two years ago now, I wrote a post about how I was fed up with resorting to shell scripting as my knee-jerk reaction to computer problems. At the time, I had been attacking any problem that required more than a couple of commands at the prompt by writing a shell (usually BASH) script and hit major limitations that I really should have been solving with a legitimate programming language. I resolved to only resort to Ruby or Python and in that goal I’ve actually been very successful (although I’ve ended up using Ruby around 90% of the time and Python only 10% of the time, which I wish was a little more evenly distributed).

Now I feel as if there is another evolution happening which I need to apply myself to. As a side-effect of the kind of work I’ve been doing, Ruby is just not cutting it. I love the flexibility of it (even despite the numerous ways you can shoot yourself in the foot), and there are some really great libraries like the AWS Ruby SDK which I’ve been using a lot lately. However, when you start wanting to do highly parallelised or concurrent tasks (and this is an excellent talk on the subject), it all starts getting a bit painful. I dabbled in event-based programming last year with NodeJS but found the spaghetti callbacks a bit mind-bending. Similarly with Ruby and EventMachine the code can be less than perfectly understandable. Goliath makes the task somewhat easier (if you are writing a web-service), and em-synchrony follows a similar pattern with Ruby Fibers but they all fall down if you need to use any libraries which don’t make use of asynchronous IO. I briefly looked at Python’s Twisted framework but didn’t find it much better (although that may be an unfair statement, as I didn’t spend much time on it).

I tried a different approach recently and attempted to use the quite awesome JRuby and solve the problem with native threads and the power of the JVM, but hit similar problems with libraries just not working in JRuby. This seems to be a common problem still, unfortunately. The overall result is having no clear option from a Ruby point of view when attempting to make a high-performance application that is also readable and understandable. It’s a bit of a blanket statement, granted, and if I had more constraints on my development environment I might have persisted with one of the options above (there are certainly workarounds to most of the problems I’ve experienced).

Fortunately for me, I have a flexible working environment, buy-in with alternative languages is pretty good and I’m willing to learn something new. Go is a relatively new language, having only been around (publicly) for just over three years, but quite nicely fits my current needs. I won’t go into it technically, as it is all over the interwebs, but I find it relatively easy to read (even for a newbie), and similarly easy to write.

However, I find myself in the same situation I was almost two years ago: it will take some effort to stop the now familiar knee-jerk reaction – this time towards Ruby – and establish the new habit in using Go wherever possible. I’ve just finished up a recent small spare-time project which utilised Ruby so I have free rein to indulge in Go at every possible opportunity. It is scary, but also very exciting – just as it was declaring my intention to use only Ruby almost two years ago.

That’s not to say I’m going to use Go exclusively – I still have to finish up reading (and working) through Seven Languages in Seven Weeks. My intention is not to become a polyglot (I think that’s a bit beyond my capabilities), but I’d at least like to be reasonably proficient in at least one language that solves a given set of problems well. I found that niche with Ruby, and now I am hoping to find that niche with Go. If you haven’t tried it, I thoroughly recommend it.

Tags: , ,

Friday, January 4th, 2013 Tech 2 Comments

Personal off-site backups

by Oliver on Saturday, December 29th, 2012.

Unlike many, I’m actually a good boy and do backups of my personal data (for which I can mostly thank my obsessive-compulsive side). However, up until now I’ve been remiss in my duties to also take these backups off-site in case of fire, theft, acts of god or gods etc. Without a tape system or rotation of hard drives (not to mention an actual “off-site” site to store them), this ends up being a little tricky to pull off.

Some of my coworkers and colleagues make use of various online backup services, a lot of which are full-service offerings with a custom client or fixed workflow for performing the backups. At least one person I know backs up (or used to) to Amazon S3 directly; but even in the cheapest of their regions, the cost is significant for what could remain an effectively cold backup. It may be somewhat easier to swallow now that they have recently reduced their pricing across the board.

Glacier is a really interesting offering from Amazon that I’ve been playing with a bit recently, and while its price point is squarely aimed at businesses who want to back up really large amounts of data, it also makes a lot of sense for personal backups. Initially the interface was somewhat similar to what you would expect from a tape system – collect your files together as a vaguely linear archive and upload it with some checksum information. I was considering writing a small backup tool that would make backing up to Glacier reasonably simple but didn’t quite get around to it in time.

Fortunately for me, waiting paid off as they recently added support for transitioning S3 objects to Glacier automatically. This means you get to use the regular S3 interface for uploading and downloading individual objects/files, but allow the automatic archival mechanism to move them into Glacier for long-term storage. This actually makes the task of performing cost-effective remote backups ridiculously trivial but I still wrote a small tool to automate it a little bit.

Hence, glacier_backup. It just uses a bit of Ruby, the Amazon Ruby SDK (which is a very nice library, incidentally), ActiveRecord and progressbar. Basically, it just traverses directories you configure it with and uploads any readable file there to S3, after setting up a bucket of your choosing and setting a policy to transition all objects to Glacier immediately. Some metadata is stored locally using ActiveRecord, not because it is necessary (you can store a wealth of metadata on S3 objects themselves), but each S3 request costs something, so it’s helpful to avoid making requests if it is not necessary.

It’s not an amazing bit of code but it gets the job done, and it is somewhat satisfying to see the progress bar flying past as it archives my personal files up to the cloud. Give it a try, if you have a need for remote backups. Pull requests or features/issues are of course welcome, and I hope you find it useful!

Tags: , , , , ,

Saturday, December 29th, 2012 Tech No Comments

On Service SDKs and Language Support

by Oliver on Wednesday, November 21st, 2012.

As I’ve previously mentioned, I’ve been doing a lot of work recently with various aspects of AWS on a daily basis (or close to it). My primary language these days is still Ruby, but I’ve been labouring through the excellent Seven Languages in Seven Weeks book in the hope I can broaden my horizons somewhat. I’m fairly comfortable with Python, somewhat familiar with Javascript now after playing with NodeJS and I have a cursory ability still in C/C++ and Java but it has been over 10 years since I’ve done anything significant in any of those languages.

Suffice to say, I’m far from being a polyglot, but I know my current limitations. Go has been increasingly noticeable on my radar and I am starting to familiarise myself with it, but this has led me to a small realisation. When service providers (like Amazon in this case) are providing SDK support they typically will be catering to their largest consumer base. Internally they largely use Java and that shows by their 1st class support for that language and toolchain.

Using the example of Elastic Beanstalk and the language support it provides, you can quite easily determine their current (or recent) priorities. Java came first, with .NET and PHP following. Python came about half-way through this year and Ruby was only recently added. Their general-purpose SDKs are somewhat more limiting, only supporting Java, .NET, PHP and Ruby (outside of mobile platform support). These are reasonable, if middle-of-the-road options.

Today I was attempting to run some code against the Ruby SDK, using JRuby. The amount of work it has to do is significant, parallisable and doesn’t exactly fit Ruby’s poor native support (at least in MRI) for true concurrency. I’m not going to gain anything by rewriting in PHP, cannot consider .NET and Java is just not going to be a good use of my time. I feel like there is an impedance mismatch between this set of languages and the scale of what AWS supports.

You are supposed to be scaling up to large amounts of computing and storage to best take advantage of what AWS offers. Similarly, you best make use of the platform by highly parallelising your workload. The only vaguely relevant language from this point of view is Java, but it’s just not a desirable general-purpose language for many of us, especially if we want to enjoy low-friction development as so many newer languages provide.

To be more specific – languages like Go, Erlang (or perhaps more relevant, Elixir), Scala etc offer fantastic concurrency and more attractive development experiences but these are not going to be supported by the official SDKs. It makes perfect sense from the point of view of the size of the developer base, but from the point of view of picking the right tool for the job it doesn’t. Perhaps in a few years this paradigm of highly parallel computing will have gained momentum enough that these languages move to the mainstream (ok, Heroku supports Scala already) and we start to see more standard SDK support for them.

Tags: , , , ,

Wednesday, November 21st, 2012 Tech 1 Comment

Amazon S3 object deletions and Multi-Factor Authentication

by Oliver on Sunday, October 7th, 2012.

I’ve been using S3 a lot in the last couple of months, and with the Amazon SDK for Ruby it really is dead simple to work with (as well as all of the other AWS services the SDK supports currently). So simple in fact, that you could quite easily delete all of your objects with very little work indeed. I did some benchmarks and found that (with batch operations) it took around 3 minutes to delete ~75000 files in about a terabyte. Single threaded.

Parallelize that workload and you could drop everything in your S3 buckets within a matter of minutes for just about any number of objects. Needless to say, if a hacker gets your credentials an extraordinary amount of damage can be done very easily and in a very short amount of time. Given there is often a several hour lag in accesses being logged, you’ll probably not find out about such accesses until long after the fact. Another potential cause of deletions is of course human error (and this is generally way more probable). In both cases there is something you can do about it.

S3 buckets have supported versioning for well over two years now, and if you use SVN, Git, or some other version control system then you’ll already understand how it works. The access methods of plain objects and their versions do differ slightly but the principle ideas are the same (object access methods generally operate on only the latest, non-deleted version). With versioning you can already protect yourself against accidental deletion, since you can revert to the last non-deleted version at any time.

However there is nothing preventing you from deleting all versions of a file, and with it all traces that that file ever existed. This is an explicit departure from the analogy with source versioning systems, as any object with versions still present will continue to cost you real money (even if the latest version is a delete marker). So, you can add Multi-Factor Authentication to your API access to S3 and secure these version deletion operations.

This has existed in the web API for some time but I recently had a commit merged into the official SDK that allows you to enable MFA Delete on a bucket, and there is another one in flight which will allow you to actually use the multi-factor tokens in individual delete requests. The usage is slightly interesting so I thought I’d demonstrate how it is done in Ruby, and some thoughts on its potential use cases. If you want to use it now, you’ll have to pull down my branch (until the pull request is merged).

Enabling MFA

I won’t go into details about acquiring the actual MFA device as it is covered in sufficient detail in the official documentation but suffice it to say that you can buy an actual hardware TOTP token, or use Amazon’s or Google’s “virtual” MFA applications for iPhone or Android. Setting them up and associating them with an account is also fairly straightforward (as long as you are using the AWS console; the command line IAM tools are another matter altogether).

Setting up MFA Delete on your bucket is actually quite trivial:

require 'rubygems'
require 'aws-sdk'
s3 = AWS::S3.new(:access_key_id => 'XXXX', :secret_access_key => 'XXXX')
bucket = s3.buckets['my-test-bucket']
bucket.enable_versioning(:mfa_delete => 'Enable', :mfa => 'arn:aws:iam::123456789012:mfa/root-account-mfa-device 123456')

Behind the scenes, this doesn’t do much different to enabling versioning without MFA. It adds a new element to the XML request which requests that MFA Delete be enabled, and adds a header containing the MFA device serial number and current token number. Importantly (and this may trip you up if you have started using IAM access controls), only the owner of a bucket can enable/disable MFA Delete. In the case of a “standard” account and delegated IAM accounts under it, this will be the “standard” account (even if one of the sub-accounts was used to create the bucket).

Version Deletion with MFA

Now, it is still possible to delete objects but not versions. Version deletion looks much the same but requires the serial/token passed in if MFA Delete is enabled:

require 'rubygems'
require 'aws-sdk'
s3 = AWS::S3.new(:access_key_id => 'XXXX', :secret_access_key => 'XXXX')
bucket = s3.buckets['my-test-bucket']
bucket.versions['itHPX6m8na_sog0cAtkgP3QITEE8v5ij'].delete(:mfa => 'arn:aws:iam::123456789012:mfa/root-account-mfa-device 123456')

As mentioned above there are some limitations to this (as you’ve probably guessed):

  • Being a TOTP system, tokens can be used only once. That means you can delete a single version with a single token, no more. Given that on Google Authenticator and Gemalto physical TOTP devices a token is generated once every 30 seconds it may take up to a minute to completely eradicate all traces of an object that was deleted previously (original version + delete marker).
  • Following on from this, it is almost impossible to consider doing large numbers of deletions. There is a batch object deletion method inside of AWS::S3::ObjectCollection but this is not integrated with any of the MFA Delete mechanisms. Even then, you can only perform batches of 1000 deletions at a time.

As it stands, I’m not sure how practical it is. MFA involves an inherently human-oriented process as it is involves something you have rather than something you are or something you know (both of which are reasonably easily transcribed once into a computer). Given the access medium is an API designed for rapid, lightweight use there seems to be an impedance mismatch. Still, with some implementation to get the batch deletions working it would probably serve a lot of use cases still.

Are you using MFA Delete (through any of the native APIs or other language SDKs, or even 3rd-party apps)? I would love to hear about other peoples’ experiences with it – leave your comments below.

Tags: , , , , ,

Sunday, October 7th, 2012 Tech 6 Comments

RCov with RSpec > 2.6.0 not running tests

by Oliver on Friday, July 6th, 2012.

Just a quick note since this tripped me up yesterday. If you take a look at the changelog of RSpec between 2.6.0 and 2.6.1 you’ll see that it now “intelligently” senses when being used with RCov and requires the autorun file (which basically just sets the tests to start automatically after they have been set up).

At some point for one of our simple apps I upgraded all of the Gems in use to the latest version, which resulted in RSpec and its components moving to 2.10.x versions. The tests still ran as expected, but what I didn’t notice was the code coverage dropping significantly. I partially blame this on not checking the output of Jenkins frequently enough or making a dashboard of it on our central display, and partially on increasing code coverage. We had pretty poor coverage before, and after the refactor that led to the upgrade of various Gems we actually had a lot more coverage. The net result of added tests but RCov not running properly was that the coverage metrics didn’t change very much in the summary view.

Regardless, without require 'rspec/autorun' present in our Rakefile or spec_helper.rb I’m not entirely sure at first glance why the tests were running successfully before through RCov. As with a lot of issues, there’s just not enough time to debug a working system šŸ™‚ For now, adding require 'rspec/autorun' to our spec_helper.rb, which is required by all spec files, has resolved the problem and our code coverage is back up to reasonable levels!

Tags: , , ,

Friday, July 6th, 2012 Tech 1 Comment

More on Bundler and RPMs

by Oliver on Friday, June 22nd, 2012.

So I was going to post a comment on the original blog post which I linked to from here but Facebook connect was broken and I don’t feel like setting up yet another account </1stworldproblems>… but there was a slight development.

I attempted to use the same methodology we had already followed with the first app, to another app so that this one was also packaged using Bundler and RPM. Needing to confirm that all was well before I committed the changes I did some testing in a CentOS virtual machine in Vagrant. To be expected, with a deployment bundle of a decent few gems the package size comes out at around 15MB. I committed the changes and the produced RPM from the Jenkins build job was 50MB. Why?

Initially, I suspected subtle differences in Bundler gem versions, library path differences etc but these ended up being dead-ends. What was happening, however, was that the gems were being installed into apprepo/vendor/ruby/1.8, including the excluded groups. I am assuming this is a necessity for the tests and other build-time checks to run, but I certainly didn’t want them to be packaged with the RPM which can rely on just the gem cache.

As it turns out, Bundler has some “smart” code around user permissions – specifically around what commands you can run through sudo. A standard Vagrant box will have unrestricted sudo access for the vagrant user, so it can install gems anywhere. Bundler uses this fact to its advantage and will install them into the standard /usr/lib64/ruby/gems/1.8/gems/ path. Hence, when it comes time to package up the gems as an RPM, these files are not in the app build path and the RPM stays a slim 15MB.

In our build pipeline which uses a standard user account on a fairly normal CentOS install, the jenkins user has no such permissions and thus has no option but to install them into the vendor directory along with the other Bundler artifacts. The solution was simply to exclude this directory from being packaged, although I’m still not entirely sure why we didn’t hit this problem the first time around. Nevertheless, bearing in mind these few gotchas, we now have a system in place that makes it a snap to add more gems and maintain a well-packaged and stable application from development to production.

Tags: , , , ,

Friday, June 22nd, 2012 Tech 2 Comments

Bundler, gems and RPMs

by Oliver on Wednesday, June 6th, 2012.

Recently I was working with one of our valued Thoughtworkers on an application we were trying to not only develop in a sane way, but package and deploy to production with just as much sanity. The status quo seems to favour bundler on the development side, but RPMs on the production side (if you judge these decisions based on what developers and ops folk prefer, generally).

After a reasonable amount of WTFing, we actually managed to get it working reasonably well. If you have Ruby apps with gem dependencies and want to develop and push to production with equal ease I suggest you read the blog post on the subject by Philip Potter here: http://rhebus.posterous.com/rpm-ruby-and-bundler

Tags: , , , ,

Wednesday, June 6th, 2012 Tech 1 Comment

The Duck Always Bites Twice

by Oliver on Tuesday, May 8th, 2012.

These days I’m noticing myself saying more and more frequently that Duck Typing is great, except when it’s not.

An amusing issue that briefly cropped up this afternoon was when we failed to correctly negotiate a data structure inside of a Rake task. Consider the following basic task:

desc "a test task"
task :test, :glob do |t,args|
  if args[:glob].nil?
    args[:glob] = 'some default value'
  end
  puts args[:glob]
end


What kind of output would you expect would happen if you ran rake test right now? If you said nil you’d be right! That’s odd, I wonder what is going on here?

...
puts args
...


Some debugging code later… what is the output? That’s right, it’s an empty hash – {}.

You could forgive us for thinking it might behave as one. Anyway, needless to say we then tried args.class and it turns out to be a Rake::TaskArguments, which evidently decides to make the arguments immutable but in such a way that you never know about it.

What usually happens?

$ irb
irb(main):001:0> class Foo
irb(main):002:1> attr_reader :bar
irb(main):003:1> def initialize(value)
irb(main):004:2> @bar = value
irb(main):005:2> end
irb(main):006:1> end
=> nil
irb(main):007:0> f = Foo.new(5)
=> #
irb(main):008:0> f.bar
=> 5
irb(main):009:0> f.bar = 6
NoMethodError: undefined method `bar=' for #
	from (irb):9

If you’ve seen the WAT video then you know what’s coming next:

    def method_missing(sym, *args, &block)
      lookup(sym.to_sym)
    end

...

    protected
    
    def lookup(name)
      if @hash.has_key?(name)
        @hash[name]
      elsif ENV.has_key?(name.to_s)
        ENV[name.to_s]
      elsif ENV.has_key?(name.to_s.upcase)
        ENV[name.to_s.upcase]
      elsif @parent
        @parent.lookup(name)
      end
    end


To be fair, this is actually kinda cool. Not only can you do something like args.glob you can also do args[:pwd] or args.term or args.USERNAME.

Unfortunately it lets you do completely unexpected things as in the above example, which is handily translated into the symbol :[]= (which I like to call the Cookie Monster symbol), which doesn’t exist, returns nothing and throws away the value you attempted to assign to it. Because it is handled by method_missing, the additional value we supplied was accepted but not used, unlike any typical situation where it will cause a compile error.

Tags: , ,

Tuesday, May 8th, 2012 Tech No Comments