Service SDKs and Language Support Part 2

by Oliver on Sunday, January 20th, 2013.

As I wrote previously, I found that the mismatch between the goals of large cloud services like Amazon Web Services and the languages they support slightly conflict with the notion of making highly concurrent and parallelised workflows.

Of course the obvious followup to that post (even embarrassingly obvious since I’ve been copiously mentioning Go so much recently) is to point out that Google’s App Engine is doing this right by supporting Go as a first-class language, even getting an SDK provided for several platforms.

I haven’t had a chance to use App Engine so far, but I’d like to in future. Unfortunately, Google’s suite of services is not nearly as rich as that provided in AWS right now but I’m sure they are working hard on achieving feature parity in order to pull more customers over from AWS.

Tags: , , ,

Sunday, January 20th, 2013 Tech No Comments

On Service SDKs and Language Support

by Oliver on Wednesday, November 21st, 2012.

As I’ve previously mentioned, I’ve been doing a lot of work recently with various aspects of AWS on a daily basis (or close to it). My primary language these days is still Ruby, but I’ve been labouring through the excellent Seven Languages in Seven Weeks book in the hope I can broaden my horizons somewhat. I’m fairly comfortable with Python, somewhat familiar with Javascript now after playing with NodeJS and I have a cursory ability still in C/C++ and Java but it has been over 10 years since I’ve done anything significant in any of those languages.

Suffice to say, I’m far from being a polyglot, but I know my current limitations. Go has been increasingly noticeable on my radar and I am starting to familiarise myself with it, but this has led me to a small realisation. When service providers (like Amazon in this case) are providing SDK support they typically will be catering to their largest consumer base. Internally they largely use Java and that shows by their 1st class support for that language and toolchain.

Using the example of Elastic Beanstalk and the language support it provides, you can quite easily determine their current (or recent) priorities. Java came first, with .NET and PHP following. Python came about half-way through this year and Ruby was only recently added. Their general-purpose SDKs are somewhat more limiting, only supporting Java, .NET, PHP and Ruby (outside of mobile platform support). These are reasonable, if middle-of-the-road options.

Today I was attempting to run some code against the Ruby SDK, using JRuby. The amount of work it has to do is significant, parallisable and doesn’t exactly fit Ruby’s poor native support (at least in MRI) for true concurrency. I’m not going to gain anything by rewriting in PHP, cannot consider .NET and Java is just not going to be a good use of my time. I feel like there is an impedance mismatch between this set of languages and the scale of what AWS supports.

You are supposed to be scaling up to large amounts of computing and storage to best take advantage of what AWS offers. Similarly, you best make use of the platform by highly parallelising your workload. The only vaguely relevant language from this point of view is Java, but it’s just not a desirable general-purpose language for many of us, especially if we want to enjoy low-friction development as so many newer languages provide.

To be more specific – languages like Go, Erlang (or perhaps more relevant, Elixir), Scala etc offer fantastic concurrency and more attractive development experiences but these are not going to be supported by the official SDKs. It makes perfect sense from the point of view of the size of the developer base, but from the point of view of picking the right tool for the job it doesn’t. Perhaps in a few years this paradigm of highly parallel computing will have gained momentum enough that these languages move to the mainstream (ok, Heroku supports Scala already) and we start to see more standard SDK support for them.

Tags: , , , ,

Wednesday, November 21st, 2012 Tech 1 Comment

Amazon S3 object deletions and Multi-Factor Authentication

by Oliver on Sunday, October 7th, 2012.

I’ve been using S3 a lot in the last couple of months, and with the Amazon SDK for Ruby it really is dead simple to work with (as well as all of the other AWS services the SDK supports currently). So simple in fact, that you could quite easily delete all of your objects with very little work indeed. I did some benchmarks and found that (with batch operations) it took around 3 minutes to delete ~75000 files in about a terabyte. Single threaded.

Parallelize that workload and you could drop everything in your S3 buckets within a matter of minutes for just about any number of objects. Needless to say, if a hacker gets your credentials an extraordinary amount of damage can be done very easily and in a very short amount of time. Given there is often a several hour lag in accesses being logged, you’ll probably not find out about such accesses until long after the fact. Another potential cause of deletions is of course human error (and this is generally way more probable). In both cases there is something you can do about it.

S3 buckets have supported versioning for well over two years now, and if you use SVN, Git, or some other version control system then you’ll already understand how it works. The access methods of plain objects and their versions do differ slightly but the principle ideas are the same (object access methods generally operate on only the latest, non-deleted version). With versioning you can already protect yourself against accidental deletion, since you can revert to the last non-deleted version at any time.

However there is nothing preventing you from deleting all versions of a file, and with it all traces that that file ever existed. This is an explicit departure from the analogy with source versioning systems, as any object with versions still present will continue to cost you real money (even if the latest version is a delete marker). So, you can add Multi-Factor Authentication to your API access to S3 and secure these version deletion operations.

This has existed in the web API for some time but I recently had a commit merged into the official SDK that allows you to enable MFA Delete on a bucket, and there is another one in flight which will allow you to actually use the multi-factor tokens in individual delete requests. The usage is slightly interesting so I thought I’d demonstrate how it is done in Ruby, and some thoughts on its potential use cases. If you want to use it now, you’ll have to pull down my branch (until the pull request is merged).

Enabling MFA

I won’t go into details about acquiring the actual MFA device as it is covered in sufficient detail in the official documentation but suffice it to say that you can buy an actual hardware TOTP token, or use Amazon’s or Google’s “virtual” MFA applications for iPhone or Android. Setting them up and associating them with an account is also fairly straightforward (as long as you are using the AWS console; the command line IAM tools are another matter altogether).

Setting up MFA Delete on your bucket is actually quite trivial:

require 'rubygems'
require 'aws-sdk'
s3 = => 'XXXX', :secret_access_key => 'XXXX')
bucket = s3.buckets['my-test-bucket']
bucket.enable_versioning(:mfa_delete => 'Enable', :mfa => 'arn:aws:iam::123456789012:mfa/root-account-mfa-device 123456')

Behind the scenes, this doesn’t do much different to enabling versioning without MFA. It adds a new element to the XML request which requests that MFA Delete be enabled, and adds a header containing the MFA device serial number and current token number. Importantly (and this may trip you up if you have started using IAM access controls), only the owner of a bucket can enable/disable MFA Delete. In the case of a “standard” account and delegated IAM accounts under it, this will be the “standard” account (even if one of the sub-accounts was used to create the bucket).

Version Deletion with MFA

Now, it is still possible to delete objects but not versions. Version deletion looks much the same but requires the serial/token passed in if MFA Delete is enabled:

require 'rubygems'
require 'aws-sdk'
s3 = => 'XXXX', :secret_access_key => 'XXXX')
bucket = s3.buckets['my-test-bucket']
bucket.versions['itHPX6m8na_sog0cAtkgP3QITEE8v5ij'].delete(:mfa => 'arn:aws:iam::123456789012:mfa/root-account-mfa-device 123456')

As mentioned above there are some limitations to this (as you’ve probably guessed):

  • Being a TOTP system, tokens can be used only once. That means you can delete a single version with a single token, no more. Given that on Google Authenticator and Gemalto physical TOTP devices a token is generated once every 30 seconds it may take up to a minute to completely eradicate all traces of an object that was deleted previously (original version + delete marker).
  • Following on from this, it is almost impossible to consider doing large numbers of deletions. There is a batch object deletion method inside of AWS::S3::ObjectCollection but this is not integrated with any of the MFA Delete mechanisms. Even then, you can only perform batches of 1000 deletions at a time.

As it stands, I’m not sure how practical it is. MFA involves an inherently human-oriented process as it is involves something you have rather than something you are or something you know (both of which are reasonably easily transcribed once into a computer). Given the access medium is an API designed for rapid, lightweight use there seems to be an impedance mismatch. Still, with some implementation to get the batch deletions working it would probably serve a lot of use cases still.

Are you using MFA Delete (through any of the native APIs or other language SDKs, or even 3rd-party apps)? I would love to hear about other peoples’ experiences with it – leave your comments below.

Tags: , , , , ,

Sunday, October 7th, 2012 Tech 6 Comments