UPnP, IPv6 and updater scripts

by Oliver on Saturday, December 21st, 2013.

I’ve written a couple of times in the past about my use of IPv6 at home. Sadly the state of native IPv6 for consumers is not so much better, so I’m left without native connectivity to my router (to be fair, I have assumed nothing has changed but it is possible Deutsche Telekom is now offering it without making any fanfare about its arrival).

So I still have the humble Hurricane Electric tunnel running as I’ve previously written about. Unfortunately the tunnel config at the HE end needs to be updated whenever your public IP changes (which it does almost every day), and I’ve only ever managed this by hacking up the DynDNS support in my router. This also meant that I couldn’t use any actual DynDNS updating in conjunction with the hack. For a time I had some kind of DynDNS client running on my HTPC but that also seemed somewhat unreliable.

Irritated with the poor state of this system, and looking to do a little bit of programming in Go, I set about building a small program that does the following:

  • Retrieves the current WAN IP from the router using Universal Plug ‘n’ Play.
  • Calls the Tunnelbroker API to update the configuration with the current WAN IP.
  • Profit!

Prior to this I only had an extremely cursory knowledge of UPnP, i.e., some technology in the router vaguely associated with Microsoft that introduces security holes into your network and is best left disabled! It is actually a very rich system of protocols that facilitates automation, integration of a wide variety of different devices and evented reactions to system changes. You can read through this guide to understanding UPnP which explains the intentions behind it, and a little of a (now slightly dated) vision of the future electronic home – despite it only being written in 2000.

My purpose is much simpler – grab the WAN interface IP address. Sure, I could do this with curl hitting one of the many “what is my IP”-style websites, but that requires actually going out onto the internet and making a request to some random site when my router already knows the address. It seems far more logical to retrieve it from there directly! Fortunately this is dead simple with UPnP, once you understand the general command flow and protocols/requests to use. Briefly, the exchange looks like this:

  • Send multicast UDP discovery message to the network. The message is basically an HTTP request.
  • Router responds with search results for each service it provides, via unicast UDP to the host that sent the discovery message. The responses again are basically HTTP.
  • Send an HTTP request over TCP this time to the router’s control endpoint with a SOAP/XML request asking for the IP address.
  • Router sends the HTTP response back with XML containing the IP address.

I can happily say that this works, and you can browse the code here. Pull requests and issues welcome. Making the subsequent HTTP request to the Tunnelbroker API is relatively straightforward, after the UPnP gymnastics.

In this implementation I just make a single control request to the router, and get a single response back, but I mentioned earlier that a core feature of UPnP is evented responses to system changes. The overview document I linked to above mentions such things as a program running on your computer that responds to events from the printer advising it is out of paper, or that its physical location has changed, but the possibilities here are really as limitless as the devices that can support UPnP. In this case, it is possible for the router to update subscribers about a new IP address once it has changed (which sadly I haven’t yet implemented).

So in summary, UPnP is a surprisingly useful technology that deserves looking into. If you have a use for my tunnel updater program, I’d love to hear any feedback on it.

Tags: , ,

Saturday, December 21st, 2013 Tech No Comments

Can’t create new network sockets? Maybe it isn’t user limits…

by Oliver on Thursday, February 28th, 2013.

I’ve been doing a lot more programming in Go recently, mostly because it has awesome concurrency primitives but also because it is generally a pretty amazing language. Unlike other languages which have threads, fibres or event-driven frameworks to achieve good concurrency, Go manages to avoid all of these but still remain readable. You can also reason about its behaviour very effectively due to how easily understandable and straightforward concepts like channels and goroutines are.

But enough about Go (for the moment). Recently I found the need to quickly duplicate the contents of one Amazon S3 bucket to another. This would not be a problem, were it not for the fact that the bucket contained several million objects in it. Fortunately, there are two factors which makes this not so daunting:

  1. S3 scales better than your application ever can, so you can throw as many requests at it as you like.
  2. You can copy objects between buckets very easily with a PUT request combined with a special header indicating the object you want copied (you don’t need to physically GET then PUT the data).

A perfect job for a Go program! The keys of the objects are in a consistent format, so we can split up the keyspace by prefixes and split the work-load amongst several goroutines. For example, if your objects are named 00000000 through to 99999999 using only numerical characters, you could quite easily split this into 10 segments of 10 million keys. Using the bucket GET method you can retrieve up to 1000 keys in a batch using prefixes. Even if you split into 10 million key segments and there aren’t that many actual objects, the only things that matter are that you start and finish in the right places (the beginning and end of the segment) and continue making batch requests until you have all of the keys in that part of the keyspace.

So now we have a mechanism for rapidly retrieving all of the keys. For millions of objects this will still take some time, but you have divided the work amongst several goroutines so it will be that much faster. For comparison, the Amazon Ruby SDK uses the same REST requests under the hood when using the bucket iterator bucket.each { |obj| … } but only serially – there is no division of work.

Now to copy all of our objects we just need to take each key return by the bucket GET batches, and send off one PUT request for each one. This introduces a much slower process – one GET request results in up to 1000 keys, but then we need to perform 1000 PUTs to copy them. The PUTs also take quite a long time each, as the S3 backend has to physically copy the data between buckets – for large objects this can still take some time.

Let’s use some more concurrency, and have a pool of 100 goroutines waiting to process the batch of 1000 keys just fetched. A recent discussion on the golang-nuts group resulted in some good suggestions from others in the Go community and resulted in this code:

It’s not a lot of code, which makes me think it is reasonably idiomatic and correct Go. Best yet, it has the possibility to scale out to truly tremendous numbers of workers. You may notice that each of the workers also uses the same http.Client and this is intentional – internally the http.Client makes some optimisations around connection reuse so that you aren’t susceptible to the performance penalty of socket creation and TCP handshakes for every request. Generally this works pretty well.

Let’s think about system limits now. Say we want to make our PUT copy operations really fast, and use 100 goroutines for these operations. With just 10 fetcher goroutines that means we now have 1000 goroutines vying for attention from the http.Client connection handling. Even if the fetchers are idle, if we have all of the copier workers running at the same time, we might require 1000 concurrent TCP connections. With a default user limit of 1024 open file handles (e.g. on Ubuntu 12.04) this means we are dangerously close to exceeding that limit.

Head http://mybucket.s3.amazonaws.com:80/: lookup mybucket.s3.amazonaws.com: no such host

When you see an error like the above pop up in your program’s output, it almost seems a certainty that you have exceeded these limits… and you’d be right! For now… Initially these were the errors I was getting, and while it was somewhat mysterious that I would see so many of them (literally one for each failed request), apparently some additional sockets are required for name lookups (even if locally cached). I’m still looking for a reference for this, so if you know of it please let me know in the comments.

This resulted in a second snippet of Go code to check my user limits:

Using syscall.Getrusage in conjunction with syscall.Getrlimit would allow you to fairly dynamically scale your program to use just as much of the system resources as it has access to, but not overstep these boundaries. But remember what I said about using http.Client before? The net/http package documentation says Clients should be reused instead of created as needed and Clients are safe for concurrent use by multiple goroutines and both of these are indeed accurate. The unexpected side-effect of this is that, unfortunately, the usage of TCP connections is now fairly opaque to us. Thus our understanding of current system resource usage is fundamentally detached from how we use http.Client. This will become important in just a moment.

So, having raised my ulimits far beyond what I expected I actually needed (this was to be the only program running on my test EC2 instance anyway), I re-ran the program and faced another error:

Error: dial tcp cannot assign requested address

What the… I thought I had dealt with user limits? I didn’t initially find the direct cause of this, thinking I hadn’t properly dealt with the user limits issue. I found a few group discussion threads dealing with http.Client connection reuse, socket lifetimes and related topics, and I first tried a few different versions of Go, suspecting it was a bug fixed in the source tip (more or less analogous to HEAD on origin/master in Git, if you mainly use that VCVS). Unfortunately this yielded no fix and no additional insights.

I had been monitoring open file handles of the process during runtime and noticed it had never gone over about 150 concurrent connections. Using netstat on the other hand, showed that there were a significant number of connections in the TIME_WAITstate. This socket state is used by the kernel to leave a trace of the connection around in case there are duplicate packets on the network waiting to arrive (among other things). In this state the socket is actually detached from the process that created it, but waiting for kernel cleanup – therefore it actually doesn’t count as an open file handle anymore, but that doesn’t mean it can’t cause problems!

In this case I was connecting to Amazon S3 from a single IP address – the only one configured on the EC2 instance. S3 itself has a number of IP addresses on both East and West coasts, rotated automatically through DNS-based load-balancing mechanisms. However, at any given moment you will resolve a single IP address and probably use that for a small period of time before querying DNS again and perhaps getting another IP. So we can basically say we have one IP contacting another IP – and this is where the problem lies.

When an IPv4 network socket is created, there are five basic elements the kernel uses to make it unique among all others on the system:

protocol; local IPv4 address : local IPv4 port <-> remote IPv4 address : remote IPv4 port

Given roughly 2^27 possibilities for local IP (class A,B,C), the same for remote IP and 2^16 for each of the local and remote ports (assuming we can use any privileged ports < 1024 if we use the root account), that gives us about 2^86 different combinations of numbers and thus number of theoretical IPv4 TCP sockets a single system could keep track of. That’s a whole lot! Now consider that we have a single local IP on the instance, we have (for some small amount of time) a single remote IP for Amazon S3, and we are reaching it only over port 80 – now three of our variables are reduced to a single possibility and we only have the local port range to make use of.

Worse still, the default setting (for my machine at least) of the local port range available to non-root users was only 32768-61000, which reduced my available local ports to less than half of the total range. After watching the output of netstat and grepping for TIME_WAIT sockets, it was evident that I was using up this odd 30000 local ports within a matter of seconds. When there are no remaining local port numbers to be used, the kernel simply fails to create a network socket for the program and returns an error as in the above message – cannot assign requested address.

Armed with this knowledge, there are a couple of kernel tunings you can make. Tcp_tw_reuse and tcp_tw_recycle both are related to tunings to the kernel which affect when it will reclaim sockets in the TIME_WAIT state, but practically this didn’t seem to have much effect. Another setting, tcp_max_tw_buckets sets a limit on the total number of TIME_WAIT sockets and actively kills them off rapidly after the count exceeds this limit. All three of these parameters look and sound slightly dangerous, and despite them having had not much effect I was loath to use them and call the problem solved. After all, if the program was killing the connections and leaving them for the kernel to clean up, it didn’t sound like http.Client was doing a very good job of reusing connections automatically.

Incidentally, Go does support automatic reuse of connections in TIME_WAIT with the SO_REUSEADDR socket option, but this only applies to listening sockets (i.e. servers).

Unfortunately that brought me about to the end of my inspiration, but a co-worker pointed me in the direction of the http.Transport’s MaxIdleConnsPerHost parameter, which I was only vaguely aware of due to having skimmed the source of that package in the last couple of days, desperately searching for clues. The default value used here is two (2) which seems reasonable for most applications, but evidently is terrible when your application has large bursts of requests rather than a constant flow. I believe that internally, the transport creates as many connections as required, the requests are processed and closed and then all of those connections (but two) are terminated again, left in TIME_WAIT state for the kernel to deal with. Just a few cycles of this need to repeat before you have built up tens of thousands of sockets in this state.

Altering the value of MaxIdleConnsPerHost to around 250 immediately removed the problem, and I didn’t see any sockets in TIME_WAIT state while I was monitoring the program. Shortly thereafter the program stopped functioning, I believe because my instance was blacklisted by AWS for sending too many requests to S3 in a short period of time – scalability achieved!

If there are any lessons in this, I guess it is that you still often need to be aware of what is happening at the lowest levels of the system even if your programming language or application has abstracted enough of the details away for you not to have to worry about them. Even knowing that there was an idle connection limit of two would not have given away the whole picture of the forces at play here. Go is still my favourite language at the moment and I was glad that the fix was relatively simple, and I still have a very understandable codebase with excellent performance characteristics. However, whenever the network and remote services with variable performance characteristics are involved, any problem can take on large complexity.

Tags: , , , , , , , ,

Thursday, February 28th, 2013 Tech 8 Comments

Service SDKs and Language Support Part 2

by Oliver on Sunday, January 20th, 2013.

As I wrote previously, I found that the mismatch between the goals of large cloud services like Amazon Web Services and the languages they support slightly conflict with the notion of making highly concurrent and parallelised workflows.

Of course the obvious followup to that post (even embarrassingly obvious since I’ve been copiously mentioning Go so much recently) is to point out that Google’s App Engine is doing this right by supporting Go as a first-class language, even getting an SDK provided for several platforms.

I haven’t had a chance to use App Engine so far, but I’d like to in future. Unfortunately, Google’s suite of services is not nearly as rich as that provided in AWS right now but I’m sure they are working hard on achieving feature parity in order to pull more customers over from AWS.

Tags: , , ,

Sunday, January 20th, 2013 Tech No Comments

Another Personal Evolution – From Ruby to Go

by Oliver on Friday, January 4th, 2013.

Almost two years ago now, I wrote a post about how I was fed up with resorting to shell scripting as my knee-jerk reaction to computer problems. At the time, I had been attacking any problem that required more than a couple of commands at the prompt by writing a shell (usually BASH) script and hit major limitations that I really should have been solving with a legitimate programming language. I resolved to only resort to Ruby or Python and in that goal I’ve actually been very successful (although I’ve ended up using Ruby around 90% of the time and Python only 10% of the time, which I wish was a little more evenly distributed).

Now I feel as if there is another evolution happening which I need to apply myself to. As a side-effect of the kind of work I’ve been doing, Ruby is just not cutting it. I love the flexibility of it (even despite the numerous ways you can shoot yourself in the foot), and there are some really great libraries like the AWS Ruby SDK which I’ve been using a lot lately. However, when you start wanting to do highly parallelised or concurrent tasks (and this is an excellent talk on the subject), it all starts getting a bit painful. I dabbled in event-based programming last year with NodeJS but found the spaghetti callbacks a bit mind-bending. Similarly with Ruby and EventMachine the code can be less than perfectly understandable. Goliath makes the task somewhat easier (if you are writing a web-service), and em-synchrony follows a similar pattern with Ruby Fibers but they all fall down if you need to use any libraries which don’t make use of asynchronous IO. I briefly looked at Python’s Twisted framework but didn’t find it much better (although that may be an unfair statement, as I didn’t spend much time on it).

I tried a different approach recently and attempted to use the quite awesome JRuby and solve the problem with native threads and the power of the JVM, but hit similar problems with libraries just not working in JRuby. This seems to be a common problem still, unfortunately. The overall result is having no clear option from a Ruby point of view when attempting to make a high-performance application that is also readable and understandable. It’s a bit of a blanket statement, granted, and if I had more constraints on my development environment I might have persisted with one of the options above (there are certainly workarounds to most of the problems I’ve experienced).

Fortunately for me, I have a flexible working environment, buy-in with alternative languages is pretty good and I’m willing to learn something new. Go is a relatively new language, having only been around (publicly) for just over three years, but quite nicely fits my current needs. I won’t go into it technically, as it is all over the interwebs, but I find it relatively easy to read (even for a newbie), and similarly easy to write.

However, I find myself in the same situation I was almost two years ago: it will take some effort to stop the now familiar knee-jerk reaction – this time towards Ruby – and establish the new habit in using Go wherever possible. I’ve just finished up a recent small spare-time project which utilised Ruby so I have free rein to indulge in Go at every possible opportunity. It is scary, but also very exciting – just as it was declaring my intention to use only Ruby almost two years ago.

That’s not to say I’m going to use Go exclusively – I still have to finish up reading (and working) through Seven Languages in Seven Weeks. My intention is not to become a polyglot (I think that’s a bit beyond my capabilities), but I’d at least like to be reasonably proficient in at least one language that solves a given set of problems well. I found that niche with Ruby, and now I am hoping to find that niche with Go. If you haven’t tried it, I thoroughly recommend it.

Tags: , ,

Friday, January 4th, 2013 Tech 2 Comments

On Service SDKs and Language Support

by Oliver on Wednesday, November 21st, 2012.

As I’ve previously mentioned, I’ve been doing a lot of work recently with various aspects of AWS on a daily basis (or close to it). My primary language these days is still Ruby, but I’ve been labouring through the excellent Seven Languages in Seven Weeks book in the hope I can broaden my horizons somewhat. I’m fairly comfortable with Python, somewhat familiar with Javascript now after playing with NodeJS and I have a cursory ability still in C/C++ and Java but it has been over 10 years since I’ve done anything significant in any of those languages.

Suffice to say, I’m far from being a polyglot, but I know my current limitations. Go has been increasingly noticeable on my radar and I am starting to familiarise myself with it, but this has led me to a small realisation. When service providers (like Amazon in this case) are providing SDK support they typically will be catering to their largest consumer base. Internally they largely use Java and that shows by their 1st class support for that language and toolchain.

Using the example of Elastic Beanstalk and the language support it provides, you can quite easily determine their current (or recent) priorities. Java came first, with .NET and PHP following. Python came about half-way through this year and Ruby was only recently added. Their general-purpose SDKs are somewhat more limiting, only supporting Java, .NET, PHP and Ruby (outside of mobile platform support). These are reasonable, if middle-of-the-road options.

Today I was attempting to run some code against the Ruby SDK, using JRuby. The amount of work it has to do is significant, parallisable and doesn’t exactly fit Ruby’s poor native support (at least in MRI) for true concurrency. I’m not going to gain anything by rewriting in PHP, cannot consider .NET and Java is just not going to be a good use of my time. I feel like there is an impedance mismatch between this set of languages and the scale of what AWS supports.

You are supposed to be scaling up to large amounts of computing and storage to best take advantage of what AWS offers. Similarly, you best make use of the platform by highly parallelising your workload. The only vaguely relevant language from this point of view is Java, but it’s just not a desirable general-purpose language for many of us, especially if we want to enjoy low-friction development as so many newer languages provide.

To be more specific – languages like Go, Erlang (or perhaps more relevant, Elixir), Scala etc offer fantastic concurrency and more attractive development experiences but these are not going to be supported by the official SDKs. It makes perfect sense from the point of view of the size of the developer base, but from the point of view of picking the right tool for the job it doesn’t. Perhaps in a few years this paradigm of highly parallel computing will have gained momentum enough that these languages move to the mainstream (ok, Heroku supports Scala already) and we start to see more standard SDK support for them.

Tags: , , , ,

Wednesday, November 21st, 2012 Tech 1 Comment