Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We currently reside in Sydney after almost a decade in Berlin, have three kids, and have far too little time to really justify having a blog.
Before you get any ideas about the topic of this post, it's not a rant about OpenResty at all - it's actually a great piece of software and in my opinion a fantastic integration between a capable webserver (Nginx) and a light-weight programming language (Lua). The combination makes it ideal for high-performance delivery of traffic with some light business logic thrown in as well. We use it at my current employer precisely for that reason. Building your own equivalent from scratch in another language (even one that is high-performance) is probably not a great idea, and other options like Kong were not available when we started using OpenResty.
However, it has proven to have its challenges, which we've discovered over the last couple of months working with it. I'd like to write briefly about a couple that might be of wider interest.
Firstly though, if you work with languages other than Lua on a regular basis, one thing that might trip you up often is the small syntactical differences. The
NOT EQUALS operator in Lua is
~= which in most other languages might be used for something like Regular Expression matching. Not only does it not look like
!=, it also contains
~ which gets your mind thinking about string matching. We missed one configuration error this way recently:
if ngx.var.ROUTING_ENABLED ~= "t" end return end
In my haste, I had
set ROUTING_ENABLED "true" in the Nginx configuration and assumed it would be working. My mind had evidently scanned the above Lua code and seen the tilde and determined I just needed a string that started with
t. This required a bit of a code hunt and thanks to a sharp team-mate we tracked this down.
I won't go too much into testing philosophy and bore you to death, but I am at least a proponent of testing your software. I like to do TDD where possible (although I don't strictly adhere to it) but I feel that the testing pyramid should be observed.
This is difficult to do when using OpenResty due to the presence of
ngx.null. For performance and signalling reasons, much of the time you interact with objects coming from the Nginx part of Openresty or other common libraries, you will receive a value or
ngx.null rather than the built-in Lua
nil value. This means you have to pull in the
ngx library which requires Nginx to be installed (and ideally running).
It would certainly be possible to create wrappers around any code returning this value, but in practice this becomes a choice of which is more cumbersome - wrappers around everything or accepting that most tests will now be integration tests through Nginx. We have only a few genuinely isolated unit tests and quite a few integration tests that involve the
ngx or other libraries. Potentially this means more time to run tests and a more cumbersome environment for testing simple changes.
Unfortunately the coupling of a largely configuration-driven engine (Nginx) with your own code means that there is a lot of behaviour you cannot easily determine ahead of runtime. When things do go wrong, you can't exactly trace through Nginx's code to figure out what is going wrong. I won't go as far as saying it is non-deterministic (because it isn't) but the differences in behaviour between environments can feel that way.
ngx.log copiously through the codebase, but a disadvantage of this is that it only outputs which line you called that specific function from. Since we often want to correlate several useful things together (e.g. which server or location block we entered through, a request ID, a log severity level etc) it is useful to write a small logging module to wrap up all of these concerns. However, then you rather unhelpfully end up with log messages like this:
2018/06/22 16:06:52 [error] 8#0: *53242 [lua] logging.lua:17: ngx_log(): [err] close, request ID: xxxx, client: x.x.x.x, server: xxxx, request: "GET / HTTP/1.1", host: "xxxx", referrer: "https://xxxx"
Yes, I know I am calling
logging.lua on line 17 - that's the logging module! Paired with very terse errors (sometimes not entirely expected - we occasionally log directly errors from external modules - this ends up making some of these errors extremely hard to track down. I unsuccessfully tried to track down the above error ("close") in a local test environment and pre-staging cloud environment, before deciding to do something about the logging.
To cut to the chase, here's the solution:
local srcFile = debug.getinfo(3).source local srcLine = debug.getinfo(3).currentline local logmsg = string.format('[%s] %s in %s:%d, request ID: %s', prefix, msg, srcFile, srcLine, requestID) ngx.log(ngx.ERR, logmsg)
debug.getinfo function returns a table containing information about the call stack when passed a depth integer. In this case, we want to go up to the third level in the stack - 1 is the current function, 2 is another function that abstracts log levels and 3 is the function actually calling the logging module. Now we have the actual call site in the business logic code that indicates where the problem might be.
This led to the discovery (or re-discovery) of another issue.
In hindsight this is not too unexpected, when comparing with other languages, but again due to the tight coupling with Nginx and the need to inter-operate with its way of doing things, you can end up with some strange results.
The above logging improvement indicated we were attempting to connect to a statsd endpoint with a hostname of empty string (literally
""). We actually had some code which should have caught this possibility:
STATSD_HOST = ngx.var.STATSD_HOST or "127.0.0.1"
However, here you run into a problem due to how OpenResty treats variables, usefully documented here. The key sentence is this one:
Undefined NGINX variables are evaluated to nil while uninitialized (but defined) NGINX variables are evaluated to an empty Lua string.
This might seem straight-forward, but it entirely depends on how you have formed your configuration. If you have no mention whatsoever of a variable (defined with
set VAR VALUE syntax) in your configuration files, it will be
nil. If you define a variable in some places of your configuration, but not all, it may either be a value or the empty string.
In our case as we were migrating configuration from one system to another, what we could previously rely on to be not defined, was now defined in some places but not others. The end result is somewhat similar to what you might see in Ruby -
nil will be evaluated as falsy and the empty string is evaluated as truthy. Thus, our cleverly constructed configuration default value as set above won't actually work.
How might you catch this, and when? We could build a more robust set of configuration defaults into the Lua code, log out when a config value is not set properly (or even kill the server, although I'm not sure if this is possible). We run into the inevitable and constant problem of checking configuration from one system (e.g. Chef or Kubernetes config maps), in order to validate that it will really work at runtime. Visions of vast Cucumber BDD tests flash through my mind, and I reconsider diving into that rabbit hole.
Unfortunately with software like Nginx, your configuration often is quite large and in a format you can't really control. Templating only shifts the burden of getting values into that fixed format, but not really ensuring that the whole thing makes sense and will work. We could create our own higher-level language that compiles to Nginx configuration, but I just don't have enough time on this planet for that.
In any case, now all of these problems have been solved (for the moment), and our OpenResty processes are humming away serving traffic happily until the next big change. Have any similar experiences with OpenResty? Leave me a comment below.