Understanding Upstart’s logging options

by Oliver on Sunday, May 18th, 2014.

I have a couple of services I own that run in EC2 that, sadly, still lack any log aggregation. In-house we have our own system for this, that technically can also run in EC2 but for a variety of reasons I consider it overkill for my use case in EC2. Basically, the solution I am after is to do the minimum amount of work to see these logs aggregated to an S3 bucket, making them available for optional processing at a later time with Elastic MapReduce, if we ever have to do any analysis. This rarely happens, so the current solution of a distributed SSH command grepping the active machines’ logs is mostly sufficient. If the logs you are interested in happened to be on a machine that was terminated as a result of an AutoScale group scale down action, well, that’s what the log aggregation would solve!

The actual services running on these instances tend to run through Upstart. Notwithstanding the result of the recent Init Wars in the Debian community, I find Upstart to be an acceptable Init system, and certainly is easy enough to configure – drop a conf file in a reasonably minimal format into /etc/init and you are done. In recent enough versions (1.4 and later) you can even have Upstart take care of logging to a file for you, and by extension (since it drops a logrotate fragment onto your system) it also handles logrotation. If you are to read the official documentation for Upstart, however, you’d be forgiven for thinking the logging mechanism is somewhat magical as not many details are given away.

Why am I interested in this? I had to refresh my memory about this, but of course logrotation only happens at most once per day (it is run from cron.daily). For my services running in AutoScale groups, some instances last less than 24 hours, and so I would prefer to aggregate my logs at a minimum every hour. Without knowing the exact relationship between Upstart and logrotate, it is hard to say whether this would be possible. My intention is to have some other small cron job run every hour to pull the log out of use by Upstart (however that may be taking place), add timestamp and machine information to the filename, compress it, and upload it to an S3 bucket. Where to start?

Going back to the original development mailing list post from October 2011 gives some details. Like the documentation alludes to, Upstart creates a pseudo-terminal and connects the slave to the child process and the master to itself. This happens in init/job_process.c, which is worth a read to understand what is going on. Upstart is indeed taking on all handling of the logging rather than delegating to an external process. Let’s confirm that our process is really not writing the log itself:

# service testprogram start
testprogram start/running, process 27671
# lsof -n -p 27671 | tail -n4
testprogr 27671 root    0u   CHR    1,3      0t0   4759 /dev/null
testprogr 27671 root    1u   CHR  136,1      0t0      4 /dev/pts/1
testprogr 27671 root    2u   CHR  136,1      0t0      4 /dev/pts/1
testprogr 27671 root  255r   REG  252,0      341 275311 /usr/local/bin/testprogram

OK, it’s definitely connected to the pseudo-terminal for standard output and standard error. Is the init process connected to the log file?

# lsof -p 1 | grep testprogram
init      1 root   11w   REG              252,0 3086869601  10521 /var/log/upstart/testprogram.log

Great! So we know now, if we want to figure out log rotation semantics, we don’t have to worry about our own process – only Upstart itself. So how do Upstart and logrotate cooperate? The logrotate fragment gives very little away:

/var/log/upstart/*.log {
        rotate 7

It is not doing anything special to signal to Upstart that logs have been rotated, for example in a prerotate or postrotate script. Having a look into init/log.c you can see that an Upstart Job contains a Log object which knows about the console log file that has been used for this particular service. It keeps the file-handle open so that any new data coming in from the service can just be written straight out to the file. There is some additional intelligence around out-of-disk conditions, and some buffering taken care of by the NIH utility library but these are orthogonal to the issue of log rotation.

When some input comes in, Upstart attempts to flush the data out to disk and in the process checks the status of the file-handle it already has open (with an fstat system call). If the fstat result indicates that there are no links to the file (i.e. it has been deleted but it is still open), Upstart will open a new file with the same name and flush the log data out to that file. Hence, when logrotate rotates the file to a new name, nothing happens. But since the Upstart logrotate config fragment contains the directive compress, it will invoke gzip which will delete the original file once the new compressed file has been completely written out. Upstart detects the deletion, opens a new file and continues logging. Mystery solved!

What does this look like in action? I made a very simple shell script which outputs a couple of phrases and a line number in a tight loop to test the rotation behaviour and the lossiness of Upstart’s logging, as well as how much data is lost between the gzip compression completion and opening a new file. Initially I also added a bunch of signal traps to the script to see if Upstart was doing anything funky there (sending a hangup signal for example), but since the program itself does no logging, it doesn’t need any signalling to be aware of the logging mechanism:

#!/bin/bash -e
let i=1
while true; do
        echo "${i} The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure."
        let i=i+1

Since there are no sleeps in between loops, this runs as fast as the data can be accepted on the pseudo-terminal. Evidently, this also causes quite some load on Upstart itself:

27754 root      20   0 17856 1484 1236 R 46.3  0.1   0:03.62 testprogram
    1 root      20   0 24380 2220 1296 S 45.6  0.1  15:36.35 init

You can see what Upstart is doing with strace:

select(15, [3 5 6 7 8 9 12 14], [], [7 8 9 12], NULL) = 1 (in [14])
read(14, "\r\n1403895 The quick brown fox ju"..., 8192) = 552
read(14, "\r\n1403899 The quick brown fox ju"..., 7640) = 138
read(14, 0x7f1027c3fbf2, 7502)          = -1 EAGAIN (Resource temporarily unavailable)
fstat(11, {st_mode=S_IFREG|0640, st_size=192626266, ...}) = 0
write(11, "\r\n1403895 The quick brown fox ju"..., 690) = 690
read(3, 0x7fff293210ff, 1)              = -1 EAGAIN (Resource temporarily unavailable)

So, it waits on the select call to indicate that one of its jobs has some output, makes some reads to collect that output from the pseudo-terminal, does the fstat on the file and then writes the buffered data out in one call. What happens when we delete the file in between writes?

select(15, [3 5 6 7 8 9 12 14], [], [7 8 9 12], NULL) = 1 (in [14])
read(14, "4 The quick brown fox jumped ove"..., 8192) = 132
read(14, 0x7f1027c3f9c4, 8060)          = -1 EAGAIN (Resource temporarily unavailable)
fstat(10, {st_dev=makedev(252, 0), st_ino=10521, st_mode=S_IFREG|0640, st_nlink=0, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=264, st_atime=2014/05/18-20:33:32, st_mtime=2014/05/18-20:33:42, st_ctime=2014/05/18-20:33:48}) = 0
close(10)                               = 0
umask(0117)                             = 0117
open("/var/log/upstart/testprogram.log", O_WRONLY|O_CREAT|O_APPEND|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC, 0740) = 10
write(10, "4 The quick brown fox jumped ove"..., 132) = 132
read(3, 0x7fff293210ff, 1)              = -1 EAGAIN (Resource temporarily unavailable)

This time I ran strace with the -v parameter to completely expand the system call output. I slowed the output down to one line per ten seconds, and then deleted the file right after one line had been flushed to disk. You can see that the fstat returns st_nlink=0, which then causes Upstart to close the filehandle, set the umask and then open the file by name again before writing the buffered log line out to the file. Great, that makes sense, but does it work as well when there is a high volume of data coming in?

Removing the sleep in the script, I let the log build up to about 2GB before doing anything. For reference, it was writing about 3MB/s to the log which is not a huge amount, but certainly enough for a single service even if behaving badly. To simulate realistic rotation conditions, I simply invoked gzip on the log file to allow it to compress and then delete the original file. Taking a look at the end of the compressed file and the beginning of the new file (where we would expect to see some losses) this is the result:

# zcat testprogram.log.gz | tail -n1
18693551 The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure.#
# head -n2 testprogram.log

18693552 The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure.

So the interesting thing here is that somehow the newline character is dropped off the end of the first file and ends up at the start of the second file, but most importantly no lines have been missed and in fact we have all bytes intact between the rotated file and the current file. I’m not positive this doesn’t break down under higher rates of logging, and of course Upstart was very heavily loaded under just this amount of logging but for my purposes which can tolerate some losses it certainly suffices. Another worrying issue would be the high rate of fstat calls (one per log flush) which don’t help the performance of Upstart’s logging, however for low logging rates it shouldn’t have much impact.

So the next step is to actually put this into practice, which involves writing a prerotate script that ignores the log files of the services I am interested in (so they don’t get rotated by both logrotate and my own log handling), then write a bit of glue code to compress and upload them to S3 hourly. It may seem like a lot more effort but I’m really trying to avoid another solution that requires external services to manage the logs (e.g. remote syslog, fluentd, Logstash connected to ElasticSearch etc). If this current idea fails or turns out to be more effort than antipated, well, I guess I’ll go down the logging service path instead.

Tags: , , , ,

Sunday, May 18th, 2014 Tech 5 Comments

Cloudformation and the data transformation nightmare

by Oliver on Friday, March 7th, 2014.

The background to this story is that I spent the bulk of one week recently working on getting a prototype service deployed with AWS CloudFormation, and the experience was still reasonably painful. My team has other services deployed with CloudFormation, which is working perfectly fine (now) but I had hoped there would be some improvements available since the last time we went through the process.

The Components

Firstly I’ll describe in a sentence what CloudFormation does, for those who aren’t familiar with it, and then list the components which need to go into making something deployable by CloudFormation. CloudFormation itself allows you to describe an entire collection of resources required to make a service runnable and accessible – for example web servers, database servers, a loadbalancer and perhaps a Memcached cluster.

You might consider the descriptive language it requires somewhat akin to the current generation of configuration management languages, but far more restrictive and minimal. It also tends to occupy itself with what happens on initial provisioning rather than keeping the whole “stack” consistent over time – tasks which are more or less delegated to other services such as CloudWatch + AutoScaling etc. The fundamental input is the CloudFormation template which requires you to provide the following:

  • Descriptive (for humans) metadata about the stack.
  • A list of all resources (basically other AWS services) required.
  • Static configuration parameters for all of those resources.
  • “Dynamic” parameters which can be overridden at stack creation or update time, which can either be a literal value or based on a map lookup.
  • EC2 instance user data – mostly taking the form of a shell script used to bootstrap your service when the instances boot up.

The input is encoded in JSON – not a bad encoding in that it is fairly easy to read, but tends to be fairly easy to break by human editing with missing quotes or commas. Depending on your JSON parser of choice, it can be very hard to find the breaking change when that happens.

Static configuration data is mixed with dynamic configuration data (parameters that can come from external sources, or mappings that depend on other inputs) which complicates the understanding of the template, but also vastly enriches its functionality. Here, we already have two possible origins of data coming into the system from sources other than the template itself – parameters passed on the command line to a tool, or read from a configuration file (local, stored in revision control, stored in S3, etc) to be merged in with other parameters.

Then there is the user data, which is really my main pain point.

Bootstrapping the Instance

We’ve described the stack, set up a template ranging in complexity from very simple to a very rich description of different environments and software version requirements – but even with a completely running stack we don’t have any instances actually running our software. You might have generated your own AMIs with your software built into them, configured to start on boot – but in this case you have simply moved your complexity from the provisioning stage to the build stage (and building an AMI as complete as this can take time both in preparation and on every build).

There are several methods available to you when delivering bootstrapping instructions to your newly started instances (most of which are described in this document from AWS).

  • User data script in CloudFormation template, verbatim, expressed as shell commands.
  • Externally-templated user data script (e.g. with ERB, Jinja or your engine of choice), rendered and delivered as part of the CloudFormation template. Like the above but allows some customisation for different environments or build versions.
  • Minimal shell bootstrap embedded in the CloudFormation template, pulling in an external script from a network source to continue the heavy lifting.
  • Puppet or Chef (which still requires a minimal shell bootstrap as above to start their own process).
  • CloudInit
  • CloudFormation helper scripts

None of these are the perfect solution, and a choice for one or another often is simply selecting a point on a complexity scale. I won’t go into detail of how each works but will try to express what pained me about each the most.

  • Verbatim shell script: Potential for errors due to how it must be encoded in the JSON template. No choice for customisation based on environment or build versioning. Limited size.
  • Shell script with external templating: Introduces complexity by requiring data to be merged in from external sources. Risk of the external sources not being available or providing correct data at bootstrap time. Difficulty in tracking exactly which data went into a given build, and maintaining that historical timeline.
  • Minimal bootstrapping to initialize another script: Same risks as above, with shifted complexity from CloudFormation template to another file (or files) which may have further templating. It is also now more difficult to pass data between the initial bootstrap and secondary scripts.
  • Puppet or Chef: Much more complexity. You really need to have ongoing configuration management a requirement to make this a worthwhile proposition.
  • CloudInit: Requires learning what is basically a scripting “shorthand” form. It requires you to write your instructions, then encode the data as a separate step and make the encoded data available to the CloudFormation process. Passing custom data around dependent on environment or build version is again more difficult. Complexity is shifted to the build stage. Since the data is still passed in the user data section of the template, it must adhere to the same size limitations.
  • CloudFormation helper scripts: Only available on Amazon Linux, unless you are prepared to use something like heat-cfntools, which you must install using Pip. This dependency makes using the tools on something like Ubuntu Linux more complex again. This was actually one of the more tempting options for me when considering them recently, but the simplicity of the helper scripts and ease of expression in the template comes at the price of them being fairly inflexible – not quite flexible enough for my software deployment.

I’ll now briefly describe another part of the problem which also has an impact on the choice of bootstrapping methodology, before getting to a more complete conclusion.

Service Lifetime Management

If you have been paying attention, you would have read at least something recently about the wars raging over the decision to replace Init in Debian. I won’t provide any links – Google has more than enough fodder on the subject. It’s a topic I’ve rarely concerned myself with (if you’ve been using RHEL or CentOS for any reasonable amount of time, any system other than the standard SysV-style Init has barely been a blip on the radar), but nevertheless I’m aware that the current choice is SystemD, for better or worse.

Be that as it may, a bunch of my services are running on Ubuntu and hence use Upstart (which I’ve rarely had problems with, despite my reading on the topic now showing that this seems to be rare). Attempting to integrate Upstart-based control of a service with the CloudFormation user-data or other bootstrapping mechanisms comes with its own challenges:

  • Amazon Linux comes with a horribly outdated version of Upstart (0.6 if I recall correctly) which lacks many of the options making Upstart desirable. Among them is file-based logging (this is largely why I decided against Amazon Linux, despite having CloudFormation helper scripts available).
  • Expressing the startup parameters of the service correctly often requires a large amount of quoting. Expressing correct quoting from another shell script, within JSON is at the very least an error-prone exercise.
  • I’d like to run my service as a non-root user. Upstart supposedly supports user services (which you can even configure from a script inside the user directory), however these are disabled by default and the documentation for enabling them again is extremely poor. Enabling them would also add complexity to the bootstrapping process. This means handling the dropping of privileges within the Upstart service script itself.
  • Delegating runtime control of your service to a system service such as Upstart introduces a complexity in passing data from the bootstrapping process once again. Since the your service is now being executed from Upstart it inherits nothing from your bootstrap process directly, unlike if you simply forked a process from the bootstrap script which would allow for inheritance of environment variables etc (which may or may not be a good thing, but it can at least be convenient).

Despite these points, Upstart still allows you to provide a very minimal startup script for your service (mine was about three lines – compare that to your typical init script), and some conveniences such as automatic logging. It’s still not perfectly smooth deploying a service unless you want to package everything up in a deb/rpm earlier in your build pipeline (and then still deal with the issue of full configuration of your service dependent on the environment and versions at hand) but to be fair, that’s not the fault or purpose of Upstart.

What did our implementation look like?

An initial implementation of ours had the vast majority of the work being done using Ruby scripts, some shell, the Amazon Ruby SDK and ERB templating. Specifically, you would call a shell script with some parameters (this could be done either manually or from our build system), that would call a Ruby script with those parameters and perhaps some other data divined from the filesystem, and the Ruby script would render the ERB template and make calls using the AWS Ruby SDK to S3 (build artifacts and more CloudFormation parameters stored in files) and CloudFormation to create or update the stack.

There were a lot of moving parts, data coming from a bunch of different places and not a consistent language or distribution of work between components. Adding to the distribution of configuration data amongst several different sources, passing this data between components at runtime as command line parameters, environment variables or embedded in JSON templates is also awkward at best.

How I manage my sanity

So, I’m pretty good at complaining about things, but what does my solution look like? One distinct advantage I have is that I tend to use Golang to write software, and hence end up with a single binary artifact from which the service is run. Most of the time there are no or few external supporting files so there is very little to ship and configure. But nevertheless…

  • Keep your data simple as possible. Build runtime secrets (and as much static “configuration”) into binaries as possible (you can use a trick like this with Golang) and keep the binaries secure.
  • Don’t use data in S3 or stored elsewhere to populate parameters. Use mappings, switch based on a pseudo-parameter, e.g. AWS::StackName to make it super simple and obvious where the data is coming from.
  • If you can tolerate Amazon Linux, and the cost of flexibility, do use the CloudFormation helper scripts.
  • There’s a temptation to make your build tooling run on multiple environments – especially if your software runs on Linux in AWS, and you use a Mac (or worse, Windows) as your working environment. Make the most common things automated completely for Linux and run as much of them in a build tool like Jenkins. Try to make the other things as platform agnostic as possible (Ruby, Python or Java are all fairly good and have Amazon SDKs).
  • Try to avoid the temptation of templating inside of templates!

When I next need to make some changes to this deployment system, I think I’ll be attempting to remove as much of the Ruby code as possible and relying solely on the Amazon Python CLI. This will allow us to remove the Ruby SDK requirement (which, unless you are a developer, has very little utility in the command-line realm) in favour of the Python CLI which can be used for many other things in addition to its integration in our deployment script. It’s also a lot lighter-weight than the previous JVM-based CLI utilities. I’ve already reduced the data sources down a lot, but these could be potentially reduced further, or if not, work to ensure that there are suitable mechanisms for tracing data transformations through the system so we know where data is coming from, and when it has changed.

These are just a few realisations after the latest round of bootstrapping refactoring I had to perform recently. What have your own experiences been like in bootstrapping nodes and keeping your sanity?

Tags: , , , , ,

Friday, March 7th, 2014 Tech 2 Comments