adrift on a cosmic ocean

Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.

Understanding Upstart's logging options

Posted by Oliver on the 18th of May, 2014 in category Tech
Tagged with: consoleec2loggings3upstart

I have a couple of services I own that run in EC2 that, sadly, still lack any log aggregation. In-house we have our own system for this, that technically can also run in EC2 but for a variety of reasons I consider it overkill for my use case in EC2. Basically, the solution I am after is to do the minimum amount of work to see these logs aggregated to an S3 bucket, making them available for optional processing at a later time with Elastic MapReduce, if we ever have to do any analysis. This rarely happens, so the current solution of a distributed SSH command grepping the active machines' logs is mostly sufficient. If the logs you are interested in happened to be on a machine that was terminated as a result of an AutoScale group scale down action, well, that's what the log aggregation would solve!

The actual services running on these instances tend to run through Upstart. Notwithstanding the result of the recent Init Wars in the Debian community, I find Upstart to be an acceptable Init system, and certainly is easy enough to configure - drop a conf file in a reasonably minimal format into /etc/init and you are done. In recent enough versions (1.4 and later) you can even have Upstart take care of logging to a file for you, and by extension (since it drops a logrotate fragment onto your system) it also handles logrotation. If you are to read the official documentation for Upstart, however, you'd be forgiven for thinking the logging mechanism is somewhat magical as not many details are given away.

Why am I interested in this? I had to refresh my memory about this, but of course logrotation only happens at most once per day (it is run from cron.daily). For my services running in AutoScale groups, some instances last less than 24 hours, and so I would prefer to aggregate my logs at a minimum every hour. Without knowing the exact relationship between Upstart and logrotate, it is hard to say whether this would be possible. My intention is to have some other small cron job run every hour to pull the log out of use by Upstart (however that may be taking place), add timestamp and machine information to the filename, compress it, and upload it to an S3 bucket. Where to start?

Going back to the original development mailing list post from October 2011 gives some details. Like the documentation alludes to, Upstart creates a pseudo-terminal and connects the slave to the child process and the master to itself. This happens in init/job_process.c, which is worth a read to understand what is going on. Upstart is indeed taking on all handling of the logging rather than delegating to an external process. Let's confirm that our process is really not writing the log itself:

# service testprogram start
testprogram start/running, process 27671
# lsof -n -p 27671 | tail -n4
testprogr 27671 root    0u   CHR    1,3      0t0   4759 /dev/null
testprogr 27671 root    1u   CHR  136,1      0t0      4 /dev/pts/1
testprogr 27671 root    2u   CHR  136,1      0t0      4 /dev/pts/1
testprogr 27671 root  255r   REG  252,0      341 275311 /usr/local/bin/testprogram

OK, it's definitely connected to the pseudo-terminal for standard output and standard error. Is the init process connected to the log file?

# lsof -p 1 | grep testprogram
init      1 root   11w   REG              252,0 3086869601  10521 /var/log/upstart/testprogram.log

Great! So we know now, if we want to figure out log rotation semantics, we don't have to worry about our own process - only Upstart itself. So how do Upstart and logrotate cooperate? The logrotate fragment gives very little away:

/var/log/upstart/*.log {
        rotate 7

It is not doing anything special to signal to Upstart that logs have been rotated, for example in a prerotate or postrotate script. Having a look into init/log.c you can see that an Upstart Job contains a Log object which knows about the console log file that has been used for this particular service. It keeps the file-handle open so that any new data coming in from the service can just be written straight out to the file. There is some additional intelligence around out-of-disk conditions, and some buffering taken care of by the NIH utility library but these are orthogonal to the issue of log rotation.

When some input comes in, Upstart attempts to flush the data out to disk and in the process checks the status of the file-handle it already has open (with an fstat system call). If the fstat result indicates that there are no links to the file (i.e. it has been deleted but it is still open), Upstart will open a new file with the same name and flush the log data out to that file. Hence, when logrotate rotates the file to a new name, nothing happens. But since the Upstart logrotate config fragment contains the directive compress, it will invoke gzip which will delete the original file once the new compressed file has been completely written out. Upstart detects the deletion, opens a new file and continues logging. Mystery solved!

What does this look like in action? I made a very simple shell script which outputs a couple of phrases and a line number in a tight loop to test the rotation behaviour and the lossiness of Upstart's logging, as well as how much data is lost between the gzip compression completion and opening a new file. Initially I also added a bunch of signal traps to the script to see if Upstart was doing anything funky there (sending a hangup signal for example), but since the program itself does no logging, it doesn't need any signalling to be aware of the logging mechanism:

#!/bin/bash -e
let i=1
while true; do
        echo "${i} The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure."
        let i=i+1

Since there are no sleeps in between loops, this runs as fast as the data can be accepted on the pseudo-terminal. Evidently, this also causes quite some load on Upstart itself:

27754 root      20   0 17856 1484 1236 R 46.3  0.1   0:03.62 testprogram
    1 root      20   0 24380 2220 1296 S 45.6  0.1  15:36.35 init

You can see what Upstart is doing with strace:

select(15, [3 5 6 7 8 9 12 14], [], [7 8 9 12], NULL) = 1 (in [14])
read(14, "rn1403895 The quick brown fox ju"..., 8192) = 552
read(14, "rn1403899 The quick brown fox ju"..., 7640) = 138
read(14, 0x7f1027c3fbf2, 7502)          = -1 EAGAIN (Resource temporarily unavailable)
fstat(11, {st_mode=S_IFREG|0640, st_size=192626266, ...}) = 0
write(11, "rn1403895 The quick brown fox ju"..., 690) = 690
read(3, 0x7fff293210ff, 1)              = -1 EAGAIN (Resource temporarily unavailable)

So, it waits on the select call to indicate that one of its jobs has some output, makes some reads to collect that output from the pseudo-terminal, does the fstat on the file and then writes the buffered data out in one call. What happens when we delete the file in between writes?

select(15, [3 5 6 7 8 9 12 14], [], [7 8 9 12], NULL) = 1 (in [14])
read(14, "4 The quick brown fox jumped ove"..., 8192) = 132
read(14, 0x7f1027c3f9c4, 8060)          = -1 EAGAIN (Resource temporarily unavailable)
fstat(10, {st_dev=makedev(252, 0), st_ino=10521, st_mode=S_IFREG|0640, st_nlink=0, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=264, st_atime=2014/05/18-20:33:32, st_mtime=2014/05/18-20:33:42, st_ctime=2014/05/18-20:33:48}) = 0
close(10)                               = 0
umask(0117)                             = 0117
open("/var/log/upstart/testprogram.log", O_WRONLY|O_CREAT|O_APPEND|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC, 0740) = 10
write(10, "4 The quick brown fox jumped ove"..., 132) = 132
read(3, 0x7fff293210ff, 1)              = -1 EAGAIN (Resource temporarily unavailable)

This time I ran strace with the -v parameter to completely expand the system call output. I slowed the output down to one line per ten seconds, and then deleted the file right after one line had been flushed to disk. You can see that the fstat returns st_nlink=0, which then causes Upstart to close the filehandle, set the umask and then open the file by name again before writing the buffered log line out to the file. Great, that makes sense, but does it work as well when there is a high volume of data coming in?

Removing the sleep in the script, I let the log build up to about 2GB before doing anything. For reference, it was writing about 3MB/s to the log which is not a huge amount, but certainly enough for a single service even if behaving badly. To simulate realistic rotation conditions, I simply invoked gzip on the log file to allow it to compress and then delete the original file. Taking a look at the end of the compressed file and the beginning of the new file (where we would expect to see some losses) this is the result:

# zcat testprogram.log.gz | tail -n1
18693551 The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure.#
# head -n2 testprogram.log

18693552 The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure.

So the interesting thing here is that somehow the newline character is dropped off the end of the first file and ends up at the start of the second file, but most importantly no lines have been missed and in fact we have all bytes intact between the rotated file and the current file. I'm not positive this doesn't break down under higher rates of logging, and of course Upstart was very heavily loaded under just this amount of logging but for my purposes which can tolerate some losses it certainly suffices. Another worrying issue would be the high rate of fstat calls (one per log flush) which don't help the performance of Upstart's logging, however for low logging rates it shouldn't have much impact.

So the next step is to actually put this into practice, which involves writing a prerotate script that ignores the log files of the services I am interested in (so they don't get rotated by both logrotate and my own log handling), then write a bit of glue code to compress and upload them to S3 hourly. It may seem like a lot more effort but I'm really trying to avoid another solution that requires external services to manage the logs (e.g. remote syslog, fluentd, Logstash connected to ElasticSearch etc). If this current idea fails or turns out to be more effort than antipated, well, I guess I'll go down the logging service path instead.

© 2010-2018 Oliver Hookins and Angela Collins