s3

Understanding Upstart’s logging options

by Oliver on Sunday, May 18th, 2014.

I have a couple of services I own that run in EC2 that, sadly, still lack any log aggregation. In-house we have our own system for this, that technically can also run in EC2 but for a variety of reasons I consider it overkill for my use case in EC2. Basically, the solution I am after is to do the minimum amount of work to see these logs aggregated to an S3 bucket, making them available for optional processing at a later time with Elastic MapReduce, if we ever have to do any analysis. This rarely happens, so the current solution of a distributed SSH command grepping the active machines’ logs is mostly sufficient. If the logs you are interested in happened to be on a machine that was terminated as a result of an AutoScale group scale down action, well, that’s what the log aggregation would solve!

The actual services running on these instances tend to run through Upstart. Notwithstanding the result of the recent Init Wars in the Debian community, I find Upstart to be an acceptable Init system, and certainly is easy enough to configure – drop a conf file in a reasonably minimal format into /etc/init and you are done. In recent enough versions (1.4 and later) you can even have Upstart take care of logging to a file for you, and by extension (since it drops a logrotate fragment onto your system) it also handles logrotation. If you are to read the official documentation for Upstart, however, you’d be forgiven for thinking the logging mechanism is somewhat magical as not many details are given away.

Why am I interested in this? I had to refresh my memory about this, but of course logrotation only happens at most once per day (it is run from cron.daily). For my services running in AutoScale groups, some instances last less than 24 hours, and so I would prefer to aggregate my logs at a minimum every hour. Without knowing the exact relationship between Upstart and logrotate, it is hard to say whether this would be possible. My intention is to have some other small cron job run every hour to pull the log out of use by Upstart (however that may be taking place), add timestamp and machine information to the filename, compress it, and upload it to an S3 bucket. Where to start?

Going back to the original development mailing list post from October 2011 gives some details. Like the documentation alludes to, Upstart creates a pseudo-terminal and connects the slave to the child process and the master to itself. This happens in init/job_process.c, which is worth a read to understand what is going on. Upstart is indeed taking on all handling of the logging rather than delegating to an external process. Let’s confirm that our process is really not writing the log itself:

# service testprogram start
testprogram start/running, process 27671
# lsof -n -p 27671 | tail -n4
testprogr 27671 root    0u   CHR    1,3      0t0   4759 /dev/null
testprogr 27671 root    1u   CHR  136,1      0t0      4 /dev/pts/1
testprogr 27671 root    2u   CHR  136,1      0t0      4 /dev/pts/1
testprogr 27671 root  255r   REG  252,0      341 275311 /usr/local/bin/testprogram


OK, it’s definitely connected to the pseudo-terminal for standard output and standard error. Is the init process connected to the log file?

# lsof -p 1 | grep testprogram
init      1 root   11w   REG              252,0 3086869601  10521 /var/log/upstart/testprogram.log


Great! So we know now, if we want to figure out log rotation semantics, we don’t have to worry about our own process – only Upstart itself. So how do Upstart and logrotate cooperate? The logrotate fragment gives very little away:

/var/log/upstart/*.log {
        daily
        missingok
        rotate 7
        compress
        notifempty
	nocreate
}


It is not doing anything special to signal to Upstart that logs have been rotated, for example in a prerotate or postrotate script. Having a look into init/log.c you can see that an Upstart Job contains a Log object which knows about the console log file that has been used for this particular service. It keeps the file-handle open so that any new data coming in from the service can just be written straight out to the file. There is some additional intelligence around out-of-disk conditions, and some buffering taken care of by the NIH utility library but these are orthogonal to the issue of log rotation.

When some input comes in, Upstart attempts to flush the data out to disk and in the process checks the status of the file-handle it already has open (with an fstat system call). If the fstat result indicates that there are no links to the file (i.e. it has been deleted but it is still open), Upstart will open a new file with the same name and flush the log data out to that file. Hence, when logrotate rotates the file to a new name, nothing happens. But since the Upstart logrotate config fragment contains the directive compress, it will invoke gzip which will delete the original file once the new compressed file has been completely written out. Upstart detects the deletion, opens a new file and continues logging. Mystery solved!

What does this look like in action? I made a very simple shell script which outputs a couple of phrases and a line number in a tight loop to test the rotation behaviour and the lossiness of Upstart’s logging, as well as how much data is lost between the gzip compression completion and opening a new file. Initially I also added a bunch of signal traps to the script to see if Upstart was doing anything funky there (sending a hangup signal for example), but since the program itself does no logging, it doesn’t need any signalling to be aware of the logging mechanism:

#!/bin/bash -e
let i=1
while true; do
        echo "${i} The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure."
        let i=i+1
done


Since there are no sleeps in between loops, this runs as fast as the data can be accepted on the pseudo-terminal. Evidently, this also causes quite some load on Upstart itself:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27754 root      20   0 17856 1484 1236 R 46.3  0.1   0:03.62 testprogram
    1 root      20   0 24380 2220 1296 S 45.6  0.1  15:36.35 init


You can see what Upstart is doing with strace:

select(15, [3 5 6 7 8 9 12 14], [], [7 8 9 12], NULL) = 1 (in [14])
read(14, "\r\n1403895 The quick brown fox ju"..., 8192) = 552
read(14, "\r\n1403899 The quick brown fox ju"..., 7640) = 138
read(14, 0x7f1027c3fbf2, 7502)          = -1 EAGAIN (Resource temporarily unavailable)
fstat(11, {st_mode=S_IFREG|0640, st_size=192626266, ...}) = 0
write(11, "\r\n1403895 The quick brown fox ju"..., 690) = 690
read(3, 0x7fff293210ff, 1)              = -1 EAGAIN (Resource temporarily unavailable)
waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WSTOPPED|WCONTINUED, NULL) = 0


So, it waits on the select call to indicate that one of its jobs has some output, makes some reads to collect that output from the pseudo-terminal, does the fstat on the file and then writes the buffered data out in one call. What happens when we delete the file in between writes?

select(15, [3 5 6 7 8 9 12 14], [], [7 8 9 12], NULL) = 1 (in [14])
read(14, "4 The quick brown fox jumped ove"..., 8192) = 132
read(14, 0x7f1027c3f9c4, 8060)          = -1 EAGAIN (Resource temporarily unavailable)
fstat(10, {st_dev=makedev(252, 0), st_ino=10521, st_mode=S_IFREG|0640, st_nlink=0, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=264, st_atime=2014/05/18-20:33:32, st_mtime=2014/05/18-20:33:42, st_ctime=2014/05/18-20:33:48}) = 0
close(10)                               = 0
umask(0117)                             = 0117
open("/var/log/upstart/testprogram.log", O_WRONLY|O_CREAT|O_APPEND|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC, 0740) = 10
write(10, "4 The quick brown fox jumped ove"..., 132) = 132
read(3, 0x7fff293210ff, 1)              = -1 EAGAIN (Resource temporarily unavailable)
waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WSTOPPED|WCONTINUED, NULL) = 0


This time I ran strace with the -v parameter to completely expand the system call output. I slowed the output down to one line per ten seconds, and then deleted the file right after one line had been flushed to disk. You can see that the fstat returns st_nlink=0, which then causes Upstart to close the filehandle, set the umask and then open the file by name again before writing the buffered log line out to the file. Great, that makes sense, but does it work as well when there is a high volume of data coming in?

Removing the sleep in the script, I let the log build up to about 2GB before doing anything. For reference, it was writing about 3MB/s to the log which is not a huge amount, but certainly enough for a single service even if behaving badly. To simulate realistic rotation conditions, I simply invoked gzip on the log file to allow it to compress and then delete the original file. Taking a look at the end of the compressed file and the beginning of the new file (where we would expect to see some losses) this is the result:

# zcat testprogram.log.gz | tail -n1
18693551 The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure.#
# head -n2 testprogram.log

18693552 The quick brown fox jumped over the lazy dog. She sells seashells by the sea shore. The shells she sells are seashells I'm sure.


So the interesting thing here is that somehow the newline character is dropped off the end of the first file and ends up at the start of the second file, but most importantly no lines have been missed and in fact we have all bytes intact between the rotated file and the current file. I’m not positive this doesn’t break down under higher rates of logging, and of course Upstart was very heavily loaded under just this amount of logging but for my purposes which can tolerate some losses it certainly suffices. Another worrying issue would be the high rate of fstat calls (one per log flush) which don’t help the performance of Upstart’s logging, however for low logging rates it shouldn’t have much impact.

So the next step is to actually put this into practice, which involves writing a prerotate script that ignores the log files of the services I am interested in (so they don’t get rotated by both logrotate and my own log handling), then write a bit of glue code to compress and upload them to S3 hourly. It may seem like a lot more effort but I’m really trying to avoid another solution that requires external services to manage the logs (e.g. remote syslog, fluentd, Logstash connected to ElasticSearch etc). If this current idea fails or turns out to be more effort than antipated, well, I guess I’ll go down the logging service path instead.

Tags: , , , ,

Sunday, May 18th, 2014 Tech 5 Comments

Can’t create new network sockets? Maybe it isn’t user limits…

by Oliver on Thursday, February 28th, 2013.

I’ve been doing a lot more programming in Go recently, mostly because it has awesome concurrency primitives but also because it is generally a pretty amazing language. Unlike other languages which have threads, fibres or event-driven frameworks to achieve good concurrency, Go manages to avoid all of these but still remain readable. You can also reason about its behaviour very effectively due to how easily understandable and straightforward concepts like channels and goroutines are.

But enough about Go (for the moment). Recently I found the need to quickly duplicate the contents of one Amazon S3 bucket to another. This would not be a problem, were it not for the fact that the bucket contained several million objects in it. Fortunately, there are two factors which makes this not so daunting:

  1. S3 scales better than your application ever can, so you can throw as many requests at it as you like.
  2. You can copy objects between buckets very easily with a PUT request combined with a special header indicating the object you want copied (you don’t need to physically GET then PUT the data).

A perfect job for a Go program! The keys of the objects are in a consistent format, so we can split up the keyspace by prefixes and split the work-load amongst several goroutines. For example, if your objects are named 00000000 through to 99999999 using only numerical characters, you could quite easily split this into 10 segments of 10 million keys. Using the bucket GET method you can retrieve up to 1000 keys in a batch using prefixes. Even if you split into 10 million key segments and there aren’t that many actual objects, the only things that matter are that you start and finish in the right places (the beginning and end of the segment) and continue making batch requests until you have all of the keys in that part of the keyspace.

So now we have a mechanism for rapidly retrieving all of the keys. For millions of objects this will still take some time, but you have divided the work amongst several goroutines so it will be that much faster. For comparison, the Amazon Ruby SDK uses the same REST requests under the hood when using the bucket iterator bucket.each { |obj| … } but only serially – there is no division of work.

Now to copy all of our objects we just need to take each key return by the bucket GET batches, and send off one PUT request for each one. This introduces a much slower process – one GET request results in up to 1000 keys, but then we need to perform 1000 PUTs to copy them. The PUTs also take quite a long time each, as the S3 backend has to physically copy the data between buckets – for large objects this can still take some time.

Let’s use some more concurrency, and have a pool of 100 goroutines waiting to process the batch of 1000 keys just fetched. A recent discussion on the golang-nuts group resulted in some good suggestions from others in the Go community and resulted in this code:


It’s not a lot of code, which makes me think it is reasonably idiomatic and correct Go. Best yet, it has the possibility to scale out to truly tremendous numbers of workers. You may notice that each of the workers also uses the same http.Client and this is intentional – internally the http.Client makes some optimisations around connection reuse so that you aren’t susceptible to the performance penalty of socket creation and TCP handshakes for every request. Generally this works pretty well.

Let’s think about system limits now. Say we want to make our PUT copy operations really fast, and use 100 goroutines for these operations. With just 10 fetcher goroutines that means we now have 1000 goroutines vying for attention from the http.Client connection handling. Even if the fetchers are idle, if we have all of the copier workers running at the same time, we might require 1000 concurrent TCP connections. With a default user limit of 1024 open file handles (e.g. on Ubuntu 12.04) this means we are dangerously close to exceeding that limit.

Head http://mybucket.s3.amazonaws.com:80/: lookup mybucket.s3.amazonaws.com: no such host

When you see an error like the above pop up in your program’s output, it almost seems a certainty that you have exceeded these limits… and you’d be right! For now… Initially these were the errors I was getting, and while it was somewhat mysterious that I would see so many of them (literally one for each failed request), apparently some additional sockets are required for name lookups (even if locally cached). I’m still looking for a reference for this, so if you know of it please let me know in the comments.

This resulted in a second snippet of Go code to check my user limits:

Using syscall.Getrusage in conjunction with syscall.Getrlimit would allow you to fairly dynamically scale your program to use just as much of the system resources as it has access to, but not overstep these boundaries. But remember what I said about using http.Client before? The net/http package documentation says Clients should be reused instead of created as needed and Clients are safe for concurrent use by multiple goroutines and both of these are indeed accurate. The unexpected side-effect of this is that, unfortunately, the usage of TCP connections is now fairly opaque to us. Thus our understanding of current system resource usage is fundamentally detached from how we use http.Client. This will become important in just a moment.

So, having raised my ulimits far beyond what I expected I actually needed (this was to be the only program running on my test EC2 instance anyway), I re-ran the program and faced another error:

Error: dial tcp 207.171.163.142:80: cannot assign requested address

What the… I thought I had dealt with user limits? I didn’t initially find the direct cause of this, thinking I hadn’t properly dealt with the user limits issue. I found a few group discussion threads dealing with http.Client connection reuse, socket lifetimes and related topics, and I first tried a few different versions of Go, suspecting it was a bug fixed in the source tip (more or less analogous to HEAD on origin/master in Git, if you mainly use that VCVS). Unfortunately this yielded no fix and no additional insights.

I had been monitoring open file handles of the process during runtime and noticed it had never gone over about 150 concurrent connections. Using netstat on the other hand, showed that there were a significant number of connections in the TIME_WAITstate. This socket state is used by the kernel to leave a trace of the connection around in case there are duplicate packets on the network waiting to arrive (among other things). In this state the socket is actually detached from the process that created it, but waiting for kernel cleanup – therefore it actually doesn’t count as an open file handle anymore, but that doesn’t mean it can’t cause problems!

In this case I was connecting to Amazon S3 from a single IP address – the only one configured on the EC2 instance. S3 itself has a number of IP addresses on both East and West coasts, rotated automatically through DNS-based load-balancing mechanisms. However, at any given moment you will resolve a single IP address and probably use that for a small period of time before querying DNS again and perhaps getting another IP. So we can basically say we have one IP contacting another IP – and this is where the problem lies.

When an IPv4 network socket is created, there are five basic elements the kernel uses to make it unique among all others on the system:

protocol; local IPv4 address : local IPv4 port <-> remote IPv4 address : remote IPv4 port

Given roughly 2^27 possibilities for local IP (class A,B,C), the same for remote IP and 2^16 for each of the local and remote ports (assuming we can use any privileged ports < 1024 if we use the root account), that gives us about 2^86 different combinations of numbers and thus number of theoretical IPv4 TCP sockets a single system could keep track of. That’s a whole lot! Now consider that we have a single local IP on the instance, we have (for some small amount of time) a single remote IP for Amazon S3, and we are reaching it only over port 80 – now three of our variables are reduced to a single possibility and we only have the local port range to make use of.

Worse still, the default setting (for my machine at least) of the local port range available to non-root users was only 32768-61000, which reduced my available local ports to less than half of the total range. After watching the output of netstat and grepping for TIME_WAIT sockets, it was evident that I was using up this odd 30000 local ports within a matter of seconds. When there are no remaining local port numbers to be used, the kernel simply fails to create a network socket for the program and returns an error as in the above message – cannot assign requested address.

Armed with this knowledge, there are a couple of kernel tunings you can make. Tcp_tw_reuse and tcp_tw_recycle both are related to tunings to the kernel which affect when it will reclaim sockets in the TIME_WAIT state, but practically this didn’t seem to have much effect. Another setting, tcp_max_tw_buckets sets a limit on the total number of TIME_WAIT sockets and actively kills them off rapidly after the count exceeds this limit. All three of these parameters look and sound slightly dangerous, and despite them having had not much effect I was loath to use them and call the problem solved. After all, if the program was killing the connections and leaving them for the kernel to clean up, it didn’t sound like http.Client was doing a very good job of reusing connections automatically.

Incidentally, Go does support automatic reuse of connections in TIME_WAIT with the SO_REUSEADDR socket option, but this only applies to listening sockets (i.e. servers).

Unfortunately that brought me about to the end of my inspiration, but a co-worker pointed me in the direction of the http.Transport’s MaxIdleConnsPerHost parameter, which I was only vaguely aware of due to having skimmed the source of that package in the last couple of days, desperately searching for clues. The default value used here is two (2) which seems reasonable for most applications, but evidently is terrible when your application has large bursts of requests rather than a constant flow. I believe that internally, the transport creates as many connections as required, the requests are processed and closed and then all of those connections (but two) are terminated again, left in TIME_WAIT state for the kernel to deal with. Just a few cycles of this need to repeat before you have built up tens of thousands of sockets in this state.

Altering the value of MaxIdleConnsPerHost to around 250 immediately removed the problem, and I didn’t see any sockets in TIME_WAIT state while I was monitoring the program. Shortly thereafter the program stopped functioning, I believe because my instance was blacklisted by AWS for sending too many requests to S3 in a short period of time – scalability achieved!

If there are any lessons in this, I guess it is that you still often need to be aware of what is happening at the lowest levels of the system even if your programming language or application has abstracted enough of the details away for you not to have to worry about them. Even knowing that there was an idle connection limit of two would not have given away the whole picture of the forces at play here. Go is still my favourite language at the moment and I was glad that the fix was relatively simple, and I still have a very understandable codebase with excellent performance characteristics. However, whenever the network and remote services with variable performance characteristics are involved, any problem can take on large complexity.

Tags: , , , , , , , ,

Thursday, February 28th, 2013 Tech 8 Comments

Personal off-site backups

by Oliver on Saturday, December 29th, 2012.

Unlike many, I’m actually a good boy and do backups of my personal data (for which I can mostly thank my obsessive-compulsive side). However, up until now I’ve been remiss in my duties to also take these backups off-site in case of fire, theft, acts of god or gods etc. Without a tape system or rotation of hard drives (not to mention an actual “off-site” site to store them), this ends up being a little tricky to pull off.

Some of my coworkers and colleagues make use of various online backup services, a lot of which are full-service offerings with a custom client or fixed workflow for performing the backups. At least one person I know backs up (or used to) to Amazon S3 directly; but even in the cheapest of their regions, the cost is significant for what could remain an effectively cold backup. It may be somewhat easier to swallow now that they have recently reduced their pricing across the board.

Glacier is a really interesting offering from Amazon that I’ve been playing with a bit recently, and while its price point is squarely aimed at businesses who want to back up really large amounts of data, it also makes a lot of sense for personal backups. Initially the interface was somewhat similar to what you would expect from a tape system – collect your files together as a vaguely linear archive and upload it with some checksum information. I was considering writing a small backup tool that would make backing up to Glacier reasonably simple but didn’t quite get around to it in time.

Fortunately for me, waiting paid off as they recently added support for transitioning S3 objects to Glacier automatically. This means you get to use the regular S3 interface for uploading and downloading individual objects/files, but allow the automatic archival mechanism to move them into Glacier for long-term storage. This actually makes the task of performing cost-effective remote backups ridiculously trivial but I still wrote a small tool to automate it a little bit.

Hence, glacier_backup. It just uses a bit of Ruby, the Amazon Ruby SDK (which is a very nice library, incidentally), ActiveRecord and progressbar. Basically, it just traverses directories you configure it with and uploads any readable file there to S3, after setting up a bucket of your choosing and setting a policy to transition all objects to Glacier immediately. Some metadata is stored locally using ActiveRecord, not because it is necessary (you can store a wealth of metadata on S3 objects themselves), but each S3 request costs something, so it’s helpful to avoid making requests if it is not necessary.

It’s not an amazing bit of code but it gets the job done, and it is somewhat satisfying to see the progress bar flying past as it archives my personal files up to the cloud. Give it a try, if you have a need for remote backups. Pull requests or features/issues are of course welcome, and I hope you find it useful!

Tags: , , , , ,

Saturday, December 29th, 2012 Tech No Comments

Amazon S3 object deletions and Multi-Factor Authentication

by Oliver on Sunday, October 7th, 2012.

I’ve been using S3 a lot in the last couple of months, and with the Amazon SDK for Ruby it really is dead simple to work with (as well as all of the other AWS services the SDK supports currently). So simple in fact, that you could quite easily delete all of your objects with very little work indeed. I did some benchmarks and found that (with batch operations) it took around 3 minutes to delete ~75000 files in about a terabyte. Single threaded.

Parallelize that workload and you could drop everything in your S3 buckets within a matter of minutes for just about any number of objects. Needless to say, if a hacker gets your credentials an extraordinary amount of damage can be done very easily and in a very short amount of time. Given there is often a several hour lag in accesses being logged, you’ll probably not find out about such accesses until long after the fact. Another potential cause of deletions is of course human error (and this is generally way more probable). In both cases there is something you can do about it.

S3 buckets have supported versioning for well over two years now, and if you use SVN, Git, or some other version control system then you’ll already understand how it works. The access methods of plain objects and their versions do differ slightly but the principle ideas are the same (object access methods generally operate on only the latest, non-deleted version). With versioning you can already protect yourself against accidental deletion, since you can revert to the last non-deleted version at any time.

However there is nothing preventing you from deleting all versions of a file, and with it all traces that that file ever existed. This is an explicit departure from the analogy with source versioning systems, as any object with versions still present will continue to cost you real money (even if the latest version is a delete marker). So, you can add Multi-Factor Authentication to your API access to S3 and secure these version deletion operations.

This has existed in the web API for some time but I recently had a commit merged into the official SDK that allows you to enable MFA Delete on a bucket, and there is another one in flight which will allow you to actually use the multi-factor tokens in individual delete requests. The usage is slightly interesting so I thought I’d demonstrate how it is done in Ruby, and some thoughts on its potential use cases. If you want to use it now, you’ll have to pull down my branch (until the pull request is merged).

Enabling MFA

I won’t go into details about acquiring the actual MFA device as it is covered in sufficient detail in the official documentation but suffice it to say that you can buy an actual hardware TOTP token, or use Amazon’s or Google’s “virtual” MFA applications for iPhone or Android. Setting them up and associating them with an account is also fairly straightforward (as long as you are using the AWS console; the command line IAM tools are another matter altogether).

Setting up MFA Delete on your bucket is actually quite trivial:

require 'rubygems'
require 'aws-sdk'
s3 = AWS::S3.new(:access_key_id => 'XXXX', :secret_access_key => 'XXXX')
bucket = s3.buckets['my-test-bucket']
bucket.enable_versioning(:mfa_delete => 'Enable', :mfa => 'arn:aws:iam::123456789012:mfa/root-account-mfa-device 123456')

Behind the scenes, this doesn’t do much different to enabling versioning without MFA. It adds a new element to the XML request which requests that MFA Delete be enabled, and adds a header containing the MFA device serial number and current token number. Importantly (and this may trip you up if you have started using IAM access controls), only the owner of a bucket can enable/disable MFA Delete. In the case of a “standard” account and delegated IAM accounts under it, this will be the “standard” account (even if one of the sub-accounts was used to create the bucket).

Version Deletion with MFA

Now, it is still possible to delete objects but not versions. Version deletion looks much the same but requires the serial/token passed in if MFA Delete is enabled:

require 'rubygems'
require 'aws-sdk'
s3 = AWS::S3.new(:access_key_id => 'XXXX', :secret_access_key => 'XXXX')
bucket = s3.buckets['my-test-bucket']
bucket.versions['itHPX6m8na_sog0cAtkgP3QITEE8v5ij'].delete(:mfa => 'arn:aws:iam::123456789012:mfa/root-account-mfa-device 123456')

As mentioned above there are some limitations to this (as you’ve probably guessed):

  • Being a TOTP system, tokens can be used only once. That means you can delete a single version with a single token, no more. Given that on Google Authenticator and Gemalto physical TOTP devices a token is generated once every 30 seconds it may take up to a minute to completely eradicate all traces of an object that was deleted previously (original version + delete marker).
  • Following on from this, it is almost impossible to consider doing large numbers of deletions. There is a batch object deletion method inside of AWS::S3::ObjectCollection but this is not integrated with any of the MFA Delete mechanisms. Even then, you can only perform batches of 1000 deletions at a time.

As it stands, I’m not sure how practical it is. MFA involves an inherently human-oriented process as it is involves something you have rather than something you are or something you know (both of which are reasonably easily transcribed once into a computer). Given the access medium is an API designed for rapid, lightweight use there seems to be an impedance mismatch. Still, with some implementation to get the batch deletions working it would probably serve a lot of use cases still.

Are you using MFA Delete (through any of the native APIs or other language SDKs, or even 3rd-party apps)? I would love to hear about other peoples’ experiences with it – leave your comments below.

Tags: , , , , ,

Sunday, October 7th, 2012 Tech 6 Comments