adrift on a cosmic ocean

Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.

AWS AutoScaling group size metrics (or lack thereof)

Posted by Oliver on the 17th of January, 2015 in category Tech
Tagged with: autoscalingawscloudwatchmetrics

One of the notably lacking metrics from CloudWatch has been the current and previous AutoScaling group sizes - in other words, how many nodes are in the cluster. I've worked around this by using the regular EC2 APIs, querying the current cluster size and the desired size and logging this to Graphite. However, it only gives you the current values - not anything in the past, which regular CloudWatch metrics do (up to 2 weeks in the past).

My colleague Sean came up with a nice workaround - using the SampleCount statistic of the CPUUtilization metric within a given AutoScaler group namespace. Here's an example, using the AWS Python CLI:

$ aws cloudwatch get-metric-statistics --dimensions Name=AutoScalingGroupName,Value=XXXXXXXXProdCluster1-XXXXXXXX --metric CPUUtilization --namespace AWS/EC2 --period 60 --statistics SampleCount --start-time 2015-01-17T00:00:00 --end-time 2015-01-17T00:05:00
{
    "Datapoints": [
        {
            "SampleCount": 69.0,
            "Timestamp": "2015-01-17T00:00:00Z",
            "Unit": "Percent"
        },
        {
            "SampleCount": 69.0,
            "Timestamp": "2015-01-17T00:01:00Z",
            "Unit": "Percent"
        },
        {
            "SampleCount": 69.0,
            "Timestamp": "2015-01-17T00:03:00Z",
            "Unit": "Percent"
        },
        {
            "SampleCount": 69.0,
            "Timestamp": "2015-01-17T00:02:00Z",
            "Unit": "Percent"
        },
        {
            "SampleCount": 67.0,
            "Timestamp": "2015-01-17T00:04:00Z",
            "Unit": "Percent"
        }
    ],
    "Label": "CPUUtilization"
}

Some things to note:

  • Ignore the units - it's not a percentage!
  • You will need to adjust your --period parameter to match that of your metric sampling period on the EC2 instances in the AutoScale group - if you have regular monitoring enabled this will be one sample per 5 minutes (300 seconds), if you have detailed monitoring enabled it will be one sample per 1 minute (60 seconds).
  • The last point also means that if you want to gather less frequent data points for historical data, you'll need to do some division - e.g. using --period 3600 will require you to divide the resulting sample count by 12 (regular monitoring) or 60 (detailed monitoring) before you store it.
  • Going via CloudWatch in this way means you can see your cluster size history for the last two weeks, just like any other CloudWatch metric!
  • Unfortunately you will lose your desired cluster size metric, which is not captured. In practice I haven't really required both desired and actual cluster size metrics.

We'll start using this almost immediately, as we can remove one crufty metric collection script in the process. Hope it also helps some of you out there in AWS land!

© 2010-2018 Oliver Hookins and Angela Collins