Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We currently reside in Sydney after almost a decade in Berlin, have three kids, and have far too little time to really justify having a blog.
Well, I almost made it two years between posts. Life, changing jobs, mental health, and the general state of the world will do that. I actually have a few blog posts I've got saved up for writing and it's a bit of a race between me penning them here or $work getting a technical blog site running (since invariably these kinds of posts always start from a $work problem).
Anyhoo, let's start with the first thing - nobody likes having private subnets in AWS. Having to separate your workloads into public and private subnets seems like a good idea - you have isolation, put the right resources (e.g. load balancers vs instances) into the right places and route to them properly. Only the things that need to be on the internet are, and the other things are not.
Sadly, NAT gateways and their associated costs get in the way. For all of your instances in a private subnet, any traffic you want to go to the internet must go via a NAT gateway, and you pay for them by the hour and by the GB of traffic. If you are clever, you remember to provision VPC endpoints for things like S3 and ECR and have that traffic not go via the NAT gateway, but for many purposes we are not so lucky and still end up paying a lot. If you divide up your network into too small subnets, you end up with a lot of NAT gateways and you suffer even more. At least one blog post has been written about this phenomenon.
Anecdotally, since last year at least (2021, a decade ago) I've been reading people talking about doing away with private subnets and just provisioning public subnets for their workloads. You have at least two mechanisms at your disposal for keeping security dealt with - not automatically provisioning public IPs for instances (although then they can't talk to the internet) and using security groups with strict ingress rules. I haven't yet found any blog posts discussing this as a blessed architectural pattern but people are definitely using it in production so I figured it was worth attempting.
In my infinite wisdom I provisioned an EKS cluster with only public subnets, to test this theory. How could there be any caveats?!? Load balancers are in the public subnets, worker nodes are in the public subnets... there are few of them, they are very large (in order to make use of AWS-VPC CNI and the IP addresses it will use for pods) and we just have our Internet gateway - no NAT gateways. Of course the security groups are set up sensibly - worker nodes can communicate out but only the control plane can talk to them on limited ports. Load balancers can talk to worker nodes, everything is fine.
Until this week. We discovered a use case where we need a workload to talk to another via the Ingress ELB which is internet-facing. These ELBs as well as the control plane are limited to only being reachable from our staff VPN so nothing else can get in - even the worker nodes. After completely misunderstanding the situation and the fix, and rubber ducky-ing it with several different people I realised the solution is just to add another security group to the ELB, which has an ingress rule allowing the worker node security group to talk to it over HTTPS. Simple, right?
Except it doesn't work. Remove the restrictions entirely (including those for the VPN) and it works fine; put them back and it's very not fine. Checking the Reachability Analyzer (my first time using it) indicates that the path is clear - it reported no problems. I checked out the VPC flow logs, and indeed could see that each of the flows were rejected. The path should look like this:
Pod ENI -> worker node public IP (shared across all pods) -> worker security group -> "Internet" -> ELB public IP -> ELB security group -> worker node private IP -> worker security group -> Pod ENI
It was getting rejected presumably at the ELB security group, since I'd made a change there and got the packets flowing, and with the "correct" rules it was not. You'll note I scare-quoted "Internet", because often AWS will talk about your traffic going over the internet when you use S3 from EC2 without a VPC endpoint, for example. I have no doubt that the traffic stays within the AWS network, but gets closer to their border at least. And they want to scare you into using VPC endpoints.
An earlier theory I had was that when the traffic gets onto the "Internet", the security group "annotations" (my own term here) on the packets are lost, and the rules I've put in place don't work. As it turns out, that's correct. I wish I could point you to the docs where it says this, but I couldn't find it and had to find out via support ticket. So, we can't predict what our public IPs from the worker nodes will be, and can't add them to the security group rules (although you could do some horrible automation using lambdas no doubt), and the security groups we already have in place can't be used to recognise the legitimate traffic.
It really does seem like this is a concrete caveat to the "only public subnets" model. It would be great if there was a bit more written about this design pattern to raise awareness of the trade-offs (or perhaps AWS could just stop charging for NAT gateways). For us I think we need to reconsider in this particular case, and go back to the split public/private subnet model in order to get some static NAT gateway elastic IPs to allow through security groups - which I expect to fill my Monday for the coming week 😒