I can see why you might not want to, though. App ELBs charge by usage and can get somewhat expensive (like running another EC2 instance or two). They can also have cold-start performance issues in specific circumstances (traffic spikes).
That doesn't solve the problem of the hostname on the EC2 instance itself being the same across all instances thereby making it harder to see what logs came from what hosts.
It doesn't solve the problem of allowing you to look at logs and then quickly SSH'ing to a single machine in the ASG.
Install a log agent on the machine like fluentd. Have it inject the host ip and other contextual meta data in to the logs then forward to your central log system?
When you see the error message in your logs, you get the internal ip and can ssh in.
Persistent internal ip’s/hostnames also means you are not treating hosts as ephemeral. It’s always good in the cloud to get things to a point you can just blow away instances and they auto recreate. It’s even possible with traditional services requiring persistent storage. Put the storage on a seperate volume and have the instance startup scripts discover available volumes and attach as required.
what you need is probably something like https://github.com/adhocteam/ec2ssh (I never used it, but I have built similar ones) -- and then you tag the log entries with instance id.
so you can do "ec2ssh i-0017c8b3"
imho: Hacking around debugging tools is better (mostly because more reliable) than hacking around production configurations (one problem you will see is that changing route53 records frequently will be subject to API rate-limits).
We absolutely put an ELB/ALB in front of these ASGs as well. The post mentions a few use cases where unique hostnames with internal Route53 records are helpful for us.
Can't this be solved by using IP addresses for hostnames? This can be a part of bootstrap script(which ASG/Launch Configuration already supports via UserData[1])
What I can't understand is -
If your logs are in ELK and metrics in prometheus/grafana - why do you need SSH access? Sounds like thats a good problem to solve
SSH access is a last resort, but it can be necessary in certain cases. For example, if our log forwarding breaks. SSH is also just one example, it can also be helpful to curl endpoints on the host directly without hitting the ELB/ALB.
The post actually provides the user_data script we use.
I do this already but from my configuration management system on my instances, but one thing I don't have that I'd love if Route53 would help support is being able to run route53 to handle in-addr.arpa zones for my IP addresses so I can get reverse IP looking for my VPC networks without having to run my own resolver.
I wrote and use this regularly, docker app that adds and removes instances from route53, similar to their terraform solution. Similar idea, different implementation.
I brought up that point since I think most developers prefer the user experience of Lambda/Kubernetes where they don't have to manage individual instances in Auto Scaling Groups. They certainly are not 'outdated' for our use cases, and especially not for those responsible for running the underlying infrastructure (when running Kubernetes nodes).
Why should an instance created by an ASG have a host name? These are cattle not pets. I use Serilog for logging with an EC2 enricher that automatically adds the instance Id and the IP address.
Since Serilog does structured logging, I can use either an ElasticSearch or Mongo sink and do complex queries.
If I routinely need to log into an instance to troubleshoot, I need to be capturing data and sending it to a central logging system.
I haven't had to manage SSH keys in a long time ;)
With this I just have a bash function for my various environments (e.g. dev = dssm) where I provide in the instance ID giving me issues if I really need to log into the server.
And I'm dropped into a shell. SSM Session manager is far from perfect, but it gets the job done, and is fully auditable, gets logged (including commands ran), and best of all works with SAML IAM profiles right out the gate. No more sharing keys, no more managing keys, it's great!
That’s the second part. If I’m troubleshooting by logging into EC2 instances, there is something wrong with my logging infrastructure. That’s actually the larger issue.
SSH access is absolutely a last resort, but can be necessary in certain cases (like when Filebeat breaks...). Turning SSH off completely (i.e. "No SSH") is certainly better for security and something we may pursue.
I mentioned in another comment here that SSH is just one example, we can also easily hit endpoints with curl via hostname.
Also mentioned in the post are other tools (like Grafana dashboards) have an expectation of unique hostnames.
Of course there are other ways both using AWS and third party services. Centralized logging is a solved problem.
AWS isn’t going to run out of disk space any time soon. You could also use a lifecycle policy to delete old logs or move them to a lower cost storage depending on your retention policy.
I’m not saying that I have never had to log on to a VM to troubleshoot, but that’s a sign of the need of better logging.
And if my logging infrastructure isn’t good, how pray tell will I troubleshoot my programs running on Lambda or Fargate?
It’s not a problem at all with lambda or Fargate. Logging can be as simple as printing to the console and they go to CloudWatch.
It’s the same concept. If you’re troubleshooting at any point involves needing to log in to an EC2 instance, you might as well have a few bespoke servers called “Web01” and “Web02”. You’re just using ASG to create pets at scale. We run an ASG in production that scales from 2 to 30 instances based on the number of messages in a queue, lambdas running all of the time, some a Fargate tasks etc. it would be a nightmare to troubleshoot all of those processes without centralized, queryable logs.
In my experience, Fargate isn't very commonly used and Lambda is used for only relatively simple things.
And that experience is representative of the entire AWS ecosystem?
I agree, I wouldn't want it any other way nowadays, but back then I had to migrate a lot of legacy system to AWS under pressure.
For one part we had a legacy service needing to connect to the services in the ASG and the best way to implement it was with round-robin DNS. So the lambda would update a DNS record contianing all the ASG host ips.
Also, because we had some had some semi stateful legacy instances that where basically lift and shift to AWS, but I wanted to have them in ASG to keep our environment similar until we could refactor them into real cattle.
I don't remember exactly. We did use elb's for all other services. So it was either cost or it had to do with MX record restrictions in that you're not allowed to use CNAMEs in MX records.
Or a way to get you more familiar with tagging, or the various queries and filters on different api results. It's annoying at first, but it leads to less reliance on the console and more effective scripting. (Instead of naming my instances I just made a script which looks up the instance I want and outputs the IP and username, and put that in an SSH config)