EC2 Instance Connect

ranman · on June 28, 2019

I made a little twitter thread here (shameless ploy for followers) (I work for AWS): https://twitter.com/jrhunt/status/1144402767890436096

* Works on amzn linux 2 - installed by default on newer versions

* otherwise: $ sudo yum install ec2-instance-connect

* The SSH public keys are only available for one-time use for 60 seconds in the instance metadata.

* you can send up your own SSH keys `aws ec2-instance-connect send-ssh-public-key`

* cloudtrail logs connections for auditing

* doesn't support tag based auth but it's on the roadmap

* plans to enable it in popular linux distros in addition to amzn linux 2

Install local client:

$ aws s3 cp s3://ec2-instance-connect/cli/ec2instanceconnectcli-latest.tar.gz .

$ pip install ec2instanceconnectcli-latest.tar.gz

$ mssh instanceid

iconara · on June 28, 2019

Great to hear that tag based auth is coming. I'm at a loss about how to use it without something like that. It looks like you either have to handle each instance individually (which makes no sense where AWS has been pushing auto scaling and spot instances for a decade – instances are ephemeral in our world), or have one rule that applies to everything in the account. To me, being able to limit access to groups of instances is a required feature.

forty · on June 28, 2019

Is there an easy way to integrate that with ansible style scripts (where it needs to ssh to many instances at once)? Or is it planned?

jonesetc · on June 28, 2019

should be a pretty simple little connection plugin: https://docs.ansible.com/ansible/latest/dev_guide/developing...

hitpointdrew · on June 28, 2019

>otherwise: $ sudo yum install ec2-instance-connect

I assume that "sudo apt install ec2-instance-connect" will also work?

KaiserPro · on June 28, 2019

Only if its pushed to the correct repo. AL2 has amazon specific stuff in its repos, Ubuntu does, eventually

bogomipz · on June 28, 2019

Thanks for the bullet points. Can you explain what "tag based auth" is and how it will work?'

sandGorgon · on June 28, 2019

About time. This is the other thing that GCP does so well and I was puzzled that AWS still couldnt do - just add more than one keys to a EC2 instance through the API (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-inst...).

There are tons of support questions about "how can I add multiple SSH keys to my EC2 instances".

Now if only AWS brings in "projects". That's the last usability edge that GCP has.

SmirkingRevenge · on June 28, 2019

I really miss projects and folders (current firm is AWS, previous was GCP). I find GCP more usable on a number of other fronts though.

Whenever there's a service that maps to the other, I just always seem to find the GCP service easier/faster to learn and use effectively. Bigquery, stackdriver, pubsub, dataproc, compute, load balancer, et al. Getting stuff done with those is miles easier, in my experience than the comparable AWS offerings, at least if you don't already have extensive experience with one over the other.

_wmd · on June 28, 2019

Note the GCP equivalent requires some permanently running crapware inside your VM, the OpenSSH hook EC2 are using is much simpler

sandGorgon · on June 28, 2019

GCP open sourced the code here - https://github.com/GoogleCloudPlatform/compute-image-package...

is the AWS code opensource ?

cthalupa · on June 28, 2019

It is.

Client: https://github.com/aws/aws-ec2-instance-connect-cli

Server: https://github.com/aws/aws-ec2-instance-connect-config

bdcravens · on June 28, 2019

I’m puzzled. Many of the comments here seem focused on browser-based ssh which isn’t new, or even the most significant thing here. Using IAM instead instead of passing around .pem files feels like a huge improvement.

ilogik · on June 28, 2019

that also isn't new: https://github.com/widdix/aws-ec2-ssh

This will lookup your username in AWS iAM, and if it has the right permissions, it creates an account and copies the public ssh key associated with that user.

jmb12686 · on June 28, 2019

Google Compute Engine has had this functionality for years (at least the browser based SSH). Furthermore, Google's free Cloud Shell feature is fantastic.

Over the years, AWS has put their focus entirely on "Enterprise" customer functionality as opposed to "developer friendly" capabilities.

choppaface · on June 28, 2019

gcloud has had it, but as of at least a month ago there are terrible races. You can create a machine, log in with normal ssh, log in with cloud shell, and then the next normal ssh login will fail because cloud shell will modify the machine’s ssh keys.

I once found a race in the UI, reported it with complete repro instructions, and then Google made me have to do a 30 minute hangout meeting where I had to repro the bug in front of the Google engineer. The call resulted in two different tickets... we found another bug.

So gcloud might have some nicer features, but the engineees building the web app seem to have some basic misunderstandings about concurrency. The BigQuery UI (new one not the old one) is similarly riddled with bugs.

EugeneOZ · on June 28, 2019

I experienced this bug also and it was quite surprising how the idea of overwriting authorized_keys file could pass sanity filters. It's really wild.

sieabahlpark · on June 28, 2019

Makes sense when you consider the people who get jobs at Google only know how to solve Leetcode problems.

akhilcacharya · on June 28, 2019

EC2 has had browser SSH for ages too! They're just improving it.

https://docs.aws.amazon.com/quickstarts/latest/vmlaunch/step...

(But..yeah, at a time it was an applet)

procrastitron · on June 28, 2019

It’s not just the browser based SSH that GCE has had for years.

The way they describe tying SSH keys to IAM roles very closely matches the SSH key management that GCE had when it launched back in 2012.

drewda · on June 28, 2019

Yes, I quite like this functionality in the "gcloud" CLI: https://cloud.google.com/sdk/gcloud/reference/compute/ssh

FWIW, it sounds like one advantage of the new AWS service is that it will provision a new SSH key each time you connect. Whereas, I _think_ the GCP one provisions one key per machine.

jkaplowitz · on June 28, 2019

Google's browser-based method uses a new key each time, set to expire from the server-side authorized_keys file in a short time (minutes). I strongly suspect, but haven't checked, that this is also true for their Cloud Console mobile app's SSH feature.

You're mostly right about gcloud, with some nuances for multi-user or network-homedir environments among other exceptions.

Gcloud does however have one of the snazziest possible hacks for resetting and securely communicating Windows account passwords:

https://cloud.google.com/compute/docs/instances/windows/auto...

peteretep · on June 28, 2019

And that’s why I’m willing to give them my money. Google’s cloud offering needs to be considered a hobby like all their non-search activity

heybrendan · on June 28, 2019

Huh? A hobby?

Transcript from Alphabet Q1 2019 Earnings Call; April 29, 2019 [1]

> Sundar Pichai, CEO Google:

> We are also deeply committed to becoming the most customer-centric cloud provider for enterprise customers, and making it easier for companies to do business with us thanks to new contracting, pricing, and more. Today, 9 of the world's 10 largest media companies, 7 of the 10 largest retailers, and more than half of the 10 largest companies in manufacturing, financial services, communications, and software use Google Cloud.

> Some of the companies that we announced at Next included: The American Cancer Society and McKesson in Healthcare; Media and Entertainment companies like USA Today and Viacom; Consumer Packaged Goods brands like Unilever; Manufacturing and Industrial companies like Samsung and UPS; and Public Sector organizations like Australia Post.

> Finally, to support our customers' growth, we also announced the addition of two new Cloud regions in Seoul and Salt Lake City, which we plan to open in 2020. These new Cloud regions will build on our current footprint of 19 Cloud regions and 58 data centers around the world.

This doesn't seem like just a hobby to Google.

[1] https://abc.xyz/investor/static/pdf/2019_Q1_Earnings_Transcr...

apsdsm · on June 28, 2019

To be fair, given their history of shutting things down, it’s easy to see anything that isn’t selling ads as a hobby for google. Even if that’s not the case, it can be hard to shake that gut-level feeling. I suspect that’ll be the hardest thing for most people to get over when considering a google stack.

peteretep · on June 28, 2019

Meh, I care about actions and not random words

9nGQluzmnq3M · on June 28, 2019

> our current footprint of 19 Cloud regions and 58 data centers around the world

Do you understand how expensive this action is?

toast0 · on June 28, 2019

How many of those data centers were already there for their advertising business?

cwp · on June 28, 2019

Every now and then I consider using GCP for a new project, because their tech is obviously better. I never do though, because every time, I stumble across somebody with a horror story.

You know how it goes. They built their business on Google's platform, and it was a dream until some AI detected a pattern of activity it didn't like and they were excommunicated. The app was shut down, the website stopped getting traffic, and all they money they had charged customers via Google Pay was frozen. No appeals, emails go to /dev/null and after a month of campaigning on social media they finally get an email from an intern saying that after review, they won't be changing the automated decision.

No thanks.

jmb12686 · on June 28, 2019

Fair enough, I agree Enterprise workloads are better off relying on Enterprise class products.

izacus · on June 28, 2019

Last I checked, Cloud is pretty much the only other thing besides Ads that makes any money at Google :)

jtwaleson · on June 28, 2019

This is bad news for ScaleFT, which provided this service via bastion servers (although not IAM based).

Rackspace managed AWS environments use this for high compliance systems.

The problems it solves are a) that login attempts are logged on a separate system for compliance and b) user management is handled in a centralized way. Both are handled with EC2 Instance Connect.

forty · on June 28, 2019

Another competitor to this is hashicorp vault, which does both certificate based ssh and AWS authentication. (I find the certificate based approach better though)

alexanderdmitri · on June 28, 2019

They could present value if their product works across cloud-providers (not familiar with their business model, but IAM is generally regarded as one of the biggest ways you can get locked into AWS).

cesnja · on June 28, 2019

There's also SSM Session Manager [1]. Not exactly ssh, but you get mostly the same features with ssh access completely disabled and the whole session being logged to S3 bucket or some log aggregation service.

[1] https://docs.aws.amazon.com/systems-manager/latest/userguide...

rukenshia · on June 28, 2019

I'm a bit confused about this one.. we have been using SSM Session Manager for quite some time now and this looks like it does the same. We also export all logs during the session with SSM and you can see which user initiated the session. What am I missing here?

snorkel · on June 28, 2019

For dev environments SSH is essential, but in production environments I 100% agree with using SSM Session Manager instead of SSH. Getting terminal access to a production server is sometimes necessary but it ought to be temporary access, all actions are logged, and treated as an exception situation rather than routine. SSM session manager provides all that without requiring SSH keys and SSH firewall rules in production.

gauravphoenix · on June 28, 2019

SSM session manager is basically a HTTP wrapper over a shell. You have to use browser for SSM which mostly works until it doesn't. I had trouble sometimes copy pasting to it.

This new service is basically a managed SSH so things like port forwarding etc will work. With SSM you can't do port forwarding etc because it is not SSH aware.

CSDude · on June 28, 2019

But this needs to expose SSH? SSM is great (although its not fast enough) because it eliminates our jumpers

gauravphoenix · on June 28, 2019

Yeah but then a lot of people have use cases for SSH. The solution is targeted towards replacing jump boxes.

otterley · on June 28, 2019

It's absolutely possible (and supported!) to connect to your instance via SSM without using a browser:

https://docs.aws.amazon.com/systems-manager/latest/userguide...

gauravphoenix · on June 28, 2019

that's good! I hope someday the AWS CLI will come bundled "the Session Manager plugin".

crankylinuxuser · on June 28, 2019

TBH I'm still waiting for console serial access via IAM. All my bare metal machines have that. And it's absolutely essential when you bork networking.

And as much it pains me to say it, Azure has that feature.

cthalupa · on June 28, 2019

I really really really try to not need serial console access to my machines. I try to only rarely need SSH access.

But when you've got some sort of bug or issue that you're not getting any metrics out of, no logs recorded, no kernel crash dump, nothing sent over netconsole, nothing showing up on the instance console screenshot... Sometimes serial console is what you need.

But, for the borked networking case, I'd recommend not modifying your networking on live instances. Make your changes on a test instance, figure out what works, and add it to your configuration management ;)

different_sort · on June 28, 2019

I'm honestly a little disappointed here, I feel like there is not fully baked but it is so close.

Unlike SSM, Instance connect goes direct over SSH - so you either need to be inside of your AWS network, on a bastion host that can route to your AWS network, or use a public IP address.

It would be great if they combined this functionality with the HTTP wrapping capability so that I do not need to expose SSH/route to SSH ports in any way but can also use IAM policy to control which unix user a given IAM principal can land in the host as (Example use case would be I would only want a certain class of user to land as a user with sudo/root access).

This is still valuable to my use case, and we'll go ahead with it using the bastion approach most likely until they hopefully integrate this with their HTTP SSH wrapper.

unixhero · on June 28, 2019

Great new feature!

But sigh I just built a PKI infrastructure provisioning system using a gigantic shell script, maintenance user with sudo permissions and ssh access where a master node would command a fleet of slave nodes.

I guess all of my work was for naught since this seems to cover some my needs for user and ssh key provisioning.

Oh well, it'll work elsewhere on all other clouds. And I guess I should release it publicly, it's just not pretty enough yet. Every time I do, gremlins come out of the bushes complaining that the code isn't elegant enough for them.

serpix · on June 28, 2019

This happens so often in Devops that a good rule of thumb is to really think it long and hard before doing any handrolling.

Today's dev ops stacks move so fast nobody is an expert on a single stack for longer than a week.

hitpointdrew · on June 28, 2019

>I guess all of my work was for naught since this seems to cover some my needs for user and ssh key provisioning.

I just finished setting up JumpCloud to manage SSH keys and logins on all my AWS instances.

Oh well.

jimktrains2 · on June 28, 2019

Do you still need to create users manually on each machine? There have also been many tools out there to pull the ask key from IAM and use it via authorizedkeyscommand previously, but my problem is always creating the user accounts, especially if you don't want to keep a separate list in ldap/Kerberos (or similar, like active directory).

ricksebak · on June 28, 2019

Looks to me like this would have all users use the default ec2-user or ubuntu user accounts.

jimktrains2 · on June 28, 2019

That's what I thought it was saying too. That's a mess from a compliance and best-practices point of view :(

Or am I missing something and this would follow the PCI DSS?

lenova · on June 28, 2019

This is what I'm wondering as well. Does the fact that everything is logged by what an IAM user does work as compliance, or are individual user accounts on the operating system still required?

the_duke · on June 28, 2019

Well, it was about time...

IAM based auth was long overdue.

Pulletwee12549 · on June 28, 2019

I hope this helps all those people that somehow manage to lock themselves out of their instances.

ravedave5 · on June 28, 2019

This is great! Giant step using iam over ssh keys.

SteveNuts · on June 28, 2019

Not complaining, but wow that took a long time. Even Linode has had a web-based shell for many years.

Either way, this will be very nice to have.

dbaggerman · on June 28, 2019

AWS already had a web based shell via SSM. This allows native SSH connections using credentials tied to AWS IAM users.

fortran77 · on June 28, 2019

Interesting, but I connect via ssh from Windows 10 PowerShell. I wonder why this isn't a standard use-case. I suppose I can get it to work as long as it's "openssh" or something compatible

itzsasi · on July 4, 2019

how to connect the instance from a windows machine using ec2-instance-connect? is that possible?

msoad · on June 28, 2019

is this also xterm.js based? the WebGL based engine will make it even better!

dahfizz · on June 28, 2019

Not gunna lie, it makes me sad that we need some huge, fancy graphics engine just to emulate a 25 year old technology. Why are people so obsessed with their browser? No matter how much JS you layer on, it'll never be as fast as a terminal.

argd678 · on June 28, 2019

I had a VT102 for many years, it wasn’t fast.

0xbadcafebee · on June 28, 2019

I mean, this is definitely cool, but we should also try to stop using ssh so much. There's a long list of reasons why using ssh leads to bad things (but not an anti-pattern - I wish people would stop using that phrase to mean anything that sometimes leads to bad things) so I just hope this functionality doesn't exacerbate its use.

davexunit · on June 28, 2019

I am tired of hearing about why I shouldn't use SSH. SSH is fantastic and I always want it enabled on every single server so when things go wrong I can debug. I've had AWS "solutions architects" puzzled about why I'm still using SSH but they can never justify any other solution. First they tell me that I should just log everything we'd want to look at. I do log everything I think would be useful, but unforeseen things happen and it is really handy to have shell access to the misbehaving server. Then they suggest I use Systems Manager to perform server updates but I have no need for that because I use an immutable deployment model.

Managing server access in a multi-account organization is a real issue, though. I currently manage 11 AWS accounts and the best solution I've implemented so far is extending NSS and configuring sshd to query our identity provider (Okta) for user/group information and SSH public keys. Each type of server is configured to permit access to a subset of Okta groups. For example, members of the DevOps group get full privileges, anyone else that has a use-case for using SSH (like in a QA environment) gets some form of limited access. With this in place, I can grant/revoke privileges and manage developer keys all from one central location.

0xbadcafebee · on June 28, 2019

I hear you, but every time I ask "what are you doing that you need SSH?", the answer comes down to "well I have these crappy random applications and I don't have enough visibility into the system or the apps." For those cases, I think SSH is a crutch that keeps the system from maturing.

Another way to think about it is, if you're an SRE, you want to eliminate toil, and an interactive SSH terminal is toil.

davexunit · on June 28, 2019

We'll just have to agree to disagree. I just don't buy it.

adamrt · on June 28, 2019

Do you mind sharing the long list?

zxcmx · on June 28, 2019

Not O.P, but:

- Encourages you to go to instances and check stuff rather than improve monitoring / health checks.

- Can do quick fixes on a few boxes rather than re-running the deploy. Great! But terrible when the person who knows how to do that is away.

- Tailing log files rather than centralised log management for all the things.

- Trying things out / quickly checking something in production rather than being rigorous about keeping test / staging in sync with prod.

The “problem” is ssh is such a great affordance (until you have tons and tons of instances and you can’t do anything by hand anymore) that it means you don’t need to fix internal processes and tools around deployment, configuration and monitoring.

If there’s no workaround you feel the pain and will be forced to set things up right, usually with benefits to security and repeatability.

As is often the case, the best thing about ssh (in terms of managing infra instances) is also the worst thing.

With that said at very small scales it might be overkill to automate all the things so sure, fill in the gaps with ssh and a wiki page.

KaiserPro · on June 28, 2019

> Encourages you to go to instances and check stuff rather than improve monitoring / health checks.

I don't think it does, well it doesn't when you have > 40 machines anyway. Plus it doesn't give the ability to compare and contrast simply. (graphs are _awesome_)

> Tailing log files rather than centralised log management for all the things.

Yes, I tend to agree. But proper centralised logging is either exceptionally hard, or a hefty splunk tax. That also encourages people to derive graphs from logs, which is arse about face. Graphs first, logs when you are desperate.

> Trying things out / quickly checking something in production

I can see this, but normally one would expect people to not have general access to prod, if they are going to do that...

tomcam · on June 28, 2019

Ugh, I'm a new full stack guy. Could you enlighten me about the evils of SSH?

0xbadcafebee · on June 28, 2019

SSH is wonderful. The problems are more what it enables, and what it lacks (or isn't designed to do).

For example, managing ssh keys for an individual is gloriously simple, but managing them for a large organization is a huge headache. You want to use ssh certificates, but even those are implemented in a weird way, and really you should use an SSO system for auth. (This makes that easier/better, so, yay?)

When people start sshing into production servers, they end up making local changes. They focus more on the "pets" aspect of managing systems rather than as "cattle". They have to install a litany of extra software to diagnose and troubleshoot bugs, rather than expose system metrics and tightly control the app environment and its operation.

Remote access to production app servers is basically a backdoor waiting to happen, and may violate corporate security policies. When you have local user access to a Linux host, it's almost guaranteed you can privesc to root.

Finally, almost everyone I have ever seen will either force-ignore/auto-accept host key changes, or just accept them blindly, because IPs and hostnames may change, and there may be multiple environments you haven't logged in to, etc). This completely defeats the purpose behind mitm protection, which is the main intent of using SSH, though these days its other features may be arguably more of a reason to use it.

And for the tech hipsters out there: "it isn't serverless!!"

cthalupa · on June 28, 2019

>They have to install a litany of extra software to diagnose and troubleshoot bugs, rather than expose system metrics and tightly control the app environment and its operation.

There's a lot of things that require more than easily exported system metrics and logs to troubleshoot.

While I've played around with using PCP's perf plugin to try and remotely do things with perf, generate flamegraphs, etc., it doesn't work nearly as well as just SSH'ing into the thing and running perf directly, especially if the perf data file is going to be large. I don't see how you could do serious performance engineering work without SSH access.

But, I think I'm nitpicking here, because I generally agree that there should be very little to no reason to login to servers via SSH day to day.

jameshart · on June 28, 2019

in the ‘pets vs cattle’ analogy, it totally makes sense that even with cattle, occasionally you bring one in for a checkup by a vet to see if you can detect any problems that might affect the herd. Ssh into a production box to check everything is working as expected and take some readings. Sure.

On the other hand, I tend to lean more towards a ‘wild animals’ model, where, sure, you can tranquilize one and bring it in to look over, but once it’s got the smell of humans on it, it’s doomed if you let it back out in the wild again.

Once you ssh into a production box, it is forever tainted. Sure, poke around in it, install some perf tools to run some diagnostics, learn what you can about its behavior. But then, rather than putting it back into the wild to serve traffic, out of mercy, you should destroy it and replace it with a clean instance.

cthalupa · on June 28, 2019

I don't disagree with this, but, at the same time, if I haven't made any actual changes to the application, I'll generally not worry about manually taking it out of service, because autoscaling will be getting rid of it soon enough anyway.

0xbadcafebee · on June 28, 2019

As an alternative, I try to implement solutions that can install and run software out of bounds and still get what you need without a persistent connection and opening up security groups (example: AWS SSM Run Command).

cthalupa · on June 28, 2019

I make extensive use of Run Command (And Automations!) but they're not a replacement for every use case, and very much not a replacement for the specific use case you're replying to.

tomcam · on June 28, 2019

> almost everyone I have ever seen will either force-ignore/auto-accept host key changes,

Thanks for the super informative answer. About the quoted portion... yeah! I assume it's my responsibility to do something... like manually check the host IP or something? What is the recommended practice to deal with this situation?

crehn · on June 28, 2019

Nothing bad with SSH per se, but building your infrastructure in a way that makes ad-hoc remote changes unnecessary is something to strive for. For anything but small deployments, automation, immutability and reproducibility will keep you sane. Less moving parts, things don’t suddenly change, easy to audit, easy to rollback, etc.