FWIW, I've been using the aws command-line tool by Tim Kay for over 6 years. It's only Perl and cURL. A single self-contained script with no other dependencies. It's been rock solid in production that whole time:
> The syntax looks quite the same, but Amazon's awscli Python installer has loads of dependencies. I'll have to see if it's worth switching.
Why are the dependencies a problem? By combining a handful of smaller, focused modules that each do something well, you can end up with something better than if you were to re-invent the wheel for every need.
AWS and the Python dev team are doing a heck of a job on botocore, and have cranked up the pace of improvement in the last 6 months. This CLI reaching "official" status will guarantee (at least until further notice) that it will see updates and fixes. It's likely to see early or earlier support for new AWS services.
`pip install awscli` just installed 26 other modules besides awscli. Now I feel a little obliged to go check out those 26, as well, to see what they are.
I agree about not re-inventing the wheel. But the amount of stuff installed is definitely a considering factor when choosing between two seemingly identical scripts.
> `pip install awscli` just installed 26 other modules besides awscli. Now I feel a little obliged to go check out those 26, as well, to see what they are.
So? Use a virtualenv and stop worrying. These 26 dependencies will be separately updated and maintained, who knows what warts are sitting in the monolithic perl scripts.
With boto, I battled for years trying to avoid any dependencies. But that has a lot of negative side effects, too. One of the great things about Python is the amazing variety and quality of libraries available. We decided to embrace that with AWS CLI. We have 10 direct dependencies. Four of those are our own packages that we decided to split to allow maximum reuse. Then there are fundamental things like requests, six, docutils. The rest are things that, we think, improve the experience. Virtualenv is an awesome way to manage this. I highly recommend it.
I feel like it's a valid concern for someone to want to know what's on their system. Sure virtualenv is a great solution, and I'm sure the person you're responding to knows about it. But there is a place for skepticism in required dependencies, and perhaps the OP parses Perl better than Python.
In a pre pip/rubygem auto-installer world, this would have been one package that had the 26 dependencies embedded as modules inside it. You wouldn't have known they were unless you looked at the source code. You also wouldn't be able to update them independently, you'd get updates when a new release of the original thing was available that included embedded updates.
Would this make you feel more like you knew what was installed on your system? Would you have felt the need to look at the source code, including the source code for any embedded dependencies, in, say, a 'vendor' or 'lib/dist' directory or something?
What are the plusses and minuses of each approach? There are some plusses and some minuses either way. After considering them, do you still have a problem with the 'new way' of doing things, where a program might install along with explicit dependencies via pip (or rubygems in ruby) separated out in a different place in the file system, vs. embedded/bundled dependencies?
The issue is that they can't release them all compiled into one thing. Instead the dependencies have to pollute the rest of the operating system. This is one of those things that Java got right and virtually everything else except npm (local node_modules) got wrong out of the gate.
I'd love to have a simple way to package python apps that depend on other python and native libraries without having to install things separately.
> The issue is that they can't release them all compiled into one thing. Instead the dependencies have to pollute the rest of the operating system.
Bullshit. Python supports having modules installed into local locations (see virtualenv).
Just `virtualenv ~/.local/lib/aws; ~/.local/lib/aws/bin/pip install awscli; ln -s ~/.local/lib/aws/bin/aws ~/.local/bin/aws` and put `~/.local/bin` into your PATH.
If you don't want the pip dependencies to "pollute the rest of the system," you can just use pip install with the --root option.
Java's CLASSPATH causes enormous pain for end-users. Just read the Hadoop mailing list. The fact that Java doesn't have a sane default for where to put anything or how to manage dependencies is a huge flaw.
Back when I was doing AWS, I just used the C binaries (I forget what they were called) to transfer things to or from S3. I just wanted to avoid installing hundreds of megs of dependencies. We paid money to transfer our AMIs around, after all! Still, a more full-featured tool will no doubt come in handy in some scenarios.
If you are using CLASSPATH for Java you are doing it wrong. Full dependency jars with every class needed is really the only way to do this reliably. That requires only a single file to be passed around. You can even include native libraries in it. Hadoop is a nightmare, I agree. They are doing it wrong.
Hadoop is a framework, not a library, so user applications need to link against the jars they need. That implies using CLASSPATH to locate them. Whether or not there is 1 jar or 100, the fact that there's no standard place to install jars in Java is a problem.
Hadoop jars are hundreds of megabytes, and we have multiple daemons. Duplicating all those jars in each daemon would multiply the size of the installation many times over. That's also a nontrivial amount of memory to be giving up because jars can no longer be shared in the page cache.
Some of these problems could be mitigated by making Hadoop a library rather than a framework (as Google's MR is), or by pruning unnecessary dependencies.
Most of these issues could be addressed by actually modularizing the core of Hadoop, some of which has been done in the latest code. Also, many things could be provided at runtime by the system with only the interfaces required to be in the jars that customers depend on, thus making their jars backwards compatible and more robust. BTW, let's say you didn't want to put the jars in one jar but didn't want a classpath. You can use META-INF/manifest to include those jars automatically as long as they are in a well defined place relative to the host jar. Redesign with the requirement that end users don't have to worry about CLASSPATH and you will find that there are solutions.
I do sympathize that something akin to the maven repository and dependency mechanism hasn't been integrated into the JDK. I was on the module JSR and continually pushed them to do something like that but it turns out IBM would rather have OSGI standardized and so it deadlocked. Maybe something will come in JDK 9.
Well, I work on Hadoop. I don't know what you mean by "modularizing the core." There was an abortive attempt a few years ago to split HDFS, MapReduce, and common off into separate source code repositories. At some point it became clear that this was not going to work (I wasn't contributing to the project at the time, so I don't have more perspective than that).
Right now, we have several Maven subprojects. Maven does seem to enforce dependency ordering-- you cannot depend on HDFS code in common, for example. So it's "modular" in that sense. But you certainly never could run HDFS without the code in hadoop-common.
None of this really has much to do with CLASSPATH. Well, I guess it means that the common jars are shared between potentially many daemons. Dependencies get a lot more complicated than that, but that's just one example.
Really, the bottom line here is that there should be reasonable, sane conventions for where things are installed on the system. This is a lesson that old UNIX people knew well. There are even conventions for how to install multiple different versions of C/C++ shared libraries at the same time, and a tool for finding out what depends on what (ldd). Java's CLASSPATH mechanism itself is just a version of LD_LIBRARY_PATH, which also has a very well-justified bad reputation.
I don't know of anyone who actually uses OSGI. I think it might be one of those technologies that just kind of passed some kind of complexity singularity and imploded on itself, like CORBA. But I have no direct experience with it, so maybe that is unfair.
I like what Golang is doing with build systems and dependency management. They still lack the equivalent of shared libraries, though. Hopefully, when they do implement that feature, they'll learn from the lessons of the past.
actually, ec2-api-tools has been provided for years: http://aws.amazon.com/developertools/351 - I just didn't like the command interface provided by them. Haven't had time to check out the new CLI yet but it looks much better at quick glance.
This latest release marks a milestone in the transition from the old Java based tools, to the new Python ones.
Mitch Garnaat[1] who built and maintained boto over the years was picked up by Amazon last year and since has been building out botocore[2] - which the aws-cli[3] tools use under the hood.
Thank you for this no-nonsense layout of the situation. =) Can you elaborate more on how this FOSS Github-er was picked up by Amazon? It sounds like a story that many here would like to replicate!
Whilst this is a much better solution that the existing slow tools Amazon provided, I don't understand why this is being reported as if command-line tools for AWS are a new thing.
Command-line tools used to be your only option for managing AWS and Amazon always create their API and shell tools before the Console.
It's unifying the commands. Just this morning I've been struggling with ascli (autoscaling cli), which has its own method for listing AWS keys that don't match the way other tools do it.
It's probably 'news' because AWS is at risk of losing mindshare on HN as more competitors target this community - Rackspace just started giving everyone free credit, Linode gave everyone big free upgrades, DigitalOcean are just killing it, and then there's Docker etc threatening to reduce them all to interchangeable dumb infrastructure.
Given the number of major, established services running on AWS that go down perfectly in sync every time there's an AWS outage, it seems unlikely that the service is likely to fall out of most of HN's mindset for quite some time to come. ;)
Interesting - I must have had an older version and not played with it. The version I had did not have 's3 sync', and the help files were a long list of ec2 commands and some boilerplate headings.
Reinstalled and now they have shorter, saner commands and real help pages... and they also changed the pager from 'less' (my default pager) to the crappy 'more'... because (apparently) you never want to go back a page in a help file. More detailed content, but it's harder to review. Odd.
Well, GP isn't just imagining things. The pager for this is at most the equivalent of crap-ass "more" right out of the box, on systems where all other cli executables have better pagers.
$ PAGER=/usr/bin/less aws help
...does seem to do the right thing, but why does this program require it when no other program does?
I think the only thing "new" here is aws-cli hitting version 1.0.0. It's been the recommended aws command line tool for quite a while now. I've personally used it for the last 6 months.
Long-time user of boto[1] here. It has been the go to library to hook your python code into AWS and has a fairly active following on github[2].
One API point that I've found lacking in boto is a "sync" command for S3. Take a source directory and a target bucket and push up the differences ala rsync, that's the dream. Boto gives you a the ability to push/get S3 resources, but I've had to write my own sync logic.
So, the first thing I went digging into is the S3 interface of the new CLI, and to my surprise, they've put a direct sync command on the interface[3], huzzah! Their implementation is a little wacky though. Instead of using some computed hashes, they are relying on a combination of file modtimes and filesize. Weird.
Anyways, glad to see AWS is investing in a consistent interface to make managing their services easier.
That is good news. I too wrote a sync layer to sit above boto for a previous project. My use-case is a little different in that I sync from S3 to RackspaceCloud as a backup. I just use file name (object name) as the key because I know that files never change (though are added and removed). I create a complete object listing of S3 and a complete object listing of CF, diff and then sync.
One disappointing issue is that the listing process on CF is a magnitude faster than S3.
CF: real 2m7.628s
S3: real 14m15.680s
Keep in mind that this is all being run from an EC2 box, so really, S3 should win hands down.
The rsync command uses a combination of file modtimes and file sizes as it's default algorithm. It's very fast and efficient. I agree, though, that like rsync, it would be good to add a --checksum option to the s3 sync command in AWS CLI. Feel free to create an issue on our github site https://github.com/aws/aws-cli so we can track that.
Good to hear this feedback. I work for AWS; I will pass this to the team.
Feel free to shoot me an email: simone attt amazon do0tcom if you have more comments.
* Why does each one of them use different parameters for the same stuff? WHATEVER_URL could be REGION (WHATEVER = EC2, ELB, ETC, ...). One uses a config file for ACCESS_KEY_ID, another one wants a environment variable. Plus they use different names for common stuff.
* Why do are the command line arguments named inconsistently across these tools? --whatever, --what-ever
* Why don't they fail on command error? Right now this only happens if there's a configuration problem - if you send a command and there's a problem (like S3 access denied), it still returns 0.
* Why don't they provide synchronous commands? Right now I have to do the polling myself. Super annoying.
Anyway, I've been using the ones included in Amazon Linux - I hope they were the latest version. If the new version fixes this problem, feel free to correct me :)
More startups should realize that they could increase developer adoption of their products if also published shell script interfaces to their product. In fact, your startup should really start off as command line accessible and add the gui after.
There have been CLI tools for some parts of AWS since the beginning. This release replaces a set of disparate tools (each with their own installation instructions and configuration issues) with a unified set.
> In fact, your startup should really start off as command line accessible...
I would argue that startups should always start at the API level and work upwards. At least that's what we did with AWS.
Starting off from the API seems like the only sensible way to approach things. It exposes core design issues early on, and if done right, eliminates a wide source of security issues that might be introduced by, say, the front-end developers.
Right, in this case it actually makes sense because you will already be technically proficient if you are using something like AWS. But in other cases, the GUI can be the main interface, but a CLI to back it up is pretty awesome and I had happily pay some provider extra bucks for it.
I'm the author of glacier-cli (github.com/basak/glacier-cli). I'd be happy to see it move into aws-cli. If anyone wants to do this, please get in touch to coordinate.
As a semi-technical small business user of AWS I really wonder why so few resources seem to be put to the AWS GUI. For example, why is autoscaling not part of the GUI/AWS console? It's one of the most basic functions and it beggars belief that in this day and age it is done through the command line only. I'd rather not have to pay extra for Ylastic or some other hack when I'm sure Amazon can throw together even the most basic UI in a few weeks.
The new CLI looks like a massive improvement over the old one, but ironically I will probably still prefer boto + ipython because of the very robust autocompletion.
If anybody is after similar s3 functionality within python, I wrote a small python wrapper around boto a while back that does similar parallel/multipart upload to s3, and a bunch of other AWS stuff: https://github.com/djrobstep/motorboto
I've been using a combination of aws-cli and the Java-based tools (as some features are unsupported by aws-cli yet). I've been patiently waiting for CloudFront support - no luck yet! Also, I'm switching my new projects to use botocore, but it lacks any documentation and it's been really painful!
I've been using s3cmd for a while now. Am I correctly seeing that the new CLI tools, or at least the s3 portion, are pretty much a replacement for that?
http://timkay.com/aws/
The syntax looks quite the same, but Amazon's awscli Python installer has loads of dependencies. I'll have to see if it's worth switching.
Anyone already know if Amazon's new CLI thing has any big advantages over Tim Kay's Perl aws?