Hacker News new | past | comments | ask | show | jobs | submit login
AWS CodeArtifact: A fully managed software artifact repository service (amazon.com)
167 points by rawrenstein on June 11, 2020 | hide | past | favorite | 86 comments



This has been a fairly obvious service that has been missing for a while, nice to see them provide a solution.

Most dependency management tools have some kind of hacky support for using S3 directly.

Full fledged artifact management tools like Artifactory and Nexus support S3 backed storage.

Interesting to see that the pricing is approximately double that of S3, for what I imagine is not much more than a thin layer on top of it.


Considering the price of Nexus and Artifactory this is way cheaper for a SAAS offering with SLA's. I imagine Artifactory is really going to have to up their product offering or at least lower their entry prices.


Github already released their package repo last year (and have since purchased NPM). If anything I imagine that had Artifactory pretty scared vs. this. If your company already uses GitHub it's a hard sell to say why you'd need something like Artifactory over the Github package repo.


And since I've been trying GitHub Actions, I don't know why you would need artifactory, nexus or this aws service anymore. Github offers private repositories, releases, project pages, cicd through actions and Microsoft is offering plenty of deployment options on Azure with AKS or plain Azure Compute



Meta: I’ve vouched for your killed comment. I suspect you may be shadowbanned.


Thanks for letting me know, I will reach out to the HN email and ask them why. I suspect it is because of some comments where I got -Karma.


Looks like a kind mod un-shadowbanned you. Welcome to the land of the living!


despite appearances i'm a very casual HN reader and all this talk of shadowbanning makes me kinda nervous tbh. hope i havent done anything to displease the powers that be.


You are fine. New people with low karma are most at risk. Once you are a little established, you have to do something very upsetting to get shadowbanned, or be consistently unpleasant. Once established, a few controversial posts with negative karma should not be a problem.

Avoid criticizing HN staff or related companies. Gentle / kind disagreement is fine, but err on the side of keeping it private.


> Interesting to see that the pricing is approximately double that of S3, for what I imagine is not much more than a thin layer on top of it.

There's a lot of necessary complexity in the backing platform. Encrypted package blobs are stored in S3 but there are a bunch of other distributed systems for doing things like package metadata tracking and indexing, upstream repository management, encryption, auditing, access control, package manager front-ends, etc... that are not immediately obvious and add cost. The platform that backs CodeArtifact is far from what I'd call a thin layer on top of S3. There is also a team of humans that operate and expand the platform.

Source: I lead the technical design for the product as well as a chunk of the implementation but left the team around mid-2018.


To add to your list of Artifactory and Nexus, Pulp[1] is also a cool project in this space, and is fully open source.

Honestly the fact that they only support javascript, Python and Java is pretty bare bones compared to what the others on the above list support, and again as you say, for a fairly high price.

1: https://pulpproject.org/


We have used S3 successfully several times. You can create a Maven repository, use it as RPM repo and many other use cases to host artifacts. I am not sure what functionality is missing that cannot be implemented on the top of S3 and requires CodeArtifact.


For maven, to push artifacts via the correct mvn deploy:deploy-file requires a S3 wagon (transport layer) software to actually make the S3 calls. For bigger orgs, having everyone use a wagon is a non-starter.

All I'm seeing this does is give the proper http endpoints so you dont need the wagon. Is it worth ~2x the price, no, but it's better than the other enterprise-y solutions.


I see, I used it only for a small org. Maybe those companies can pay the 2x penalty.


> Interesting to see that the pricing is approximately double that of S3, for what I imagine is not much more than a thin layer on top of it.

Haven’t looked carefully, but is there a difference in the guarantees it provides? Might be a performance or SLA difference.


It looks like the SLAs are about the same (https://aws.amazon.com/s3/sla/ and https://aws.amazon.com/codeartifact/sla/). I haven't seen any documentation on garantees for performance for either service, but I'm skeptical this will perform any better than s3.


The login credentials expire after 12 hours (or less)[1], just like with their Docker registry (ECR). That makes it pretty annoying to use, especially on developer laptops.

GCP has a similar offering[2]. And GitHub[3].

[1] https://docs.aws.amazon.com/codeartifact/latest/ug/python-co...

[2] https://cloud.google.com/artifact-registry

[3] https://github.com/features/packages


I could not disagree more re. the expiring credentials. It is a bad practice to have credentials that never expire, especially on developer laptops, especially credentials of this nature. Developers frequently store this stuff in plain text in their home directory or as environment variables. That's a huge security risk! This service manages the process of generating and expiring credentials automatically, which is awesome.


This service is for code artifacts. What credential to the developers use to access source code? Do they expire?

It is common for developers to use Git to store source code, in a hosted service like GitHub. It is common to use SSH keys to access Git. Frequently those SSH keys are generated without passphrases. Those are non-expiring credentials stored on disk. If HTTPS is used to access Git, it will likely be with non-expiring credentials.

I'm not saying short lived credential are bad, not at all. I'm pointing out how this service differs from similar services, requiring a change it workflow, which might be annoying to some people. Not everyone is operating under the same threat model.


Your source code may reference a shared library at a specific version from a trusted source to build. This trusted source is CodeArtifact.

The short lived passwords is a non issue and a good thing. Your dependency resolver should handle fetching the new password and most orgs I’ve worked at had scripts dealing with short lived passwords/iam.


> Your dependency resolver should handle fetching the new password

According to AWS's documentation, none of the supported dependency resolvers will fetch the new password[1][2][3].

If they were capable of automatically fetching the new password without human intervention, it would mean they have credentials for generating credentials. If this isn't on an EC2 instance (where an IAM role can be used), that means there are long-lived credentials (probably written to disk) used to generate short-lived credentials.

This would be the case if you are using a hosted CI service that doesn't run on your own EC2 instances. You would probably be providing an AWS key and secret, which would then be used to generate the short-lived credentials. But the key and secret won't be short-lived, and will have at least the same access as the short-lived credentials (probably more access).

> Your source code may reference a shared library at a specific version from a trusted source to build. This trusted source is CodeArtifact.

HTTPS is what forms the trust between you and the artifact repository. Short-lived passwords don't do anything to ensure you are talking to the real trusted source. They may make it so the artifact repository can better trust you are who you say you are, but I don't see what they has to do with safely getting a specific version of a library.

[1] https://docs.aws.amazon.com/codeartifact/latest/ug/python-co...

[2] https://docs.aws.amazon.com/codeartifact/latest/ug/npm-auth....

[3] https://docs.aws.amazon.com/codeartifact/latest/ug/env-var.h...


> that means there are long-lived credentials (probably written to disk) used to generate short-lived credentials.

In terms of local development experience, most mature organizations will have these "long lived" credentials still require an MFA at a minimum of once per day and locked down to particular IP addresses to be allowed to get the temporary credentials.[1]

> This would be the case if you are using a hosted CI service that doesn't run on your own EC2 instances.

Typically you'd want to see third-party platforms leveraging IAM cross-account roles these days to fix the problem of them having static credentials. Granted, many of them are still using AWS access key and secret.

This is still not a "solved" area though, and a point of concern I wish would get more aggressively addressed by AWS.

[1] https://github.com/trek10inc/awsume, https://github.com/99designs/aws-vault, and a few other tools make this much easier to deal with locally.


> I could not disagree more re. the expiring credentials. It is a bad practice to have credentials that never expire, especially on developer laptops, especially credentials of this nature.

For the specific use case of the developer box and the Docker registry, resetting the credentials every 12 hours doesn't offer any more security than not on its own.

The reason for that is after you try to login to ECR after the expired time, the way you authenticate again is to run a specific aws CLI command to generate a docker login command. After you run that, you're authenticated for 12 hours.

If your box were compromised, all the attacker would have to do is run that aws command and now they are authenticated.

Also, due to how the aws CLI works, you end up storing your aws credentials in plain text in ~/.aws/credentials and they are not re-rolled unless the developer requests to do so. Ultimately they are the real means for Docker registry access.


Those credentials sitting in ~/.aws/credentials should also expire after 12 hours. There are plenty of tools out there to automate this process so you just log in with Okta or similar tool in your CLI and your done (bonus they also make switching between accounts a lot easier).

There is absolutely no reason with the tools that we have available that we should be creating long living AWS keys. It's a major security risk if those keys ever got out.


> Developers frequently store this stuff in plain text in their home directory or as environment variables

If you care about the security of these artifacts, why is their home directory (or their full disk) not encrypted? If they have access to the repository, they probably have artifacts downloaded on their laptops, so if the laptop is compromised, the artifacts are compromised anyway.

Edit:

Not saying temporary credentials are bad. But the reasons you gave seem a little suspect to me. A better reason is that you don't have to worry about invalidating the credentials when an employee stops working for you.


The problem isn't encryption, let's assume everyone has full disk encryption turned on, so someone who steals your laptop can't access your data.

The problem is that your home directory is accessible to a ton of apps on your computer, and you have no idea what each of them is doing with that access. You also have no idea if any of them can be / are being exploited. The most recent case being Zoom – if that server they had running on localhost and responding to anyone on the laptop had file system access APIs (which is reasonable if Zoom had offered file sharing on calls) an attacker would have been able to read all your credentials.


If you have an app on your computer that is controlled remotely you have _massive_ issues. Creds are stored for SSH, browser, probably heaps of other things too. If this is a serious security concern within your threat model you should be auditing every single package or isolating (docker, vms, Bare metal if you’re super tin foiled), anything short of that is fake security.


>Creds are stored for SSH, browser, probably heaps of other things too.

And ideally these credentials should have similar controls applied around them as well (only temporary, using passwords to unlock the SSH keys, etc). If you don't have that, that's your choice, but just because some of your credentials lack security controls is not a reason for other credentials to lack security controls, too.

> you should be auditing every single package or isolating (docker, vms, Bare metal if you’re super tin foiled), anything short of that is fake security.

Which is exactly the reason that many orgs do specifically audit every package and disallow unapproved software. But again, even if some of your desktop apps are allowed unaudited, that is not reason to lessen your security elsewhere.


There’s a very limited set of scenarios where local file read isn’t accompanied by enough write/exec privilege to inject a keylogger. Sir, there might be some cases where the control would prevent abuse but they’re limited. IMO time/money should be invested in other security over anything more unless you’re literally nearing an absolutely secure environment. In most cases I’ve seen there’s gaping holes while crazy amounts of time and money are spent securing something that doesn’t actually improve overall security much or at all.


In that case, the rogue app would have access to your temporary credentials anyway...


Yes, and of course that is bad, but it is not as bad as a rogue app having access to infinite life credentials.


You should have a shell alias to rapidly top up your auth token, just like with the Docker ECR. Short lived tokens are best practice, and a 12 hour TTL is reasonable. That’s no more than two auths in a day as a dev.


And every developer needs to have that alias. And all automation needs to be changed to call that command before trying to use pip, or mvn, or whatever. It sucks. No other hosted artifact repository does this.


It’s roughly a dozen lines of bash (error handling and all), speaking as someone who has had to maintain dev tooling for an org where Docker ECR was used, and can be checked into your project’s repo. It’s not onerous at all, either on devs or your build and deployment pipelines/runners.


If only devs had a way to share code with one another...


Can’t imagine any serious tech environment still allowing non-temporary creds. If they do, good luck when the security audit happens.


Why so? You just log in once a day at the beginning of your work day. I don't think you'll work a 12-hour day so that should be good for the entire day.


Is it just me or is this missing plain artifacts - those that are not packaged for a specific tool? I'm thinking of plain binaries and resources required for things like db build tools and automated testing tools - just files really. How do I publish a tarball up to this, for example?

Also the lack of nuget is a major issue.


I think CodeArtifact loses value when you aren't using a package manager; the benefit is an api-compatible service with various controls and audits built on top.

Out of curiosity, what would you want from this service for the "plain binary" use-case when S3 already exists?


I think mainly the ease of having security dealt with around who can access etc really. Ofc you can just upload files and serve them over http, but I'd like something that's as easy to setup and use as nexus for these files - and something that forces a structure for how they are organised. Stops arguments and people doing whatever they want.


>> I think mainly the ease of having security dealt with around who can access etc really. Ofc you can just upload files and serve them over http,

This is where S3 really shines. You can give developers access through group membership while servers using instance profiles. We have implemented a fine grained access control for the S3 repos that works really well. Of course you access the content via HTTPS.


Fair enough, I dislike having the idea of having disparate systems where one type of the same thing is stored on a different system from a second type of the same thing.

IAM is on the AWS repo aswell isn't it? I guess it wouldn't be so bad then.


It’s nice having the metadata around the push available versus raw blobs to s3.


Objects in S3 can have custom metadata associated with them. Look at the returned data for the HeadObject call.[0]

It's not advertised in the documentation, but HeadObject(Bucket, Key)['Metadata'] is a neat dictionary of custom values.

0: https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObje...


S3 supports metadata (see https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-o...).

Perhaps I don't understand what you're saying fully though--as I don't fully understand your comment.



It’s frustrating to not see more system package management (deb, rpm) from these new services (github and gitlab for instance).

Are others not packaging their code in intermediate packages before packing them into containers?


What's the purpose of intermediate packages if you're already using containers?


Very large c++/python/cuda application that is packed into various different images (squashfs images, but functionally the same).

We end up having a lot of libraries that are shared across multiple images.


Would it not be easier to just pack into different base images? Docker is very efficient with reusing these layers.


Intermediate packages permit you to choose different deployment situations later, with minimal additional cost now. Tying everything to Docker images ties you to Docker and removes your ability to transition to other systems. It may not be worth the cost now, but as soon as you want to deploy on more than one platform it can become critical to maintaining momentum (vice having to hand tailor deployment for each new environment).


We've been going that direction. Packages integrate better into multiple use cases (e.g. VM images, containers). Running a properly signed apt repo is easy these days, so why not?

For people that disagree with this model: where do you think the the software comes from when you apt/apk install things inside your Dockerfile?


Most people don't need to do that. You can build things you need as part of the image build. No need to setup a deb or rpm package unless you're also installing it that way somewhere else.


We use jfrog. One jenkins job builds our code into a .deb and pushes it there. Another job builds the VM image which is then deployed once testing passes.


That sounds like double work.


I'd like really like to see more support added (Ruby, etc). It could be a great alternative to Artifactory.


No C#/Nuget support? Really?


AWS products always take an MVP approach. The rest is driven by customer feedback on the roadmap. CodeGuru/CodeProfiler/X-Ray are similar to limited language support they've built out over time.

Whenever I see a product announcement like this missing something I need to use it, I immediately ping our Technical Account Manager to get the vote up for a particular enhancement.


sounds surprisingly manual. has AWS not tried to formalize some sort of feature voting system?


Some products have started doing public github “roadmaps”. Use github issues to get more accessible public feedback but who knows how that gets processed internally.


The back-end is largely package type agnostic and the package manager front-ends are pluggable. I'd look for AWS to expand package manager support in the near future. Nuget was on the list along with a few other popular package managers. There's a whole lot of functionality in the platform they didn't yet expose or have finished for the launch, I'd keep an eye on this as they move forward.

Source: I lead the technical design for the product as well as a chunk of the implementation but left the team mid-2018. I don't have any specific insight into their plans, not that I could really share them even if I did.


That is strange, I wonder if that's coming later but I didn't see anything to that effect. I'd also have liked to see docker image support (despite ecr) and raw binaries too.


My guess (purely a guess though) is that this is a good proportion of the platforms AWS use internally, and that this service will expand to other ecosystems less used internally in response to customer demand.


Weird when you can just do it using S3 for 50% of the price.

https://github.com/emgarten/sleet


Static feeds are much slower than one that use a real server.


Any reason why? Could we not make it faster?


It’s been awhile since I tried a static feed. But basically, the client NuGet command had to read the directory structure to find all of the NuGet packages and versions instead of using an API where the server had everything indexed already.


You know it's an AWS service when you look at it and go "Huh, it's only 2x the price of S3, what a bargain!"


2x the price of S3 is very cheap.


It dedupes artifacts (according to the twitch demo today) so actual cost would likely be much less than s3 unless you're doing a solo project.


No deb, RPM, or nuget. Half a product really. As annoying and expensive as Nexus and Artifactory are, at least they're more fully featured.


Seems like a direct competitor for Artifactory and Nexus. I wonder if it is profitable for them to create an inferior alternative to fully flagged artifact managers. Or if they are doing this for product-completeness of AWS.


I'd wait a few years to be ready, AWS developer tools are really crude. Last year I had to build a Lambda to be able to spit multiple output artifacts in CodePipeline.


Appears to support ivy/gradle/maven, npm/yarn, and pip/twine only.


What is wrong with S3?


I don't get it.

The git server you use supports artifacts already. You could also just put all of your artifacts on an S3 bucket if you needed somewhere to put them, which is exactly what this is but more expensive. I don't understand when this would save you money or simplify devops.


It’s not “exactly what this is”. Every time AWS or Azure or GCP releases a service, there are a droves of people on HN decrying them as “just <something I’m familiar with>”, without bothering to understand if that’s actually true. It’s not.

Skim the docs and you will see it is not “just S3”.


yep. I've worked on a solution that "just uses s3". It is not trivial.


Can occur in a VPC without direct internet access. For the average developer this isn’t usually an issue but in highly secure corporate environments this helps a lot. Can’t just do pip install X in such situations. Even the S3 proxy solutions often require many hoops from the security Jedi council before you can use any packages there.

A lot of people won’t find this useful but for some it’s a big blessing.


For python at least, fetching something from git is far slower than fetching it from pypi.


The benefit is being able to keep your existing maven/npm/pip workflows as well as use the same workflow for both internal and public dependencies.


I still don't see what's different. I can configure pip to look at my git server, so that all I have to do is `pip install my_thing` and it will automatically download all public and private deps. I don't know what you mean by "workflow" in this context but this is just about as simple as can be.


You’re not the target user here. In highly secure environments you can’t just “pip install your-thing”.


Looks like you’re assuming you have some kind of access to any part of the internet you please. I envy you because most tools just work in this case.

Not so on enterprise networks.


what git server is that?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: