Hacker News new | past | comments | ask | show | jobs | submit login
Rust crate rg typosquatting/redirect to ripgrep (github.com/burntsushi)
87 points by super_linear on Sept 4, 2023 | hide | past | favorite | 35 comments



Note that this problem of squatting (like many others security problems) is mostly a consequence of unmanaged repositories where developers publish themselves (like crates.io here, npm, pypi or the various "app stores"). Well-tended community-organized repositories, like most linux distribution have, do separate the role of package maintainer. This makes a much needed buffer between users and the developers, which regularly have contradicting interests, security-, support- and integration-wise.

See ddevault's two very clear explanations of this issue: https://drewdevault.com/2019/12/09/Developers-shouldnt-distr... and https://drewdevault.com/2021/09/27/Let-distros-do-their-job....


You seem to be right in large parts, however other econsystems and package managers don't have the same kinds of problems to that degree, since their package names are namespaced.

I'm not sure if this would solve this particular issue, it actually might, but name squatting in general seems to be far less of an issue with other package repositories.


I expect to hear about the fallout of crate.io's decision to not use namespaces and have no procedures to claim names for several years to come. Its not like they weren't warned this would happen, or that there was prior experience with exactly this kind of issue (just ask the Maven team).


Is the rationale documented somewhere? I have a hard time to understand why this decision was made, especially in the face of prior art. To me it seems like a non-tradeoff.


I don't think there's an answer in one place, but various reasons have been given in many namespacing megathreads:

• Some members think squatting is not a problem, let people take all names, and then people will invent creative ones, like nokogiri in Ruby.

• Adding of namespaces only moves the squatting problem from squatting crate names to squatting namespaces. People like nice namespaces too. What if someone grabs an official-looking namespace like "aws", and what if that's a legit project?

• Using usernames for namespaces makes typosquatting even worse, because many usernames are odd and hard to remember correctly (would you remember digits in winapi's owner handle? Is it BurnSushi, BurntSushi, BurnedSushi?)

• crates.io relies on GitHub for identities, but GitHub usernames are not permanent. Crate names must be permanent. Letting the two out of sync creates new problems.

• Crate names map to Rust identifiers, and there's a bikeshed about separators and ambiguities.

• There's already a ton of non-namespaced crates, which must be supported. Having both namespaces and non-namespaces creates another bunch of bikeshed problems.

• Having anti-squatting policy is laborious to enforce, and handling of disputes was a terrible drain of resources for npm.

• crates.io is understaffed and can't deal with this right now.

• people also proposed different approaches, like UUIDs/hashes/git URLs. There's a current RFC to use existing crates as namespace prefixes for projects.


I think this covers it fairly well but wanted to clarify one item

> There's a current RFC to use existing crates as namespace prefixes for projects.

Thrs is not intended as a general purpose namespacing but to allow semi-open (rust) namespaces and should only be used if it makes sense in the code itself.

EDIT: Another one:

> crates.io relies on GitHub for identities, but GitHub usernames are not permanent. Crate names must be permanent. Letting the two out of sync creates new problems.

GitHub is also an implementation detail and they don't want to couple features to github.


This being a gordian knot is exactly why it shouldn't exist (or more precisely it shouldn't be such an important/official/recommended thing).

Providing an unified build system and tooling for rust "crates", with cargo etc is all fine, but if you're not prepared to run a registry then don't. It is almost purely a political and people-handling issue. The only value a registry like crates.io provides is to have shorter human readable names instead of git repo urls. Possibly together with the ability to run some analytics on what some people write in rust. This is purely cosmetic and is a WWW anti-pattern (urls have this job).

All these tools (pip, cargo, npm) support git urls just fine. Just don't upload to the registry and make people use your existing git url, everyone will be better off and it already works.

edit: also iirc, crates.io names and rust identifiers by which you import the library are separate (although usually similar), and the identifiers can be altered when specifying a dependency in Cargo.toml. The registry and crate name do not serve any essential functionality.


> The only value a registry like crates.io provides is to have shorter human readable names instead of git repo urls. Possibly together with the ability to run some analytics on what some people write in rust. This is purely cosmetic and is a WWW anti-pattern (urls have this job).

This is not quite true. There are some assurances around availability that crates.io provides that self-hosting wouldn't. It won't publish two crates with the same version, it won't completely delete a crate version (a yanked crate-version is still available if you have a Cargo.lock file), it won't retroactively change versions, and it won't completely go away if a domain expires. URLs in their ideal are great. In practice they are brittle.


I'm not talking about self-hosting, i'm just saying that most people will have their source already on some stable location (typically a hosted forge). I'm not so sure why crates.io should be any different than the typical source host: i don't trust you, so i'll run my own thing, but now people have to trust me and we're back at square one.

All in all your points are mostly valid (i don't fully understand all your arguments about versions as integrity should be handled by git) but again referring to my first message, they are best addressed by a community repository: these kind of guarantees can only be addressed by a trusted entity and there no reason on the world everybody writing rust should be trusting the same entity (with such a lightweight management). It can only fail in some ways.


> All in all your points are mostly valid (i don't fully understand all your arguments about versions as integrity should be handled by git)

If you depend on github.com/owner/foo@1.0.0, there's nothing stopping owner from force pushing to foo to override an existing version. The way it can be mitigated is if your dependency declaration contains a hash, so Cargo.toml files would be closer to foo = { url = "github.com/owner/foo", tag = "1.0.0", hash ="$HASH$" }.

> there no reason on the world everybody writing rust should be trusting the same entity

Note that you can today depend on git repos, local paths, or on alternative crate repositories. In practice the only people doing so are either applications (for the former) or large air-gaped organizations (for the later).


> Note that you can today depend on git repos [..]

Yes! That's what i'm doing and my whole argumentation is that this should be the encouraged default. Because cargo is already well designed and crates.io (or any custom cargo registry) only brings marginal value to it. Let people organize their infrastructure as they wish, it shouldn't be the business of a language implementation or build system.


Thanks for assembling this list, I'll reply to each.

> Some members think squatting is not a problem, let people take all names, and then people will invent creative ones, like nokogiri in Ruby.

I think we can agree that time has proven this one to be wrong (predictably).

> Adding of namespaces only moves the squatting problem from squatting crate names to squatting namespaces.

This is why Maven uses two schemes, io.github.username (and similar) in case of Git hosting services, or proven domain ownership via DNS TXT records. There has been zero drama in many years of having this scheme. [0]

> Using usernames for namespaces makes typosquatting even worse

Yes this is not a good idea.

> crates.io relies on GitHub for identities, but GitHub usernames are not permanent

A "published artifacts may never be unpublished" rule should solve that, plus protecting a namespace (groupId) once it has been used to upload an artifact by credentials.

> Crate names map to Rust identifiers

TBH I don't get that one. It seems this is more like a technical question of how to mangle symbols?

> There's already a ton of non-namespaced crates

Maven had the same issue, it is no really a problem, just confusing bc for a while there will be duplicates (could be solved by redirects).

> Having anti-squatting policy is laborious to enforce

Understandable. I think using the Maven scheme, it should be fully automatable, but I'm not sure.

> crates.io is understaffed

Understandable. I suspect corporations could be interested in having this managed properly, as this is a security issue. Possible revenue stream via managed services (Maven does this)?

> people also proposed different approaches, like UUIDs/hashes/git URLs

Haven't looked at this in detail, seems inferior because mnemonics are important, and Git URLs alone are too limited.

[0] https://central.sonatype.org/publish/#individual-projects-op...


> Adding of namespaces only moves the squatting problem from squatting crate > names to squatting namespaces. People like nice namespaces too. What if > someone grabs an official-looking namespace like "aws", and what if that's > a legit project?

Solved using domain ownership via TXT record for example.

> Using usernames for namespaces makes typosquatting even worse, because many > usernames are odd and hard to remember correctly (would you remember digits > in winapi's owner handle? Is it BurnSushi, BurntSushi, BurnedSushi?)

Fact is that people don't write their package dependencies by memory, they usually go to the readme of the project and use the instructions there.


> Solved using domain ownership

This is not a silver bullet. Domain ownership can lapse, can change. When project changes owners, they usually want their domain. So this creates an extra layer of difficulty of having a non-permanent identity attached to permanent identifiers.

> Fact is that people don't write their package dependencies by memory

They absolutely do in Cargo. `cargo add serde; cargo add tokio`.


> This is not a silver bullet. Domain ownership can lapse, can change.

Domains rarely change, and when they do you can redirect or alias the namespace. It works in the real world, and has so for over a decade. Its a package repo, not a nuclear waste storage facility.


https://samsieber.tech/posts/2020/09/registry-structure-infl... is not "official" but attempts to sum things up, and includes a bunch of links to RFCs and other discussions with comments from crates.io team members.


IMO package developers shouldn't be the ones managing package repositories (and duplicating ICAN's job when naming is concerned.)

Package management is an inherently political activity, and encouraging centralized package registries is, in my opinion, a bad decision in the long run. The security issues are one important problem, but sooner or later, somebody will have to deal with the "what do we do about excellent packages that many people lean on, but that come from a racist and fascist transphobe" problem. There's no good solution here, the choice is between potentially breaking a major part of your ecosystem or angering a violent pitchfork mob on social media, potentially with many corporations who are major ecosystem participants behind their backs. There are also other issues related to trademark infringement, patents, DMCA handling (and packages which are illegal to host in your jurisdiction but perfectly legal in the one of their developers), and important financial contributors bullying you into taking actions that serve their interests.

A better way is to follow the lead of Go (or at least pre-module Go) and use git repositories (but not necessarily with repository-based import paths) instead of your own package registry.


The Go modules ecosystem doesn't suffer from the squatting problem because they chose not to create a new vacant namespace, and the corresponding rush to fill it.

They easily could have. pkg.go.dev could be like npm. It's not a question of cost, google is paying for the infrastructure.

It seems that language creators generally get this false impression that if they are the one to create the new namespace, then it will be high quality, and the best packages will get the short de-facto names. Maybe a few of the packages they wrote themselves can get some of the first names.

That's never what happens. The wise solution is just to use DNS. We already have names, people pay for them, there is infrastructure for selling them, there is an auditable certificate system. A new package namespace won't have any of that.


This issue has been pointed out so many times, it's clear it just doesn't matter, really, to anyone on the Cargo team. Meanwhile, years after this criticism was first offered, the problem remains, only more entrenched.


Man, HN really just can't handle harsh truths. Do I need to go dig up 6 year old issues about this? Or the half dozen times it's come up on HN over the last 5 years?

I love Rust, I'm a big fanboy, but we need to stop pretending there aren't glaring blind spots.


I've definitely found `https://pypi.org/project/bs4/` useful - in Python if you want to use BeautifulSoup (a common package for parsing and manipulating HTML), you import it with `from bs4 import BeautifulSoup`, but you install it with `pip3 install beautifulsoup4`.

In this case, the `bs4` package actually directly installs what you need, though I agree with the arguments in the article why this might not be ideal.

It would be nice if the committees that deal with the language itself could also look after things like this as it's hard to say objectively (main package needs x installs/month?) when something is squatting and when it is useful, but I think a 'common sense' approach goes pretty far.


I've found this useful several times, and wish that `fd-find` did the same thing. It's not an unreasonable thing to do, IMO, under the appropriate circumstances.


Interesting read. Thanks for sharing!

This was created a year ago and Crates.io haven't taken it down so I assume they're ok with it.


It's not surprising. crates.io's stance on squatting or namespacing is to put their fingers in their ears and hope you stop asking them.

https://crates.io/policies#squatting

Eg https://crates.io/users/swmon has squatted 100 empty crates since 2017.


No their stance is to create RFC#3463[0] setting out a policy for handling name squatting and other issues. What they won't do is act unilaterally without a policy and accountability.

[0]: https://github.com/Turbo87/rust-rfcs/blob/crates-io-policy-u...


I'll believe it when they actually do something, such as delete the garbage crates from my second link. These "policies" and "discussions" have been going on since 2014. Note that even your link is not an accepted RFC; it's still being reviewed ( https://github.com/rust-lang/rfcs/pull/3463 )


You've moving the goalposts. The point is that a written RFC and one that is under review is more than "put their fingers in their ears and hope you stop asking them."


They should probably provide an official way of doing this with manual review.


Indeed, but unfortunately the crates.io team is chronically understaffed.


THe same strategy is employed by PyTorch. If you do "pip install PyTorch", like I've done many times, it just tells you to "pip install torch" instead. To be even more confusing, though, the Anaconda package is actually named "PyTorch".


rg's a rusty ag. To install ag, you usually have to guess something like "ag-the-silver-searcher". Not easy.


Why guess when there are installation instructions for various platforms on the README at https://github.com/ggreer/the_silver_searcher#installing?

Also, although it may not be easy to remember, is this really a problem in practice given the installation count in most contexts is one? If there's a context where it's installed regularly, that's a one-time addition to an install script, Dockerfile, etc. in my experience. Do you have a situation that isn't amenable to that?


I prefer Go’s imports via Git


What happens if someone takes down that git repo? Or modify a commit? Sounds like another left-pad waiting to happen. Crates.io doesn't take down packages - even the yanked ones.


You can pin the version/commit (default). Won’t cause anything to break unless you explicitly run go get -u.

Left pad situation is a concern yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: