Hacker News new | past | comments | ask | show | jobs | submit login
Two malicious Python libraries caught stealing SSH and GPG keys (zdnet.com)
624 points by choult on Dec 4, 2019 | hide | past | favorite | 317 comments



I don't know what the solution is but it feels like this is a much bigger issue and we need some rethinking of how OSes work by default. Apple has taken some steps it seems the last 2 MacOS updates where they block access to certain folders for lots of executables until the user specifically gives that permission. Unfortunately for things like python the permission is granted to the Terminal app so once given, all programs running under the terminal inherit the permissions.

Microsoft has started adding short life VMs. No idea if that's good. Both MS and Apple offer their App stores with more locked down experiences though I'm sad they conflate app security and app markets.

Basically anytime I run any software, everytime I run "make" or "npm install" or "pip install" or download a game on Steam etc I'm having to trust 1000s of strangers they aren't downloading my keys, my photos, my docs, etc...

I think you should be in control of your machine but IMO it's time to default to locked down instead of defaulting to open.


> my keys, my photos, my docs

You listed it, the problem is personal data.

If an executable nuke my computer it’s a pain but if they don’t have access to my data it’s not that severe. For a long time we conflated the two, being hacked / getting a virus meant you got downtime, and you were screwed if you didn’t have backup, but it stopped here.

But really what changed in recent years in threat level is how much personal informations are in computers, and classic security models were not thought with that in mind. There is no « personal data » flag, just a permission it belongs to a user. It worked fine for executables and working documents, but personal data are a different thing.


The problem is convenience vs. security, and we all know that even knowledgeable users will often sacrifice the latter for the former.

Technically on many OSs you already have dozens of ways to achieve what you're saying. You could spawn a VM, use a different user, use some container framework, use SELinux, etc... The problem is that usability is generally terrible. Or maybe not terrible but bad enough that many people will start mashing the "allow" button indiscriminately when they're in a hurry and want the thing to Just Work. There's only one thing I know for sure how to do with SELinux without having to lookup a guide or manual, it's how to disable it.

Look at the modern Linux desktop for instance, you need admin privileges to configure a printer but your sensitive private files are just lying on your home directory with user privileges. That makes no sense.

This is not really a technical problem, that was solved when we invented the MMU, it's an UI problem.


I agree it can be inconvenient but don't iOS and Android kind of show that for most users the experience can be just fine? For us devs we can still either open our systems or find better ways but I'd prefer the default to be more closed.

I'd also prefer the default to be VM-ish. Right now I have 275 projects in my /home/me/src folder. Every one of them has at least a build process that executes code I have mostly not reviewed. Many of them execute more code after being built. Some of them execute code when they were installed for dependencies. Maybe I'm just not used to it but I'd really like if the process for isolating all 275 of those projects was somehow easier. Or maybe I'd prefer they were all isolated by default and that we could work on ways of making them easy to do the things they really need to do and hard/discouraged to do anything that would also be a security issue.

Maybe people have suggestions. I don't want an actual VM if it means I have to install 275 copies of OSes and 275 sets of semi global dependencies (like I don't want to have to install python 3.8 200 times) but I do want all 275 of those projects not to be able to read my private keys or my photos etc or write outside designated areas


The problem with existing OSs is that if you don't make it the default and force everybody to adapt you end up with a half-assed solution with bad support that just gets in the way. I tried to use firejail to sandbox critical apps (mainly my browser, but also the closed source Spotify client for instance) but eventually gave up because it would just break too often.

On top of that the one app I'd like to sandbox more is obviously the web browser, but since nowadays web browsers are like a full OS on top of another it's really hard to finely grain permissions unless you want one profile per website. The browser needs access to the filesystem if you want to upload and download files, the graphics card if you want good performance and webgl, the sound API if you want audio, the ability to fullscreen if you want to be able to play games and videos that way, the printer if you want to print stuff, the USB stack if you want to use security tokens etc...


you could use some lightweight rootless containers: I wrapped bubblewrap in a small python-program parsing a YAML-configuration (which mirrors my hosts rootfs and provides "topical" homes: node, python, scientific stuff), but there is also something like toolbox: https://github.com/containers/toolbox/blob/master/README.md

Note that you need to secure your X-Server also, especially when sharing your network-namespace... (as it might be configured to authenticate your uid on (anonymous) sockets)


Containers (and virtualisation) are helpful but. not bulletproof. If your threat model is already malicious code with ill intent, you probably want higher grades of privsep / sandboxing.

E.g.,

https://www.twistlock.com/labs-blog/escaping-docker-containe...

https://www.exploit-db.com/exploits/46978

Full virtualisation does far better, though I believe there've been exploits there as well. Yes, from 2015:

https://threatpost.com/xen-patches-7-year-old-vm-escape-hype...


Docker has had it's fair share of breakouts because the default configuration doesn't use user namespaces (one of the most significant security isolation features in Linux) and runs as root. It's not indicative of how secure a properly set up container is -- I don't remember the last time there was an LXC or LXD breakout (which use user namespaces by default).

Source: I've found a fair few Docker breakouts. I also maintain runc, which is the lower-level container runtime component (where most of the breakouts are found).


Don't user namespaces have significant security issues themselves?


There have been security bugs involving allowing unprivileged user namespaces, but that doesn't matter at all in this case -- seccomp is used by effectively all container runtimes to block things like CLONE_NEWUSER inside containers.


Some effort towards more secure containers: https://gvisor.dev/


I sort of agree, but I think that user interfaces need an underlying conceptual model that is familiar to users separately from the actual UI. Otherwise any purely presentational change implies arbitrary new rules that no one can remember.

One conceptual model that has worked to some degree on mobile platforms is the idea that programs not users have permissions.

But mobile platforms are overdoing it in a way that makes data centric work impractical.

I'm not quite sure how to overcome these issues, but I feel that making the distinction between permissions assigned to users vs permissions assigned to programs more explicit and more visible could take us a step further.


I think this is the most sensible approach, but I'm not aware of any desktop operating systems trying it. It seems like you'd need permissions for the user to access their own files, the program to access its own files, and then a set of combinatory permissions: Sally has given Foo permission to access Bar.txt, for example.

Standard unix permissions could technically do this by creating a bunch of ghost accounts pairing the two, but it gets messy fast. Something closer to access lists makes more sense? I dunno. This is a curious problem, and my ideas are half-baked. Is there any activity in this space outside of mobile?


> Look at the modern Linux desktop for instance, you need admin privileges to configure a printer

Not for an amount of time that is firmly outside "modern."


cupsd asks for my admin credentials when I try to add a new printer, but admittedly that might not be the preferred way for people who run DEs like gnome or kde, I just tend to prefer a more barebones experience.

But that's not really my point anyway, the point is that the cupsd process runs sandboxed with its own permissions while my text editor, my password manager and my web browser all share the same UID. If there's a security vulnerability in Firefox and some attacker manages to hack my printer it's pretty bad. If they hack my password manager it's really, really bad.


> the point is that the cupsd process runs sandboxed with its own permissions while my text editor, my password manager and my web browser all share the same UID. If there's a security vulnerability in Firefox and some attacker manages to hack my printer it's pretty bad. If they hack my password manager it's really, really bad.

That's true, but what it primarily suggests to me is that Firefox should be sandboxed. It seems possible that the main reason cupsd runs in a sandbox is that it's a source of attacks on you, not a target. (Why is it pretty bad if someone else hacks your printer?)

Firefox is a plentiful source of attacks on you. Your password manager isn't -- it already knows all your passwords; you trust it by necessity.

The password manager is a high-value target, and might (does) deserve extra protections for that reason, but the parallel in your example runs between cups and Firefox, not cups and the password manager.


Yes because otherwise one user could configure a fake printer and see what others are printing. Although for one-user system this doesn't make much sense.


Right, that's the problem IMO, the usual way processes are segregated on Linux harks back to the day of UNIX mainframes shared across dozens of users. It still makes some sense on servers where you want to isolate, say, the database from the web server from postfix from sshd but it's almost entirely inadequate for single-user desktop computers.

On my desktop root doesn't really matter, if my user gets compromised it's already game over. I have a lot more to lose if by browser gets backdoored that if my /etc/shadow gets leaked.


Yes, but if we will run every application under different user (which is a sane idea, Android does it and I use it for several apps) then we come back to the problem of not allowing program X to reconfigure a printer. So I think that we still need a confirmation dialog, but without a password.


Maybe there should be a trust chain for everything.


I don't buy it.

I heard the same complaints when we went from Apple II to Macintosh, from 68K to PPC, from Finder to MultiFinder, from OS 9 to OS X, and so on. At each stage, the CPU and RAM became less of a free-for-all and more of a system where you were only allowed certain operations. (Yeah, protected memory is terrible for hacking! Sorry.)

And yet, somehow, we survived. Security got better, and usability (generally) got much better in other areas to compensate. Life is better all around when you can tell the computer accurately what you mean, instead of relying on your ability to jump across layers willy-nilly.

You're describing possible solutions using current technologies. None of those (except perhaps "users", in some form) are inherent to the design of a security model. We only have them because they were a convenient way to implement our current security model on top of the operating systems we've got now. At some point, we aren't going to be solve all our problems by adding more layers to a 1972 design. I can think of many security models which could offer better usability than SELinux or "mashing the 'allow' button".


Not really. You can have security with any system where security was present in the design phase and not trying to retrofit it into the implementation without giving up much of UX. Non if these operating system we use today were designed with security in mind because at the era they were created there was no need for security, half of the problems we know today simple did not exist. SELinux is great but extremely hard to get it right.


Secure but inconvenient has an immediate and visible downside, whereas insecure convenience has delayed and nebulous downsides. Convenience will always win when appealing to the lowest common denominator, but maybe we can have both.


This is literally what SELinux does, and has been able to do for years. We don't need "default block lists" - we need solid SELinux policy in all distros.


"Solid SELinux policy" is really the hard part there.

When I pip install paramiko, I do, in fact, want it to have access to my SSH keys. When I pip install ansible, I want ansible to be able to shell out to OpenSSH to use my keys. If I write custom Python code that calls gpg, I want that custom code to be able to load libraries that I've pip installed without the gpg subprocess being blocked from loading my keys. If I have a backup client in Python, and I tell it to back up my entire home directory, I want it backing up my entire home directory including private keys.

If I wanted an OS where I couldn't install arbitrary code and have it get to all my files for my own good, I'd use iOS. (I do, in fact, use iOS on my phone because I sometimes want this. But when I'm writing Python code, I don't.)

SELinux has been able to solve the problem of "if a policy says X can't get to Y, prevent X from getting to Y" for years. Regular UNIX permissions have been doing the same for decades. (Yes, SELinux and regular UNIX permissions take a different approach / let you write the policy differently, but that's the problem they're fundamentally solving; given a clear description of who to deny access to, deny this access.) Neither SELinux nor UNIX permissions nor anything else has solved the problem of "Actually, in this circumstance I mean for X to get to Y, but in that circumstance I don't, and this is obvious to a human but there's no clear programmable distinction between the cases."

To be clear - I think there is potentially something of a hybrid approach between the status quo and what newer OSes do. For instance, imagine if each virtualenv were its own sandboxed environment (which could be "SELinux context" or could just be "UNIX user account") and so if you're writing code in one project, things you pip install have access to that code but not your whole user account. I'm just saying that SELinux hasn't magically solved this problem because all it provides is tools you could use to solve it, not a solution itself.


> When I pip install paramiko, I do, in fact, want it to have access to my SSH keys

Do you? Wouldn't it be better if Paramiko was obliged to access your keys via the agent? Then we could secure the agent (there's more work to be done here anyway) and also it fixes problems where Paramiko wants to do something that the agent could facilitate (and so works with plain ssh) but Paramiko doesn't know about yet, like using a FIDO device to get a key.

One of the obvious things the agent could do on a workstation is mention that your keys are being used. Imagine you run a job, which reaches out via Paramiko to fifteen servers, you see a brief notification saying the agent signed 15 logins for Paramiko. That makes sense. An hour later, reading the TPS reports, the agent notifies you again, Paramiko just signed another login. Huh? Now you're triggered to investigate what happened and why instead of it just silently happening and you read a press piece in a month about how a new version of Paramiko is off-siting your keys because bad guys broke into a GitHub repo or whatever.


> Wouldn't it be better if Paramiko was obliged to access your keys via the agent?

That's just kicking the can down the road. You get the same exact problem, but with the agent permissions rather than the actual keys.

The problem is "I installed paramiko and I know what it does and I want it to access my SSH credentials, but I don't want evillib123 to access my SSH credentials even though I have installed it as well and I think I know what it does, but I am mistaken" and the distinction between the two cases above is in the intent and judging intent is hard.

> One of the obvious things the agent could do on a workstation is mention that your keys are being used

This has been tried many times. Windows UAC is one of the more ubiquitous and notorious examples. If everything starts sending you notifications you stop paying attention to them. That is what happened to UAC, it would notify users about important things, software installs, system setting changes, admin accesses etc. but it was doing it too much and most people would just click through without actually reading what the notification is about. And the reason it was doing it too much is because it cannot judge intent. It cannot tell the difference between me installing an application deliberately and me installing an application because I was tricked into it somehow.


> You get the same exact problem, but with the agent permissions rather than the actual keys.

This is also importantly wrong in a subtle way. If evillib123 steals a SSH private key, that key isn't private any more and my only option is to revoke the key and make a new one. Nothing else works, they have the key and can authenticate as me as often as they like whenever they like until that key's access is revoked.

But if they only have permission to get the agent to do operations their ability to authenticate is limited by the agent. If they lose access to the agent they can't authenticate any more. That would happen if I uninstall their malware, or if the agent locks access of its own accord (e.g. it's common to auto-lock after suspending a machine or locking the screen) or if the machine is just switched off altogether.


This is a good point. Agent still does not solve the "I thought I meant to do it, but I didn't really mean to" problem, but I agree that it does take a step towards minimizing the damage.


It's a bit like intents on Android where you also don't get direct access to (some) resources. I like that line of thinking very much.


UAC interrupts you, which is bad. UAC thinks that a thing happening is so important you need to acknowledge it. Everybody's going to learn to click past.

I'm talking about notifications not interruptions. At most a toast message, much more likely just a small indicator lights up. Not a big deal - when you'd expect it.

Think about the turn indicator on the dashboard of your car. When you indicate one way or the other a lamp illuminates, on and off, sympathetic to (and in older models directly run by the same relays as) the external turn lamps. But it doesn't ask you to confirm. "Are you really turning Left? Yes/ No" and since you're expecting it you hardly notice. But, imagine you're on the highway and suddenly that lamp illuminates for no apparent reason. That's weird right? You might be too busy to do anything about it immediately, but you'd now be concerned that perhaps there's a problem. Good!

That's what I'm talking about. Yes, out of a million users whose key got abused, maybe 90% of them weren't looking at the screen when it happened and 90% of those left were too busy or didn't understand why it was strange, and 90% of those who noticed never actually investigated and 90% those who investigated give up without notifying anyone about this weird phenomenon... you've still got a hundred users complaining about the problem.

It's not a solution, but it's a warning system.


People act like UAC leads to banner blindness but I don't think that really holds up.

In the mobile space you get prompts for soooo many things, and loads of people see "ask for location data" and say no when they think it shouldn't be used! The system works!


> and loads of people see "ask for location data" and say no

Do you have any non-anecdotal evidence for this?


plural of anecdote is data :)

I see loads of articles about people talking around permissions. Much much more than for tools on desktop computers. I believe that the higher visibility makes people noticing much more likely.

Of course the hypothetical "don't care" person won't notice.... but definitionally they won't ever notice!

I think it's fairly undisputed that the little lights on webcams that are on when the camera is enabled has totally worked, and the location service blue bar on iOS has worked well too IMO.


Abusing the ssh-agent doesn't lead to the private key being stolen, which is a really nice difference.


> Then we could secure the agent (there's more work to be done here anyway)

That's the problem, though. How do you secure the agent? How do you make sure that the program talking to the agent is doing something good and not evil with the request?

Yes, there is some defense-in-depth advantage to making this change, but the thing you're trying to solve here is that you can pip install thing X and have thing X run ssh with your credentials if thing X isn't evil, and you want to automatically determine whether it's evil.

> Imagine you run a job, which reaches out via Paramiko to fifteen servers, you see a brief notification saying the agent signed 15 logins for Paramiko. That makes sense. An hour later, reading the TPS reports, the agent notifies you again, Paramiko just signed another login. Huh?

That seems like it defeats only the most naive malware. Why wouldn't the malicious Python module sit around and wait for you to make a legitimate SSH connection? Would you notice if your agent signed 16 connections instead of 15? (What if it made one of the requests time out so it kept it at 15 notifications?)

Remember that the problem you're trying to solve is to prevent arbitrary code from being evil. This is basically equivalent to the antivirus problem, and there's a long history of appealing-sounding, naive, and ultimately useless solutions to the antivirus problem.


You've decided upon a very broad and likely impossible to solve problem, whereas I'm focused on a narrower problem.

There is relatively little incentive to just "be evil". But much more incentive for certain specific intents that are evil, and so if we can make those trickier we get most of the benefit without solving the impossible problem.

This happens elsewhere in society. We put a bunch of effort into deterring car theft, but crooks could also steal mailboxes, or shrubs from your garden, or garbage. They mostly don't though because there's no incentive - in your city chances are you can find somebody who'll take a stolen car off your hands for cash, but good luck finding anybody who can pay you for a dozen stolen rose bushes.

Likewise I doubt that there's a healthy market for "Sometimes you might with no prior notice get to SSH into a target machine". Even the raw SSH private keys being stolen here are a pretty niche product, I think the actual authentication privilege itself, rather than the raw keys, is so much harder to exploit profitably that it won't sell.

That doesn't mean nobody would do this, but it makes it into a targeted attack. Think "organised gang break into one family home to kidnap a bank manager as part of a scheme to get into the vault" not "burglars break into homes across the city to steal jewellery". We don't fix the problem, but we do greatly mitigate the impact on most of the population.


This is why I don't think this is an OS problem. I think it's a developer mindset problem.

Dependencies are bad.

Every single dependency in your code is a liability, a security loophole, a potential legal risk, and a time sink.

Every dependency needs to be audited and evaluated, and all the changes on every update reviewed. Otherwise who knows what got injected into your code?

Evaluating each dependency for potential risk is important. How much coding time is this saving? Would it, in fact, be quicker to just write that code yourself? Can you vendor it and cut out features you don't need? How many other people use this? Is it supported by its original maintainer? Does it have a deep-pocketed maintainer that could be an alternative target for legal claims?

Mostly, people don't do that and just "import antigravity" without wondering if there's a free lunch in there...


I strongly disagree that this isn't an OS problem.

s/dependency/application/g in your comment. Dependencies are just applications that are controlled through code rather than via a mouse/keyboard. They're not special.

I run a minimal Arch setup at home for my development machine, partially for security/reliability reasons -- less software means fewer chances for something to go wrong. But this is a band-aide fix. A minimal Arch setup that forgoes modern niceties like a graphical file browser is not a general-purpose solution to software security.

When someone comes to me and says that an app is stealing their iOS contacts behind their back, my response isn't, "well, its your own fault for installing apps in the first place. Apps are bad." My response is to say that iOS apps shouldn't have access to contacts without explicit permission.

The same is true of language dependencies. Both users and developers need the ability to run untrusted code. The emphasis on "try your hardest not to install anything" is (very temporarily) good advice, but it's ultimately counterproductive and harmful if it distracts us from solving the root issues.


I actually agree with you.

But until we can provide a form of static analysis that can tell you whether a dependency is malicious or not, we're stuck either manually auditing them, or not using them.

There's very little to choose between a user coming to you saying "I ran a bad application" and "I ran a bad application and clicked on the allow button because I had no way of knowing it was a bad application and I have to allow all applications". Users are notorious for defeating access permissions. Implementing this same bad solution on developers isn't going to work.


At the risk of sounding like someone who wants to spark a language fight (which I genuinely don't) this is why I love Go. The standard library is so good that I rarely need to bring in any third-party dependencies, and the few I do use are extremely well-known with many eyes on their code.


I've heard more and more sysadmin's liking go for this reason.


same. Go's attitude to dependencies and standard libraries helped convince me that this is such a problem elsewhere.


That sounds like a boil the ocean solution. We're never going to get all developers to be perfect, and besides there are evil devs as well so the solution has to be elsewhere.


Well, where, exactly?

The solution most people seem to be talking about is sandboxing imports off into containers (sandboxes, whatever - these will end up as containers) so that they can have their access to sensitive data and API's controlled. These aren't "code dependencies" any more, these are "runtime services". It implicitly conforms to "dependencies are bad" by forcing all dependencies to be external services. But it doesn't allow you to actually import known-good dependencies from trusted sources.

And specifically granting access permissions to code has always worked before, right? I mean, people never just click "allow" all the time so they're not bothered by security dialogs, do they? Why are we talking about implementing such a proven-bad solution yet again?


> And specifically granting access permissions to code has always worked before, right? I mean, people never just click "allow" all the time so they're not bothered by security dialogs, do they? Why are we talking about implementing such a proven-bad solution yet again?

To be clear, is your argument that it's too hard for us to teach people to avoid granting unnecessary permissions, but not too hard for us to teach users not to install any software in the first place?

Educating users about permissions is hard, convincing users not to download anything is impossible.


My argument is that user behaviour proves that this solution isn't actually a solution. It shifts the blame, but it doesn't solve the problem.

Developers will just allow the bad code access to the things it says it needs, because it says it needs them. Meanwhile we have another sandbox layer to deal with, which isn't good.

We need to reduce the proliferation of dependencies, and only use them for important things, to reduce the attack surface. And we need to tighten up the package managers so typosquatting and duplication of interfaces is flagged (if not banned), and we need some kind of static analysis that flags what capabilities a library uses. And I'm sure there's lots more ways of solving it that I can't think of here.


> We need to reduce the proliferation of dependencies, and only use them for important things

What you're proposing here is infinitely harder than teaching users to be responsible with permissions. If you can't teach a developer not to grant code access to everything it asks for, you are not going to be able to teach them to install fewer dependencies. It just won't happen, it's completely unrealistic.

A lot of the solutions you're proposing have significant downsides, or they don't scale. Static analysis is great, but doesn't work in highly dynamic languages like Python, Javascript, and Lisp. It also can't handle ambiguously malicious behavior, like device fingerprinting. Static analysis is just a worse version of sandboxing with more holes and more guesswork. Manual reviews don't scale at all -- they're even more unrealistic of a solution than trusting developers to be frugal about the code they install. Tightening package names is nice, but again, not a silver bullet. Sometimes official libraries with official names go bad as well. We have a lot of solutions like this that we can observe in the wild, and they don't really work very well. Google Play still has malware, even though Google says they review apps and remove fraudulent submissions.

On the other hand, we actually have pretty good evidence that sandboxing at least helps -- namely, iOS and the web. Sandboxing isn't perfect, it's a very complicated UX/UI problem that I consider still somewhat unsolved. But, iOS is making decent progress here. Their recent permission reminder system periodically asks users if they want to continue granting permissions to an app -- that's really smart design. The web has also been making excellent progress for a long time. The web has a lot of flaws, but it is a gold standard for user-accessible sandboxing. Nobody thinks twice about clicking on a random link in Twitter, because they don't have to. There's obviously still a lot that needs to improve, but if the primary concern we had about malicious packages on PyPi was that they might mine bitcoin in the background, that would be a very large improvement over stealing SSH keys.

The reason sandboxing is so good is specifically because it shifts blame. Shifting blame is great. With the current situation, I need to audit the code and do research for every single app I install on my PC -- I have to decide whether the author is trustworthy. If the author isn't trustworthy, there's nothing I can do other than avoid their app entirely. This is complicated because trust isn't binary. So I can't just separate authors into "good" and "bad" categories, I have to grade them on a curve.

I do this. It's exhausting. A system where I manage permissions instead of granting each codebase a binary "trusted" label would be a massive improvement to my life, and it's crazy to me that people are in effect saying that we should keep dependencies terrible and exhausting for everyone just because the solution won't help users who are already going to ignore safeguards and install malware anyway.

Imagine if when multiuser systems were first proposed for Unix, somebody said, "yeah, but everyone's just going to grant sudo willy-nilly or share passwords, so why even separate accounts? Instead, we should encourage network admins to minimize the number of people with access to a remote system to just one or two." The current NodeJS sandboxing proposals would mean that when I import a library, I can globally restrict its permissions and its dependencies' permissions in something like 3 lines of code -- the whole thing is completely under my control. The alternative is I spend hours trying to figure out if it's safe to import. How is that better?


Because a dependency isn't a service. You're talking about dependencies as if they're standalone services that you consume. I think that's probably the predominant attitude at the moment, so sandboxing dependencies to turn them into (effectively) standalone services that you consume might work.

But I don't use dependencies like that. I'm mostly just importing useful functions from a library. Having to sandbox that function away from the rest of my code is not going to work. I'll end up copy/pasting the code into my project to avoid that.


When we talk about sandboxing dependencies, we're talking about sandboxing at an API level, not an OS level -- in some languages (particularly memory-unsafe languages) that's difficult, but in general the intention isn't to put dependencies in a separate process; it's to restrict access to dangerous APIs like network requests.

Sandboxing might be something like, "I'm importing a function, and I'm going to define a scope, and within that scope, it will have access to these methods, and nothing else." Imagine the following pseudo-code in a fictional, statically typed, rigid langauge.

  import std from 'std';
  import disk from 'io';
  import request from 'http';

  //This dependency (and its sub-dependencies) can
  //only call methods in the std library, nothing else.
  //I can call special_sort anywhere I want and I *know*
  //it can't make network requests or access the disk.
  //All it can do is call a few core std libraries.
  import (std){ special_sort } from 'shady_special_sort';

  function save (data) {
    disk.write('output.log', data);
  }

  function safe_save (data) {
    if (!valid(data)) { return false; }
    save(data);
  }

  function main () {
    //An on-the-fly sandbox -- access to safe_save and request.
    (request, safe_save){
      save('my_malware_payload'); //compile-time error
      disk.write('output.log', 'my_malware_payload'); //compile-time error
      safe_save('my_malware_payload'); //allowed
    }
  }

We're not treating our dependencies or even our inline code as a service here -- we're not loading the code into a separate process or forcing ourselves to go through a connector to call into the API. We're just defining a static constraint that will stop our program from compiling if the code tries to do something we don't want, it's no different than a type-check.

The difference between this and pure static analysis is that static analysis isn't something that's built into the language, and static analysis tries to guess intent. Static analysis says, "that looks shifty, let's alert someone." An language-level sandbox says, "I don't care about the intent, you have access to X and that's it."

Even in a dynamic language like JS, when people talk about stuff like the Realms proposal[0][1], they're talking about a system that's a lot closer to the above than they are about creating standalone services that would live in their own processes or threads.

This kind of style of thinking about security lends itself particularly well to functional languages and functional coding styles, but there's no reason it can't also work with more traditional class-based approaches as well -- you just have to be more careful about what you're passing around and what has access to what objects.

  class Dangerous () {
    unsafe_write (data) {
       //unvalidated disk access
    }
  }

  class Safe () {
    public Dangerous ref = new Dangerous();
    safe_write (data) {
       validate(data);
       ref.unsafe_write();
    }
  }

  function main () {
    Dangerous instance = new Dangerous();

    (instance){
      //I've just accidentally given my sandbox
      //access to `unsafe_write` because I left
      //a property public.
    }
  }

Even with that concern, worrying about my own references is still way, way easier than worrying about an entire, separate codebase that I can't control.

[0]: https://github.com/tc39/proposal-realms

[1]: https://gist.github.com/dherman/7568885


Ideally, though, we wouldn't have to all reimplement the wheel the n+1th time every time. The great power of software is that something can be written once and used over and over again, unlike the way each building needs to be built from the ground, each dinner has to be cooked from the ingredients every day etc.

To give up this kind of modularity and relying on other software engineers' work would throw the baby out with the bathwater.

Sure you need to apply judgement about whether a library seems legit, but the other end of the spectrum is the not-invented-here attitude, which is also bad.


The other question to ask yourself is if you want the dependency as a visible external thing, or do you want it cut & pasted into your code?

Just saying that "Dependencies are bad" means people are more likely to cut and paste that algorithm or bit of code into your application rather than taking it from some sort of package. In this sense you also do not know that it is a dependency, and you do not get any updates or bug fixes for it either.

Have to be careful about those unintended consequences there.


Dependencies are great. Critical infrastructure with little to no security engineering effort is the problem.


I agree with you, we need developers that take resposibility for their publications, review and test their codebase and all it’s dependencies, proper identification of ”real” published code (integrity check) and also the opt-in to place trust in different maintainers.


The value-add for SELinux is that the security boundary is no longer the user. Prior to SELinux a process running as `bob` is allowed to access anything that Bob himself can access.

It at least pushes the boundary to I want to allow X program to access Y instead of I want to allow X user to access Y.

Using a user account per-app only really works elegantly on single user systems where there are a small number of apps.

bob-firefox, bob-vim, alice-evolution, alice-calculator,... would be a nightmare to maintain compared to being able to apply policy to the program itself.


> bob-firefox, bob-vim, alice-evolution, alice-calculator,... would be a nightmare to maintain compared to being able to apply policy to the program itself.

This is how Android works, btw. Each app has its own UID.

And the standard SELinux policies don't solve this problem, anyway - when there are a small number of apps, you can give them each a context, but maintaining one for hundreds or thousands of apps is a nightmare. For web serving itself, quoting from https://linux.die.net/man/8/httpd_selinux :

    The following process types are defined for httpd:*

    httpd_cvs_script_t, httpd_rotatelogs_t, httpd_bugzilla_script_t, httpd_smokeping_cgi_script_t, httpd_nagios_script_t, httpd_dirsrvadmin_script_t, httpd_suexec_t, httpd_php_t, httpd_w3c_validator_script_t, httpd_user_script_t, httpd_awstats_script_t, httpd_apcupsd_cgi_script_t, httpd_nutups_cgi_script_t, httpd_munin_script_t, httpd_openshift_script_t, httpd_sys_script_t, httpd_dspam_script_t, httpd_prewikka_script_t, httpd_git_script_t, httpd_unconfined_script_t, httpd_t, httpd_helper_t, httpd_squid_script_t, httpd_cobbler_script_t, httpd_mediawiki_script_t*
SELinux works elegantly on a very lightly configured system where there are a small number of apps and you got them all from the distro.


> SELinux has been able to solve the problem of "if a policy says X can't get to Y, prevent X from getting to Y" for years. Regular UNIX permissions have been doing the same for decades. (Yes, SELinux and regular UNIX permissions take a different approach / let you write the policy differently, but that's the problem they're fundamentally solving; given a clear description of who to deny access to, deny this access.) Neither SELinux nor UNIX permissions nor anything else has solved the problem of "Actually, in this circumstance I mean for X to get to Y, but in that circumstance I don't, and this is obvious to a human but there's no clear programmable distinction between the cases."

I don't think even your examples are obvious to everyone. For example, if paramiko is installed as a dependency of something else, it's not clear that you want to grant it access to your keys. Further, you might want to grant access to some but not all keys. There are many nuances that are unique to the particular use case which I don't think are obvious. However, that doesn't explain the inability of tools to allow us to describe these relationships.


Is it possible the OS should be designed so only the OS can read the actual private key so then at least at some level some apps don't ever need to see the key, they just need permission to authenticate.


As described in a sibling thread, while that contains the damage, that doesn't actually solve the problem - you're still giving malicious code permission to authenticate.


An alternative solution is just sandboxing of user processes ala containers. Something kind of like what Qubes does: https://www.qubes-os.org/

With a bit of work, the tech behind that could be made to work reasonably well in a general purpose linux distribution like say Fedora.


Shoving everything into namespaces doesn't solve everything because at the end of the day those 'containers' are still regular 'ole processes running in the root namespace under UIDs that are valid in the root namespace.


Qubes instances are native virtualized via hardware-assisted virtualization, so there shouldn't be any ability for instances to access root UID processes unless the hardware virtualization solution's security fails (Intel's cache/branch prediction attacks).


that's a problem with linux not taking container security seriously. there are other, more secure, container implementations in different kernels. containers are a good abstraction, but linux does them poorly.


Without absolving us all of our responsibility to do better, to create systems that minimize the potential damage by bad actors, "I want to be able to download and execute random code off the internet and have it access my file system, my cameras, my network, my everything, but only in ways that don't hurt me", has always been kind of a pipe dream.

It's also kind of hilarious when the person downloading and executing random code off the internet isn't the naive user, the pointy haired boss, the stereotypical grandmother, the dumb tween, but the "sophisticated", computer-savvy, net-wise programmers.


If you are installing paramiko or ansible you know what you are doing. It should be end user conscious choice to give them access. You should not be prevented to give that access but also that access should not be granted without end user knowing it.

You don't want to instal "some library, from somewhere" to have automatically access to everything on your machine.

I also agree with all people that comment that it is solved problem in technical means. I someone installs random stuff it is like crossing street with closed eyes, you might not get hit by a car, but yeah chances are much higher then if you take your time and look around what you are doing.


The assumption that all users know what all processes, tools, or facilities are doing, at all times, and at all instances, has proved false far too many times.

You could argue that the user should know what they're doing, but then, drivers shouldn't crash cars, and pilots shouldn't crash aircraft.

Numerous elements of this problem are simply hard, perhaps impossible to resolve. If the problem is what Neal Stephonson called metaphor shear in "In the Beginning Was the Command Line", then the fundamental problem isn't technical, but that people generally are operating under a false mental model of what computers are, can do, and do behave.

Yes, "all models are false, some are useful". The utility of this one may be past its sell-by date.


That approach has two problems. First is that access isn't fine-grained enough - you often have to grant access to far more than you intend. Second is that there's no way to know why an app is asking for access, or to be certain what it's going to do with that access.


One nice thing about a virtualenv is that you get a copy of the Python interpreter in there, so it is in fact a separate executable running, for the purposes of hanging the policies off of.


Depending on the platform, virtualenv defaults to creating a symlink to the Python executable. You can override it with --copies, but then you have a new problem: updating the interpreter in all virtualenvs when a new Python release comes out.


On Ubuntu 16.04, the default behaviour is definitely to copy it, and there are no tricks with hardlinks or anything else:

    $ virtualenv foo
    Running virtualenv with interpreter /usr/bin/python2
    New python executable in /home/administrator/foo/bin/python2
    Also creating executable in /home/administrator/foo/bin/python
    Installing setuptools, pkg_resources, pip, wheel...done.

    $ ls -la foo/bin
    total 3464
    drwxrwxr-x 2 administrator administrator    4096 Dec  4 11:06 .
    drwxrwxr-x 7 administrator administrator    4096 Dec  4 11:06 ..
    -rw-rw-r-- 1 administrator administrator    2082 Dec  4 11:06 activate
    -rw-rw-r-- 1 administrator administrator    1024 Dec  4 11:06 activate.csh
    -rw-rw-r-- 1 administrator administrator    2222 Dec  4 11:06 activate.fish
    -rw-rw-r-- 1 administrator administrator    1137 Dec  4 11:06 activate_this.py
    -rwxrwxr-x 1 administrator administrator     252 Dec  4 11:06 easy_install
    -rwxrwxr-x 1 administrator administrator     252 Dec  4 11:06 easy_install-2.7
    -rwxrwxr-x 1 administrator administrator     239 Dec  4 11:06 pip
    -rwxrwxr-x 1 administrator administrator     239 Dec  4 11:06 pip2
    -rwxrwxr-x 1 administrator administrator     239 Dec  4 11:06 pip2.7
    lrwxrwxrwx 1 administrator administrator       7 Dec  4 11:06 python -> python2
    -rwxrwxr-x 1 administrator administrator 3492656 Dec  4 11:06 python2
    lrwxrwxrwx 1 administrator administrator       7 Dec  4 11:06 python2.7 -> python2
    -rwxrwxr-x 1 administrator administrator    2341 Dec  4 11:06 python-config
    -rwxrwxr-x 1 administrator administrator     230 Dec  4 11:06 wheel
This is with virtualenv 15.0.1.


Sorry for slow response, I don't check back here often enough.

You're right about virtualenv. I don't realy use that anymore, the venv module added in 3.3 gets the job done. And that does default to symlinks for posix. https://github.com/python/cpython/blob/3.8/Lib/venv/__init__...


There's a fair amount of other solutions to this too.

Things today:

- Protective monitoring. If it's stealing your key, it's presumably sending it somewhere, so somebody alerting should be going off asking clients are calling something external.

- Peer review. Maybe its about time we question whether we should actually be running so many random modules without actually vendoring and reading the code.

- Subscribing to security lists, so the people who actually do the above stop you from using dirty modules.

- Yubikeys, etc.

Things in the future:

- New home directory model. I believe systemd crew are looking into this, so I imagine it'll be hated, even if it does help solve similar issues.

- Newer distro models such as fedora silverblue where every process is isolated. I believe this currently mounts your home directory though, so maybe wouldn't work.

- New FS layout. Putting .ssh keys in something like .secret/ssh would likely make it easier to not mount secrets into isolated processes and help MACs too.


Monitoring isn't really a solution, it just lets you know you need to make new keys and gives you a point in time to figure out if they were used after being compromised


In the way OP suggests, it is; it's kind of like Zonealarm (I have no clue if that still exists; I haven't used Windows for a long time); so it will block a process before sending it and asking the user for permission to send something to X.com. It will do that with every change of X.com. You can allow your process everything of course, but it was convenient enough to block most everything evil while letting through only what I wanted without giving my dev (or any other processes) permission to send to ..


You're right monitoring doesn't stop your key from being stolen, but it might stop your key being used or for the vulnerability being used again.


Monitoring is not a 'solution', but it's quite effective. The real issue is that instrumentation is quite weak, so detecting this attack isn't trivial in most typical monitoring tools.

Monitoring is also something that requires expertise, and therefor is something only companies are really going to benefit from. Individual users are not equipped to do this.


That's kind of like saying modern healthcare and vaccines isn't really a solution sinve someone will invariably catch the flu every time a new virus hits, isn't it?


I like the idea to solve this on the software side. Be it by peer reviews or by some sort of social network, similar to GitHub, but for security reviews of software. Then you could decide to only install libraries that have a certain number of "good" points on the reviewer network, for example.


We're already crowd souring this when you run things like npm audit, and github stars, docker downloads, etc.

The problem is a tragedy of the commons in where we expect everyone else to do this for us and nobody does. This is why we need to rely on our own developers to actually do the reviewing, and why we have the monitoring to identify when that fails.

The more ways you attack the problem, the less likely you're going to be completely owned by a failure. Obviously sticking to a small number of generally trusted popular packages will likely make it easier to establish trust than using 1500 of them.

Obviously all this allies to other layers, such as Linux, your cloud provider, your hardware, your CI/CD, your secrets manager.


One of the reasons I've stuck with Fedora is the fact that it does have SELinux turned on by default, and takes active steps to make sure that the policies are up to date. It can occasionally suck when something gets borked and one of my programs starts throwing SELinux errors after an update, but I think the security trade-off is worth it.

It becomes much more manageable when you go all-in on SELinux from the outset, rather than trying to bolt it on after-the-fact.


Selinux cannot show a popup asking whether you allow program X to do Y or whether you want to deceive it and provide fake data. So it is completely useless by itself.


I would add, SELinux policies distributed with each app, and iptables outbound rules using the owner module, as a partial mitigating control. One shortcoming of iptables / ipsets is the lack of DNS lookups that obey TTL, but that can be solved with some helper code. There may even be a xt module by now that does this, I have not checked.


If I were installing a custom kernel would adding selinux limit risk, or would a compromised kernel simply laugh at it?


This is currently only implemented in FreeBSD? No other distro uses this?


SELinux is part of Linux. FreeBSD has its own mandatory access control framework.


For this particular case: if you have GnuPG or SSH private keys, do not store them on-disk. Use a hardware token, such as a gnuk token, or a token with a secure element if you also want resistance against physical key exfiltration. A gnuk-based hardware token can be had in a nice format for under 30 Euro [1] or you could buy a STM32F103-based microcontroller for a few bucks and flash gnuk [2] if you like DIY. If you are a company, invest the 50 Euro in hardware tokens for your employees, one compromised SSH private key means that all the machines that use the public key are suspect.

I don't know what the solution is but it feels like this is a much bigger issue and we need some rethinking of how OSes work by default. Apple has taken some steps it seems the last 2 MacOS updates where they block access to certain folders for lots of executables until the user specifically gives that permission. Unfortunately for things like python the permission is granted to the Terminal app so once given, all programs running under the terminal inherit the permissions.

I fully agree. It seems that the UNIX model is not very compatible with the macOS permission system, but one could imagine defining multiple types of shell sessions, each with its own set of permissions.

[1] https://www.nitrokey.com/ [2] https://salsa.debian.org/gnuk-team/gnuk/gnuk


yubikeys can store GPG keys and emulate SSH keys via gpg-agent. More exciting: OpenSSL recently added full U2F/Fido support, but it might take a bit until that lands in all distros https://www.undeadly.org/cgi?action=article;sid=201911150648...


> OpenSSL

I know what you meant to type but, for the benefit of everyone else here, he meant OpenSSH.

(Normally I wouldn't comment just to correct a typo as the intention is usually obvious but this is a bit different.)


You're obviously absolutely correct. Thanks for the correction.


yubikeys can store GPG keys and emulate SSH keys via gpg-agent.

Indeed. Same for gnuk keys. I currently use a gnuk key (Nitrokey Start) with Ed25519 and have GPG set up as the SSH agent for SSH. Though I hope to switch to a YubiKey soon, so that I don't need a separate key for U2F/Fido2.


I'm really looking forward to the U2F support - it's built into openssh and allows multiple ssh-keys tied to a single yubikey. The gnupg stack has become more stable, but it's always been the problematic part of the setup.

The other advantage is that you can use the cheap U2F/Fido yubikeys without GPG/applet support.


Hardware tokens usually have three slots for keys. That's enough for a full set of signing, encryption and authentication keys. It is not wise to keep the primary certification key there, though. Hardware tokens can fail or be lost or stolen. Subkeys are expendable but the primary key is important enough for special treatment.

The best way to keep it offline is to make a paper backup with the paperkey tool and store it in a safe:

http://www.jabberwocky.com/software/paperkey/

In addition to this, it's a good idea to QR encode the key. QR codes are quickly and easily restored with a laptop camera and they support even 4096 bit RSA keys.

The zbarcam program can be used for this purpose but current versions have binary data decoding problems and aren't easy to interface with. I sent some patches to that fix these problems but they haven't been reviewed yet:

https://github.com/mchehab/zbar/pull/64

https://github.com/mchehab/zbar/pull/60


I never understood the point of hardware tokens. If the machine it's plugged into it's compromised, the attacker can still use it to sign whatever they want. If the attacker has persistence on your machine, it's as good as stealing the key file. You're basically counting that you can discover the infection before the attacker can use it, or the attacker is unsophisticated so all they do is copy your .asc files (although you could also protect against this by putting your key files in a weird location). Depending on how proficient the attacker is and how vigilant you are, it could be a while before it's discovered. When you do discover it, you still have to revoke/rotate your keys since you have no idea what it could have been used for. So you end up not saving any effort at all.


Yubikeys for example can require a touch on the token to activate it. A dedicated attacker could still trick you into activating the token by waiting for you to do something where you'd need to activate it, but it significantly raises the bar.

They also entirely prevent various compromises:

* Theft or loss of the laptop does not mean loss of token.

* Even if the token gets lost, cracking it will be hard. A dedicated attacker might be able to do it, but a 6 digit pin with 3 tries is hard to guess. Decapping and convincing a yubikey to reveal the secret key is likely possible, but nontrivial.

* Attacks where the attacker can read files do not turn into a compromise of key material.

* Even a compromise that allows code execution would require a sophisticated attacker to pivot: You'd need to figure out how to make good use of the acces gained either in a fully automated fashion or be online when the victim has its token plugged in. You cannot collect the key and later figure out what to do. This pretty much rules out attacks such as the one we're discussing here.

So while they do not protect against a full persistent compromise, there are quite a few cases that they do protect against.


>* Theft or loss of the laptop does not mean loss of token.

Doesn't seem relevant when you probably have FDE enabled.

>* Even if the token gets lost, cracking it will be hard. A dedicated attacker might be able to do it, but a 6 digit pin with 3 tries is hard to guess. Decapping and convincing a yubikey to reveal the secret key is likely possible, but nontrivial.

To be fair, if you weren't using a token, and were just storing the password protected keyfile on your FDE protected computers, there's nothing to "lose" either.

>* Attacks where the attacker can read files do not turn into a compromise of key material.

Only if the keyfile isn't password protected.

>* Even a compromise that allows code execution would require a sophisticated attacker to pivot: You'd need to figure out how to make good use of the acces gained either in a fully automated fashion or be online when the victim has its token plugged in. You cannot collect the key and later figure out what to do. This pretty much rules out attacks such as the one we're discussing here.

Keyfiles are already harder to monetize than other information you can steal off a computer. There's no market for id_rsa/.asc files, but there are for credit card numbers, personal info, bank/email logins. You have to put in the legwork to make money off them (eg. logging into each server and checking what's on it or whether they can be used to pivot elsewhere, seeing who your contacts are to see whether they can be duped using a signed email, or checking whether you're a maintainer for a software project and using your key to sign a malicious update. Therefore, it's safe to assume that attackers interested in your key files are also sophisticated enough to perform the pivot.


> >* Theft or loss of the laptop does not mean loss of token. > Doesn't seem relevant when you probably have FDE enabled.

You still need to regard the key as compromised - it's no longer under your control and you have no idea what a potential attacker would try. Most FDE does not lock the disk when the computer goes to sleep, so the attacker can now try to break in via Firewire, ...

While with a physical token, as long as the token is in your possession, the key is entirely under your control (unless you have a backup on your computer, which kind of goes against the idea)

> Therefore, it's safe to assume that attackers interested in your key files are also sophisticated enough to perform the pivot.

Like they strictly did not even attempt in that case? This seems to be targeting python developers. Now, with a python developers ssh keyfile and gpg keyfile (if I manage to unlock it), I could do quite a bit of damage For example try it on github. It's trivial to associate an ssh key with a github acct - the info is public.


>Most FDE does not lock the disk when the computer goes to sleep, so the attacker can now try to break in via Firewire, ...

Fair point, but this is a very untypical threat model. Basically it protects you against targeted physical attacks. Targetted, because your average laptop thief isn't going to be pulling off DMA attacks. I certainly have not heard of it occurring (targeted or untargeted) in the wild.

>This seems to be targeting python developers. Now, with a python developers ssh keyfile and gpg keyfile (if I manage to unlock it),

There lies the problem. If you used a reasonably secure password (ideally from a password manager), your keys would be as secure as they would be stored on a token. This wasn't an attack that only tokens could mitigate. A free password manager would do just as well.


Yubikey owner. You have to physically insert the USB key and then interact with it (touch it) to do signing/auth/etc. If you hijacked the signing touch step, you'd notice that your signature wasn't generated by the touch and would know you're compromised. The key isn't plugged into the computer except when you're using it, and if you're in Qubes you have VM isolation for the USB ports. It narrows down the attack surface a lot, and makes anyone not using one a more attractive target.


>Yubikey owner. You have to physically insert the USB key and then interact with it (touch it) to do signing/auth/etc. If you hijacked the signing touch step, you'd notice that your signature wasn't generated by the touch and would know you're compromised.

as mentioned by a sibling comment this can be worked around by social engineering. some ideas:

* simulating software/hardware/connection error, forcing the victim to retry. bonus points if you only start doing it after the victim installs an update to gpg

* in cases where you know the signature doesn't have to be valid, substitute a legitimate signing request with your payload, and returning a fake signature for the legitimate request. for instance, the you coax the victim into sending a signed gpg email. you know that nobody would be checking the signature except for you, so you detect that case and use that opportunity to sign your payload, and return a fake signature to the email program.

* my favorite: causing gpg to fail (thereby forcing the victim to retry) by injecting typos into his terminal when he's invoking gpg from the terminal


I use Yubikeys for signing multiple times a day, yet I must still enter my PIN every time (and the same goes for my "real" smartcards as well).


I think a hardware chip would solve ‘crown jewels’ theft (eg I get rekt but at least my SSH key isn’t stolen for good), but there isn’t much you can’t do to my host if you have root, persistence and patience.

Yubikeys are silly and designed to sell rather than solve a problem effectively, IMO.


As much as I would like very much to have an open source hardware token, there’s a major downside to the nitrokey start and other STM32F103 gnuk devices. The microcontroller itself is not hardened at all in hardware. It would not be remotely infeasible to extract secrets from its hardware using relatively available tools. This is in contrast to a true smart card or a yubikey, which have tamper evidence.


For a long time, the security model of OSes has been, "only trusted code should run."

That model doesn't work. Code affects too much of our life and is used in too many scenarios for a binary "trusted" boolean to be feasible for most people.

Offices and houses have locks inside of them as well as outside. If I invite someone into my office, they don't immediately have the key to my server room. But (while I agree that software bloat is a problem) you'll still find plenty of people after these issues who argue that having too many packages is the real security issue, and really all this comes down to is vetting our repositories better, or forcing everything to be signed, or whatever.

In reality, the long-term solution is that we have to start taking native sandboxing seriously -- embracing efforts like Wayland, Flatpak, SE Linux, JS Realms, and turning on secure sandboxing systems by default. The problem isn't NPM or PyPi, it's Node/Python sandboxing. The problem isn't allowing arbitrary browser extensions, it's per-domain extension sandboxing. The problem isn't third-party scripts, it's browser fingerprinting.

It is impossible to scale a trustworthy repository to the size of NPM, or PyPi, or the Apple store, or AUR, or the Chrome web store. There is not, and is never going to be, a trustworthy repository of that scale.

In the meantime, because most platforms don't have serious sandboxing controls turned on by default, you just have to reduce your dependencies and install less software. But that band-aide fix will get less and less useful over time, because more and more of your life will depend on software from more and more diverse sources, and it will be impossible for you to vet everything. Being conservative about dependencies and code you run is an extremely temporary fix that is not going to work in the future. But people treat it like it's the obvious solution and that we don't have any need to address the fact that most consumer-grade OSes, platforms, and runtimes are simply crap at security.


When I first realized that any and all code that I execute, has read/write permissions to most of my filesystem, it blew my mind. The OS grants every process its own unique virtual-memory-space, specifically to prevent malicious/accidental interference with other processes. It seems like the file-system really should operate on a similar principle as well. Every application should run in a sandboxed environment by default, with exceptions being granted by the user for specific applications that actually do need access to the entire file system.


This is already possible in Linux with mount namespaces, and used by (for example) systemd to block access to /home by services if so configured by the user.


I wonder if there is a Linux distro out there that works like the OP wanted out-of-the-box, with userland processes sandboxes by default, and providing a slick interface to grant access to areas of the filesystem when wanted?

There are so many distros with little differentiation - I'd think something like this would be quite unique (unless it already exists, and I don't know it!)


Do you mean something like Fedora silverblue?



> Unfortunately for things like python the permission is granted to the Terminal app so once given, all programs running under the terminal inherit the permissions.

Would Python permissions even be enough though? All it takes is one legitimate Python application wanting your Photos (lets say some Python photo manager app you wrote) and now all Python libraries get access.

Unfortunately, I think Apple's direction may be correct. It feels hugely inconvenient, but an end goal of all processes being signed and explicitly allowed certain things seems useful.

At the very least, Python should perhaps never get access to anything beyond the devving folder. Then any real use of Python applications would have to be properly baked out into processes where the OS can manage permissions for. This goes for all languages, Python was just an example here.

This is all off the top of my head, so I could be way off base. But it seems logical to me in the moment.


> Unfortunately, I think Apple's direction may be correct. It feels hugely inconvenient, but an end goal of all processes being signed and explicitly allowed certain things seems useful.

Apple is operating a racket. There's e.g. no need for signing (and developers buying expensive certificates), you could also have a dedicated "permissions agent" checking executable hashes against an online service, and giving them as many permissions as desired... So users could (1) pick and choose their "permissions providers" (not necessarily Apple, could also be e.g. GNUpermissions or WikiPermissions or whatever), (2) users could modify permissions (e.g. provider defaults to Web Access but I want to deny it for this specific executable), and (3) all (even unsigned) programs could run, but with minimal permissions by default (i.e. sandboxed).


Wasn't this the promise of containers? "python" should have access to the whole system that it is run on, but instead of running random python scripts you download from the internet on your base system that has all your personal data in it, you run it in a container that only has the specific files that the script needs access to.


The no setuid shell scripts rule on Linux was an early attempt to deal with this.

https://unix.stackexchange.com/questions/364/allow-setuid-on...


Well it does get worse. Lots of apps/programs run in python. I want `python goodprogram.py` to have access to certain things but not `python badprogram.py`


Yep, and Apple, Google and Microsoft aren't helping. Particularly bad these days is the idea of granting apps permissions; for example, if I want to deploy a new Google Drive application for some whiz-bang thing, I almost certainly need to "Grant read and write for all documents." Well, I don't want to do that, so I don't use any Google Drive add-ons ever. Is there a way to restrict privileges by folder? Nope! Same is true for Microsoft Office 365. Apple is trying, but still a long way to go.


> Is there a way to restrict privileges by folder? Nope!

Playing devil's advocate... I'm sure most folks here appreciate that providing granular level of permissions/restrictions can often lead to a less secure environment!

Users (of pretty much every skill level) quickly become overwhelmed and give up.

I recall advocating that a large client group manage their own permissions in Sharepoint. It was a painful and futile exercise! Though really not much different than IT admins trying to managing security of a large organization using active directory.


Yes, it's important to have great tools for helping users with this. Just like how Google Drive goes to great lengths to show which folders are shared and give the user good access to those policies.

All I want is for an application to be subject to policies just like other users are. Is that so hard? And honestly, to provide all this tooling for being careful who you share files with, while providing none of that to being careful about which applications I share data with, is a little bit disingenuous about the relationship between people and applications. People write and operate applications, so when I am sharing my data with applications, I am sharing it with people too. I just don't get to find out who or have any specific control over that.


You could probably do something with access control lists and setuid/setgid. Take a set of programs that all should have the same set of accessible folders and the same set of inaccessible folders, and make them setuid to the same UID, and then add the appropriate entries for that UID to the access control lists for all those folders.

The usual interfaces for manipulating ACLs are not very friendly so you'd want to have some kind of tool to set this thing up. Managing the ACLs, and managing the program grouping UIDs, and making it so things don't get regularly broken by package managers that don't know about this system is probably going to be challenging.

What I'd like to see is someone make an access control system for Unix based off of the ideas from the old DEC PDP-10 operating system TOPS-10 file daemon system. TOPS-10 originally just had owner/group based access. Later, they added the file daemon access control system. The way that worked is simple:

1. If an access was allowed by the owner/group system, it went through normally.

2. If an access was NOT allowed by the owner/group system, and the new "use file daemon" flag had not been set by the caller, the accesses was denied.

3. If, on the other hand, the "use file daemon" flag was set, the OS sent a message to the file daemon describing the desired access, and asking the file daemon if it should be allowed or not. Whether the OS allowed the access or not was determined by the file daemon.

In what follows I'm probably getting some details wrong, but the general idea is right.

The way file daemon made its decision was by consulting a rules file containing access rules. The rules file was named ACCESS.USR. I don't remember if file daemon looked for that in the same directory as the file it was being asked about, in the home directory of the owner of the file, or what.

The key idea is that the access rules for a file were in ACCESS.USR, NOT in metadata of the file itself. ACCESS.USR allowed for wildcards, so one rule in there could specify the access rights for a whole class of files--including files that did not yet exist.

My recollection is that the rule matching could be based on the name of the file someone is trying to access (full or partial, with wildcards), the user and group that is trying for access (I believe wildcards were allowed here, too), and what program was being used.

So let's say you wrote a game, and it wanted to keep a high score file. You could make an entry in ACCESS.USR that said anyone was allowed write access to SCORES.TXT if they were running EMPIRE.EXE. (Picking EMPIRE for my example in the hopes of summoning Walter Bright, since he probably remembers more details than I do about file daemon). Then just make sure to set the permissions on SCORES.TXT so that access by anyone other than yourself will fail, and make sure that EMPIRE.EXE sets the "use file daemon" flag when it tries to write the score file.

This fits in well with the way users think about things. For example, let's say I've got public and private key files. My public key files have .PUB extensions and my private key files have .KEY extensions. It is easy to put a rule in ACCESS.USR that denies all others access to my .KEY files. If I do something that creates a new .KEY file, I don't have to make sure whatever tool created it set the ACLs...I just have to make sure I use a .KEY extension.

It's a lot easier to develop and maintain a naming convention that reflects my security requirements and stick to that than it is to manage per file ACLs that reflect my requirements.


Qubes OS already exists! It's been perfectly usable for more than a decade now. It can't be recommended enough: https://www.qubes-os.org/

Sadly "modern security awareness" doesn't seem to really amount to anything, and existing solutions go unused. %90 of this "malware library" problem too would have been avoided if package repositories just required all packages to be signed with keys on hardware dongles. Ruby and python and some others at least have the excuse of inertia, but it's pretty cynical of everything else to not require signing already, when the added friction is nothing relative to the effort of writing software worth publishing.


> %90 of this "malware library" problem too would have been avoided if package repositories just required all packages to be signed with keys on hardware dongles.

I'm with you about requiring signatures, and you can get around the FUD about packages getting abandoned because of developers losing keys by implementing something like TUF[1] (because of delegations in the targets role), but I don't really see how you can enforce dongle usage. That is, how can the repository administrators tell the difference between a signature from a key on a hardware dongle and a signature from a key on somebody's windows laptop? You'd need an IRL auditing process, which just isn't feasible for most open source packages.

[1] https://theupdateframework.github.io/


I don't have any particular definitive solution in mind. Attestation is possible (e.g. https://developers.yubico.com/PGP/Attestation.html), but just telling people to do it right should go pretty far, especially if the packaging software doesn't cater to circumventing the policy.


I don't think that's a good solution.

A) it adds cost to what was previously free for many people

B) it does nothing to deter bad actors

C) many software packages would be abandoned because their dongle was lost or destroyed


Why? If I sign a package, that's proof I endorse the contents... but how do you know I'm not a malware author?


You can add android to the list, any app which is given storage access can access any of your files there. So, if we had downloaded sensitive.pdf from email and it lies in the downloads; theoretically any of the dozen apps which has storage access permission can siphon it off to their server.

Android Q has 'scoped storage access' to prevent this and individual folder access permission; but is optional and will be enforced only in the next major android version which by android update standards will take another 5-6 years to get widely adopted.


I suspect the solution is a combination of selinux (apparmor?) and curated code repositories. I don't think sandboxing alone is a solution as it doesn't alleviate concerns with zero days.


Maybe the future is in fully sandboxed/containerized apps, where each action outside of sandbox/container is confirmed by the user.


That is a super-broken future. I mean, that's working... sort of OK for cell-phone-style use, where the vast amount of use is single-service, casual human-computer interaction.

It is horrible for any reasonable professional workflow. Even if you get permissions lined up correctly once, when you change your workflow, shit breaks. and more subtly, the way permissions accrete through reactive interactive dialogs, folks become less secure over time - past permission "paths" enable capabilities that the user no longer even remembers making.

More formal ways of defining security policy of course work better (I deal with that sort of thing for a living), but expecting normal people to be able to understand how to, e.g., use `restorecon` or the moral equivalent is a nonstarter.

Personally, I'm headed back to Linux when I need to replace the Macbook I'm typing on. I'm no longer the target MacOS user, and I'm not going to rely on a machine I don't control.


Alert fatigue -> user accepts everything


Is there a better solution though? I agree that's a super real thing, but having no security at all (as is the case now, basically) is infinitely worse.

We would need to change some things, but I'd love to know if some random lib I'm using is trying to do things my application shouldn't. Or if something I installed, say Docker, is trying to access my photos or ssh keys.


That was my experience of SELinux.


See Windows Vista for reference.


This was my first thought too!

It was also my first thought after I recently upgraded MacOS, and was bombarded with multiple permission dialogs whenever I did anything.


True, but this just makes the point more. Because after that was 7, with lessons learned. Gotta start somewhere


When Docker was first introduced this is exactly the niche I thought it was going to fill. After I elbows deep into it I saw that wasn't the case.

I still think something like it could evolve to fill that role.


On freeBSD there is jails since year 2000.


Yeah I think about that each time my boss talks about how new and great docker and kubernetes are. It was old hat on FreeBSD 4.0 mate...


Jails, yes; k8, no.

But I do have to say, the more Canonical and Redhat attempt to "innovate", the more I appreciate FreeBSD.


OpenBSD has a really novel concept of using unveil and pledge to contain processes into restricted-service operating modes. And being OpenBSD, these security features are actually turned on by default for many binaries. See discussion in https://news.ycombinator.com/item?id=17277067


macOS has this already.


Does it? On OpenBSD, Chrome cannot read my ~/.ssh directory, can the same guarantee be made on Mac?


Yes, you can write custom sandbox profiles and apply themselves to yourself using sandbox_init(3).


Good to know, but I don't need to write anything on OpenBSD. The protection is just enabled by default. That is more comforting to me as a user.


Certain programs have default protections applied to them, such as most App Store apps.



You know you should not run as root and you should not install all kind of crap on your computer.

Every time you walk on the street you have to trust 1000s of strangers they are not going to murder you or rob you. You also know you should not go into dark alleys alone at night.

This is just how life works.


The first step is to reject the fashionable "wild-west" style of package repositories and realize that maintained repositories came about for a reason. Then you can worry about rearchitecting your OS for sandboxes up the wazoo.


Some "easy" tools:

1. Ban libraries that have names that might confused with others (so no I where L is etc).

2. Only use libraries that have been around for over 1 year (you're relying on the community to debug them).

I think with these two alone you might get rid of most issues. The problem is that someone might buy an old, trusted name and then inject malicious code. I don't know of a technical way to enforce change of ownership that might be useful and not too much work. Alternatively a nefarious player might release useful code, bide their time to get a reputation, and then inject malicious code.

Short Life VMs are a good idea but sometimes you want them all to talk to each other.


The problems are forks, though. Anybody can create a copy and the problem is that e.g. on Github there are tons of library clones that aren't even marked as fork, because they have been cloned and then imported as a new library for whatever reason by the new maintainer. I've often had substantial troubles finding the original library or the most well-maintained fork.

Large repos like Github should do an automated similarity search and prominently display potential older (as in creation date) versions of the same library even if they are not forked from the original repo directly.


> Ban libraries that have names that might confused with others (so no I where L is etc).

I agree that this is some obvious low-lying fruit here. Given that the onus is currently on devs here, an actionable solution today is to use a font which makes the differences between i, I, L, 1 easily distinguishable (obviously not fool-proof), or bake a check for malevolently named dependencies into your linter or plugin.


Just have package managers run as a separate user in a separate namespace? I don't get why we let packages do crazy shit on our system as a fully privileged user.

Look to browsers, imo. 'Extensions', which are very analogous to packages, have to package a manifest, users must explicitly ack the permissions, and they have restricted access to the system.


Most packaging systems are designed with the expectation that they will be installing arbitrary things that can interact with one another in arbitrary ways that should not have to be defined in advance for a given package.

The result has been a very hands-off approach. Package management systems for user-land libraries, like pip and so on, are extra guilty of being hands-off. And who wants their dev tools telling them what they can and can't do?


Yeah I think it's totally insane. If your package requires a system dependency, like a binary, it should simply fail to install and let you know that you should go get that dependency.

It should not, instead, require root privileges to add that to your system. Or, we should be providing virtualization like containers so that it can do 'root-y things' safely.


I've been having a happy time putting together a BuildStream project (https://buildstream.build/). It's meant for integrating packages into a larger system, for instance Flatpak runtimes and Linux system images, but it happens to do a really good job of this because every component is built in its own sandbox - with zero internet access, because they're going for reproducible builds. (Go to a commit from two years ago, build it, and you should get the same output you did back then).

This means you have to obsessively specify dependencies, including exactly which files to fetch from the internet and which system components are required. And it also means some lovely security wins. For instance, if everyone lost their minds and decided that we should wrap all of our Python packages with BuildStream metadata and build everything that way, a build-time attack would be extremely difficult :)

Of course, said library could do whatever evil it wishes for end users, but I guess the interesting lesson here is that the steps required for sandboxing can have genuine benefits outside of security, as well.


It sounds like you're describing snap packages. Is that accurate?

That mostly works for whole applications. I'm not sure how to apply it to libraries for developers though. Especially ones that would get installed in userspace, like these python packages.


I'm not super familiar with snap, my weak understanding is that it makes for a very poor isolation boundary.


Yup, in large part because users expect things to work across the boundary.


Yeah but not just package manager. As a developer or build machine you are just the first potential target and potentially not even the most attractive one. If you've got a dependency on malicious package that is getting embedded into your code there's near limitless potential downstream harms waiting after the code is deployed.


There's a few different threat models, and different mitigations.

We have attackers who:

a) Want to run code on package installation

b) Want to run code on application execution

Restricting packages on installation helps with (a)

I disagree that developers are not attractive targets. If I were to target a typical tech company I would absolutely go for a developer. I'd wait for them to SSH to production, hijack their connection, and start moving around. That's a hell of a lot better than if I attack HR and have to start moving laterally in a corporate environment to escalate.

As for backdooring production services or applications, ala (b), it's a more complex and bespoke problem to solve I think. I think package managers themselves are not suited to solve this problem and it is up to the OS to restrict programs more by default, like mobile OS's/ Browsers do.


This sounds kind of like what Microsoft did with UAC in Windows Vista. As I recall, it was a terrible experience and everybody hated it. Seems like the trouble is that the huge installed base of applications have no accommodations for anything like that, which makes it awful to try to impose stricter permission requirements now.

Even imagining how is kind of tough. How do you let some Python scripts, but not others, access certain directories, like the ones with your SSH keys? It would have to be built into Python, which may mean major changes to tons of packages. Ditto Ruby, NodeJS, Perl, PHP, and every other interpreted language out there. And how do you develop compiled applications in such an env? Suppose every new build would have to be signed with the expected permissions. But how would you let one internal package access a dir, but not some other one? More internal permission systems I guess?


I absolutely agree with you. An OS should be many read only folders like /kernel /etc /bin that can be changed in an update process (that requires reboot) and folder where you give access to specific applications like /bin/ssh ro -> .ssh/id_rsa. Python process should not have read access to this folder by default at all. There are other solutions that you move all the credential access to a service that is accessed on a port by applications using tokens with different privileges. The current implementation of operating systems are pretty horrid, retrofit security all over. I think we are ready for a new OS era.


Why is it that we accept that any piece of code can just randomly reach into the file system and connect to any server?

Why isn't the file system an object you pass around instead?


The phrase you're looking for to poke into a search engine is "capability-based security": https://en.wikipedia.org/wiki/Capability-based_security

It's a long, kinda story, which I'm not intimately familiar with, but seems to boil down to, it's more effort than we're willing to spend on rebooting our entire computing infrastructure. (Lots of things that could improve computing have that problem. "Rebooting our entire computing infrastructure" is a boil-the-ocean problem now.)


Getting it into a mainstream OS is a huge undertaking - but doing it within the confines of a new programming language is straightforward. Haskell mostly does it, at least with the Safe Haskell implementation.

In an object oriented setting, it's simply a matter of not exposing the filesystem (etc.) as globals, but instead as an object that is passed around explicitly, starting from main(). There are a few upcoming languages that do this.


I'm not sure to what extent "rebooting our entire computing infrastructure" is necessary. Current OS efforts with capability security include SEL4, Genode, and Google's Fuschia, but Cambridge's Capsicum "hybrid" system adds capability primitives to the POSIX API. (Their CHERI capability hardware recently made the news with funding.)


As with many other concepts, you can create your own virtual machine or whathaveyou (in the sense of "C virtual machine" or "JVM", not a full stand-alone VM necessarily) that can implement whatever you like, but every time you have to reach outside of that, you end up back in the old world, and that limits your ability to truly enforce whatever new security thing you want to enforce. e.g., you can wrap all access to external executables in your code all you like, but once you allow access to bash or Python or something, your new capabilities-based security system is now gone and you're back in the world of Unix.

In a true top-to-bottom capabilities based system, that would not be true. You could hand out access to a shell along with a certain set of capabilities and be much less concerned about what will happen in that shell. Hypothetically, implemented correctly, with CPU & RAM-based capabilities, you could safely hand out shell access to random people on the internet and be sure they won't do anything wrong. That is not the case today, if you just have your own little VM world.

If you try to extend your VM, you find yourself re-implementing more and more of the world. Creating even a simple GUI, for instance, is a huuuuuge undertaking for a very small group of interested people. The baseline of expected functionality in a new language environment is going up every year. It's hard to get over the initial hump.

On the flip side, if you do it from the OS level, and you truly implement this new model and don't have a generic "give up and just go back to UNIX/Windows/etc" callout, you have the problem that while you have an OS, you can't do anything in it, no matter how much better it may theoretically be. You can't get anyone to work in it, because there's no value there. See Plan 9 for an example of how this goes down.

Getting something like a capabilities-based OS going is basically "rebooting the computing world" because it's not just a matter of getting "an OS", it's a matter of getting an OS, a windowing environment with some usable GUI, a browser capable of browsing the "real" web and not just some 1995-esque subset, some kind of terminal, a whackload of libraries for programmers to use, a whole bunch of apps I'm not even thinking of here, and more before anyone will even give you the time of day in the real world, and without making it out to the "real world" all your security is pointless. It's very difficult to create a business plan where that makes any sort of sense; no matter how glorious the payoff may be in 20-30 years, simple time-based discounting of value makes it very difficult to be rational to commit the vast amounts of effort it would take today to start getting there. Those research projects are great, and I value them, and wouldn't ask them to stop, but they are far, far less than 1% of what would be required to make them truly useful. Nobody has crossed that point in the last 20 years, even without trying to bring a novel security mechanism to everybody. The closest thing to a new OS we've gotten are the mobile OSes which are still fundamentally just competently-managed UNIX systems under the hood. What capabilities appear to be there are still just bashed on top of the UNIX permissions model, not the true, granular capabilities of those research projects.


I agree with this as well. I feel like the OS should require apps to specify which domains they'll connect to and they should be allowed to only to connect to those domains. Yes you should be able to specify all domains like if you're making Firefox but otherwise it should be looked on with huge suspicion to list more than a couple. As it is any app/install script/build and scan my local network, find out my SSIDs, tell what devices are connected to my lan, which ports are open, what software is on some of those ports, and of course exploit any known vulnerabilities.

I have a guest WiFi partly for that reason so when guest come over their apps hopefully can't hack stuff on my non-guest network. Unfortunately every new game/app/dep I install on my phone/pc/mac/tv/appletv/ps4/switch can still do that and while I might trust the app devs I don't trust the library devs, especially the analytics libs compiled into every game.


Flatpak offers sandboxing, but I think it's mostly meant for applications, not for development tools (where you often need to access your keys anyways).


Windows have had locks for specific things for many years. They also created a new model with sandboxed runtimes för Windows Store Apps but those are not very popular.

Yarn, a ja package manager, tried to forbid any code execution during installs. That may be a good idea. At least show a warning or require an extra flag before executing anything.


In the old day, I used ZoneAlarm to alert me / block any app that was trying connect to any networks by default.

Does anyone know similar solution for today's win10, Linux, Android environment?

Win10's firewall is ok for blocking when configure manually. But it lacks the "alert" functionalities on any new app/connections requests.


> Microsoft has started adding short life VMs. No idea if that's good.

The lightweight sandbox VMs seem to be pretty useful so far. It's still so new that I've not seen it used outside of blog posts yet, but that's the nature of cutting edge features only "just" released. (It was released with 1903 in May, which was delayed several months and a lot of businesses still are hesitant to install it.)

Windows 10 has had Controlled Folder Access (aka Ransomware Protection) available as an option for a while as well, which acts like Apple's system of forcing apps to need additional permissions to user directories. It's not quite as fine-grained as Apple's yet, and it is certainly not on by default.

(I turned it on for interesting paranoia reasons on my gaming desktop, and it's been fascinating to watch what gets blocked. Though my paranoia led me to adding Steam folders to the Controlled Folders list, which has especially made it a permissions whack-a-mole because Game Developers are children and games access random folders all the time, use five EXEs where one probably should have been enough, use random EXEs in TEMP folders to call EXEs in Steam folders, games not installed by Steam sometimes try to access Steam folders, NVidia wants to touch everything, etc.)

> Both MS and Apple offer their App stores with more locked down experiences though I'm sad they conflate app security and app markets.

Windows 10 has had sideloading on by default for years now, and supports Win32 apps inside APPX/MSIX. Win32 apps by default aren't nearly as sandboxed sadly (because it is hard to guarantee Win32 apps work as expected when sandboxed), but some sandboxing is better than none. The biggest issue seems a lot less to do with "app market" (you don't have to publish to the Microsoft Store, and MSIX even supports auto-updating for sideloaded apps vaguely "ClickOnce-style") and a lot more to do with "please rebuild your installer to something more modern", which is a hurdle a lot of developers don't want to overcome. (Because installers are terrible and once one is built who wants to rebuild it. Because MSIX support for Windows 7 is still "Beta" with the April end-of-support for Windows 7 looming over everything, and some Enterprises clutching their Windows 7 purses as the new XP...) Who wants extra app security when it means a lot of development effort to repackage your app? (Even if it can be semi-automated from your existing installer.)


SandFS could be used for lightweight custom sandboxing $HOME dir. It uses eBPF to let you insert custom checks. Check https://lwn.net/Articles/803890/


We need library level isolation, not just app level isolation (which desktop OSes don't even bother with).

Fortunately there's some amazing work on supporting library isolation in WASM. There was a really good blog post about it recently.


When I see any software updating its dependencies or an app store I think the security battle is lost. It's very easy for any gov intelligence agency to seize an abandoned software and start to use it to distribute spyware.



Running everything in containers seems to make sense.


Use SELinux


This is spot-on. In a perfect world every major OS would have proper, granular mandatory access control enabled by default and applications would come with a profile specifying precisely which resources they require – at least regarding the more critical stuff like keys and cookies – with attempts to access anything else triggering an optional notification. Hopefully macOS will become more granular that way and Apple will continue pushing and improving what they began with Catalina.

Meanwhile, in a less than perfect world there's XFENCE [0], previously known as LittleFlocker. It's basically LittleSnitch for files. It was originally developed by Jonathan Zdziarski and later sold to F-Secure.

The challenge is to set it up in such a way that the level of interaction is kept at a minimum while still providing some level of protection.

I might write a detailed blog post / howto about it, but meanwhile here's the TL;DR if someone wants to try this blacklist/greylist approach:

1. Set an 'Allow any app – rwc' rule for /Users to override the default 'Watch – rw' rule there, which would otherwise result in a ton of popups. This does not override the more specific watch rules for some critical resources like loginitems, etc.

2. Add watch rules for additional critical resources, like ~/.gnupg, ~/.ssh, ~/bin, possible password manager directories, Firefox/Chrome directories to prevent cookie extraction, etc.

3. Temporarily add a watch rwc rule for ~/, thus overriding the Allow rule for /Users.

4. Run any network connected software with a potentially large attack surface like browsers, torrent clients, vpn clients, etc. and give them the required permissions to your home directory using the popups. Make sure to put them through their paces in terms of file system access to cover all possible use cases.

5. When they are usable without any more popups, remove the temporary watch rule and add 'Deny rwc to /Users' rules for each one, thus overriding the general /Allow rule we created above. An application-specific watch rule would be nice here instead, but sadly that doesn't seem to be possible – watch rules apply to all applications.

Execute steps 3–5 for any other untrusted software you might want to install/run.

When combined with LittleSnitch to catch possible attempts at data extraction, this reduces the risk of rogue applications extracting/damaging critical data and limits the potential damage of possible RCE vulnerabilities in network connected software. And it does this with a minimum of interaction – after the initial setup phase.

I've been running LittleFlocker/XFENCE for a couple of years now and the setup described above for maybe a year and it works like a charm, currently on Mojave, previously High Sierra, all the way back to Capitan, if memory serves.

A whitelist approach would of course be more secure, but that's way too stressful and distracting for me.

[0] https://community.f-secure.com/t5/Home-Security/XFENCE-beta-...


This is part of why I install my Python dependencies from downstream Linux distro repos. I never use virtualenv. If a distro is missing a package I need, it's a simple process to put it together, and the additional steps and checks built into the process stop close to 100% of these issues. Getting a human here also lets you do things like patch out telemetry or other anti-features.

Software repositories without a human review process are a bloody stupid idea.


> Software repositories without a human review process are a bloody stupid idea.

What review process? Do people actually review/audit the code? Usually they don't. All it tells you that at least one person though it looked "okay enough" to package it, based on unclear criteria. It's most certainly not a "review process".


Which, honestly, would have been quite sufficient in this case.

I can't remember ever having seen anything like this in Debian, for instance.


Indeed. Plus this whole "install the whole python ecosystem for each thing you want to use" is insane.


In science this is pretty important so we can control versions and reproduce a result again as well.


There are other, more efficient ways of handling dependency version conflicts than having an isolated env where each module is downloaded specifically for that env. For example, it doesn't make sense that if I have two virtual env's that ,use the exact same module (and version), it's downloaded and stored twice on my machine.


That's one of the issues that conda (https://docs.conda.io/en/latest/) solves by design. When using conda environments you get hard links whenever possible. Improvements to venv, making it built-in module, and projects designed to simplify dependencies made conda less attractive in comparison but it's still a solid way to have Python. Still, it doesn't really solve fundamental issue with Python packages you mentioned, unfortunately.


For real reproducibility you want to go a step further with a virtual machine or something. Freezing your python dependencies won't shield you from changes in your C standard library or differences in vector extensions or whatever. In practice, most codes shouldn't be so fragile though if dependencies are reasonably well behaved... Otherwise how can you even trust that the dependency is giving a reasonable answer?


I'm a bit torn and often err on the side of installing into --user or virtualenv. What often happened is I would toy around with a project, install a few libraries, then abandon it. Months/years later I would hit an incompatibility and I wouldn't know which packages were necessary for things I use everyday and which were because of that abandoned package. Similarly, when revisiting that abandoned project I wouldn't know what a good known set of libraries were. Installing stuff globally also makes it really difficult to know what dependencies were needed to run it on a different machine.

This wasn't just for Python libraries, but also with OS packages (especially non-sanctioned RPMs). I've been very selective about what I install globally, and usually first reach for a VM or other virtual environment.

One big counter to this is when I want to make it a simple commandline tool to run. I have to enter a Virtualenv just to run my fancier `ls` command?


My solution for this problem is apk's virtual packages. I'll do something like this:

    apk add -t .myproject-deps py3-foobar py3-foobaz
This creates a fake ".myproject-deps" package which depends on py3-foobar and py3-foobaz, then I can just uninstall ".myproject-deps" later and it nukes the rest. `head /etc/apk/world` is generally sufficient to get a list of projects I've forgotten about whenever I feel like some spring cleaning.


So they caught the guy using a cliched I-vs-l typosquatting scheme and lazily writing the malicious code in Python; we can presume they haven't caught the guy who took the trouble to put their malicious code in a pre-compiled C extension.

Reminds me of the fraudulent scientific papers that get caught using really dumb fakes (e.g., microscopy pictures that are copies of one another re-zoomed and rotated); we catch the dumb ones, but presumably not all malicious actors are dumb, so there must be a lot more fraudulent work out there.

For that matter, isn't there a reasonable systematic way to catch out typosquatters simply based on text analysis? Any library name that's a short edit distance from a popular library should have been carefully reviewed from the start; there's no excuse for "jeilyfish" to have lasted more than a couple of days.


> For that matter, isn't there a reasonable systematic way to catch out typosquatters simply based on text analysis?

You could probably write one using Python & jellyfish.


I'm extremely disappointed to see that you didn't suggest using jeIlyfish instead.


nicely done


From jellyfish’s pypi page: “ a library for doing approximate and phonetic matching of strings.”

Huh! Hadn’t realized. That adds an ironic spin to the whole thing.


This will keep happening, and not only will SSH And GPG keys be the target, but any interesting data will be stolen.

And the problem is much larger than these typosquatting attacks. Abandoned Github projects taken over my malicious users, rogue Maven/npm/PyPI/what have you repositories, hacked accounts on any website that is used for distributing programs, feature branches in open source projecs that are automatically built on CI servers in side corporate networks, the possibilities to grab data and send it to somewhere on the internet are endless.

One security measure that somehow grew out of fashion over the last years, is at least on application servers, to disallow any outgoing network traffic, especially to the internet (at least any cloud environment I see nowadays allows it by default). This would largely prevent these sorts of attacks from being able to actually send anything out, but also prevent XXE attacks from happening, prevent reverse connections to an attacker host from being set up, make SSRF attacks harder to verify, and so on.

I strongly recommend whitelisting only the network traffic that your application actually needs.


How would this work for a public facing API? Or an API that serves a SPA?

I'm interested in this approach


objectified has it already, but to reiterate: you can block outbound traffic initiated on a host without blocking outbound traffic that is a response to externally initiated traffic. This is, for example, what haproxy, iptables, and AWS security group outbound rules do.

I'm deliberately avoiding the term "connection" above because new UDP-first protocols require slightly different handling to determine who initiated what, but most routing/firewall software can deny-initiated-outbound for those protocols as well.


I'm not sure I understand your question correctly, but I'm talking specifically about outbound network traffic. Your API's application servers (where such evil libraries could be deployed) should not be able to have any network connectivity towards the internet. So on that server, you should not be able to do even `curl www.google.com` for example.


GP was asking how you would allow APIs to respond to requests if you are blocking outbound traffic.

I’m assuming if you open a connection for a sync request you’d be fine. What about an async request? I’d imagine a scenario where your API needs to do some processing first, connect to another internal system, and then respond async to the outside system.


At pypistats.org download numbers of the last half year can be found.

* python3-dateutil has 271 downloads from non-mirrors in last month[1]

* jeilifish has only 106 downloads from non-mirrors in last month[2]

[1]:https://pypistats.org/packages/python3-dateutil

[2]: https://pypistats.org/packages/jeilyfish


Im assuming that by "only" you mean there's limited impact. However, if the malicious package steals user keys, the harm can spread to the packages that may have received way more downloads.


In the end I think the solution to these issues will be something like what's promised by the Bytecode Alliance[0]. The idea is you give each package its own WASM sandbox with granular control over its permissions.

That solution also has the benefit of allowing you to call a package from any language from your language of choice.

I highly recommnend reading their the article introducing the idea, its very convincing:

[0] https://bytecodealliance.org/articles/announcing-the-bytecod...


I agree with this. They don’t mention this explicitly in the article, but it has a capability-based security model, which is something I think we desperately need in our OSes. (They do link to a paper about it that mentions this.)

There are a few other such systems that look interesting; Agoric is working on one for JavaScript, Google has a kernel patch set that adds capability support to Linux, and Christopher Lemmer Webber is working on a similar system on top of Racket called Spritely Goblins. I’m excited about all of them though, because it feels like this kind of security model is starting to gain public awareness!


There is also https://xtclang.blogspot.com/

Java, has started going down this route with the new module system and Lookup objects, however, this is mainly for restricted field/method/constructor access. I do hope we will see something similar for File and Network I/O (Random memory access is less of an issue in Java)

I do think we are going to see a lot more of this in the future.


Hear, hear on capability systems, but they seem of limited use confined to specific language implementations, as opposed to the whole system. I wonder what's the Google kernel patch, and how it compares with Capsicum. It's rather tragic to gain public awareness so long after KeyKOS et al...


The kernel patchset I was referring to is https://github.com/google/capsicum-linux - it's a Linux version of Capsicum. Though now that I look at it more closely, it appears to no longer be maintained. :(

That said, the Bytecode Alliance stuff appears to be multi-language, so that's neat! I could see that making WASM runtimes pretty useful even outside the web.


Package management and curation are the Achilles heel of open source. Abandoned packages, typo and letter substitutions, maliciously crafted pull requests and so on are all going to go up in frequency until the environment is hardened enough that the bulk of these attempts fail. That's a long way to go, and the number of capable maintainers and curators is relatively small.

Some environments (Python, Node) are more susceptible to this sort of trickery than others.


You know allowing people to upload libraries with highly conflicting names to existing ones is almost reckless.

There is a bunch of stuff PyPi could do short of full curation that would make this much harder.

A better solution is for languages to provide a kind of "module sandboxing" where modules need to declare the capabilities they need, and the runtime prevents them from accessing anything else.

In this case, the module would have needed to request the ability to make an outgoing TCP/IP connection - and that request should raise red flags for a date parsing utility.


I wonder if employing some string metric threshold would make sense. It would be interesting to measure the dispersion of names across pypi along some chosen metric. https://en.wikipedia.org/wiki/String_metric


Sandboxing in the language runtime is an interesting idea. I wonder if that could be offloaded to the OS/kernel instead so that every language doesn't have to reinvent the wheel. Eventually then hardware could handle it to reduce overhead.

Eg, declare parts of the compiled/interpreted code with specific privileges. Kind of like how memory regions can be NX.


MyPy requests permission to write data to the hard drive. MyPy writes malicious payload to a file and then execs it. MyPy never wrote a damn thing to a socket, netcat did through a bash/ash shell. (Windows named pipes may be a little trickier to work with but can yield similar results)

Sandboxing libraries specifically seems like a fools errand.


Writing to HDD would be considered a suspicious permission for most modules. Like opening network socket or patching another module's namespace.

I don't know exactly how it would work. Maybe modules that request these permissions get extra scrutiny, or users can specify "levels" that different modules can run at. If you specify MyPy to run at the least-priv level, it will fail to install/load if it requests greater capabilities.

I can't really think of any other way around the problem. It is a problem for all other scripting languages, especially nodeJS.


"pip" has a usability problem. It should do a lot more at preventing this kind of thing. When using pip, it's not easy to tell information like the release date, how many versions have been released, and so on.

Since such info is available from PyPI API, I wrote my own "pypisearch" script to sort by latest release date and include number of releases to weed out packages that seem useful but are old or rarely released. I should probably integrate PGP signing info too into it.


Is the code public? I'd love to use something like that.


It isn't. I'll make it public and announce it here by Friday.


Cool. Feel free to ping me by replying to this when you do.


> The first is "python3-dateutil," which imitated the popular "dateutil" library. The second is "jeIlyfish" (the first L is an I), which mimicked the "jellyfish" library.


It isn't helping that the dateutil-library actually is "python-dateutil". This confused me this weekend as I wanted to pip install dateutil which did not work.


I don't get it. Who would type "pip install jeilyfish" by mistake?


I can only see this working where someone would copy and paste the package name.

EDIT: another vector I saw mentioned in another comment: you pull in what appears to be a 'valid' dependency, and jeIlyfish is listed as a dependency of that package; looks legit so you proceed.


I suppose that could happen in a malicious tutorial or comment/post with the snippet, like in a StackOverflow answer.


The attacker would need to leave more footprints to do this, but yes. It is common for people to pipe up with "I wrote a thing that does this" and I imagine that results in people picking up odd packages.

I think an experienced programmer probably would be less likely to do this, but perhaps a junior programmer working on a system that no one wants to support anymore introduces a "bad" module.


Put yourself in the attackers shoes. Your goal is to spread this to as many machines as possible. The best and easiest way to do that is to add your library as a transitive dependency. What better way to infect people than to get everyone who ran `pip install numpy`? As for getting it in, I'd push it to older projects as part of a "styling cleanup" PR because there's so much noise in the diff anyways. Imagine a PR to a project adding a transitive dependency for python3-dateutil. Most people would merge without looking twice, especially if you add some scary "this deprecates py2 support" to the PR.


You don’t have to type the name of a package to install it. There are GUI package managers that use click to install. See Anaconda Navigator, for example.


These people did:

Downloads last day: 13 Downloads last week: 103 Downloads last month: 119

Check https://pypistats.org/packages/jeilyfish you won't believe your own eyes.


Regrettably some people use fonts in their terminals and IDEs which do not make the difference between uppercase I and lowercase l obvious.


Wish they listed out how many installs.


At pypistats.org download numbers of the last half year can be found. * python3-dateutil has 271 downloads from non-mirrors in last month[1]

* jeilifish has only 106 downloads from non-mirrors in last month[2]

[1]:https://pypistats.org/packages/python3-dateutil

[2]: https://pypistats.org/packages/jeilyfish

https://news.ycombinator.com/item?id=21702973



These two have been caught. How many haven't yet been caught?

Traditional Unix file permissions are pretty much a joke for the way developer computers get used (one user - does everything). Real process sandboxing is needed.


And it's not just the developer machine that's at risk. Even if you protect your own system, you will still be shipping malware to your users, who may be vulnerable. And you'll be lending your credibility to the malicious libraries you distribute.


Not everything needs to run as root.


    $ pkgman install hip2019pretty-ls
    $ su sandboxeduser prettyls
... nobody does that. Sure daemons might run as different user(s), but on ~100% of developer machines, the logged in user launches a shell as themself, and runs programs in that shell as themself.


Nobody can make you use account isolation, but you should.


Ok, but your private keys are stored in your home directory. No root access required.


For sure, but does applying a password on pub-priv key creation encrypt it? Haven't tried it myself yet but may later when I'm back at my computer.


Yes, if you use OpenSSL it will ask if you want to encrypt with a password. You can verify this by just opening the file in a text editor and you'll see that the contents are obfuscated via that password.


But they are encrypted and so are of no use to anyone.


It's all running as john


I get the python3-dateutil because you might think it's an updated version of the standard library.

But how does jellyfish with a different char for L work? Someone would need to copy and paste it. But if they go to pypi, it won't have many installs.

Unless they started writing tutorials with:

"okay now just pip install X"


I think the key here is that jeIlyfish had malicious code in it, and the fake python3-dateutil imported jeIlyfish. Even if you were examining the source for malicious code, you might not notice the difference between jellyfish and jeIlyfish, I guess.


The python3-dateutil listed jeilyfish as a dependency - so I'm guessing that upon installing the fake dateutil, the user would see something like "installing dependency - jeIlyfish" - which, depending on your font, may, at the very least, cause a person to google "python3 what is jellyfish" arriving at a concise description of what jellyfish is from pypi.

Then dateutil calls the code in jeilyfish.


Copy and pasting, even via word of mouth (rather than, say, a blog post or highly-google-ranked tutorial) is a surprisingly viral propagation vector.

And god help us if a malicious install command was posted, even if only for a few minutes before being edited, to a help or forum site like StackOverflow or Reddit.


Just an idea: in autocomplete scenarios, jeIlyfish comes alphabetically before jellyfish, while looking like it. That could hijack an install. I don't know if and where autocomplete can be used with PIP, though.


>But if they go to pypi, it won't have many installs

If people are using number of installs as a safety metric, can't blackhats game that by having a bot install the package many times?


If I had to guess, I'd say that the JeIlyfish package is just here to wrap the malicious code since JeIlyfish is a dependancy of the compromised python3-dateutil package.


PyCharm, Stallion, Anaconda Navigator, etc...

You don’t need to copy and paste it, or go to pypi, since there are commonly used ways to install packages by clicking on the name.


Article points out Gitlab account olgired2017 (https://gitlab.com/olgired2017). I think it will be interesting if GitLab shares his/her list active session details like ipaddress, browser, date and time from their site logs (profile/active_sessions).


I would consider this to be a huge breach of privacy.


I wonder if the users who were affected by this malware think something similar.


As a general principle, Western legal systems don't let the victims determine the punishment for a misdeed. Are you suggesting this is not a good thing?


I wonder what gave you the idea that I was implying that.

Gitlab is of course free to disclose any information they like about those abusing their platform.


What's the best information source for me to follow to keep up to date on these kinds of library vulnerabilities? I would make a feed of the homepages for all the libraries I know I use, but that won't help me with the libraries I use without knowing.


"bandit" (available in pypi) is a nice static analysis tool - I don't remember if it is able to recurse into dependencies though

[safety](https://pyup.io/safety/) is a commercial product that monitors your dependencies for this kind of shenanigans

LGTM.com seemed to be working in this area - Semmle was acquired by github/microsoft


we made a tool that automates that process - https://trustd.dev

it will analyse open-source packages as you install them and tell you of any vulnerabilities before they are even on your system...

meaning it will detect problems in the libraries you aren't thinking about.


I saw a recorded conference talk where they mentioned Snyk as a way to keep your eye on package vulns, but have never used it myself


If you want a concrete hardening step to avoid this attack, try using a hardware PIV/CAC device (e.g. a Yubikey) as the only copy of your private keys.

This is very easy to setup on MacOS High Sierra or later (https://support.apple.com/en-us/HT208372):

1. Generate the key: https://developers.yubico.com/yubico-piv-tool/Actions/key_ge...

2. Use "ssh-keygen -D /usr/lib/ssh-keychain.dylib" to extract the public key fingerprint to put in your authorizes keys list.

3. Add this line to your SSH config file to tell the client to attempt to login using the key on your device: “PKCS11Provider=/usr/lib/ssh-keychain.dylib“

On Windows, Putty-CAC supports this and can reportedly be used with Git: https://piv.idmanagement.gov/engineering/ssh/#ssh-using-putt...


Does anyone know of a way of running something like “little snitch” on Heroku so that you can whitelist outgoing connections?

That seems like a possible way of mitigating this type of issue.


I have been working on a Node.js library (very much work in progress and progressing slowly due to lack of time) that integrates libseccomp to be used programmatically inside Node.js whether at library or application level. For me at the moment it's kind of an experimental idea as it can be quite tricky to get everything right and in order without the kernel killing the process by mistake, but at least I think for some of the cases in the category that this issue is in, it will help mitigate it. I believe libseccomp already has official bindings for Python, as well as third party bindings for other languages, but this kind of work has been very successful in OpenBSD with pledge and I think it has been overlooked in dynamic programming languages and Linux in general.


It would be good to get some feedback on this instead of just anonymous downvotes.

I'm thinking the way it would work is that, at the application level, you'd open sockets and other file descriptors and then lock everything down, so if some malicious library tried to read a file or spawn a process the application would be either be killed or return an error, obviously depending on the application logic and whether or not it can handle the error. I would advocate this kind of approach as it doesn't need any external hardening, i.e. you get the benefits whether or not you're inside or outside of a container, whether or not the application is being run on a distro with SELinux or AppArmor, it's basically built in. I may be missing some big thing here but like I say it would be good to get some feedback here.


> "jeIlyfish" (with an upper case I) and "python3-dateutil" (not "dateutil").

Libraries should take lessons from writing safety critical code. If you identify libraries visually by name, the main problems are:

* easily misread characters like 1 (one) and l (lower case L), 0 and O, 2 and Z, 5 and S, or n and h.

* identifier names that differ only by on or few characters, especially if they are long.

It's possible to enforce a set rules that make identifier names are visually distinguishable and string distance measure to check all new libraries that are added against old names.


There's a bit more to it as "dateutil" is actually installed via "pip install python-dateutil", not simply "pip install dateutil". If someone was to see "python3-dateutil", there's every chance they think it's the same module but with Python3 compatibility.


Ironically jellyfish is the one library which can help you with that task: doing approximate and phonetic matching of strings.


Anyone here not encrypting their private keys?

Also known_hosts file is a double edged sword. It's pretty sensitive in combination with a private key.


You shouldn't just encrypt the keys. If malicious code is running on a machine, they could as well eavesdrop on keystrokes. This is even made simpler by the fact that a large part of the demography uses X11, which is inherently insecure since all applications can read key strokes, mouse events, and do screen grabs.

Get a hardware token. If you have a hardware token with a pin, they could extract your PIN by reading keystrokes and use the token on your machine. But once you yank the token out of the USB port, that's the end of it. Even better is a hardware token that requires physical confirmation for operations.


Modern SSH versions only store hashes of domain names in the known_hosts file exactly for this reason.


Hmm, you're right, but HashKnownHosts in openssh defaults to no, still.


I haven't personally seen that enabled by default.


Does anyone know where this was officially announced? I don't see anything at https://mail.python.org/archives/list/security-announce@pyth...


One thing I've always thought would be a good idea is a tool (either local or part of the pip/other packet manager download process) that greps and prints out all URLs and IP addresses within the code, including common encodings. Additionally, any lines that uses any transfer protocols (like HTTP requests) should be highlighted too as IP/urls can be encoded. Any HTTP request, for example, to suspiciously encoded URLs could raise flags.

The official library itself could have a "urls" file which has a list of urls that are expected and so anything that doesn't match can be questioned.

Whilst this won't solve the issue 100%, it raises the difficulty barrier to implement outgoing network calls.


> it raises the difficulty barrier to implement outgoing network calls.

Not very much. You just obfuscate your code until this tool doesn't notice anything untoward, and then upload it.


Highly obfuscated code would raise suspicions, especially in similar cases found in NPM packages.

E.g. in Python, obfuscators I've come across tend to replace characters with non-Latin unicode chars, which should raise flags when found in a predominatenly latin based source code.


Only if a person is looking at it.

If the only thing looking at it is a machine, then you can keep iterating until the machine doesn't notice anything.


I agree, it's no where near bulletproof, but it's about raising barriers as well as updating the tool once workarounds are found. I don't see an easy solution to this issue but in most of the cases (including the ones in this article) I've seen to date, a simple URL scan would've caught them let alone more complex methods.


Could Python be retrofitted with import flags, much like the openbsd pledge? Eg you could import a library but not permit file io or network access -- that would snipe these kind of attacks, and it would be a reasonable restriction for many libraries


To search your own projects for the malicious libraries:

pip3 freeze | grep -i jeIlyfish

pip3 freeze | grep -i python3-dateutil



Another insidious exploit is to hijack the maintainer's package manager account and push the code directly there, bypassing the repository altogether. It doesn't rely on you installing a new package since the hijacked package in question is already a dependency of yours.

I got into a habit of checking both the package manager and the project's repository before updating any given dependency.

https://news.ycombinator.com/item?id=20377136


Would sticking to the OS upstream package manager be a safer option compared to installing from pypi directly?

How often does something like this happen with packages in the CentOS, epel, or Debian repositories?


I think it would be quite a bit safer and you are already trusting your distribution but those libraries tend be very old/stale versions, and limited selections.


Ideally programming languages should include capability-based access control, so that a random library that is supposed to do X, can't do Y.

Until then, we need to vet our dependencies. Check out https://github.com/crev-dev/cargo-crev/tree/master/cargo-cre... for a distributed review system we're working on.


I am curious when the hackers will start stealing ~/.kube/config. In the default Kubernetes install on-premise, a token or admin certificate is just laying there, unprotected and adding a passphrase to the cert is not supported. Some are using an oauth identity provider or other mechanisms but that unnecessarily complicates the setup and smaller k8s clusters could be stolen this way...


I don't see a solution other than every application being launched in a separate container, exposing only what the user explicitly gives access to, similar to how mobiles applications require permissions. We have the technology, eg. cgroups & unshare on Linux; what's missing is something that plumbs all these brittle pieces into a secure application launcher made for desktop (rather than server/cloud) usage.


Best practice: Don't use pip search && pip install. Search for the project site and copy & paste the pip install instructions.


We just recently launched a free tool that helps python developers prevent exactly these type of issues.

Feel free to check it out: https://trustd.dev

we work preventatively, so as you download packages, the tool will analyse it and tell you of any issues found.

We’re looking to collaborate with devs to work out what features should be next.


First question is what the hell does this mean:

> As you install open-source packages, trustd will scan them and provide you with instant feedback on any problems.

What kind of scanning? Algorithmic? Based on human review? If we're outsourcing trust to you, I'd want to know a lot more.

And "we use Slack instead of a dashboard" doesn't sound terribly appealing. I'd want a dashboard and a range of notification options (for me email > Slack. Others may differ)


might not have explained it the best I could have haha..

It means as you pull packages in from NPM et al, the analysis goes to work, telling you of any known vulnerabilities, or any license in-compliance.

With regards to Slack, we are hearing that a lot, it isn't the best mechanism for providing this feedback, and we are working on alternatives now, including email.

Happy to answer any more questions on here or reach out jake@418sec.com


From the article... The libraries were "jeIlyfish" (with an upper case I) and "python3-dateutil" (not "dateutil"). Both libraries were close spellings of the real libraries.

The lesson for developers is double check your imports! Spelling does count and there are lots of similarly named libraries (most are not malicious thankfully)


Agreed, although I think there's more than just a comparing spelling issue here. "dateutil" via pip is installed using "pip install python-dateutil". I can easily see someone thinking "python3-dateutil" is simply a Py3 compatible version of "python-dateutil". The "python3-dateutil" module imported "jeIlyfish" so my guess is that the creator banked more on people installing the fake dateutil library than directly downloading jeIlyfish.


Terrifying but understandable. We need better ways to stop this but I'm not sure how...

In the meantime, store your private keys on a security device like a YubiKey. I use it simultaneously for all signing, encrypting, and authenticating (SSH as well as PAM to my workstations).

Be sure to set a strong PIN on the device. And have a backup!


Does PyPI offer package notarization and make that observable in the lockfile or the installation logs? Or offer optimized SEO for notarized packages over those not notarized in package search? If that’s not there, and I don’t see it as part of the PyPA roadmap, it might be a good first step to take.


Don’t install packages with large numbers of dependancies (for me, this is more than 2.)

Don’t install packages you haven’t at least been to the website for and preferably couldn’t build yourself.

Libraries and package managers help us work together, they’re not excuses for not thinking.


Placing the burden of responsibility for security on the end user is not the way to go about this - at least, not if you want people to actually use your product / language.

Package managers do not only exist as a convenience. They should also provide guarantees about their packages, or at the very least some level of moderation.


You could make that argument about repositories not package managers (which are just software.)

Some do! (Main OS repos (not community ones) do.) But the reality is that this takes man hours, so repositories have been set up without those guarantees in the name of efficiency and expectation of responsibility.

I don’t think these are built because people “want users for their language” but because they were needed. This whole “you shouldn’t do that because it scares users away” thing tends to result in terrible software IMO.


This shows the inadequacy of thinking “open source makes all bugs shallow”.


On the contrary, this wouldn't even need to be hidden if it was closed source. It was caught because it's open source.


Possibly, but some of the solutions proposed, e.g., monitoring of network activity, would work either way.

It concerns me that one of these sat out there for a year.


But it's not like malicious activities could only involve the network. Also, it's possible to obfuscate network activity and hide such things among legitimate traffic.

> It concerns me that one of these sat out there for a year.

Certainly it being open source doesn't guarantee that someone will notice such things, but it raises the probabilities. It could have been like that longer if it were closed source.


The server to which these libraries upload stolen keys, 68.183.212.246, is with Digital Ocean. As of Thu Dec 5 12:50:10 UTC 2019, it's still up with ssh open.

Nice job, Digital Ocean!


The solution to this is WASM. Where you compile a cross platform library like this, and due to the sandboxing it isn't able to arbitrarily access memory outside its package.


bruh moment:

    $ python -m pip list | grep date
    python-dateutil               2.8.0
Yeah this is another library but anyway that was creepy.


Some of the permission comments reminded me of deno[0].

[0] https://deno.land/


There are probably several open-source projects that you can download today that include unknown malicious code. But they will be discovered eventually. Proprietary software, on the other hand, can keep malicious code during their entire life of relevancy. And in fact, it's rare for proprietary software to not have malicious code these days, with personal data being sent to servers, ads being delivered, and installers bundling third party products and plugins.


Complete false equivalency. Exploiting user data deliberately, when you technically mention it in a thousand page legal document, is completely different from distributing malware because your dev machines got infected.


This is why the Node-ecosystem is plagued, because many apps will require hundreds of unvetted libraries.


Alarming given that pypi is treated as trusted by many I think (even if it shouldn't)


What is trusted then, the Anaconda base repo?


I don't know the answer frankly.

The serious devs in actual dev shops I've asked answered with something like:

>We do code reviews on the libraries we use. Basically if someone wants a library they're responsible for checking it

Me: Isn't that a shtload of work with new versions etc

>Yeah so we tend to lag behind official versions quite a bit

It sounded like they host local mirrors of some sort with just the vetted code. Though I think vetted here is a quick glance over for shady sht rather than true security vetting


Maybe ash keys and got keys need to be protected from access by anything local unless given permission? Right now they are just files sitting there that can be read by any standard user process right?


Ok macOS you can easily throw your SSH keys into the secure Keychain (not that I do this...).


It should be the default that they are protected though on all os. We should not automatically trust locally installed apps anymore. They should be sandboxed by default like on Android and they should ask for permissions as they need them.

Windows and Linux need to get with the times.

My dev env should be sandboxed like everything else. Git can have ssh permissions but not every random tool from pip or the ceasepool that is npm.



Is it possible to find out where that IP is located?


Can the person responsible be found?


Why should Python program have access to SSH keys? Popular Linux distributions, unlike proprietary systems like Android or iOs, cannot protect user's data from malicious programs run by the user.

Also, in popular Linux distributions programs can read unique hardware identifiers like MAC address or HDD serial number, read browser's history and cookies. These valuable data are not protected by Linux.


Android isn't proprietary.

On the other hand, Windows and macOS, which, as far as I am aware, also have this problem, are proprietary.

Hence, I don't see why an argument like "unlike proprietary systems" is justified.


Windows and macOS have been pushing for sandboxes for quite a while as well, exactly to prevent this kind of behavior.

Currently macOS is more agressive than Windows on this area, with Apple now requiring notarization for all software, which you can still bypass, but need to explicitly allow it.


Apple does not require notarization for all software; it's just enabled by default and checked on all applications downloaded from the internet and opened through Launch Services.


Which is already more than any other FOSS UNIX.

As for macOS, I am sure that it will come, as these features have been slowly being added release after release.


Mac has started prompting users when programs access the file system. I hope they implement a HUGE warning is that directory is ~/.ssh

> These valuable data are not protected by Linux.

That’s extremely scary to me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: