Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Porting OpenBSD Pledge() to Linux (justine.lol)
571 points by jart on July 14, 2022 | hide | past | favorite | 128 comments



Thanks Justine, you're doing the Lord's work here.

One of the best introductions I've seen to pledge() is Kristaps Dzonson's writeup on it that you can find here [1]. The whole website in general is furthermore a guide on how to write webapps in C, which may sound like a crazy idea to some people who have written it off as everything from an elaborate joke to a security nightmare that nobody should ever consider doing [2], but for people like me who end up having to write that kind of code in constrained execution environments it's been extremely enlightening and I really appreciate knowing how to do it more securely. If pledge is something I can access in Linux it really helps me out for things in the future going forward, I hope to create some cool things with it!

[1] https://learnbchs.org/pledge.html

[2] https://learnbchs.org/


Awesome blog post! Writing web apps in C? That doesn't sound crazy to me. If someone told me they were doing that, then I'd just assume they're trying to compete with Google Search for the title of the world's lowest-latency high-performance scalable website. If pledge() makes you feel less guilty about going for the gold, then I'd say that's a good thing. You might also be interested in Cosmopolitan Libc's ASAN and UBSAN support. It does things like print memory diagrams with backtraces with malloc origin tracing if you do something like overrun a buffer or use a piece of memory after it's freed. ASAN has been one of the most important tools that Google used to find security issues in Chrome. So I put a lot of work into implementing greenfield support for it in Cosmo. In fact, ASAN is so important, that even languages like Rust need to use it, since it makes the unsafe keyword safe! So please try Cosmo's implementation and let me know what you think. I believe Cosmo has the highest quality ASAN implementation that's available to the open source community.


Yeah it's pretty much your best option (C or C++) when writing web apps for things like routers or for game consoles for example, which is the kind of use case I'm talking about. Your work here makes doing that so much easier and safer, which is really important considering how many routers are being exploited to be malicious these days.

It might take a little while to catch on in those use cases but I've been waiting for someone to nail the implementation for a while. It's hardly a surprise that you of all people got it down, I've been admiring your work for quite some time.

So how about those blog posts in Mayan Hieroglyphics?


> Yeah it's pretty much your best option (C or C++) when writing web apps for things like routers or for game consoles for example,

Why? This library obviously assumes you have an operating system running, so it's not that constrained of an environment.


Great question - you'd be surprised what kind of resource constraints these are running under which prevents the use of the execution overhead of scripted languages while focusing on routing things securely or being squeezed under the execution overhead of a 3D engine, and also what the manufacturers of game consoles actually allow you to run. For routers, they're running low-power processors like MIPS with very small memory and storage, so you want to squeeze every bit of performance out of them and if you can make the web application portion of it a few kilobytes rather than megabytes, this means a lot to the bottom line of the company building these things. As Justine (Someone who works for this company) mentioned also, if you're trying to build a competitor to Google you're probably doing it in a stack like that to stay competitive. It's why you can get a new router for like 15$ but your rasbpi with a full debian stack on it can't handle heavy networking like VPN. It's also why your average C/C++ developer is making a lot more money than your average joe writing stuff in scripted languages who's scared of these things.

Playstation's SDKs in particular is a strictly C/C++ (And I think C# via Xamarin) environment, you're not going to be making games in something like Rust or Go for that platform. Lua is an exception - see what Justine does with redbean [1], it's very lean and meant for this kind of embedded use, but something with its own full networking stack built into it like Python? Forget about that if you want to spare your users the trouble. It may compile, some may use it in their internal scripting engines, but it's just a huge why would you do that to me and everyone else I personally know involved in doing that kind of development. If we're talking about security here as well, if you know the stack you wrote from top to bottom with very small (Easier to audit), well-written libs, it's a lot more reliable to write your own if you're confident enough in these lower-level environments and using tooling for auditing (ASAN and UBSAN are real game changers as well), like what Justine mentioned here, than having to put up with some bug down the line in some interpreter you didn't write; Issuing patches for every single user just to get the interpreter's implementation secure again is what you end up doing.

[1] https://redbean.dev/


Google search is a bunch of C++ code by circumstance, I'm sure it could be a bunch of Java and do just as well. A lot of Google's other properties are Java-based, so yeah. I haven't found a major difference in compensation between competent people who know different languages. Compensation bands at most companies are almost always tied to experience and role rather than language skills.


Google's use of Java came at enormous cost and aggravation of legal settlements to the tune of nearly 9 billion dollars on the line, however. [1] They won, but it would have had ugly implications for all of the rest of software if they had not. That was eleven years of legal wrangling where we were all wondering what the devil the outcome would be and for a while it was pretty ugly. The ultimate supreme court decision came as a surprise to many. Would you rather in their position ever do that again, especially in light of their new pay for play yearly subscription licensing scheme per computer for usage? [2] It's extremely limiting and costly for any organization to be tied to Oracle's product at this point.

As for salaries when looking at the data I stand somewhat corrected, you're probably right on compensation based on experience, too. It looks like the top three averages on the latest survey I can quickly find are Rust, Go, and Scala. [3]

[1] https://en.m.wikipedia.org/wiki/Google_LLC_v._Oracle_America....

[2] https://www.theregister.com/2022/03/22/oracle_starts_to_incl...

[3] https://www.zdnet.com/article/developer-jobs-and-programming...


> Google's use of Java came at enormous cost and aggravation of legal settlements to the tune of nearly 9 billion dollars on the line, however.

Google's use of Java in Android. I can't claim to know the exact details of how the company reacted to the lawsuit internally, but I will point suggestively at the fact that Google is still running most of their Java code in production and continues to write more, and their support for Java on Android stagnated at Java 8-ish and Kotlin is now heavily promoted as the language to write software for that platform.


Thanks for explaining. Please note I don't work for Google. I did four years ago.


I'll remember that in the future, apologies


> In fact, ASAN is so important, that even languages like Rust need to use it

Yes…

> since it makes the unsafe keyword safe!

No. It's a debugging tool to help, but saying it makes the unsafe safe is a somewhat dangerous hyperbole.


[1] was discussed here:

Why pledge(2) or, how I learned to love web application sandboxing - https://news.ycombinator.com/item?id=13037442 - Nov 2016 (73 comments)


Oh god yes please. If pledge() makes it into the mainline kernel, we OpenBSD-type people seem less weird when submitting patches for some syscall "nobody uses".

Many OpenBSD ports contain pledge() support. Steal all you can. OpenBSD does "hoisting" in many userland programs, meaning all the open calls and such take place at the very start but isn't actually parsed until after we have locked ourselves in with pledge() and unveil(). I hope this also means many non-base programs are redesigned in this manner.


I love this [philosophy,mindset,culture].


I too would love pledge() in Linux. But as I pretty much implied in this comment thread (https://news.ycombinator.com/item?id=32097649) the project here won't get us there, and it does need to be part of the kernel.

Of course the problem is that it needs some cooperation between libc and the kernel. That's why it's easier for OpenBSD to provide it.


> Of course the problem is that it needs some cooperation between libc and the kernel.

You may have missed that our new pledge() impl is hosted in the Cosmopolitan Libc repository. So we've got cooperation. Therefore it'd help to clarify when you say this project won't get us there, that you're only speaking for glibc. pledge.com works fine with everything else like Musl and Cosmo Libc. It even works with glibc most of the time. One can't reasonably expect cosmo to need to support glibc before being able to support itself. We've learned today that folks want this. So the Glibc dev team should be doing more to help us. What I fear is that they're focused on different things.


Like i said: it needs cooperation with both kernel and libc.

You have libc, which does mean it's in sync with dynamically compiled binaries against your libc, with the other drawbacks I mentioned too.

And you need kernel maintainer cooperation too to "bring pledge to Linux".


> That's why it's easier for OpenBSD to provide it.

That and they invented it.


If it were just that then porting it to Linux would not have taken this long. OpenBSD were able to copy things from Linux (e.g. some grsecurity/PAX stuff), and vice versa. But in this case the org model is getting in the way.

OpenBSD is able to provide these things because they have a different organizational model.

Or to put another way: Linux is shipping the org chart.

OpenBSD can do some other interesting things because they don't have the mantra of "we don't break user space". Neither is wrong, just different. High level strategies like these affect commercial success and also the abilities to ship technical solutions.

Or on the topic of the article here: Yes, if you replace libc, restrict to just one arch, and put many other restrictions on userspace, then under those circumstances your only "opponent" is the kernel, and you can implement something partially like pledge().

But really, that means that you solve the multiparty problem by kicking out everyone who's in your way, who you can't get rid of (namely the kernel).


My observation is that you like the sound of your own voice.


We don't do that here.

You may want to familiarize yourself with the Hacker News guidelines: https://news.ycombinator.com/newsguidelines.html


Hi jart,

I noticed that you're missing the sendfd promise. Is there a reason for that omission? Also you've added a number of extensions that perhaps should be documented as such.

Also there's a pretty significant undocumented difference between your implementation and OpenBSD, in that promises are by inherited across exec in yours, and not on OpenBSD.

OpenBSD experimented with that by specifying the second argument as execpromises, but discovered that it was extremely difficult if not impossible to write meaningful execpromises in the parent for potentially unrelated programs, it's also happening too early before dynamic linking, and before the program has had an opportunity to "initialize" or "do privileged stuff" and call pledge(2) later itself with more accurate knowledge of what to self-restrict. As such, no calls with execpromises set to anything but NULL/0 exist today. I'm curious how to reconcile those differences.


sendfd is something we can add. Note that seccomp has limited visibility into recvmsg / sendmsg args because bpf can't dereference syscall arg pointers. I mention some of this in the caveats section. As for execpromises, it's documented in the Cosmopolitan Libc source code, but that got lost in translation when I copied the docs to the website. I just updated the site. I also appreciate the eyeballs. Something like this deserves critical examination. There's so much breadth to the modern system call interface and pledge. I started working on this about a month ago. With the support of folks like yourself, I think we'll have something really nice that will benefit Linux users! I'll definitely be doing more to make the C API as compatible with OpenBSD as possible.


> I'll definitely be doing more to make the C API as compatible with OpenBSD as possible.

One suggestion I might add, it would be worth trying to compile any of OpenBSD's privilege separated network daemons on Linux (w/ Cosmopolitan Libc, or others). While you may have intended to use this facility primarily for your own APE Binaries, I suspect you'll find that the despite your intentions to make this compatible with the C API definition of pledge(2), in practice, your implementation is incompatible with privsep/privdrop software, for which pledge was designed. It was never intended for application "sandboxing".


pledge() wasn't intended for our awesome sandboxing tool? Well that just goes to show how brilliant the OpenBSD developers are, that folks like myself are finding great uses for their ideas and design that they didn't intend. We might not be able to live up to OpenBSD's model given the way Linux is, but I do believe we're going to have a better and more secure Linux thanks to the influence of OpenBSD.


The idea of a "pledge" utility isn't a novel one, it is my understanding that one was intentionally not provided.


OpenBSD pledges aren’t normally used for child processes, they’re more for “okay this program has super well-defined needs, let’s make sure (in its own source code) that it won’t be able to do anything else”. So a wrapper program wouldn’t really be normal usage.


> Note that seccomp has limited visibility into recvmsg / sendmsg args because bpf can't dereference syscall arg pointers.

BPF programs attached to syscalls (via kprobe or fentry) can read arguments via helpers (bpf_probe_read_{user,kernel}). Seccomp uses "classic BPF" which has no concept of helpers or calls.


> Note that seccomp has limited visibility into recvmsg / sendmsg args because bpf can't dereference syscall arg pointers.

I guess landlock can't help you here since it is still mostly about filesystem access right now, but maybe someday? It looks like "minimal network access control" is on the long term roadmap: https://landlock.io/


There is an ongoing work to support network access-control: https://lore.kernel.org/all/20220621082313.3330667-1-konstan...


this is one of the more lovely replies i've seen on HN


Great work!

>.. So how do we get it that simple on Linux? I believe the answer is to find someone with enough free time to figure out how to use SECCOMP BPF to implement pledge.

> There's been a few devs in the past who've tried this. I'm not going to name names, because most of these projects were never completed.

I guess I am also one of those. I am giving it a shot with my WIP sandboxing library, which aims at making sandboxing easier for applications in general: https://github.com/quitesimpleorg/exile.h. It also aims to fix the "file system blind spot" mentioned in the article, by using Landlock and Namespaces/chroot.

Though I am calling my attempt "vows" instead of "pledge" to avoid misunderstandings. At the the end of the day, pledge() cannot be pledge() on Linux, due to limitations which the article also mentions.

Nevertheless, as has already been mentioned in this thread, as all attempts, mine also suffers from the fact that one has to keep up constantly with kernel releases and all software must recompiled from time to time against new library releases. This is a suboptimal situation. Secondly, there systems calls with currently cannot be filtered with seccomp BPF, such as openat2() and clone3() and so on.

Therefore, at this time you cannot have pledge() on Linux properly. So I am putting it on hold until deep argument inspection lands.

Overall, my experience led me to believe in order to have true, partical pledge() on Linux, it must be implemented in the kernel ultimately.


Thanks for your work!

As someone else who's banged their head against seccomp and given up (put on hold) I have to say that you're missing one roadblock though. It's not enough that the kernel gets pledge(), but libc needs to cooperate too.

E.g. as I found in https://blog.habets.se/2022/03/seccomp-unsafe-at-any-speed.h... the first printf() you do will do a newfstatat() syscall.

So really there's no way for user space to know which syscalls will be called, just based on common sense. libc can call anything and everything.

And this is why I have less hope for a real pledge() on Linux.


apropos of recognizing your name, I want to say thanks for your blog post on ssh certificates. I relied on it a ton when I was writing a host and user ca 6+ years ago.


> There's been a few devs in the past who've tried this. I'm not going to name names, because most of these projects were never completed.

Oh, I'll just name myself (see code link below), I'm not ashamed that this didn't complete.

There are more fundamental reasons why these projects don't complete, and they apply to the project linked above as well:

- Maintaining lists of allowed syscalls across kernel versions and architectures is unreasonable to do, and it's a moving target. An allowlist-based approach might break when doing a kernel or glibc upgrade, which breaks the API backwards compatibility that the kernel otherwise guarantees. - Core parts of glibc are doing initialization work on demand, which complicates matters (should ideally be done before enabling the sandbox, but this is not generally possible) - Some pledge() promises can't be mapped correctly; examples: - the "dns" promise in cosmopolitan/pledge.c just permits all outgoing UDP connections, which is more than what OpenBSD permits, IIRC - the "exec" promise cannot elevate privileges again, as it does in OpenBSD - glibc name lookups as they are used for DNS are loading shared libraries on first use (libnss), and it's impossible to tell in advance what these will do unless you control the execution environment - The pledge API itself is a moving target as well (compare https://man.openbsd.org/OpenBSD-6.0/pledge.2 and https://man.openbsd.org/OpenBSD-7.1/pledge.2) - In OpenBSD, pledge was always path-aware, until they moved that part into the unveil() call. This is not possible on Linux with seccomp-bpf. (Well, yes yes, with signals that do syscall trampolines and very very tight control over the runtime environment, or with sidecar processes to do the supervision, but both of these require a lot more dedication and can't be just mapped to a call to "pledge()" on Linux.)

FWIW, my own (abandoned) implementation is at: https://github.com/gnoack/seccomp-scopes/blob/master/pledge....

IMHO the better approach on Linux is going to be Landlock (https://landlock.io) in the future. I'd encourage you to look into it.


Here's a landlock wrapper for FireFox: https://github.com/62726164/misc/tree/main/go/landlock/firef...

It's more restrictive than Firejail and is not suid.


> - the "exec" promise cannot elevate privileges again, as it does in OpenBSD

I agree with all of this fwiw, but I actually was very glad to see this. I find it really weird that the openbsd pledge lets you just escalate out of it.


> I find it really weird that the openbsd pledge lets you just escalate out of it.

It's only weird because the normal M.O. with facilities in other environments is to sandbox specific, high-exposure applications, which are typically run in isolation, such as system daemons or end-user GUI applications. The semantics of pledge and unveil were fine-tuned from OpenBSD developers systematically sandboxing almost the entire base system--most command-line utilities in OpenBSD are pledged, as are all daemons.

Inheritance is only a "problem" if you're exec'ing other programs, but the vast majority of programs, even command-line utilities, never need to exec, at least not during runtime (as opposed to startup). Most programs using pledge drop the ability to exec entirely (i.e. don't specify the "exec" pledge), so inheritance is irrelevant as there could never be anything to inherit. (Note that upon the first call to pledge, you lose any capabilities you don't explicitly request.) But if you do need to exec other programs, then inheritance can quickly become a problem--counter-productive, even, as workarounds typically involve the parent program preserving permissions which it might not need directly. (Note that this is also why pledge and unveil are superior APIs to relying on the invoker--e.g. systemd--dropping permissions as only the program itself knows best which permissions it needs and when.)

OpenBSD is developed as a comprehensive system, and it's in this context that you need to understand pledge and unveil. On OpenBSD, if program A exec's program B, program B should be making use of pledge and unveil itself, and if it's not then that can and will be fixed. Unlike in the cat-herding Linux ecosystem, OpenBSD developers have little reason for a security model that prioritizes the ability for program A to implement workarounds for deficiencies in program B; their habit is fixing program B or refactoring A and B so they can work better together.


Landlock suffers from the same limitations as seccomp when it comes to pledge-emulation: you can't opt out of inheritance.


That's a good thing.


No. Having inheritance a default is fine.

Not having an option to opt out when your program voluntarily tries to secure itself is terrible because it means fewer programs can use it to secure themselves. A classic example would be an ssh server. You want to factor it into several components, one that talks to the network and a privileged (root) part that spawns the child processes with arbitrary uids. The latter part can reasonably pledge away a lot of its privileges but its children must run unrestricted (but often on unprivileged users).


This is a good example that really helps to explain things. SECCOMP's model of monotonically decreasing permissions seems the most intuitive. However with something like an SSH server I imagine the great fear is a remote exec compromise. pledge() fixes that, since after you call pledge, you can't create new PROT_EXEC memory. Plus OpenBSD enforces a W^X invariant so we can say for certain nothing like a pre-existing executable stack gets grandfathered in. So what they're doing seems reasonable to me. It's an added feature that our Linux polyfill can't offer. But that doesn't mean our pledge() doesn't work as advertised. We simply offer a subset of behaviors. It's not a disjoint paradigm that some people here have made it out to be.


All that's required to emulate "no inheritance" is a broker.


Yes but we can't always assume a broker is installed. For instance I distribute a lot of foss software. I don't want to have to ask my users to install and adopt these other tools in order to use my code. I don't want to have to ask people to do things like run my security daemon as root. What I like about this new pledge.com program is it doesn't need root. I could just incorporate it into a build process or automated tooling for instance, and it would just work, and wouldn't inconvenience anyone. The only thing it needs is a Linux Kernel from the last 12 years.


You can package the broker yourself and launch it before you pledge.

Keep in mind that if you can:

1. write to a file 2. Make that file executable (or write to an already-executable file, or an executable that provides scripting/ code exec natively)

And you have the `exec` capability, the sandbox is trivial to bypass in the face of code execution in the pledge'd process.


Yes that's true. It's something that I considered. I was reluctant at first to restrict changing the executable bit since OpenBSD lets this happen:

    int main(int argc, char *argv[]) {
      CHECK_NE(-1, pledge("stdio rpath wpath cpath exec", 0));
      CHECK_NE(-1, open("/tmp/doge.exe", O_CREAT | O_WRONLY | O_TRUNC, 0755));
      return 0;
    }
But if other people are noticing it too, then I think I might change the tool to simply disallow creating new executables. Since that makes more sense to me. But naturally would make even more sense for OpenBSD.


The bypass is so significant that I kind of wouldn't bother. To me I think that it's probably best to just assume that this tpe of sandboxing is not capable of resisting an attacker with code execution, which is fine. It could instead be for things like path traversal attacks in web servers, or other design flaws that would allow "tricking" the application into performing actions you don't want.

I mean it's probably a good idea to close the trivial version of the bypass by disallowing setting exec on files (although you need to check the path because you may want to set it on a directory), but if you can execute `chmod`, write to the i-node directly, write to any other executable, write to your own executable, etc, that's just a full bypass.


But that's the issue. In the example I gave we already have a broker (the root process spawning ssh shells). But we want to restrict the broker too to make it more difficult to exploit. To do that we need to pledge without inheritance.


I wouldn't call sshd a broker, at least, it's not exclusively one.


No that's a linux thing. The OpenBSD way is to do things that could be abused later early and then drop privs to prevent abuse. How is some program going to know what other programs you use will need? How is the dynamic linker of the later program supposed to deal?


> The OpenBSD way is to do things that could be abused later early and then drop privs to prevent abuse.

That is the approach if you can't regain privileges. In the OpenBSD way it is literally the opposite of waht you propose.

> How is some program going to know what other programs you use will need?

It doesn't? Same issue on linux/openbsd in that regard, with some aspects a bit different due to libc.

> How is the dynamic linker of the later program supposed to deal?

It obviously doesn't, again, same thing on Linux and OpenBSD. It also won't help you to be able to regain privileges via exec if the issue is a shared library.

Allowing for privileges to not propagate is a cool feature that allows you to shell out to a binary that you can't write to (otherwise it's pointless) and have that binary manage its own permissions. That's sometimes important, and Apparmor supports that sort of thing (with child profiles).

But it's also a big footgun that can easily allow a full sandbox bypass.

You can implement the child profile concept easily enough with a privileged broker anyway.


Oh yeah, is this not a clear bypss? Write an executable file to disk and then... execute it. As far as I know that is all that's required to completely bypass pledge?

I assume/ hope this is something you can control with better filesystem controls?


Once you've pledged away exec() access this doesn't seem like an issue?


Of course, but if you've pledged away exec inheritance is obviously not a problem at all.


I guess the TL;DR is: This is a cool project, don't get me wrong :)

I just feel that it's difficult to do in Linux's heterogeneous environment where everyone uses their own kernel configuration and libc variant... the reason is not just the difficult C API (with BPF in it...) but it's also the surprises and weak guarantees in the environment where these programs run.

We should at some point be able to do unprivileged sandboxing, but seccomp-bpf may not be the way to do this at scale.


It was mentioned in the discord that instead of doing `curl something | sh` you could do `curl something | ./pledge.com sh` that looks like an interesting way to see what all those install scripts really want to get access to


Very nice! I'm a fan of OpenBSD and pledge(). I've had some success on Linux with libseccomp[0] which means you don't have to deal with BPF directly, but pledge() is obviously much much easier.

0. https://github.com/seccomp/libseccomp


I'm just going to dump some other links to pledge just for others that are interested. Here's some presentations on attempts at natively implementing pledge in Linux (YouTube's auto-translate does a decent job) [1][2].

The topic of a pledged process starting other processes un-pledged often comes up (and already has done in the comments here). I'd recommend checking out this section of Theo de Raadt's presentation that explains why this is [3].

As mentioned in the article the nice thing of pledge on OpenBSD is the integration of the pledge interface with the reality of underlying system. So as one example a program can pledge only dns and say not have filesystem access, but really under the covers it can read /etc/resolv.conf.

[1]https://www.youtube.com/watch?v=uXgxMDglxVM

[2] https://www.youtube.com/watch?v=PK7gETZURx0

[3] https://youtu.be/Er44ur7wkXQ?t=1497


If you find this post interesting, and would like to follow (or take part in) the discussions that happen around these new developments, I highly suggest you pop into the redbean discord! It covers more than just the scope of redbean and the community is really blossoming

https://discord.gg/mvkhxRaW


This is great

If you like I can run pledge against my more complicated app to see if it complains. I did it on a simple app and it seemed to not like how I used memory map (MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE). I also do MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE | MAP_STACK | MAP_GROWSDOWN which might be weird to pledge too


MAP_POPULATE is intentionally disallowed currently by the Cosmo Libc pledge() implementation, because it's Linux-specific, I've rarely seen it used, and I was erring on the side of caution since I wasn't sure how much prefaulting reads could be considered a borderline privileged operation. If you remove that flag, then mmap() should work fine.

MAP_STACK can be used correctly with Cosmo by saying MAP_STACK | MAP_ANONYMOUS. That's all you need. You can also call Cosmo's _mapstack() function. Cosmo is somewhat unique in that it polyfills the platform-specific underlying flags automatically when you do that. So you shouldn't include MAP_GROWSDOWN because we chose to adopt the FreeBSD behavior which we polyfill on Linux using MAP_GROWSDOWN. See our docs for further details. https://github.com/jart/cosmopolitan/blob/0a589add4167c1b587...


I see. I should check again because things can change but my app is performance sensitive and when I checked last it seemed to give myself a boost since I was writing to the memory so quickly.

One of the reasons why I use MAP_GROWSDOWN is because it gives me a page guard. Your link mentions/implements a page guard. Well done

I tried my complicated app for fun removing the MAP_POPULATE flag. I get an assert because it seems like mmap didn't align the pointer to 4K. I don't need this to work I was just playing around with it


As you wish. I've just pushed a change removing the restriction on MAP_POPULATE. https://github.com/jart/cosmopolitan/commit/ccd057a85daf0d2d... Now that we have -n (maximum niceness flag) I doubt anyone will object. If your app is public and you're building it with Cosmo, then I look forward to the opportunity to see it someday. Please stay in touch. You're also welcome to join our Discord community! https://discord.gg/mw3j3sa2


Having tried this myself i don't believe in this implementation. The list of syscalls is not only dynamic across kernel versions, but even different between architectures.

You REALLY need a committed team to maintain this, and likely ability to fix in a /etc/pledge.conf or something, as syscalls are added and changed.

The hard part is not what jart did, but to keep it working.


The Cosmopolitan Libc project has a very modest scope that isn't nearly as big as you suppose. We only support x86-64. We're not trying or claiming to be the ones who will carry the pledge() burden for the whole Linux world like glibc and Musl too, even though (so far) our implementation seems to do a reasonable job at that.

As for the moving target of the Linux system call ABI, we mostly use a whitelist model so I don't see why it should matter. Unless your concern is that we keep up with the latest new features. My humble opinion is there's very few system calls Linux has introduced in the last ten years that I personally care about using. Other people might care about things like io_uring and statx() but I tend to stick with the classic calls.

Keep in mind, Cosmopolitan Libc binaries are designed to run on six operating systems. I'm probably the only person on earth who's written a libc for that many platforms. Here's what I learned. If you look at https://github.com/jart/cosmopolitan/blob/master/libc/sysv/s... you can see that there's a point in history where consensus between systems drops off. It probably happened sometime around the year 2000, and since then Linux has mostly just gone its own way as far as UNIX systems are concerned. There's only been a select few system calls introduced in the last twenty years that every single system was quick to adopt, e.g. getrandom(), pipe2(), openat(), fstatat(), etc.

So I really don't think it's as complex as folks here are making it out to be. It's simply a question of focus. We aren't solving every problem. But the few things we are doing, we do very well.


I'll admit to not knowing this context around it, but I guess yes if you only support binaries linked against your own libc (and only dynamically), and only on one architecture, then it should work.

But as you say, it's quite limiting for userspace trying to use things that maybe your libc is not aware of.

Compare how OpenBSD is actually disallowing syscalls outside of their libc. That's the level of control that's needed both for it to work, and I don't think Linux lets you do that.

My point is: No, this does not "port pledge() to Linux". I'm sure you agree that it has many many asteriskses, to the point that the title built up a hope it can't live up to.

I wish it did, but I don't see it.

> We aren't solving every problem.

It's a good effort. But the problem isn't the narrow scope. The problem is "Porting OpenBSD pledge() to Linux", which I don't think is an accurate description of what this does.


> Compare how OpenBSD is actually disallowing syscalls outside of their libc.

And Cosmopolitan Libc actually disallows syscalls to the OpenBSD Libc whilst running on OpenBSD. If this surprises you then you've misunderstood the intent behind msyscall(), which Cosmopolitan's pledge() implements on Linux too. See the "Syscall Origin Verification" section of the blog post https://justine.lol/pledge/#msyscall The basic idea is you can choose whatever set of system call wrappers you want, put them into one memory location, and then the kernel will check the RIP register to make sure that SYSCALL is only being used from those addresses. Their choice to start doing this is kind of funny because it turns C libraries into a game of Highlander. There can be only one.


The only libc in the address space is a very attractive ROP target :)


Syscall origin verification is orthogonal to pledge and is not really even a security feature at all without strong CFI. The general consensus right now is that strong CFI is kind of a mythical unicorn that doesn't exist. pledge remains useful even though this is true, so syscall origin verification is not a prerequisite.


Yeah I said "compare" just as an illustration of the control that OpenBSD can and does exercise over the interface between user space and kernel space.


Yeah, that is true. IMO seccomp is kind of not useful unless you own the libc.


Could the CLI tool be combined with something like firejail [1] to solve the filesystem blindspot issue?

[1] https://firejail.wordpress.com/


I think there have to be more than 7000 OpenBSD users. The link to that claim was not responding when I clicked it.


It's a joking reference to an ancient copypasta troll/meme alleging that bsd is dying, cf the phrase "netcraft confirms it" and troll comments on slashdot 20+ years ago. :)


Of course, now I remember, those comments would cite fake sources estimating usage numbers, and probably some of them claimed to have data from de Raadt.


Yes, that statistic is from 20 years ago (11/02)


I was an OpenBSD user then. What are the odds I was 1 out of 7000?


It seems to be some copypasta meme from 2003


...maybe that means there are only 6,999 users now... I KEED...


> Pledging causes most system calls to become unavailable. Your system call policy is enforced by the kernel, which means it can propagate across execve() if permitted.

My understanding is that promises aren't inherited across execve. So that's an incompatibility with openbsd's pledge. And a pretty important one imo because it makes it more difficult to factor out privileged subprocesses (e.g. one doing network things, the other accessing filesystems).

> File system access is a blind spot. OpenBSD solves this with another famous system call called unveil()

That could probably be approximated with user + mount namespaces, a tmpfs and bind mounts. Basically what containers do. But that might suffer from the same process inheritance problems if unveil is BSD-specific, the manpage is unclear about unveil's exec behavior.


I was introduced to pledge via SerenityOS and it really is amazing


And here's Andreas porting Justine's pledge utility to SerenityOS after having seen this very post: https://youtu.be/T6YkQF6ohoA


Wow nice! The system call has been in Serenity for a long time I think


Very neat! Unfortunately can't get the `curl` example to work no matter what I do (on Arch Linux).

    $ pledge.com -p 'stdio rpath wpath cpath dpath flock tty recvfd fattr inet unix dns proc thread id exec' curl http://justine.lol/hello.txt
    curl: (6) getaddrinfo() thread failed to start
I tried following the Troubleshooting section and looking through strace output, but unfortunately I'm not sure what I'm looking for, I see a few EPERMs for calls that I don't know what they do: rseq, set_robust_list, and sysinfo to name a few.


It works fine for me if Curl is built with Musl Libc. I can see you're using a very cutting edge glibc. I tried reproducing this on Debian 10. The only calls that got EPERM'd were set_robust_list() and prlimit64(). I recompiled pledge.com by adding those, and Curl is still failing for reasons that aren't obvious. I've encountered issues like this in the past. Me personally I've always solved this by keeping a healthy distance from Glibc by using things like Musl and Cosmo. However I want to see Glibc users benefitting from our work too! So I'd welcome contributions from anyone who can help us do that.


That could mean that the clone3 system call fails with EPERM instead of ENOSYS. Suppressing system call implementations with ENOSYS is generally safer because it just looks like an older kernel, while EPERM is an regular error code for some system calls.

Put differently, ENOSYS tells userspace that the system call isn't implemented and you need use some fallback code. EPERM means that the operation was denied by policy. But in that case, it might not be a good idea to approximate it by using different system calls because it might circumvent the intended security policy.


Excellent work Justine! You consistently create interesting high quality projects and it’s turned me into a fan :)

Keep it up! I’ll be upping my sponsorship in the hopes that you can continue doing this full time.


Thank you!


Well done, Justine! This is absolutely fracking BRILLIANT. Thank you for this, especially a useable CLI utility! I'm going to have a lot of fun unmasking bastards with this thing :-)



It seems you can disable[0] networking with `unshare --map-root-user --net`.

[0] I don't think it's meant to disable it, I just know too little about namespaces to appreciate how you would unshare a new network namespace and have it keep working.


I don't think Linux will get pledge() for a long time, and this definitely won't do it in the general case either.

More specific critique of seccomp here: https://blog.habets.se/2022/03/seccomp-unsafe-at-any-speed.h...


I can see it is a very simple and easy way to restrict stuff but it is also very basic (allow or deny file write ops anywhere).

There is AppArmor which is much more flexible and it is not as complicated and annoying to use as Selinux.

Disabling the internet access is just one line per network:

  /usr/bin/prog { 
    deny network inet,
    deny network inet6
  }
I don't really see the need for yet another way to restrict stuff. I think it makes more sense to create more tools to work with AppArmor, make writing profiles easier, instead of reinventing another security wheel.


Looking at https://unix.stackexchange.com/questions/135115/apparmor-pro... wouldn't raw sockets still be allowed under your blacklist policy? What about bluetooth and other radio interfaces? If an AppArmor evangelist can let so much through accidentally then that's a red flag. The thing that confuses me the most about AppArmor is it appears to be part of Linux, but I have no idea where its API exists. For example, there's nothing in the system calls table that says AppArmor. I tried running `aa-exec ls` under strace and I don't see any system calls that appear to be doing sandboxing. I tried putting your suggested config in an ls.apparmor file and passing that along as `aa-exec -p ls.apparmor ls` but it didn't work, since I think it requires you put profiles in /etc as root. That kind of defeats the whole point of us wanting a non-root sandboxing tool.


I am just its (mostly happy) user. I wrote that rule on my phone so you are right, it is not complete. On the other hand raw sockets require CAP_NET_RAW capability which is often assigned to root only so running a capability-untreated binary as an unprivileged user should not allow any raw socket ops (ping often uses file capabilities or setuid root).

AFAIK it requires root to load/reload profiles. And that is fine for me, my use-case is hardening of services running on my server.

For ad-hoc restriction of untrusted software you can already use stuff like FireJail https://firejail.wordpress.com/

But I agree that software developers know their software the best so they should be the ones writing the rules and ideally configure them depending on automake/configure paths (i.e. different PREFIX, this software/profile separation is annoying) but pledge() looks too old and non-flexible for such a job IMO. Most software need file ops but don't need to write everywhere.


> Firejail

I'm sure it's great but it requires setuid privileges. If it needs root it isn't ad-hoc.

> But I agree that software developers know their software the best so they should be the ones writing the rules

Exactly! You get it. pledge() is basically an App Store permissions model in spirit. It's curated and, like Android / Apple devs, the developer is thinking about what permissions they'll need to ask for at each step of writing their program. Not needing root is an important aspect of enabling that. The good news is that with SECCOMP BPF and Landlock I think we finally have a comprehensive solution for building the perfect unprivileged sandbox.


> Your pledge command imposes some perfectly reasonable resource quotas on programs by default, to prevent that from happening. By default, unless you tune the flags, a program is allowed to use only 4gb of memory and, if you've permitted it to fork off new processes, then it won't be able to spawn more of them at the same time than twice your number of CPUs. That way your sandbox won't compromise the stability of your machine.

4gb and perfectly reasonable are two different things. I'm not sure if there is any sensible default memory limit but if there is one it should probably scale with available memory instead of just hardcoding some magic amount that "ought to be enough for everyone".


I've updated the tool to make the default virtual memory limit equal to the total amount of physical ram. https://github.com/jart/cosmopolitan/commit/7f966de48987f79a... I hope this addresses your concern. No default limit is going to be perfect. Some use cases might want a lot of overcommit for sparse scientific computing. Others might have a ton of RAM, but it's because they want to run thousands of automated programs, rather than one program that dominates the system. I think this is a better limit that's going to help make sure no one runs into any surprises, while still helping to protect the system from an unintentional memory bomb.

New binary available at https://justine.lol/pledge/


For the niceness part, I wrote a small helper [1] quite some time ago, and it really is handy in some situations.

[1] https://codeberg.org/post-factum/litter


It’s stuff like this that really, really entice me to give openbsd as a daily driver a shot.


You could be user 7001.


i am now user 7001. :-)

spun up a vm and am googling a lot


jart, you are a crazy person.

Nicely done and if we're ever at the same conference link this comment to me and I'll buy your next drink.


I understand why pledge is useful when a trusted program is trying to lower it's own privilege to limit the blast radius.

However I never really quite understood the idea of limiting the permission of untrusted programs hoping that it won't pown you anyway. If you remove network access and allow file system access, what's preventing the program to invoke itself in your .bashrc ? (Or probably 10s of other ways)


For the laymen among us, this[0] StackExchange answer has a good explanation of what pledge does.

This seems like a pretty cool feature to have in Linux.

[0] https://unix.stackexchange.com/questions/410056/what-is-open...


How does this approach for sandboxing compare to the bubblewrap that uses namespaces? https://github.com/containers/bubblewrap


Great work! Should we reverse "exec" and "execnative" because the first is more APE oriented (maybe "execape"?)? Just thinking from the perspective of adopting to more Linux use cases.


The pledge.com command always enables execnative because it needs to be able to call execve() after pledge() is called. It's one of the weaknesses of using pledge() as a command runner rather than as a C API library. In terms of the C API, that's up for debate. Because our implementation of that is primarily intended to cater to Cosmopolitan Libc's way of doing things, where APE is the preferred binary format. It wouldn't feel right to call it exec if it can't execute our own binaries! However we still do the responsible thing by providing a way to opt-out into only supporting the native elf format.


If you liked this you might like [0].

0: https://www.youtube.com/watch?v=-a5hLBuW6tY


Damn, I was looking for pledge on Linux not too long ago, but I think I might go with cloudflare's sandbox or bwrap instead.


Does anyone have a good primer on pledge, how it works, syntax, etc?



Many thanks :)


it's so great and nobody porting this to the other BSDs?


FreeBSD already has capsicum which unfortunately is more complicated than pledge/unveil:

https://wiki.freebsd.org/Capsicum


It’s not really more complicated; it’s just that Capsicum implements an actual security model instead of a random hodgepodge, like with seccmp or pledge, and that means one has to fit the application into that model.


Unfortunately the Linux port was never incorporated and is apparently now abandoned: https://github.com/google/capsicum-linux Then, if you're serious about capabilities, as you should be more-or-less, you might want Genode (posted here fairly recently) or something else, where they're not grafted in.


oh my god. the plebs are coming.

using strings for 16 categories is certainly not "elite". bits would be fine.


pledge(2)'s predecessor used bits, it was inflexible: https://flak.tedunangst.com/post/string-interfaces


not inflexible, they just didn't like prefixes. and since Theo doesn't care about performance, he just used strings.


Wait until you see how macOS does sandboxing.


Isn't the rant about chroot kind of addressed by filesystem namespaces in Linux?


pledge() is not chroot-like. unveil() is, kinda. But pledge() is much cooler.


I didn't say it was. There's a long section about chroot() under "Caveats."


Ah, sorry I misunderstood.

Addressed, yeah, but I would not say solved for the general case of all the namespaces.

"Just put me in a (sand)box" is actually really tricky with namespaces, and depends on if you started off as root or not.

More on using namespaces to drop privs: https://blog.habets.se/2022/03/Dropping-privileges.html (another backburner project)

It's early morning so I may be wrong, but my testing seems to show that actually yes you can still fchdir() your way out of a file system namespace.


You're right, it doesn't address the file descriptor leak, only the root restriction (well, user namespaces address that).

But that isn't really an issue with chroot (or namespaces). It's (1) that CLOEXEC is opt-in, not opt-out, and (2) that you need this poll hack to enumerate open file descriptors.


Sorry but the elitism of openBSD is something that is a huge red flag for me. It is the person who is confident they know it all that is the most dangerous especially when they are the most skilled. Arrogance is something to ne ashamed of but also very dangerous and humility is something to be valued (but also a most redeeming feature).


How is this related to pledge? Even if you hate everyone that works on OpenBSD, software not being allowed to make system calls it doesn't need to make is a good security practice. And with a Linux implementation of it, you don't need to involve yourself with OpenBSD. Steal the good parts, basically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: