Awesome! It looks like they have some tests in the directory -- I'd be curious whether they're able to re-use musl's test suite some. I had an experiment a while ago to incrementally port musl to rust while keeping the test suite passing: https://github.com/anp/rusl. Haven't put any time into it in a while, but I think matching musl's program-facing behavior is still a good idea.
Is there a good guide on how to make libraries with C bindings in rust? I've been learning it for the past few days but without much of a use case. Fast libraries with a C API in a higher level language would be very useful though.
It's mostly straightforward, you just define with #[no_mangle] + extern fn ...
The Rust docs cover it in small detail[1]. FWIW you can do the same with JNI and have direct Java->Rust bindings. If you want to get really clever you can even parse the Rust AST(using syntex) to auto-generate java/headers(there may be libs to do this now, I haven't looked in a while).
What's nice about the C FFI is you can pretty much embed Rust anywhere, I've used it in C/C++, Java, C# and Python pretty easily.
I wrote on a similar topic [1], about writing system libraries in Rust with a C API and bindings to other languages.
Which covers a bit of this, plus some of the sharp edges of FFI. Most of the trickier parts of it are covered in the FFI omnibus [2]. Also worth checking out is the cbindgen tool [3].
If I understand correctly, that will eventually allow 100% pure Rust executables, and these executables will be easily redistributable and work on any distro (only the latter is currently possible with the MUSL target).
> these executables will be easily redistributable and work on any distro
While this sounds amazing and I'm sure is very useful in various situations, I think encouraging a culture of distributing software in such a fashion is an incredibly bad idea. This is a big issue I have with the Go culture. Downloading a statically linked binary from some site and just sticking it somewhere is extremely bad practise to encourage. The user will never get prompted to update it, vulnerabilities and bugs in the vendored libraries will never be addressed... the list of issues is endless.
Distributions are not really about distribution, they're about maintenance.
There is cargo-deb [0] which is, IMHO, a far superior way to distribute Rust software. I've been hoping for something similar for Mac/PC distribution (.pkg and .msi...stuff like homebrew is not a substitute for proper packages). I've been tempted to work on it myself, but haven't found the time. It's not particularly hard to do if you're only scripting the various platform-specific commands, but making something that will output artifacts for all platforms is much harder.
I think the old Windows model of installers polluting the registry and disk drive in random places was a very bad idea and it's time to move on from .exe and .msi.
It reminds me of why the protobuf compiler had never received any functionality regarding the generation of python code with the redefined library roots [1]: Google has a monorepo, therefore they never needed this. It is especially amusing when it is later being explained [2]:
> Overall, your directory structure appears to be more complicated
> than what we generally do at Google (which is the environment where
> all this behavior was designed/evolved). At Google we generally have
> a single directory structure for all source files, including protos.
Hi there, those are my words you quoted above. I just wanted to emphasize that we would help open-source users out on this if we could. This isn't a case of "we didn't need it, so we never implemented it." The problem is that no one has proposed a solution that is compatible with our need to do parallel compilation inside Google.
The only proposal I saw involved removing the 1:1 file correspondence between .proto input and _pb2.py output. I can understand why people would want this. But inside Google, we compile all our .proto files in parallel, and our whole build system depends on being able to parallelize things in this way.
If you have a solution to this problem that doesn't break any of our existing use cases, we're all ears.
Yet, people find static Go and Rust binaries a useful feature, so clearly there is some notable problem with the Linux ecosystem that static linking addresses.
In particular there's tension between running enterprise distros on servers and getting stuff deployed when the system libs are very old.
It's not even about being old, but having different versions.
Take libcurl and OpenSSL for example. Some distros use OpenSSL 1.0.2, some default to OpenSSL 1.1.0, in some cases libcurl depends on 1.0.2, but the system in general prefers OpenSSL 1.1.0. Which one should you use in your application for crypto?
Obviously, the sane decision is to provide a separate build for each distro which uses the libraries bundled with the distro. Or you can just link everything statically (and then security updates of OpenSSL become your problem).
Then also come QA people and management, which want to support multiple distros, but do not want to support multiple packages (as they are different and require separate QA effort), plus the users would find it hard to install an appropriate package for their distro.
So... it's more convenient to just link statically.
Static linking used to be how everyone deployed software. The argument for dynamic linking was mainly "we can save a lot of diskspace and memory" (which it didn't when you account for how UNIX worked back then)
Of course nowadays you also use it to quickly patch security holes in applications without having to relink them.
IMO the proper way would be to simply relink the binary with patched libraries as default and only use dynamic linking where it's necessary.
Statically linked binaries work far better in my experience, I can run (some) go programs on ancient 2.3 linux kernels (last i tested atleast) without complaint but any modern Firefox will likely refuse since the libraries in that specific distro are ancient af.
I count myself as the young generation (21), static linking is out of the ordinary since it is rarely done for software I run (outside my Rust and Go development).
But I still think it's how software should be deployed, it should be the norm.
> Yet, people find static Go and Rust binaries a useful feature
People find junk food convenient, but encouraging people to form a habit based on it is no favour to them. Both situations prioritize instant gratification, the difference is that with downloading random binaries, you really only need one of them to lead to trouble for it not to have been worth it.
> some notable problem with the Linux ecosystem
How so? Care to name a platform which has "solved" this problem? They're all either "go and download a random binary from a random site and plonk it somewhere/run an installer that does god-knows-what" or lead you to installing "apps" that are so isolated from one other that general purpose productivity on such a platform is almost impossible.
> I think encouraging a culture of distributing software in such a fashion is an incredibly bad idea
It isn't obvious to me how that is dramatically different from traditional package managers as far as the update process goes. Upgrading decencies seems like a generally active and involved process as far as most package managers I've seen go.
pip/apt/npm/go-get/glide/yum/nixos etc all require you to actively discover and upgrade your dependencies and I've never been prompted to upgrade a package from any of the programs (a few do ask if you actively engage special subcommands on CLI e.g apt list --upgrade). Unattended-upgrades might be close but you can really only enable that on security releases and most package managers don't have the resources to setup special distros (and fewer backport security fixes).
So is the pain really just not having a quick upgrade cli and is that dramatically different from going to a github page and getting the new URL for a new binary? Would something as simple as writing a script to list and download versions of binaries from github releases make this a non issue?
They don't require you to discover - they all have a clear and easy option to go and find updates automatically. Most of the system-level ones also have various schedulers & notifiers (both CLI and GUI) which are normally on by default for common distributions.
More people study and use glibc, so I guess they find more bugs. On the other hand glibc has a larger code base. Anyway it's difficult to compare bug counts across projects.
Hm, at those who downvoted me: Read some older glibc bug reports.
For example the one about sscanf("foo", "%10s", strptr) returning 1. Hint: According to the standard, it should return 0.
Musl does it right, which recently broke some of my test cases. But the bug is known and not fixed since years, maybe a decade...
While better security is a goal of relibc it is not actually one of the primary goals.
Most existing libc implementations are not particularly portable across different kernels, and especially not to microkernels such as Redox OS. The goal is to have a standard library that is easier for the Redox OS devs to develop in (obviously, Rust makes sense here), and not so tied down to one kernel like musl is for example.
There are important reasons why Rust should continue to use glibc by default, but the Rust compiler already supports musl libc on Linux, so glibc is not required.
It’s a mechanism in glibc for loading dynamic libraries at runtime (based on a config file) to handle DNS and some other things. Anything statically linked can’t support this, which can cause issues with some setups.
A friend of mine found a bug in memmove over Christmas. Rust would have caught it (signed/unsigned issue), but the bug was in sse2 assembler so would still exist if they allowed optimised ask in the rust version.
I’m doing a (free) operating system (just a hobby, won’t be big and
professional like gnu) for 386(486) AT clones. This has been brewing
since april, and is starting to get ready. I’d like any feedback on
things people like/dislike in minix, as my OS resembles it somewhat
(same physical layout of the file-system (due to practical reasons)
among other things).
I’ve currently ported bash(1.08) and gcc(1.40), and things seem to work.
This implies that I’ll get something practical within a few months, and
I’d like to know what features most people would want. Any suggestions
are welcome, but I won’t promise I’ll implement them :)"
I wonder how they are going to handle testing. The tests directory seems pretty bare, surely there are some libc tests (written in C I guess) that can be directly imported?
Gonna go on a tangent here. I'm always curious about stuff like this. Everyone and their mother wants to replace C, but what's people's answer to it being a part of POSIX? I mean it's literally a part of the standards family, and no one seems to address this. I can never take aspiring C-replacements seriously because no one comes out and openly says, "This is how we replace C."
They're all really just claims that C is bad, and we shouldn't use it. Okay, well how do you replace it? Or was that never the goal to begin with and you just want me to use this language?
We're at the point now where it would be useful to be able to express ABIs and FFI in a language-agnostic manner without resorting to C. C lacks the capability to describe fairly widespread functionality: things like fixed-size integer types [0], pointer lifetimes, arrays with lengths, functions with bound environments, variable arguments [1]. Keeping distinctions between binary strings and textual strings would be useful for languages, and tracking how to deallocate pointers would be beneficial as well. An ABI that allows systems with different GCs to pass GC information to each other would be amazing, if challenging.
Unfortunately, we're sort of in a catch-22. The multilingual interoperation has to go through C, because that's the only standard that exists. The people who write the standards look at the situation and see no reason to go beyond C because no one's proposing an alternative.
[0] Sort of. There is <stdint.h>, but C tends to fundamentally think in terms of char/short/int/long/long long, and there have been issues with trying to work out what [u]int64_t maps to.
[1] Not in C's broken sense of "there are more arguments after this point, but you get to guess how many." Rather, I'm thinking of a more Java sense of "here's the list of extra arguments the caller gave me."
I mostly agree. The caveat is that we can still express things like pointer lifetimes and even arrays-with-lengths in English. E.g. libc is full of arrays-with-lengths, implemented as separate params.
Don't get me wrong: I think it is great that more powerful languages automate and check those things. But it is also right for ABIs to be two-steps behind so that they don't tie themselves to a single language.
And here is the real problem with ABIs and C -- C is a single language. Ages ago C was a decent lingua franca, because C programs translated to machine code in pretty easy to guess ways, and the job of an ABI was just to formalise those guesses.
But nowadays we have aggressive optimisers, expansive definitions of undefined behaviour, clever parameter passing conventions, abstract "memory models" defining concurrency rules and all kinds of other stuff that make C something more than just an obvious abstraction of real-world machines.
> C lacks the capability to describe fairly widespread functionality: things like fixed-size integer types [0], pointer lifetimes, arrays with lengths, functions with bound environments, variable arguments [1]. Keeping distinctions between binary strings and textual strings would be useful for languages, and tracking how to deallocate pointers would be beneficial as well. An ABI that allows systems with different GCs to pass GC information to each other would be amazing, if challenging.
Adding some of those features would mean that every language has to support them and that includes c if you wanted to gain any traction. Part of the reason c is the interop standard is because it lacks those features, the less features you have the easy it is for the rest of the world to inter operate with you.
C is the interop standard on OSes written in C, mostly following UNIX ideas.
It isn't the interop standard on OSes that have decided to follow other path, namely Windows (COM/.NET), Android (Java), ChromeOS (JavaScript), IBM z (Language Environment), IBM i (ILE), Unisys ClearPath (MCP).
What sort of properties would you consider a "textual string" to have that a "binary string" would lack?
Offhand I'd assume you mean some of the following, but I'm curious if /my/ assumption about your needs is correct and complete. (Also, note that some of these may be implied by values elsewhere in the specification, it isn't strictly a data structure.)
* flag:isValidated
* targetEncoding
* flag:isNormalized
* targetNormalization
* lengthMemory (not including the implicit size of the type's metadata)
* lengthCodepoints (distinct encoding elements)
* length(Some word meaning complete display elements, but not the actual width, height, etc.)
Do you also need actual random access to individual codepoints/"full characters"? Or is the assumption that if such a scan is in progress the entire sequence or major parts of it are being fully-scanned?
What if, within a given function call*, the underlying storage could be referenced with zero-copying and substrings made? (On exiting that context they'd be copied if retained.)
It might be useful to have library iterators that split on say, a single given character, segments up to a maximum memory size of N (and scanning backwards for the last "full character" or at least complete rune).
Of course, I believe that, by /default/ a program should perform /binary/ interaction with files including standard in/out/err. If there is a process for converting such binary input/output it should assume and produce data in UTF-8 (in NFC normalization) (when converting to/from "text" types) as a default, which should be easy to globally over-ride or set specifically for a given open file (and change mid use).
Of course I don't normally need to care about the encoding of data directly; libraries with parsers and writers typically do that as well as other encoding/decoding for that format for me. Everything else tends to be copy / compare and do not change, or making filenames (mostly append bits of "should all be UTF-8").
Actual, hard core, manipulation of encoded streams seems far more like a composition editing issue (A specific editor where default input dialogs are insufficient) or something that underlying libraries need to worry about.
In practice, use cases don't justify strings precalculating their number of code points or grapheme clusters. Pixel width isn't a property of a string alone, since it depends on font and the container to render it in (for line breaks).
So a string should know its number of code units. Validity information should be part of the type system and not runtime flags. I.e. you have a different type for arbitrary bytes and for valid UTF-8. (See IDNA for why normalization guarantee in infra over time is problematic.)
Having thought about it you're probably right. That properly belongs on a different kind of wrapper around this data structure or ones like it. Maybe in that /specific/ and /rare/ use case it even makes sense to store the underlying data in a wide data structure; but my supposition is that handling it as a series of fragments (which would have a length in memory and in code-points sizes) would be the answer. If addressing in such a way replacement and other editing operations seem likely.
> What sort of properties would you consider a "textual string" to have that a "binary string" would lack?
Being a unicode stream correctly reified using a specified (or specific) encoding.
So:
* encoding possibly (the ABI could also specify it)
* in-memory length probably (could be a stream but that would require some sort of stream/iterator support at the ABI level), though you'd probably use the length in code units and match that to the encoding for in-memory length (if the encoding is not hard-coded)
* validated... in the sense that the stream could be garbage which may or may match the encoding above? No, this would be intrinsic.
* normalised, no
* length in codepoints, no
* length in grapheme clusters, no
> Do you also need actual random access to individual codepoints/"full characters"?
I agree with everything EXCEPT for the validation being intrinsic.
If the validation is /required/ after every operation then a lot of processing must happen on the underlying storage after EVERY operation to ensure it is still valid.
If the validation is deferred (such as until programmer request or possibly when emitted out of program) then those checks are also combined in to one final operation; hence the reason for tracking 'is this validated'.
Similar logic also applies to normalization (which opens options for optimizations in comparisons among other things).
> If the validation is /required/ after every operation
The only thing that is required is that the data be valid, anything else would be UB. I don't know of any properly implemented text processing instruction which would make valid text invalid, and thus there are almost no operations which would require validation.
The only point at which you'd need validation (aside from bytes to text) is when you're trying to play tricky bugger and do text processing via non-text instruction for performance gains.
> a lot of processing must happen on the underlying storage after EVERY operation to ensure it is still valid.
Only if your storage is insane and does not understand the concept of text e.g. mysql or files. And then yes, I would certainly want whatever garbage these output to be checked every damn time before somebody tells me "yeah it's text just trust me". In fact that's more or less how every language with an actual concept of text separate from bytes/arbitrary data treats files.
> If the validation is deferred
If the validation is deferred you need to check at runtime before each text-processing instruction if it's working on valid text or not, and if that is put in the hand of developers (to avoid paying the cost for every instruction)… we know how "just check them" null pointers end up.
Replacing C in part is valuable. You don't have to have a plan to replace all C code in an entire stack to get measurable improvements (performance, readability, barrier to entry, modularity, features, and yes security) from the work.
POSIX is an ideal. It's true that POSIX is defined in terms of the C API instead of any ABI - Rust libstd used to have a tiny bit of code I wrote to deal with the fact that Android's libc exposes a couple of signal-handling functions as inline functions in <signal.h>, which is perfectly fine as POSIX goes but annoying if you're not writing in C. But in practice you can successfully bind to POSIX implementations from non-C languages and only need workarounds like this a few times, and in the same way, you can write a POSIX-fulfilling set of header files that interface to a non-C implementation.
And once you replace every component... you've replaced C.
Are you saying that the only people who care about POSIX are those saddled with legacy systems that they can't get rid of? That actually sounds about right to me.
They aren't the only ones, but there's some truth there. POSIX, and other similar uniform standards, are very much a "you don't know what you've got 'til it's gone" type of thing.
> Anyone who ships software to multiple operating systems?
You can do that without direct dependencies to POSIX.
I don't care how .NET, Java, Go, Free Pascal, Ruby, Python, Perl,Swift, D, C++... implement their runtimes, unless I actually need to look under the hood.
There's such a strong connection between the idea of "programming language runtime" and "operating system" (resource management, I/O management, task scheduling, package management, upgrade management, lifecycle management, version control, configuration management, user interface, etc.) that it makes sense both to
- think of a PL as an OS (Smalltalk, Erlang, JVM)
- think of an OS as a PL
The latter leads to the idea that there might be a "main language" for an OS in the same way there's a "main language" for Smalltalk, Erlang, and JVM runtimes (Smalltalk, Erlang, and Java, respectively), and "auxiliary languages" that fit into the same runtimes (Java/Self/etc, Elixir/LFE/etc, and Scala/Clojure/Kotlin/etc, respectively).
If you know this language, you can change any part of OS. I think that's a valuable asset. For example if you're using OpenBSD and know C, you can hack anything from kernel to smtp daemon. If you're OS is written with 10 languages, it's much harder to learn them all.
It never is. Even ancient unix had shell and (maybe more to your point) yacc. Yacc still generates C files, but that's more of an implementation detail. The C _interop_ is key.
Well, of course! I was talking about why yacc is a second language for (early and contemporary) unix. Its C interop for (early and contemporary) unix is what I was stressing.
I think you're splitting hairs. If someone is making an OS they claim is "Unix-like", then it isn't unreasonable for someone to expect that the code that compiles and runs on AIX, HP-UX, MacOS, Solaris, and OpenBSD will also compile and run on this new "Unix-like" OS with minimal changes. If they're not going for POSIX compliance, or even getting reasonably close thereto, they're going to have a hard time being "Unix-like" in practice, certifications be damned.
So long as you can export using standard x86 conventions (cdecl, stdcall, syscall, etc.) there is no need to replace it. Even a managed (.Net, Java, Go, etc.) POSIX could interact with C, provided it can import and export C calls.
This also means that the C/++ compiler, which depends on these standard imports and exports, could theoretically run in a pure Rust POSIX.
What's the issue? So what if POSIX is specified in using C. That's just an API, and Rust can use it. The ABI is not specified by POSIX, naturally, but the platform does specify an ABI (or enough that you can derive an ABI for an API).
It is still difficult to get some people to believe C is a problem at all. Until the majority of the industry (and especially OS vendors) agree that we should have a replacement systems programming language we can't begin the discussion on what that replacement should look like.
There are many projects recently titled like "xxx in Rust" or "xxx in Go", rewriting an existing software, seems to be built not because there is a problem in the existing software, but without a clear reason.
For learning a new language, it can be a good practice. But if you hope the project to grow successfully I think there is no hope.
Once you've built this easy way out I fear it will turn into a crutch that stops new, pure rust apps from being written. Do you really want a C library in Redox?
There is tons of C software out there that depends on libc that is never going to be rewritten in Rust. Also think about interpreted languages like Python, Ruby, etc. I imagine they all depend on libc. The only real way for Redox to gain adoption is through cross platform compatibility.