The Windows malloc() implementation from MSVCRT is slow

TonyTrapp · on July 2, 2022

Not that it helps here, but Microsoft never considered the MSVCRT that ships with Windows to be public API. This is not the "Windows allocator", this is the (very) old MSVC runtime library's allocator. Of course that doesn't keep anyone from using this library because it's present on any Windows system, unlike the newer MSVC versions' runtime library. Using the allocator from a later MSVC's runtime library would provide much better results, as would writing a custom allocator on top of Windows' heap implementation.

MSVCRT basically just exists for backwards compatibility. It's impossible to improve this library at this point.

FreakLegion · on July 2, 2022

There's also UCRT, which ships with the OS since Windows 10. The logic of this rant was a real head-scratcher. If you must blame one side, it's LLVM. Fragmentation of C runtimes is annoying but inescapable. Glibc for example isn't any better.

mananaysiempre · on July 2, 2022

> Fragmentation of C runtimes is annoying but inescapable. Glibc for example isn't any better.

Glibc very much is better.

There cannot be more than a single version of a (tightly coupled) ld.so+libc.so pair in a given address space any more than there can be more than a single KERNEL32 version in a given address space, and given that some system services are exclusively accessible via dynamic linking (accelerated graphics, for one), this forces essentially everything on a conventional Linux desktop to use a single version of a single C runtime (whether Glibc or Musl). No need to guess whether you need CRTDLL or MSVCRT or MSVCR70 or MSVCR71 or ... or UCRT (plus which compiler-version-specific overlay?): you need a sufficiently recent libc.so.6.

I cannot say I like this design (in some respects I actually like the Windows one more), but it does force Glibc to have very strong back-compat mechanisms and guarantees; e.g. it includes a bug-compatible version of memcpy() for old binaries that depended on it working byte by byte. As far as I’m aware, this applies from the point GNU stopped pretending Linux did not exist and the “Linux libc” fork died, that is 1998 or thereabouts.

(There are a myriad reasons why old Linux binaries won’t run on a new machine, but Glibc isn’t one of them.)

This is not to say that Glibc is perfect. Actually building a backwards-compatible binary that references old symbol versions is a gigantic pain: you either have to use a whole old environment, apply symbol versioning hacks on top of a new one and pray the resulting chimera works, or patch antique Glibc sources for new compilers and build a cross toolchain. But if you already have an old binary, it is better.

p_l · on July 2, 2022

Glibc is major source of "binary from X years ago doesn't work" in my experience, to the point that if I want to run stuff older than let's say 5 years I start by getting a docker container - or a specially taylored set of libs starting with dynamic linker itself. Somehow proprietary OpenGL drivers still worked, it was glibc related files that caused crashes.

Thanks to symbol versioning I never know if a binary will work unless I can dismiss it early by versioning causing it to not link at all.

And given that glibc backward compatibility for all practical purposes is shorter than 10 years, having to support centos can get you a lot of those issues

mananaysiempre · on July 3, 2022

> Given that glibc backward compatibility for all practical purposes is shorter than 10 years

As far as I’m able to tell[1], Glibc has never removed a versioned symbol as long as symbol versioning has existed, so the intended backwards compatibility is “forever”. No binary built against a(n unpatched) Glibc should fail to load on a later Glibc due to an absent Glibc symbol version. (Of course, if you play non-version-aware dynamic linker and dlsym() not dlvsym() things in libc, you lose; if you bundle an “updated” part of libc with the application[2], you lose; etc., so this may be true but still not entirely foolproof. And given how many things there are in libc, if you’re going to lose it’s probably going to be there.)

[1] E.g. https://sourceware.org/git/?p=glibc.git;a=history;f=stdlib/V...

[2] https://news.ycombinator.com/item?id=29479903

benibela · on July 3, 2022

Glibc 2.34 broke FreePascal. It calls some unversioned symbol for initialization that was removed

password4321 · on July 3, 2022

For example: https://github.com/microsoft/vscode-remote-release/issues/10...

kelnos · on July 3, 2022

This is the opposite problem of what's being talked about. Binaries built against old versions of glibc should run just fine against newer versions. This is about a binary built against a newer version of glibc that doesn't run on an older one. This is common, and super annoying. There are ways to build to eliminate this problem, but they all feel like hacks, or involve a lot of extra work.

(Hacks such as https://github.com/wheybags/glibc_version_header -- which apparently does work very well, but still feels like an annoying hoop that should be unnecessary to jump through. I wish glibc's shipped headers actually could support this out of the box so you could set a preprocessor define like `-DGLIBC_TARGET_ABI=2.12` and it would just work.)

mananaysiempre · on July 3, 2022

So a vscode thingie uploads a newer binary to an older host, tries to run it there, and fails? Because the people who built said binary did not care to make it backwards compatible (or better yet, statically linked)?

... Duh?

(I’m not an expert in VSCode DRM, to put it mildly, so I might be misinterpreting this discussion, but that’s what it looks like to me. Also, isn’t it referencing GLIBCXX i.e. libstdc++, which not even a part of Glibc?)

password4321 · on July 3, 2022

> referencing GLIBCXX i.e. libstdc++, which not even a part of Glibc

You're right, so not an example of a glibc failure, but rather another standard library. Thanks!

captainmuon · on July 3, 2022

> There cannot be more than a single version of a (tightly coupled) ld.so+libc.so pair in a given address space

That's an odd restriction, and it is related to the fact that all symbols live in one global namespace in a process. It's annoying if you are trying to build something like a plugin system, or if you are using a dynamic language which by definition loads all libraries dynamically. This is also the reason that you cannot mix Glib (GTK+) versions in a process.

I think you should be able to dlopen some library, or heck just load some machine code, and be able to run it, just take care to only pass POD over the boundary and never `free` stuff you didn't `malloc`.

FreakLegion · on July 3, 2022

> Actually building a backwards-compatible binary that references old symbol versions is a gigantic pain: you either have to use a whole old environment, apply symbol versioning hacks on top of a new one and pray the resulting chimera works, or patch antique Glibc sources for new compilers and build a cross toolchain.

This is the example I gave downthread. It's an everyday problem for a lot of developers targeting Linux, whereas targeting decades-old Windows versions is a breeze.

Glibc isn't better, it just has different problems.

seba_dos1 · on July 3, 2022

Targeting old GNU/Linux is a breeze too. Containers make compiling stuff for old distros super easy. I even use them to crosscompile stuff for Windows, macOS, Android etc. too, since I got tired of having to set up development environments on new machines.

ChrisSD · on July 2, 2022

The UCRT has even been present since Windows 7, if users keep up with updates. Or if applications bundle the UCRT installer with their own.

josephcsible · on July 3, 2022

> Or if applications bundle the UCRT installer with their own.

But the UCRT is proprietary, so there are often legal issues with doing so.

ChrisSD · on July 3, 2022

There is an official Universal CRT Redistributable.

rurban · on July 3, 2022

And java ships it, and installs it into its own java bindir, not the global system dir.

So you have two competing copies, which leads to race conditions (different "global" locks!) and nice cashes, until you can analyze it with DrMemory or the kernel debugger. Nobody does that.

leajkinUnk · on July 2, 2022

Could you elaborate why Glibc isn't any better?

I remember some funny problems with Glibc, like, 20 years ago, but it's been invisible to me (as a user) since then. You get a new Glibc, old binaries still work, it's fine.

FreakLegion · on July 2, 2022

Just like with Windows the challenges affect developers rather than users.

> You get a new Glibc, old binaries still work, it's fine.

Indeed, but when you need to build for an older glibc it's not so simple. This is a common use case, since e.g. AWS's environments are on glibc 2.26.

Ideally you'd like to build for all targets, including older systems, from a single, modern environment (this is trivial in Windows) -- and you can do some gymnastics to make that happen[1] -- but in practice it's easier to just manage different build environments for different targets. This is partly why building Linux wheels is so convoluted for Python[2].

Hardly a world-ending problem, but my point is simply that C runtimes are a pain everywhere.

1. https://stackoverflow.com/questions/2856438/how-can-i-link-t...

2. https://github.com/pypa/manylinux

josephcsible · on July 3, 2022

> Ideally you'd like to build for all targets, including older systems, from a single, modern environment (this is trivial in Windows)

https://github.com/sjmulder/netwake does what you're talking about, but it does a lot of gymnastics to make it work, and it also needs to use MinGW rather than MSVC for that to be the case.

FreakLegion · on July 3, 2022

I was thinking specifically of how easy it is to build for Windows 2000 even today. Going back to 3.11 is pretty cool!

usrn · on July 2, 2022

I'm pretty sure I've run into binaries breaking on new versions of Glibc but maybe it's because the architecture or calling convention changed. I've never really gotten the sense that GNU cares much about binary compatibility (which makes sense, they argue that sharing binaries is mostly counter productive.)

cryptonector · on July 3, 2022

Eh, no, this is strictly a Windows problem. On Windows every DLL can be statically linked each with its very private copy of the MSVCRT, which means -for example- that you'd better never ever pass one DLL's malloc()'ed memory pointers to another DLL's free().

On Unix systems (and Linux) this sort of thing can only ever happen if you have a statically-linked application that is linked with libdl and then it dlopen()s some ELF -- then you need the application's C library to be truly heroic. (Solaris stopped supporting that insanity in Solaris 10, though glibc apparently still supports it, I think?)

pjmlp · on July 3, 2022

AIX, which happens to be a UNIX, has similar dynamic linking model.

Then we have the surviving mainframes and embedded RTOS, which aren't UNIXes either.

Shadonototra · on July 3, 2022

what has LLVM to do with software development on windows?

because developing for windows is cancer that people fall back to LLVM and others

it's all on microsoft for not cleaning their mess

even apple offers a better story than the platform for "developers, developers, developers, developers"

a shame, a well deserved shame, a hall of shame

jart · on July 2, 2022

It's effectively mandatory. Microsoft provides about twelve different C Runtimes. But if you're building something like an open source library, you can't link two different C runtimes where you might accidentally malloc() memory with one and then free() with the other. If you want to be able to pass pointers around your dynamic link libraries, you have to link the one C runtime everyone else uses, which is MSVCRT. Also worth mentioning that on Windows 10 last time I checked ADVAPI32 links MSVCRT. So it's pretty much impossible to not link.

TonyTrapp · on July 2, 2022

It isn't mandatory. I have never actively linked against MSVCRT on Windows. From my experience it's mostly software that isn't built with Visual Studio that uses MSVCRT, or software that that takes extreme care of its binary size (e.g. 64k intros). MSVCRT is not even an up-to-date C runtime library. You wouldn't be able to use it for writing software requiring C11 library features without implementing them somewhere on top of it.

It's true that you cannot just happily pass pointers around and expect someone else to be able to safely delete your pointer - but that is why any serious library with a C interface provides its own function to free objects you obtained from the library. Saying that this is impossible without MSVCRT implies that every software needs to be built with it, which is not even remotely the case. If I wanted, I could build all the C libraries I use with LLVM and still link against them in my application compiled with the latest MSVC runtime or UCRT.

The much bigger problem is mixing C++ runtimes in the same piece of software, there you effectively must guarantee that each library uses the same runtime, or chaos ensues.

kazinator · on July 2, 2022

If you're writing in C++ on Windows, expose only COM (or at least COM-style) interface called through virtual functions on an object pointer. Then you can use whatever C++ run-time you want, internally. What you don't want is the other library calling C++ functions by name. Like you pass it some ostream object and it calls ostream::put or whatever, where that symbolically resolves to the wrong one.

p_l · on July 3, 2022

It's also why all, or nearly all, new API on windows are done through COM.

It also means that C++ or other runtimes don't pollute your ABI and make it annoyingly hard to access features from any random code.

kazinator · on July 3, 2022

Another reasonable choice for some components would be to have a purely C api; everything extern "C", and only PODs (plain old datastructures) in the arguments.

olvy0 · on July 3, 2022

Exactly. Using a pure C interface escapes COM's requirement to register every component (regsvr32) and its overblown "GUID for everything" model, and its baroque way of creating components (somewhat alleviated by various macros and templates, but still). Making an object oriented interface is slightly cumbersome, but you can do it by sending back and forth a cookie (void* or int) of your object as the first parameter.

This will also make your interface accessible to other languages, if the need arises, since the C ABI is a de-facto stable ABI on all platforms (disregarding library objects and issues).

Another alternative if you want to stick with C++: make up your own lightweight COM! I've done this successfully in my own code at work, it works great, and has been working for 8 years now.

This method allows people to write C++ components against my app without using COM and without having the exact same compiler versions and having symbol problems. It may seem like a lot of work but it really isn't.

1) Expose a pure virtual interface IFoo, that doesn't include any implementation. Once it is released, never touch it, only create new versions which inherit from it, IFoo2, IFoo3 etc.

2) Expose an extern "C" CreateFoo() that will return FooImpl : IFoo. Also a matching DeleteFoo(IFoo).

3) For all structs / classes you need to receive/send from IFoo methods, create another interface. This includes all C++ classes, including std::string, hide it behind IMyString or something. This is a minor inconvenience but it sidesteps all ABI incompatibilities.

4) Have all interfaces inherit from some IBaseXX which has a GetVersion() method A component which uses this component could call this, and if it returns e.g. 2, then it can safely cast IFoo to IFoo2* and use IFoo2 methods. Else it can return an error message or use something from IFoo*

This relies on the fact that C++ vtable layout is essentially an ABI that will never change, at least under Windows, since the whole of COM relies on this and MS will never change it. Anything other than vtable layout in an object is subject to change, so the trick is to only have pure virtual interfaces.

I have no idea if this trick will also work on Linux, I don't know how stable GCC / Clang's vtable layout is from version to version, but I suspect it will.

This was taken from a CodeProject article I read a few years back, but I can't find it anymore... the closest I can find is [0] (DynObj - C++ Cross Platform Plugin Objects), but it is more complicated than what I read back then. I didn't use it, I wrote my own as I outlined above. Like I said, it isn't really that complicated.

[0] https://www.codeproject.com/Articles/20648/DynObj-C-Cross-Pl...

cesarb · on July 3, 2022

> Using a pure C interface escapes COM's requirement to register every component (regsvr32) [...] Another alternative if you want to stick with C++: make up your own lightweight COM

Or you can combine both: use a pure C interface which returns COM objects. That way, you can keep using COM without having to register anything with regsvr32 or similar.

> I have no idea if this trick will also work on Linux, I don't know how stable GCC / Clang's vtable layout is from version to version

Recent (as in, since around the turn of the millennium) GCC and clang on Linux use a standard vtable layout, defined by the "Itanium ABI" (originally created for Intel's IA-64, but like UEFI and GPT, became a standard across all architectures), so it's also very stable.

p_l · on July 4, 2022

Itanium ABI is somewhat stable. As in there exists a chance that two binary files compiled with exact same set of libraries and headers will be compatible, which is something that will get broken the moment two files were compiled with different switches in GCC that do not technically change things at Itanium ABI levels, but cause incompatibilities in various standard types.

olvy0 · on July 3, 2022

Ah! Yeah, I've completely forgotten that the Itanium ABI also guarantees a standard vtable layout. Thanks!

Were starting to convert my app from a Windows-only app to a cross platform one, so this makes me happy that my "COM lite" scheme would also be stable on linux.

As for returning pure COM objects through a C interface: Yeah, I've considered that, but like I wrote in my comment below to pjmlp, I don't like COM personally, and I want to simplify writing the components' code as much as possible, since the team which writes them isn't a team of software developers but rather in a supporting role. They do know basic C++/C# but don't know COM.

Also getting rid of COM would be the first thing done for cross platform support, anyway.

p_l · on July 4, 2022

Due to experience trying to link some code, especially C++, I'd probably rather start by implementing a simplified COM system first rather than dealing with extern C

pjmlp · on July 3, 2022

Not kept up with the times?

Registration free COM exists for several years now.

olvy0 · on July 3, 2022

I was aware of registration free COM, and I did consider it, in fact I tried it out several times during the last decade.

However in my experience it was always hard to get working properly, and it was always poorly documented. I'm talking primarily about the XML assembly manifest, which is very easy to get wrong.

In fact I remember vaguely the only complete documentation & example I could find at the time was an article in MSDN magazine, but now I can't find it, only bits and pieces of information scattered around, nothing in MSDN except a vague article. Most references I can find are for consuming dotnet COM objects and I also need to consume components written in C++. So the situation has gotten worse, documentation-wise.

Another couple of points:

1) Personally I don't want to use COM at all, I think it's too complex. I think it's really a brilliant idea but wrapped with over-engineered, idiosyncratic idioms and API. I tried to distill it to the minimum required by me for my "COM lite" above.

2) I'm not the one creating those components, I'm in charge of the consuming application. The people creating those components (in house) are essentially my support people, they're not really professional developers, and get confused with anything slightly more complex than plain C++ code. None of them has ever learned anything about COM, I'm the only one who knows all about it. Meaning I have to support them when things go wrong with COM registration or when they compile their code and it doesn't work. So I'm on a mission to get rid of all COM dependencies in my application, and replace them either with plain DLLs with C API, or (in one specific case) with my "COM-Lite" I outlined above.

josephcsible · on July 4, 2022

Wasn't there also a hacky way to achieve the same goal as registration-free COM even before it existed?

p_l · on July 4, 2022

For in-process DLL-based servers, you could try to implement CoCreateClass (IIRC) manually and bypass GUID-to-server lookup and go directly for loading a DLL and calling the method for getting the object from it.

pjmlp · on July 4, 2022

Probably, although I tend to stay away of such solutions, as they eventually turn into headaches to sort out in some machine, where the hack doesn't work as expected.

kazinator · on July 2, 2022

C applications targeting Windows must provide their own C library with malloc and free (if they are using the "hosted implementation" features of C).

MSVCRT.DLL isn't the library "everyone" uses; just Microsoft programs, and some misguided freeware built with MinGW.

Even if ADVAPI32.DLL uses MSVCRT.DLL, it's not going to mistakenly call the malloc that you provide in your application; Windows DLL's don't even have that sort of global symbol resolution power.

I would be very surprised if any public API in ADVAPI32 returns a pointer that the application is required to directly free, or accept a pointer that the application must malloc. If that were the case, you'd have to attach to MSVCRT.DLL with LoadLibrary, look up those functions with GetProcAddress and call them that way.

Windows has non-malloc allocators for sharing memory that way among DLL's: the "Heap API" in KERNEL32. One component can HeapAlloc something which another can HeapFree: they have to agree on the same heap handle, though. You can use GetProcessHeap to get the default heap for the process.

It may be that the MSVCRT.DLL malloc uses this; or else it's based on VirtualAlloc directly.

cesarb · on July 2, 2022

> MSVCRT.DLL isn't the library "everyone" uses; just Microsoft programs, and some misguided freeware built with MinGW.

There's a third set of users: programs built with old enough versions of the Microsoft compiler. Before Microsoft decided that every single version of its C compiler should use a different C runtime (and much later they changed their mind again), all Microsoft C and C++ compilers linked their output with MSVCRT.DLL. In fact, that's probably the reason MinGW chose to use MSVCRT.DLL: to increase compatibility with the Microsoft compilers, by using the same C runtime.

kazinator · on July 3, 2022

MinGW chose MSVCRT.DLL because it meets the definition of "System Library" referred to in the GNU Public License. There is a special exception that GPLed programs can be linked to a proprietary, closed-source component if it is a system library; i.e. something always installed and always present on a certain type of system to which that program is ported. Without that, you couldn't have GNU utilities as replacements for Unix utilities on a proprietary Unix, linking with its libc.

mananaysiempre · on July 2, 2022

> Before Microsoft decided that every single version of its C compiler should use a different C runtime (and much later they changed their mind again).

It is... not as simple as that[1]. MSVCRT itself is the fifth version of a Win32 C runtime, after CRTDLL (used to ship with the OS but no longer), MSVCRT10, MSVCRT20, and MSVCRT40. It’s just that both the toolchains linking with MSVCRT and the OSes shipping it (and nothing newer) endured for a very long time while Microsoft went on a collective .NET piligrimage.

Of course, “NT OS/2” started the same year ANSI C was ratified (and Win32 only a couple of years later), so some degree of flailing around was inevitable. But the MSVCRT Garden of Eden story is not true.

[1] http://www.malsmith.net/blog/visual-c-visual-history/

plonk · on July 2, 2022

Our programs ship their DLL dependencies in their own installer anyway, like most others on Windows. Just ship your FOSS library with a CMake configuration and let the users build it with whatever runtime they want.

garaetjjte · on July 2, 2022

>but Microsoft never considered the MSVCRT that ships with Windows to be public API

It was in the past. At first msvcrt.dll was the runtime library used up to Visual C++ 6. Later, VC++ moved to their own separate dlls, but you could still link with system msvcrt.dll using corresponding DDK/WDK up to Windows 7.

I'm also not sure that this is just ancient library left for compatibility, some system components still link to it, and msvcrt.dll itself seems to link with UCRT libraries.

TonyTrapp · on July 2, 2022

> It was in the past. At first msvcrt.dll was the runtime library used up to Visual C++ 6.

At that time it was already a big mess, because at first it was the runtime library of Visual C++ 4 in fact! The gory details are here: https://devblogs.microsoft.com/oldnewthing/20140411-00/?p=12...

> some system components still link to it

Some system components themselves are very much ancient and unmaintained and only exist for backwards compatibility as well.

garaetjjte · on July 2, 2022

Ancient or not, I don't think it really matters for allocation performance: malloc in both msvcrt.dll and ucrtbase.dll after some indirection ends up calling RtlAllocateHeap in ntdll.dll

TonyTrapp · on July 3, 2022

It does, because modern allocators do a lot of house keeping on top of the system heaps, e.g. to quickly reuse identically-sized memory chunks that were just freed.

jasomill · on July 3, 2022

For posterity, here's a link to a version of that page with the original comments section intact:

https://web.archive.org/web/20160410051513/https://blogs.msd...

ComputerGuru · on July 2, 2022

I don’t think msvcrt is exposed to link against in the DDK anymore. I maintain this, with the caveat that you really need to know what you’re doing: https://github.com/neosmart/msvcrt.lib

Sesse__ · on July 2, 2022

Win32 has an allocator (HeapAlloc), and it is similarly slow and low-concurrent. Even if you enable the newer stuff like LFH.

bjourne · on July 2, 2022

Well... Who told you to link to MSVCRT (the one in System32)? Not Microsoft that's for sure. New software is supposed to link to the Visual Studio C runtime it was compiled with and then ship that library alongside the application itself. Even if you don't compile with VS you can distribute the runtime library (freely downloadable from some page on microsoft.com). Ostensibly, that library contains an efficient malloc. If you willingly link to the MSVCRT Microsoft for over a decade has stated is deprecated and should be avoided you are shooting yourself in the foot.

"Windows is not a Microsoft Visual C/C++ Run-Time delivery channel" https://devblogs.microsoft.com/oldnewthing/20140411-00/

sterlind · on July 2, 2022

there's a weird licensing thing for the VC runtime where you can't redistribute the dll alongside your code unless you have a special license. instead, you have to install it as an MSI. I have no idea why they created that restriction. I even work for MS and it baffles me.

londons_explore · on July 2, 2022

At one point MS was very keen for people to use MSI's rather than building custom installers that jam things into system32. Perhaps that was why.

saagarjha · on July 3, 2022

I don’t usually bug people about this but you mean “MSIs”, not “MSI’s”. They mean different things in this context.

Dylan16807 · on July 2, 2022

But if you're using it for yourself you wouldn't want to put it in system32.

chunkyks · on July 3, 2022

There was a looong period of time where a lot of developers felt entirely free to drop whatever they wanted into c:\Windows, System, System32, whatever. It's a big part of what led to windows getting a reputation for being poorly behaved/crashy/slow over time [leading to a lot of "have you tried reinstalling windows?" recommendations]

Dylan16807 · on July 3, 2022

Yes but I think that's separate from a newer installer using one of these newer runtimes.

legalcorrection · on July 3, 2022

It’s so the system can update the runtime when security vulnerabilities and such are discovered.

naikrovek · on July 2, 2022

probably so it gets a consistent installation location that is protected by the operating system. read the Raymond Chen blog post linked here a few times to see why not doing this has bitten Microsoft in the past.

userbinator · on July 3, 2022

Even if you don't compile with VS you can distribute the runtime library

...which is about two orders of magnitude bigger than the applications I write, so that's already a huge DO NOT WANT. Don't even get me started on the retarded UCRT mess that newer VS seems to force you to use...

mananaysiempre · on July 2, 2022

Isn’t every (release-mode) Microsoft CRT malloc a thin wrapper around KERNEL32!HeapAlloc (thus eventually NTDLL!RtlAllocateHeap)? CRTDLL did it, MSVCRT did it, UCRTBASE still seems to be doing it.

Tringi · on July 3, 2022

Yes, it is.

spatulon · on July 2, 2022

I suspect they are actually talking about the modern/redistributable C/C++ runtime, not the old msvcrt.dll.

Microsoft refer to the modern libraries as the "Microsoft C and C++ (MSVC) runtime libraries", so shortening that to MSVCRT doesn't seem unreasonable.

nice_byte · on July 3, 2022

i'm not sure which part of the post gave you the impression that it's linking against the one in system32. the author even says:

> it's statically linking with the CRT shipped with Visual Studio

that is, unless you know for a fact that they're wrong and clang uses the msvcrt from system32. in which case this seems like clang's fault?

bjourne · on July 3, 2022

He updated the text after the article hit HN: https://web.archive.org/web/20220702095817/https://erikmcclu...

The C runtime shipped in Windows is called MSVCRT.DLL and the ones shipped in Visual Studio are called MSVCRxxx.DLL, where xxx is Visual Studio's version number. If he statically linked to MSVCRxxx.DLL (MSVCRxxx.LIB actually) then what version did he link to? The performance of malloc() differs between versions.

Clang doesn't ship its own C/C++ runtime and certainly can link to MSVCRT.DLL. That is how legacy applications are built.

rayiner · on July 2, 2022

I wonder how much of this is the development culture at MS. https://www.theregister.com/2022/05/10/jeffrey_snover_said_m... (“When I was doing the prototype for what became PowerShell, a friend cautioned me saying that was the sort of thing that got people fired.”)

In that environment I can imagine nobody wants to be on the hook for messing with something fundamental like malloc().

The complete trash fire that is O365 and Teams—for some reason the new Outlook kicks you out to a web app just to manage your todos—suggests to me that Microsoft may be suffering from a development culture that’s more focused on people protecting fiefdoms than delivering the best product. I saw this with Nortel before it went under. It was so sclerotic that they would outsource software development for their own products to third party development shops because there was too much internal politics to execute them in house.

sterlind · on July 2, 2022

(I work for MS, though in core Azure rather than Office or Windows.)

I think that PowerShell story was how old MS worked, back in the days of stack ranking, hatred of Linux and the Longhorn fiasco. things inside the company are a lot more functional now. I saw internal politics drama at my first position, but once I moved everything was chill, and experimentation and contributing across team boundaries was actively encouraged and rewarded.

I suspect Office suffers from a ton of technical debt, along with being architecturally amorphous and dating from a pre-cloud era. as for Windows, the amount of breakage I see in the betas suggests they're not afraid of making deep changes, it's probably that MSVCRT is a living fossil and has to support old programs monkeypatching the guts of malloc or something.

lkfjasdlkjfsad · on July 2, 2022

any idea what the hell is going on with Teams?

why can't i simply scroll up in my own conversations? let alone search them. the sticky sludge of communication in something as simple as chat has cost me hours since i was forced to use teams. outlook search is so superior to teams i'd easily prefer to have lync back. this one thing absolutely cripples communication. there are a list of other very basic issues that make communicating code blocks frustrating. i see new app features here and there, i saw some feature added the other day which won't help anyone. i just don't understand the prioritization of issues

i don't expect a direct answer to this, although i hope to read an explanation one day

EDIT: i removed content from this comment that was missing the point

lkfjasdlkjfsad · on July 3, 2022

downvote me all you want. maybe the quality of my comment was bad.

in my opinion, the team or leader who is responsible for prioritizing issues in Teams needs a major adjustment. their flaws are brazen and affecting all of us who are forced to use Teams for communication.

bluedino · on July 2, 2022

Why don't the arrow keys work half the time in the chat input box?

stagger87 · on July 2, 2022

PEBKAC, you can scroll and search chats in teams.

lkfjasdlkjfsad · on July 2, 2022

> you can scroll

outrageously slowly, and i'm not talking about a couple HTTP requests and a database query slow

> and search

search is not practical in my opinion. i'd go as far as saying it's unusable. i can _find individual messages_, but there are times (often, i might add!) where context or jumping to that message aren't even options. context is often the reason you search for messages in the first place. if i'm alone here, sure, PEBKAC

DSMan195276 · on July 3, 2022

I actually think it's fair to say the chat search in Teams is basically unusable. You can search them, sure, but it only finds single chat messages with no way to go to that message to read the context around it. So, if the chat message you find has the word you wanted but not the information, then you're basically out of luck unless you keep guessing words and find the message with what you wanted.

I think I understand why it works this way too. If you search _channels_, then the search will show the messages with that word but link you to the entire conversation that message appears in. But for chats, each individual message is basically treated like it's own conversation and thus search only displays the single message with no context.

A quick Google suggests I'm not alone in this criticism, and it's been a problem for even longer than I've been using Teams (a year or two): https://techcommunity.microsoft.com/t5/microsoft-teams/teams...

stagger87 · on July 3, 2022

I'm able to search chats on my phone and PC and go to that message when I search, and get all the context around it. Even messages that are several years old. I just tested it. Maybe it's just your Teams settings?

DSMan195276 · on July 3, 2022

I figured I would go check and it seems you're somewhat right, for newer messages (a few weeks) it does link me straight to the chat message so I can view the messages around it. Older messages from Ex. last year still have the behavior in the screenshot of the issue I linked - 'Go to message' simply takes me to a screen with only that message displayed and no others, with no way to actually get to that message in the chat. Obviously I don't have an easy way to tell yet if it's fixed for all messages going forward, or if it only works for recent messages, but either way it's some progress. Presumably in time I'll be able to tell.

So I donno, perhaps it's something funky with our Teams deployment, but this is such a basic feature that I have a hard time understanding how it could ever be implemented this way. Certainly no other chat service I've used has had this kind of problem.

rayiner · on July 2, 2022

It was interesting the see them switch back to Win32 after all of these greenfield alternatives that quickly died. (WPF, WinRT, etc.) Makes you wonder what was going on during that time. Contrast Apple which has been with Cocoa which is an evolution of Next Step.

wvenable · on July 3, 2022

Apple wanted to force Cocoa on everyone but had to delay their OS an entire year to build Carbon so that developers would actually port their Mac apps to OS X.

assttoasstmgr · on July 3, 2022

A number of Apple's original core apps (like iTunes, AppleWorks, etc.) were Carbonized apps.

kccqzy · on July 3, 2022

Finder was Carbon until 10.6. It's probably a message to show developers that Apple was serious about Carbon.

pjmlp · on July 3, 2022

Kind of, Windows 11 WinRT components are still based on UWP, as WinUI 3.0 and WinAppSDK aren't yet up to the job of replacing it.

WinRT hasn't died, that is what WinUI 3.0/WinAppSDK is all about, making that COM infrastructure available on the Win32 side, even though their progress is quite slow.

I think it will take them 2 years still to reach feature parity with UWP features.

ComputerGuru · on July 3, 2022

You have it backwards, UWP is based on WinRT (which is built on top of the COM underpinnnings) and not the other way around.

pjmlp · on July 3, 2022

Having used it since Windows 8 days, I know pretty much how it all goes.

Nowadays there are two WinRT models, the original underlying UWP that grew out of the UAP / WinRT evolution introduced in Windows 8.1, to simplify what was originally split across phones, tablets and desktop.

And now the WinRT implementation on top of Win32, started as Project Reunion, rebranded as WinAppSDK alongside WinUI 3.0.

ComputerGuru · on July 4, 2022

I’m not trying to be particularly pedantic but that still doesn’t make WinAppSdk built on UWP; it’s mainly an expanded and cleaned-up collection of first-party cross-language wrappers/bridges/ffi to WinRT to hide the COM underpinnings plus unify some of the disparate Win32 vs WinRT APIs.

As you know, WinRT predates UWP. UWP as tech isn’t strictly defined but it includes things that are out of the scope of WinRT itself and aren’t available via WinAppSDK even now that UWP is finally, officially dead.

pjmlp · on July 3, 2022

Looking how things have shaped through the years with WinRT, UWP, WinUI and related projects, at least WinDev seems to still be living in the past culture while trying to pretend to actually have changed.

The rate PMs change, how they lack understanding that we are fed up with rewrites, how they seem to believe we would still care and hope that in a couple of years WinUI 3.0 will actually be usable, how bugs in public repos accumulate,....

munch117 · on July 2, 2022

You shouldn't read too much into the PowerShell story. Creating your own programming language is in most cases a frivolous vanity project. Spending company resources on your own frivolous vanity projects is the sort of thing that can get you fired.

sterlind · on July 2, 2022

! I disagree.

CMD badly needed replacing. MS needed a new shell language. A functional company would connect people with a passion for X with the resources to achieve X, if X has a chance of helping the company.

Windows Terminal and WSL show how far MS has come from the PS days.

(Disclaimer: I work for MS)

13of40 · on July 2, 2022

The way I remember it, the need for a new shell language for system administration was something that lots of people in Windows Server were trying to solve. Ballmer talked about it, we had a push to add a handful of new command line tools (like tasklist.exe I think) that you could use under CMD, and there was a proof of concept where MMC could be used to output some kind of macro language when users did things in the UI. PowerShell was the thing that eventually won, and I think it was largely because it stood on the shoulders of .Net so had a ton of capability right out of the gate. (And TBH, I think it's a little bit weird that we have this mythos today where Snover sat down at his computer one morning and invented it out of thin air, when even the v1 feature team had something like 30 engineers and PMs on it.)

skissane · on July 3, 2022

> CMD badly needed replacing

CMD is nasty but there are lots of little ways in which it could have been improved. For example, provide an option to disable that useless "Terminate batch job (Y/N)?" prompt.

I wish Microsoft would open-source CMD.EXE. I dislike how slow PowerShell is (especially its startup). Maybe nobody at Microsoft cares about CMD.EXE enough to fix those long-standing little annoyances like the above, but if it was open-source other people out there would.

Also, I wonder why nobody ever seemed to have thought of integrating CSCRIPT into CMD, so that you could have seamlessly mixed VBScript (or other WSH languages) into batch files.

londons_explore · on July 2, 2022

I think the smart move would have been to make an official port of bash...

jrockway · on July 2, 2022

Some experimentation in that space is probably a good thing. Bash is familiar, but it's far from perfect. In terms of executing programs, it gets the critical functionality down pretty well; readline-based editing, variables, aliases, pipes, IO redirects, background process management, etc.

In terms of being a programming language, I always regret when I use it. I recently had a minor heart attack when I realized that our CI was green when tests were failing; deep down in some shell script, someone forgot to turn on "pipefail" and the shell linter to check for that was misconfigured, and so the CI command that piped output to a logfile failed, but the log file was written OK, so "exit 0".

In terms of interactive poking around at a computer, I never really liked the UNIX philosophy here. I run a terminal emulator, which connects to a remote server over SSH, which is running a terminal multiplexer, which is running bash, which then runs my programs. None of these things know anything about each other. The UNIX way!!! The end result is total jank and some useful features are completely impossible to implement. The various shells running under the multiplexer overwrite each other's history. The terminal multiplexer can't give the terminal emulator its scrollback buffer. The shell history is specific to the machine that the shell is running on, not the machine that the terminal is running on. Echoing the character you just typed involves two TCP round trips! It's so bad, guys.

For that reason, I totally see the desire to re-engineer this space. There are a lot of improvements to be mad. Powershell is an interesting attempt. It doesn't solve any of my problems, though, and I personally don't enjoy using it. It's verbose and unergonomic, and still not a good programming language for making stuff happen. Windows is missing the glue ecosystem of things like "grep", "sed", "curl", etc., which make matters worse. (Powershell provides some of those things as cmdlets, but the "curl" one opens up IE to make you click something, and weird stuff like that.) It's nice that someone tried to make a new thing. I personally think it's worse than everything else out there.

TL;DR: "we've always used bash" leaves a lot to be desired. It's fine. But if someone says they can do better, I completely agree.

skissane · on July 3, 2022

> The various shells running under the multiplexer overwrite each other's history.

GitHub seems to have umpteen different variations on "store the shell history in a SQLite database": https://github.com/thenewwazoo/bash-history-sqlite https://github.com/trengrj/recent https://github.com/andmarios/bashistdb https://github.com/digitalist/bash_database_history https://github.com/quitesimpleorg/hs9001 https://github.com/hinzundcode/termmon https://github.com/fvichot/hisql https://github.com/bcsgh/bash_history_db https://github.com/jcsalterego/historian

This seems a popular enough idea, I wonder why it hasn't just added to Readline/libedit/etc? (Would the maintainers of those packages agree to add such an idea to them?)

> The shell history is specific to the machine that the shell is running on, not the machine that the terminal is running on

Once you've got the idea of shell history in sqlite – why not have a (per-user) daemon which exposes that history, some RPC protocol over a Unix domain socket, with the name of the socket in an environment variable? Then you can forward that socket over SSH? Again, something that could go in readline/libedit/etc, if their maintainers could agree to it.

> The terminal multiplexer can't give the terminal emulator its scrollback buffer.

Someone needs to come up with a new protocol here, instead of just pretending we are all still using physical VT series terminals over serial lines? The biggest problem with introducing any such protocol would be getting everyone to agree to adopt it. One could make patches for a few different terminal multiplexers and emulators, but would the maintainers accept them?

jrockway · on July 3, 2022

My thought is to burn it all down. It's my very-background side project to do this; I have also seen a few startups trying it out. (My theory is that developers won't pay for tools, so I wouldn't start a company to do it. A few people use IntelliJ or whatever, that's about it. They're not going to buy a commercial ssh daemon and terminal emulator.)

I will say that one should be wary of combining 3d graphics and font rendering and designing a programming language in the same project, at least if you want to get it done in less than 10 years. (I did talk myself out of writing a text editor, at least. Not touching that one.)

Like I said, very backgroundy. I don't have a good programming language spec. I do have text and UI elements that can be rendered at 360fps. (Yup, I have a 360Hz monitor. If you do things right, the lack of latency is frightening. It's like that "perfectly level" floor from Rick & Morty. But it is hard work to render a UI 360 times a second, especially when you're not using C++. Sigh!)

skissane · on July 4, 2022

> My thought is to burn it all down

I have the exact same thought. I've been working on it, on-and-off, for years. I remember starting out when our son was a baby, and he's 9 now, and I still haven't got very far. Plus, now we've got two kids, and they are older, I have far less time to muck around with this stuff than I used to.

What I've realised after a while – "burn it all down" is unlikely to produce much results. I mean, if it is all just for fun (which is kind of all it is for me now), it doesn't really matter. But if one is hoping to make an impact in the real world, small incremental improvements building on pre-existing work are far more likely to do that than starting it all over from scratch, as tempting as that is.

> I will say that one should be wary of combining 3d graphics and font rendering and designing a programming language in the same project, at least if you want to get it done in less than 10 years. (I did talk myself out of writing a text editor, at least. Not touching that one.)

My priorities are somewhat different from yours. Own programming language? Yup, been through a few of those (every now and again I get fed up with it all and restart from scratch). 3D graphics? Well, I planned on having 2D graphics, never got very far with that, didn't even think of 3D. Mostly have just stuck to text mode, at one point I had my own incomplete clone of curses written in a mixture of my own programming language and Java (this was before I decided to abandon the JVM as a substrate and rewrite my own language in C–I started redoing my curses clone in C–why didn't I just use ncurses or whatever?–but it is very unfinished, never got anywhere near as far as my Java attempt did), also HTML to render in the browser.

But a text editor? Yeah, did that. Also, somewhat bizarrely, the text editor I wrote was a line editor (as in ed, ex, edlin, etc). Never actually used it that much for editing text. I meant to write a full-screen interface for it too (retracing the historical evolution from ex to vi), just have never got around to it.

> I don't have a good programming language spec

One thing I learned years ago – unless you love writing specs, they are a bit of a waste of time for this kind of stuff. Write your language first, create the spec later. Even for serious languages like Java, that's actually what happened (from what I understand)–the implementation of the language and the runtime was already several years old before they started writing the specs for them.

hughw · on July 2, 2022

Maybe that's how we got WSL

munch117 · on July 3, 2022

I didn't say that this project in particular was a vanity project. Just that when someone comes to you with a project and says "I'm going to solve our problems by using this new programming language that I'm in the process of inventing", then some skepticism is not unwarranted.

In hindsight, this particular one may be the best thing since sliced bread. But that's survivorship bias.

assttoasstmgr · on July 3, 2022

> people protecting fiefdoms than delivering the best product.

Apparently you have never worked at any company with more than 50 employees because that's literally how every large company works. Career-obsessed managers who can't see the forest for the trees not giving a single shit about the overall product as long as their goals were met. They're off to the next promotion before all hell breaks loose.

barrkel · on July 2, 2022

Windows doesn't have a malloc. The API isn't libc like conventional Unix and shared libraries on Windows don't generally expect to be able to mutually allocate one another's memory. Msvcrt as shipped is effectively a compatibility library and a dependency for people who want to ship a small exe.

qsdf38100 · on July 2, 2022

Note that Windows has HeapAlloc and HeapFree, which provide all the functionality to trivially implement malloc and free.

The C runtime is doing exactly that, except it adds a bit of bookkeeping on top of it IIRC. And in debug builds it adds support for tracking allocations.

barrkel · on July 3, 2022

VirtualAlloc is a better base for a custom memory allocator. It's closer to mmap + mprotect in functionality.

There's also CoTaskMemAlloc (aka IMalloc::Alloc). And COM automation has a bunch of methods which allocate memory for dynamically sized data, which could be abused for memory allocation - SafeArrayCreate, SysAllocString.

astrange · on July 3, 2022

It sounds like they've invented 15 different ways to allocate memory rather than have malloc. What did they do that for?

(Not to say malloc is a perfect API, it’s definitely oversimplified, but they probably didn’t solve any of its problems.)

barrkel · on July 3, 2022

The latter three are for RPC / interop / automation scenarios, simplifying the programming model for things like Visual Basic.

HeapAlloc (and legacy routines GlobalAlloc and LocalAlloc which wrap it) is mostly a relic of 16-bit Windows.

VirtualAlloc is the one that matters for language runtimes on Windows since Win32 API in the 90s, and it's designed to allocate slabs which are suballocated by more sophisticated code.

qsdf38100 · on July 3, 2022

malloc is just a C api, it’s not a syscall, and Linux is no different. malloc/free on Linux is probably using mmap under the hood, and doing some bookkeeping to decide when to decommit memory to give it back to the OS.

astrange · on July 4, 2022

When did I say anything about syscalls or Linux?

qsdf38100 · on July 4, 2022

You said "they invented 15 ways instead of having malloc". By "they" you mean Microsoft right?

Windows does have malloc as a C api for programs using the C runtime library. Same as everywhere else.

Then, at the OS api level, there are indeed several memory management functions. But you usually don’t need them. Except if you are writing a custom memory allocator for instance. Also same as everywhere else.

So saying Windows has X memory management functions instead of malloc is incorrect.

evmar · on July 2, 2022

The other inaccuracies in this article have already been covered. I noticed there was also a weird rant about mimalloc in there ("For some insane reason, mimalloc is not shipped in Visual Studio").

My understanding is mimalloc is basically a one-person project[1] from an MSR researcher in support of his research programming languages. It sounds like it's pretty nice, but I also wouldn't expect it to be somehow pushed as the default choice for Windows allocators.

[1]: https://github.com/microsoft/mimalloc/graphs/contributors

bcbrown · on July 2, 2022

Seeing someone refer to any piece of software technology as a "trash fire" makes it harder for me to view them as credible. It's unnecessarily divisive and insulting, and it means it's unlikely they will have any appreciation of the tradeoffs present during initial design and implementation.

dang · on July 2, 2022

We've replaced the baity wording with more representative language from the article, in keeping with the HN guideline: "Please use the original title, unless it is misleading or linkbait; don't editorialize."

https://news.ycombinator.com/newsguidelines.html

trollied · on July 3, 2022

The Factorio team were looking at a performance bug recently & tracked it down to similar: https://forums.factorio.com/viewtopic.php?f=7&t=102388

https://developercommunity.visualstudio.com/t/mallocfree-dra...

InfiniteRand · on July 3, 2022

The thread on the second link gives some clue as to why things are the way they are

eska · on July 3, 2022

So Microsoft changed the malloc behavior for UWP apps, but not desktop apps. In other words they saw it as problematic enough to change it but then say it’s not a bug for the other case. Schizophrenic.

DHowett · on July 2, 2022

I'm curious whether the "new"(ish) segment heap would address some of the author's issues.

It's poorly documented, so I can't find a reference explaining what it is on MSDN save for a snippet on the page about the app manifests[1]. There's some better third-party "documentation"[2] that gets into some specifics of how it works, but even that is light on the real-world operational details that would be helpful here.

Chrome tried it out and found[3] it to be less than suitable due to its increased CPU cost, which might presage what Erik would see if they enabled it.

[1] https://docs.microsoft.com/en-us/windows/win32/sbscs/applica...

[2] (PDF warning) https://www.blackhat.com/docs/us-16/materials/us-16-Yason-Wi...

[3] https://bugs.chromium.org/p/chromium/issues/detail?id=110228...

MarkSweep · on July 2, 2022

The one other piece of “documentation” that I know of is this blog post:

https://blogs.windows.com/windowsexperience/2020/05/27/whats...

It mentions that the segment heap is used by default for UWP apps and reduces memory usage of Edge.

denkshom · on July 2, 2022

This rant was rather devoid of relevant technical detail.

I mean, why exactly is the malloc of the compatibility msvcrt so slow compared to newer allocators? What is it doing?

An analysis of that would have been some actual content of interest.

chrisseaton · on July 2, 2022

So why is it a trash fire? It's just slow? Or is there something else wrong with it? I thought the author was going to say it did something insane or was buggy somehow.

Someone · on July 2, 2022

Also, is it slow because it’s badly implemented, or is it better than other mallocs in some other respect? Maybe, dating from decades ago, it’s better in the memory usage front?

softwaredoug · on July 2, 2022

My knowledge is like 10 years old - For a long time, Microsoft's stl implementation was based on their licensning of dinkumware's STL (https://www.dinkumware.com/). Not something maintained in house. It seemed to work OK'ish - giving lowest common denominator functionality. However, it was pretty easy to create higher performing specialized data structures for your use case then what seemed like simple uses of dinkumware STL.

garaetjjte · on July 2, 2022

malloc is not related to STL. But about it, big issue with Microsoft STL is that it is atrociously slow on debug builds.

Sesse__ · on July 2, 2022

Just wait until you try to use it from multiple threads at the same time!

eps · on July 3, 2022

Not sure what's your usage was exactly, but Heap API works reallly well in this context.

So much so that beating it with a custom allocator is a real challenge.

Sesse__ · on July 3, 2022

I had a system that was sped up by 30%+ on Windows by switching from HeapAlloc to jemalloc. Profiling showed that HeapAlloc was largly stuck in a single giant lock. (This was on Windows Server 2016, IIRC.) And that wasn't even that allocation-heavy in the large scale of it; most of memory was done through arena allocations, but a few larger buffers were not.

shaggie76 · on July 2, 2022

I wonder if he was running with the debugger attached; we also saw atrocious performance with MSVCRT malloc until we set _NO_DEBUG_HEAP=1 in our environment.

moonchild · on July 2, 2022

> it basically represents control flow as a gigantic DAG

Control flow is not a DAG.

spatulon · on July 2, 2022

You're not wrong.

I guess they're just trying to say that LLVM's control-flow graph is implemented as individually heap-allocated objects for nodes, and pointers for edges. (I haven't looked at the LLVM code, but that sounds plausible).

Even if those allocations are fast on Linux/Mac, I wonder whether there are other downsides of that representation, for example in terms of performance issues from cache misses when walking the graph. Could you do better, e.g. with a bump allocator instead of malloc? But who knows, maybe graph algorithms are just inherently cache-unfriendly, no matter the representation.

3836293648 · on July 2, 2022

Pretty sure they mean the AST is a DAG

tick_tock_tick · on July 2, 2022

For it to be a DAG you'd have the solve the halting program wouldn't you?

remram · on July 2, 2022

No. Knowing whether programs end in finite time (dependent on input) doesn't mean all programs end or that all programs end in constant time.

The halting problem is also not considered unsolved (though P=NP is unsolved).

pshirshov · on July 2, 2022

Well, why?

AshamedCaptain · on July 2, 2022

Directed _Acyclic_ Graph. Control flow graph has loops.

pshirshov · on July 3, 2022

This guy builds a compiler. I guess he added some limitations into his model so his CF can be represented as a DAG. I have a compiler which represents its CF as a DAG.

somerando7 · on July 2, 2022

> I was taught that to allocate memory was to summon death itself to ruin your performance. A single call to malloc() during any frame is likely to render your game unplayable. Any sort of allocations that needed to happen with any regularity required writing a custom, purpose-built allocator, usually either a fixed-size block allocator using a freelist, or a greedy allocator freed after the level ended.

Where do people get their opinions from? It seems like opinions now spread like memes - someone you respect/has done something in the world says it, you repeat it without verifying any of their points. It seems like gamedev has the highest "C++ bad and we should all program in C" commmunity out there.

If you want a good malloc impl just use tcmalloc or jemalloc and be done with it

Taniwha · on July 3, 2022

I'm a sometimes real-time programmer (not games - sound, video and cable/satellite crypto) - malloc(), even in linux is an anathema to real-time coding (because deep in the malloc libraries are mutexes that can cause priority inversion) - if you want to avoid the sorts of heisenbugs that occur once a week and cause weird sound burbles you don't malloc on the fly - instead you pre-alloc from non-real-time code and run your own buffer lists

astrange · on July 3, 2022

Mutexes shouldn't be able to cause priority inversion, there's enough info there to resolve the inversion unless the scheduler doesn't care to - i.e. you know the priority of every thread waiting on it. I guess I don’t know how the Linux scheduler works though.

But it's not safe to do anything with unbounded time on a realtime thread, and malloc takes unbounded time. You should also mlock() any large pieces of memory you're using, or at least touch them first, to avoid swapins.

Taniwha · on July 3, 2022

if you have to wait on a mutex to get access to shared resource (like the book keeping inside your malloc's heap) then you have to wait in order to make progress - and if the thread that's holding it is at a lower priority and is pre-empted by something lower than you but higher than them then you can't make progress (unless your mutex gives the thread holding it a temporary priority boost when a higher priority thread contests for the mutex)

(this is not so much an issue with linux but with your threading library)

I'm completely in agreement that you shouldn't be mallocing, that was kind of my point - if you just got a key change from the cable stream and you can't get it decoded within your small number of millisecond window before the on-the-wire crypto changes you're screwed (I chased one of these once that only happened once a month when you paid your cable bill .....)

astrange · on July 3, 2022

> (this is not so much an issue with linux but with your threading library)

If your threading library isn't capable of handling priority inheritance then it's probably Linux's fault for making it not easy enough to do that. This is a serious issue on AMP (aka big.little) processors, if everything has waits on the slow cores with no inheritance then everything will be slow.

morelisp · on July 2, 2022

Aside from the performance implications being very real (even today, the best first step to micro-optimize is usually to kill/merge/right-size as many allocations as possible), up through ~2015 the dominant consoles still had very little memory and no easy way to compact it. Every single non-deterministic malloc was a small step towards death by fragmentation. (And every deterministic malloc would see major performance gains with no usability loss if converted to e.g. a per-frame bump allocator, so in practice any malloc you were doing was non-deterministic.)

charles_kaw · on July 2, 2022

If this person was taught game dev any time before about 2005, that would have still been relevant knowledge. Doing a large malloc or causing paging could have slaughtered game execution, especially during streaming.

>If you want a good malloc impl just use tcmalloc or jemalloc and be done with it

This wasn't applicable until relatively recently.

jcelerier · on July 2, 2022

> Doing a large malloc or causing paging could have slaughtered game execution, especially during streaming.

... it still does ? I had a case a year or so ago (on then-latest Linux / GCC / etc.) where a very sporadic allocation of 40-something bytes (very exactly, inserting a couple of int64 in an unordered_map at the wrong time) in a real-time thread was enough to go from "ok" to "unuseable"

charles_kaw · on July 3, 2022

i suppose so.

modern engines generally have a memory handler, which means that mallocs are usually coached in some type of asset management. you are also discouraged from extending working memory of the scene suddenly. When I was doing gamedev, even then, there was no reason to big malloc because everything was already done for you with good guardrails

jcelerier · on July 3, 2022

I mean, if it's a custom memory handler, pool allocator, etc. it's not what people generally mean by malloc, which is the call to the libc function

syntheweave · on July 3, 2022

If you go way back into the archives of the blog's author, probably about ten years now, you will find another memory-related rant on how multisampled VST instrument plugins should be simple and "just" need mmap.

I did, in fact, call him out on that. I did not know exactly how those plugins worked then(though I have a much better idea now) but I already knew that it couldn't be so easy. The actual VST devs I shared it with concurred.

But it looks like he's simply learned more ways of blaming his tools since then.

TonyTrapp · on July 2, 2022

As always there is some truth to it - the problem of the MSVCRT malloc described in this blog article is the living proof of that - but these days it's definitely not a rule that will be true in 100% of cases. Modern allocators are really fast.

forrestthewoods · on July 2, 2022

Strong agree. I recently wrote a semi-popular blog post about this. https://www.forrestthewoods.com/blog/benchmarking-malloc-wit...

It's interesting that LLVM is suffering so horrifically using default malloc. I really wish the author did a deeper investigation into why exactly.

dang · on July 2, 2022

Discussed here:

Benchmarking Malloc with Doom 3 - https://news.ycombinator.com/item?id=31631352 - June 2022 (30 comments)

KerrAvon · on July 2, 2022

Has everyone forgotten that Unix is the common ancestor of Linux and every other Unixlike? I’m seeing an uptick of people writing nonsensical comments like “this was written for Linux (or Mac OS X, which implements POSIX and is therefore really Linux in drag)”.

jchw · on July 2, 2022

No... That's why they had the parenthetical. The problem is, your computer probably doesn't boot the common ancestor. If you're writing UNIX-like stuff, most likely it boots macOS or Linux. If you're cool maybe it's one of the other modern BSD variants aside macOS. In practice there's a pretty low probability that your code also runs on all POSIX-compliant operating systems, and more honest/experienced people often don't kid themselves into thinking that they're seriously targeting that. Even if you believe it, you probably have some dependency somewhere that doesn't care, like Qt for example. Saying something like "Linux (or macOS, which is similar)" is a realization that you're significantly more likely to be targeting both Linux and macOS than you are to even test on BSD. And to solidify that point, note that lots of modern CI platforms don't even have great BSD support to begin with.

Of course, there is a semantic point here. macOS nominally really is UNIX, except for when someone finds out it's not actually POSIX compliant due to a bug somewhere every year or so. Still, it IS UNIX. But what people mostly run with that capability, is stuff that mostly targets Linux. So... yeah.

Of course it is true that some people really think macOS is actually Linux, but that misunderstanding is quite old by this point.

addendum: I feel like I haven't really done a good job putting my point across. What I'm really saying is, I believe most developers targeting macOS or Linux today only care about POSIX or UNIX insofar as they result in similarities between macOS and Linux. That macOS is truly UNIX makes little difference; if it happened to differ in some way, developers would happily adjust to handle it, just like they do for Linux which definitely isn't UNIX.

copperx · on July 2, 2022

Apparently yes, because all I ever hear is "macOS is like Linux" and even "macOS is really Linux behind the scenes" from less enlightened people.

naniwaduni · on July 2, 2022

Well, a pretty big part of the point of Linux is that it's not a Unix-descendant, just a Unix-clone.

pcl · on July 2, 2022

What’s the distinction between those two?

dwheeler · on July 2, 2022

"Unix" implies paying annual fees for use of the "Unix" trademark, and/or at least direct descent from the original Unix code.

According to: https://kb.iu.edu/d/agat "To use the Unix trademark, an operating system vendor must pay a licensing fee and annual trademark royalties to The Open Group. Officially licensed Unix operating systems (and their vendors) include macOS (Apple), Solaris (Oracle), AIX (IBM), IRIX (SGI), and HP-UX (Hewlett-Packard). Operating systems that behave like Unix systems and provide similar utilities but do not conform to Unix specification or are not licensed by The Open Group are commonly known as Unix-like systems."

Many will include the *BSDs as a Unix, because their code does directly descend from the original Unix code. But Linux distros generally do not meet either definition of "Unix".

GabrielTFS · on July 3, 2022

It should be noted that Linux generally does actually conform to UNIX standards, though, and that two actual Linux distributions (Inspur K-UX and EulerOS) have in the past obtained UNIX certification. While this doesn't make all Linux distributions UNIX certified, it puts a rather large dent in the claim that they cannot be qualified as UNIX because of some claimed divergence from the standards.

(It also seems odd from my perspective to call exactly only those two Linux distributions "UNIX" unless you're essentially using it as a legal qualification and not a practical one)

dwheeler · on July 4, 2022

> While this doesn't make all Linux distributions UNIX certified, it puts a rather large dent in the claim that they cannot be qualified as UNIX because of some claimed divergence from the standards.

No one is claiming it can't be a Unix. But as you noted, Linux distributions normally do not meet the legal criteria, nor are they descended from one that did.

Legally Unix is a trademark and has a very specific legal meaning. If you don't mean that legal meaning, then it is clearer if you use another term. The usual term is "Unux-like"; that is the widely-used term and it has been for decades.

A rose by any other name may smell as sweet, but calling it a different word risks confusion.

pjmlp · on July 3, 2022

Only if their code for the certification is fully available upstream.

asgeir · on July 2, 2022

Part of it is whether the code can be traced back to original AT&T code. Which would be true for e.g. BSD variants (which includes MacOS). https://i.redd.it/kgv4ckmz3zb51.jpg

Another part is the trademark and certification fee. https://www.opengroup.org/openbrand/Brandfees.htm

sterlind · on July 2, 2022

Darwin forked BSD, and BSD is a fork of the original Unix source. Linux is a fresh implementation of POSIX, and doesn't directly inherit any code from Unix.

Linux is as much Unix as WSL1 was Linux - i.e. not at all, just clones.

avgcorrection · on July 2, 2022

I just call Linx+Mac+Bsds Unix (not “Unix-like” and certainly not that “*nix” nonsense). I don’t respect Unix enough to be perfectly precise with it.

pjmlp · on July 2, 2022

There is no Windows malloc(). Only UNIXes have the C API as part of the OS API.

spc476 · on July 2, 2022

malloc() is defined by the C Standard. If you want to claim your compiler is ANSI or ISO certified, you need to support malloc() (as well as the rest of the C Standard library).

pjmlp · on July 3, 2022

Quite right, except we are then talking about compilers and not OS APIs.

UNIXes are the only OSes were there is an overlapping between OS APIs and libc due to C's origin.

jart · on July 2, 2022

malloc() isn't part of the Linux API which provides mmap().

plorkyeran · on July 2, 2022

Libc being just a library is indeed one of the ways that Linux is unlike Unix.

leajkinUnk · on July 2, 2022

What do you mean by "Unix"? Are you talking about some specific Unix version, or is there something in the POSIX spec that says that libc isn't a library?

naniwaduni · on July 2, 2022

It's not that libc is supposed to not be a library, but those functions are the POSIX-defined interfaces to the OS. Linux is unusual in that it defines its stable interfaces in terms of the syscall ABI, enabling different implementations of the libc that can work semi-reliably across kernel versions.

pjmlp · on July 2, 2022

Since we are getting pedantic, Linux isn't a UNIX.

CyberDildonics · on July 2, 2022

It's absurd you would call someone pedantic for saying malloc is in a library on linux after trying to say that malloc in a library on windows.

pjmlp · on July 3, 2022

[flagged]

CyberDildonics · on July 3, 2022

Why are you calling me your "dear". Don't ever talk to me that way again.

chrisseaton · on July 2, 2022

> Linux isn't a UNIX

I think this isn't quite right - I think some distributions are actually certified as UNIX.

https://www.opengroup.org/openbrand/register/

jart · on July 2, 2022

I thought it was. Some distro (I forget the name) paid to be certified with The Open Group.

fguerraz · on July 2, 2022

"Don't use spinlocks in user-land."

eska · on July 3, 2022

He only did as a workaround for a performance issue in the mutex.

oddity · on July 2, 2022

If you're depending on the performance of malloc, you're either using the language incorrectly or using the wrong language. There is no such thing as a general purpose anything when you care about performance, there's only good enough. If you are 1) determined to stick with malloc and 2) want something predictable and better, then you are necessarily on the market for one of the alternatives to the system malloc anyway.

mwcampbell · on July 2, 2022

The whole point of the article, though, was that the system malloc was good enough on Linux and Darwin.

oddity · on July 2, 2022

This misses the point of my comment. When you put faith in malloc, you're putting hope in a lot of heuristics that may or may not degenerate for your particular workload. Windows is an outlier with how bad it is, but that should largely be irrelevant because the code should have already been insulated from the system allocator anyway.

An over-dependence on malloc is one of the first places I look when optimizing old C++ codebases, even on Linux and Darwin. Degradation on Linux + macOS is still there, but more insidious because the default is so good that simple apps don't see it.

dzaima · on July 2, 2022

Except that I'd guess that there is no "good" case in the case for MSVCRT's malloc. You shouldn't assume malloc is free, but you should also be able to assume it won't be horrifyingly slow. Just as much as you should be able to rely on "x*y" not compiling to an addition loop over 0..y (which might indeed be very fast when y is 0).

Yes, this unfortunately isn't the reality MSVCRT is in, but it is quite a reasonable expectation.

oddity · on July 2, 2022

It's unreasonable to assume that an stdlib must be designed around performance to any capacity. For most software, the priorities for the stdlib are 1) existing, 2) being bug/vulnerability free, and likely, in the Windows case given Microsoft's tradition, 3) being functionally identical to the version they shipped originally. Linux and macOS have much more flexibility to choose a different set of priorities (the former, through ecosystem competition and the latter through a willingness to break applications and a dependence on malloc for objc), so it's not at all a fair comparison. The fact that malloc doesn't return null all the time is a miracle enough for many embedded platforms, for example, so it's not exclusively a Windows concern. Environments emphasizing security in particular might be even slower.

Multiplication is not a great argument... There's a long history of hardware that doesn't have multipliers. Would I complain about that hardware being bad? No, because I'd take a step back and ask what their priorities were and accept that different hardware has different priorities so I should be prepared to not depend on them. Same thing with standard libraries. You can't always assume the default allocator smiles kindly on your application.

dzaima · on July 2, 2022

I don't see a reason for the stdlib to be considered in a different way from the base language is all I'm saying. For most C programmers, the distinction between the stdlib and the base language isn't even a consideration. Thinking most software doesn't heavily rely on malloc (and the rest of the stdlib) being fast is stupid.

Even on hardware without a multiplier you'd do a shift-based version, with log_2(max_value) iterations. What's unreasonable is "for (int i = 0; i < y; i++) res+= x;". If there truly were no way to do a shift, then, sure, I'd accept the loop; but I definitely would be pretty mad at a compiler if it generated a loop for multiplication on x86_64. And I think it's reasonable to be mad at the stdlib being purposefully outdated too (even if there is a (bad) reason for it).

oddity · on July 3, 2022

C and C++ are some of the few languages where the spec goes out of its way to not depend on an allocator, for good reason, and this is well after you've accounted for the majority of the code that, hopefully, doesn't need to do memory allocation at all. The fact that many programmers don't care is an indication that most code in most C or C++ software is not written with performance in mind. And that's (sometimes) fine. LLVM has a good ecosystem reason to use C++, for example, and it's well known in the compiler space that LLVM is not fast. Less recently, for a long time C and C++ were considered high level languages, meaning lots of software was written in it without consideration of performance. But criticizing the default implementation's performance absent a discussion of its priorities when you have all the power to not be bottlenecked in it anyway is just silly.

dzaima · on July 3, 2022

The fact that you should avoid allocation when possible has absolutely nothing to do with how fast allocation should be when you need it. And code not written with performance in mind should still be as fast as reasonably possible by default.

I would assume that quite a few people actually trying to write fast code would just assume that malloc, being provided to you by your OS, would be in the best position to know how to be fast. Certainly microsoft has the resources to optimize the two most frequently invoked functions in most C/C++ codebases, at least more than you yourself would.

MSVCRT being stuck with the current extremely slow thing, even if there are truly good reasonable reasons, is still a horrible situation to be in.

Dylan16807 · on July 3, 2022

> If there truly were no way to do a shift, then, sure, I'd accept the loop

Not even then. You can just use an addition instead of a shift.

Dylan16807 · on July 3, 2022

> It's unreasonable to assume that an stdlib must be designed around performance to any capacity.

To any capacity? That's insane.

It's not a reference implementation to show you what the correct results should be. It's the standard. The default.

jeffbee · on July 2, 2022

There isn't really a "system malloc on Linux". Many distributions come with the GNU allocator based on ptmalloc2, but there is no particular reason that a distro could not come out of the box with any other allocator. The world's most widespread Linux distribution uses LLVM's Scudo allocator. Alpine Linux comes with musl's (unbelievably slow) allocator, although it is possible to rebuild it with mimalloc.