The Horror in the Standard Library

bluejekyll · on May 6, 2017

OMG, as I was reading this I thought, "man, this reminds me of a bug I ran into with std::string back in 2000", A few sentences later, and this is also about std::string and the STL.

Mine was different though, after tracking down a memory leak that was happening with the creation of just new empty string, I discovered in the stdlib that there was a shared pointer to the empty string with a reference count of how many locations were using it (ironic that this was intended to save allocations). It turned out this was on Intel and we had what was rare at the time, a multi-processor system. It turned out that the std::string empty string reference count was just doing a vanilla ++, no locking, nothing, variable not marked volatile, nothing.

A few emails with a guy in Australia, a little inline assembly to call a new atomic increment on the counter, and the bug was fixed. That took two weeks to track down, mostly because it didn't even cross my mind that it wasn't in my code.

From that point on, I realized you can't trust libraries blindly, even one of the most used and broadly adopted ones out there.

bostik · on May 6, 2017

> you can't trust libraries blindly, even one of the most used and broadly adopted ones

There is a corollary to development and debugging. When things break in mysterious ways, we tend to go through a familiar song and dance. As experience, skills and even personal networks grow, we can find ourselves diving ever further in the following chain.

1. "It must be in my code." -- hours of debugging

2. "Okay, it must be somewhere in our codebase." -- days of intense debugging and code spelunking

3. "It HAS TO be in the third party libraries" -- days of issue tracker excavations and never-before-enabled profiling runs

4. "It can't possibly be in stdlib..." -- more of the same, but now profiling the core runtime libraries

5. "Please let this not be a compiler bug" -- you become intensely familiar with mailing list archives

6. "I will not debug drivers. I will not debug drivers. I will not debug drivers."

7. "I don't even know anyone who could help me figure out the kernel innards."

8. "NOBODY understands filesystems!"

9. "What do you mean 'firmware edge case'?"

And the final stage, the one I have witnessed only one person ever achieve:

10. "Where is my chip lab grade oscilloscope?"

Apart from bullheadedness, this chain also highlights another trait of a good developer. Humility.

uep · on May 6, 2017

From my experience this is normal for embedded development; particularly for consumer electronics. Part of the reason some developers in this space have to wear so many hats is that the pace in consumer electronics is unforgiving. I don't think my current employer is unusual either.

Hopefully, the number of frameworks at the top, and the size of your individual programs are relatively small (so that 1-3 aren't nightmares by themselves).

In my experience, 4-5 are seldomly the problem (thanks Linaro!). I suspect the ratio of C to C++ is significantly larger in embedded systems though.

In general, PowerPC/MIPS/ARM toolchains and drivers are not as mature as x86/AMD64. 6-8 tend to occur because CPU vendors usually have their own "blessed" toolchains and BSPs that have diverged from their upstream projects. Fortunately, this means that it's often the case that someone else has already fixed the problem. It's just as often that a driver has not been tested for your use-case since the last time that particular driver's infrastructure was refactored inside the kernel. Or... you wrote the driver and made the mistake (or it might be something from 9/10).

9-10 happen because we're often using hardware that is new and has not had all of its errata discovered yet.

When products need to ship, we're regularly going through this stack. I've seen every one of these, even in just the last 4 years.

mighty_atomic_c · on May 6, 2017

Can confirm. I had a trippy experience where I had on one monitor some RTL+simulation for our chip up for view, on another I had the PCB schematic I had helped design, and on my third I had the GUI and embedded toolchain development environments up, and on my desk I had an oscilloscope measuring that PCB running that firmware. It was basically rolling through the list and really fun!

rbanffy · on May 7, 2017

Indeed. I too had once the dubious pleasure of having an oscilloscope on my desk, between two computers and a prototype.

kabdib · on May 6, 2017

I've done the oscilloscope thing, though it was only a vanilla couple-of-hundred Mhz scope on some pins that were bugging me, and not the full deal: "Gang way, we're cracking the lid of that thing and going in." That sounds exciting, I'd love to see it.

Once, I used a chunk of ice to cool down a chip, and that made it work. The hardware guys were unimpressed. But hey, they've got cans of Chill and they use them a lot, and this software guy took a while to realize the reason the board worked in the morning and was dead by lunch, and worked for a little while again after lunch, was temperature related.

There were some devs who tracked down a nasty bug in a processor's TLB. I only heard about that one, wish I had been there. I only had to deal with the fallout in the hypervisor. Note: If you have to spend 20ms hunting down and killing lies with all interrupts turned off and everything basically stopped in its tracks, you are no longer a real-time operating system.

bostik · on May 6, 2017

Heh. It could be telling that I had to look up the expansion for TLB. CPU cache implementation... holy crap.

My ex-coworker has done the vanilla scope thing too and has a 400MHz scope at home. For some reason people like this are not too uncommon in Finnish oldskool[tm] IT scene. I remember how he isolated a latency and concurrency bug to an expensive interrupt handler. Rewriting isolated parts of core kernel code to make a really tricky problem go away was one of his more hardcore skills.

I'm not even near his level. My own experience is limited to slightly nibbling the edges of file system and block cache behaviour. It's a brave person who dares dive into that code. Not me.

But I do know one person who regularly works with decapped chips. He works for a company who do extremely low-level hardware investigations. Now that's hardcore.

kabdib · on May 6, 2017

Cache bugs are one of the fun ones. You think you're losing your mind and the people around you would probably agree. A couple of weeks go by, your spouse is ready to fire you, your boss wants to divorce you, and every waking moment is full of race conditions. Four-way stops on the drive to work are a source of stress and you punch buttons in the elevator and worry about firmware bugs. Then you get to your desk and there's the setup, a laughably small board for all the trouble it's made, and it's time for single combat, Sherlock Holmes style.

When you find the problem it's usually a blinding flash of realization that illuminates a tiny, eensy bit of code that you tweak and make right in a couple of minutes. Invariably the mistake was pretty stupid. The glory moment is over quickly because you know all the test cases will pass and that you've just nailed another one.

You've got bragging rights during one lunch, but that's it. It's off to more mundane bugs in the mortal world, and you feel a little sad.

I need to do hardware again.

nikanj · on May 8, 2017

Your writing reminds me of https://www.usenix.org/system/files/1311_05-08_mickens.pdf

steve_gh · on May 6, 2017

I remember a 3G network signalling simulation I worked on back in about 2002. We ran it on a rack of custom servers. The CPU load was pretty hellish, and the only way we could get it to run reliably without segfaults was to install gaming cooling systems and underclock the CPUs ... ran like a charm then!

marcosdumay · on May 6, 2017

At the late 90's and earlier 2000's it was commonplace to fix OS panics by opening the computer and pointing a fan at it.

I would try it even before going for some harder software problems, because it's so easy.

mcguire · on May 6, 2017

Ever squeaked the chips on an Amiga?

marcosdumay · on May 6, 2017

People couldn't import computers into my country at the Amiga's time.

I had a local-made spectrum clone, it didn't overheat, but I lost a multimeter on its power supply.

kosma · on May 6, 2017

> 10. "Where is my chip lab grade oscilloscope?"

11. "Shit, where do I borrow a spectrum analyzer and a set of near field probes? These things cost an arm and a leg!"

Yes, STM32F1 MCUs generate inference that jams GPS receivers. No, it's not documented anywhere.

yzmtf2008 · on May 6, 2017

And here's the documentation for future generations! :P

nickpsecurity · on May 7, 2017

12. "Try our new 7nm fab they said. You'll be ahead of 10nm competition with few issues. Now, gotta call engineers at the fab to see if it's materials or production messing my custom stuff up. (sighs)"

makomk · on May 6, 2017

You'd be surprised how much you can find out just with a $10 TV tuner dongle and a piece of coax with a short section of the outer braid trimmed off at once end.

ethbro · on May 7, 2017

You're describing my college tv antenna. Coax is cheap. An actual digital antenna is like... 50 ramen equivalents.

eltoozero · on May 7, 2017

50 ramen = $10 USD

kosma · on May 10, 2017

RTL-SDR is the Arduino of EMC work. :)

northern_lights · on May 6, 2017

any MCU can "jam" a GPS receiver if the board is laid out improperly or without enough shielding

kosma · on May 10, 2017

A bare, free-floating STM32F103 with literally nothing but a LiPo battery connected with two wires, running the blinky.c demo, will completely jam many GPS receivers when placed next to the antenna.

CiaranMcNulty · on May 6, 2017

11. "Hm maybe I should check my code again... ah there's the bug"

doubleplusgood · on May 6, 2017

"oh, this config shouldn't be linked to /dev/null..."

fnord123 · on May 8, 2017

    if (featureFlags[HN_DEBUG_HIER_FLAGS] = null) {
    /* Who won't this trigger!?!?!?
    */
    }

...oh god, kill me now.

frik · on May 6, 2017

I've done the oscilloscope thing, but it was for IoT stuff - debugging broken a I2C communication with Arduino (8-bit 16MHz ATMega CPU).

The Arduino software stack is not huge, there is no operating system involved. Our application is the only thing that runs on bare slow hardware with very limited memory. But this also makes debugging harder. The IDE is limited, you debug over serial output. You have to reflash the flash-memory after every re-compile, which can take a minute.

Building a IoT system for very specific tasks that has to run reliable for years without interruption, I would still use a 8-bit tiny ATMega CPU (e.g. Arduino), and to control this tiny CPU and do some networking stuff with a control center using a 32bit ARM CPU (e.g. RPi).

garaetjjte · on May 6, 2017

> The IDE is limited, you debug over serial output. You have to reflash the flash-memory after every re-compile

uh, you know that AVRs have debugWIRE (smaller parts) or JTAG (bigger parts)?

rjsw · on May 6, 2017

The furthest that I have got down the list was trying to bring up the first prototype of a board that had been designed with too long traces on the PCB between the SoC and DRAM. If you tried to read a location in memory you got the value of the page table entry for that address rather than the address contents.

kosma · on May 6, 2017

I once had to debug a poorly-designed board where the CPU would lock up if you did a DRAM burst write with at least 3 of the 5 highest bits of the word set (yes, I narrowed the test case down that far). A quick look at the layout confirmed that those traces were routed directly under the crystal oscillator without any form of ground shielding...

(We ended up underclocking the CPU by about 20% because there wasn't enough time for a redesign. Sigh. It's a miracle the thing even worked in the first place...)

kabdib · on May 6, 2017

... then your power supply goes marginal (because it will) and well . . . never been there :-)

fest · on May 6, 2017

I've had the opportunity spend about a week debugging incorrect configuration of SDRAM by BSP team. At first I blamed third party library with no source code available. Then it occurred to me that my initial SDRAM tests were doing word-by-word access. Third party library used memset which was optimised to use DMA for bulk transfers, which failed to write subsequent words in the same transaction.

An easy, one bit change in configuration registers of SDRAM fixed it. A week well spent!

kps · on May 7, 2017

Similar: my new driver crashes the machine. A couple days debugging. Triple-check every register value. All good. It doesn't crash when I single-step! A couple more days debugging. Finally get it: the machine crashes when two ports are enabled close enough in time. Go talk to the hardware guys. “Yeah, we know, power traces are fixed in the next rev.”

ethbro · on May 7, 2017

I feel like "I can reliably make the bug disappear by turning on my debug harness" is a reliable sign that things are about to get weird.

tripzilch · on May 6, 2017

Ooofff, that list made my stomach churn, more stuff of nightmares! All debugging post-mortems of this level should be written in Lovecraftian style.

... it's not widely known, whispers attribute it to a transcription error, unsure when it started, copied through ancient manuscripts, that the Dead Thing that lies dreaming at the bottom of the ocean, is actually named ... C++hulhu

SideburnsOfDoom · on May 6, 2017

I have also seen developers far too keen to blame the library before exhausting the most likely case that the issue is caused by the local code (step 1 and 2), or at any rate is fixable in it.

It's a well-known syndrome. The classic motto for it is "'SELECT' isn't broken" https://blog.codinghorror.com/the-first-rule-of-programming-...

ethbro · on May 7, 2017

This too. For every one time it's the parent article, 99 times it's my code.

Balgair · on May 7, 2017

Regarding #10: Oh Lord, I've been there too many times to count.... One of the more memorable time was with an old timing distribution system. The thing would pretty much just send out clock pulses to networked machines and this cost a lot of money to do properly (very abridged here). This particular one was acting 'funky' and came back in. In testing it, we go really weird behavior. True to your list, I think we went 1, 2, 5, 6 (no drivers, per se), 7 (for about 5 days), 8, 9, 10. At 10, we finally plugged in the o-scope and started debugging the PCB vias and connections themselves. Things were getting really wacky now. The Faraday cage that was the testing room had to be re-grounded, we thought, as the wires themselves were still carrying current even when the power was dis-connected. One of the guys brought in his old hand-held impact hammer to drive a new copper stake into the peeled up linoleum and through the foundation of the building. Still, we got strange results. Like really strange results that, to us, were worthy of a Nobel Prize, as we had thus far proved to ourselves that physics herself was broken inside the lab. For reference: a lot of people worked in there, so having stuff about in all kinds of dis-repair was typical. I remember, long after the pizza had gone cold and the Mt. Dew was flat, I was looking up at the ceiling of the room. I saw an old RF horn hanging from the roof, kinda held together with the connecting wires. 'Hey, if that thing was on, would it do anything?' The other techs' eyes all lit up. Turns out, one of the guys was doing something with the horn for some other test. He had left for an extended backpacking vacation and accidentally still had the thing on. The broadcasting from the horn was adding the small amount of current to all our wires, thereby causing the whole box to go out of whack just enough to cause all the issues. At about 4 am, we finally got the box re-configured, the original problem from the customer solved, and all of it packed up and ready to overnight out to the customer for when the UPS store opened at 7am, about 3 hours from then. The poor guy got back from vacation to that mess of an email inbox and many meetings. It was an honest mistake, and he bought us all 12 packs for the trouble. Still, when you think you have proved that physics is broken, I think that will qualify as step 12.

gone35 · on May 6, 2017

11. "Do we have the IP core for this?"

12. "Where is my electron microscope?"

shabble · on May 6, 2017

..13 "We're gonna need some time on the FIB workstation"[1]

[1] http://www.electronicdesign.com/eda/fib-circuit-edit-becomes...

nickpsecurity · on May 7, 2017

That's the one I should've thought of. I said fab but who trusts them to know what's on it!? Haha.

RX14 · on May 6, 2017

This is the worst thing I've ever found, still not solved: https://github.com/crystal-lang/crystal/issues/4127

lloeki · on May 6, 2017

You must be kidding me. We hit a bug with the mysql2 gem where the client randomly crashes in libmariadbclient (but not libmysqlclient) only on debian (using Arch Linux and OS X for dev, but exact same versions of everything) and only for database names of length 25. And 28, but we cannot reproduce it on the repro docker image we made. And only if there are enough aliases in the query (could be as low as 5 but could need as much as 20+). And only if 'active_record' is require'd, but even when it's not used at all. And never ever under GDB or Valgrind, making it the perfect heisenbug.

That's a lot of stars to align there, but when they do, hell breaks lose just often enough to be sure it's not completely random, and obviously this hit one of our most finicky customers, and only in production because of course "#{customer}_production".size == 28 (and not 25, because nah that'd have been too easy to be able to reproduce the bug right away).

[0] bug: https://github.com/brianmario/mysql2/issues/822

[1] repro: https://github.com/adhoc-gti/mysql2_pointer_bug

nickpsecurity · on May 7, 2017

Regarding No 10:

http://cs.dartmouth.edu/~sergey/cs258/2010/D2T1%20-%20Kris%2...

s_kilk · on May 6, 2017

I've only seen level 5 personally, with a C# compiler bug that would omit totally valid `else` branches.

mikeash · on May 7, 2017

My favorite level 5 was a bug in clang that caused it to occasionally emit incorrect code when calling a vararg function. However, the bug was harmless when combined with clang's vararg function prologue. When calling a vararg function compiled with gcc, the clang bug would cause gcc's prologue to jump to a quasi-random address vaguely nearby and continue execution in the middle of some other function. That was great fun. I wrote it up here:

https://www.mikeash.com/pyblog/friday-qa-2013-06-28-anatomy-...

kyberias · on May 6, 2017

Care to elaborate? Which C# compiler?

s_kilk · on May 6, 2017

The official one, I think it was in .NET 3, but it was a few years ago at an old job, so I'm a bit hazy on the details.

Basically we had a bug where a whole conditional branch was being skipped, and we traced it down to the branch being omitted entirely from the compiled IR.

And no, it was nothing fancy, just something like:

    if (customer.country == "US") {
      doSomething();
    } else {
      doDifferentThing();
    }

The whole `else` branch was simply missing from the compiled program.

If I remember correctly, we got around it by doing something like:

    bool isUsCustomer = customer.country == "US"
    if (isUsCustomer) {
      doSomething();
    } else {
      doDifferentThing();
    }

Anyway, the point is that the compiler fucked up it's handling of if/else statements, but only at that specific part of the code, leading to a few wasted days of effort tracking down the problem.

cesarb · on May 6, 2017

It can get even more "fun" with Java. Your code can start running through an interpreter, then after a while suddenly be transformed by a JIT engine. The interpreter and the JIT engines (there's more than one JIT engine) have different bugs. The optimizations made by the JIT engine can depend on the data which went through your method before the JVM decided to optimize it.

I'm not finding it right now, but I recall seeing a few weeks ago a presentation with several of these sorts of bugs in a recent version of Java (all reported and fixed): after a number of iterations, it suddenly starts returning wrong results.

Pica_soO · on May 6, 2017

Sounds like optimization going haywire, deducing that the statement under question would constantly evaluate to this term. Its valid to optimize a else statement out- if it will never be reached (Dead Codepath Optimizing out). Was there something akin to this in the statement?

s_kilk · on May 6, 2017

> Was there something akin to this in the statement?

It probably was the optimiser at fault, but there wasn't anything special about this conditional, and certainly nothing that _should_ have caused the optimiser to throw away the else branch.

If memory serves right it was comparing a string field of an object to a static string, like `someObject.foo == "some string"`.

kyberias · on May 11, 2017

Sorry, I don't buy it. I've seen countless cases where developers conclude that "compiler has a bug" and it never ultimately did. There are also cases where they never bother to figure it out, change the code a bit and continue with their lives thinking they've found a bug in the compiler. But they didn't.

SideburnsOfDoom · on May 6, 2017

> Anyway, the point is that the compiler fucked up it's handling of if/else statements, but only at that specific part of the code

It would have to be specific. Put it this way: if this was a general bug and the "else" was always omitted, how long would it last before being found and fixed?

Related, if you were to say to me "I found the issue, the compiler isn't correctly handling if/else statements" Then my first thought would be about your medication not about the compiler.

s_kilk · on May 6, 2017

> if you were to say to me "I found the issue, the compiler isn't correctly handling if/else statements" Then my first thought would be about your medication not about the compiler

And yet, it happened :)

And the senior engineers at the company looked at it and confirmed it was a compiler bug. Their best guess was that something about that part of the code was putting the compiler in a funny state, causing it to skip that particular `else` branch.

We reported it to Microsoft, but never heard anything back.

syngrog66 · on May 7, 2017

went through these kinds of stages a few times in my career.

once it led me to discover a leak in a major travel website's purchase flow caused by Java's Thread class, related to thread groups.

most recently, I was writing some Linux auth code in C, reached a point where I could rule out my code, and found a bug in sudo. freaking sudo.

(also related to groups, though the Linux user kind, not Java threads.)

asdfgadsfgasfdg · on May 7, 2017

For devs using higher level languages it is more like:

1. "It must be my code" -- minutes of debugging

2. "It must be in our codebase" -- hours of debugging

3. "Third party library or framework" -- If library use a different library, if framework accept the bug and work around it whilst cursing framework choice.

arglwarg · on May 6, 2017

This one goes to eleven: 11. "On some setups, clients get a corrupted stack when swapped back in from kernel."

(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/745836)

Beltiras · on May 10, 2017

This is the most concise list of debugging layers I have seen. I'm mostly commenting so that I get this into my comment feed and can locate it easier.

bostik · on May 10, 2017

In that case, feel free to bookmark this one: http://bostik.iki.fi/aivoituksia/random/developer-debugging-...

This got more attention that I thought possible, so I decided to pull it out as an item all by itself.

TimJYoung · on May 6, 2017

Spot on. The experience level of a developer is directly proportional to how fast he/she assumes that the problem is in someone else's code. ;-)

Natanael_L · on May 6, 2017

That one depends a lot on who your coworkers are too.

TimJYoung · on May 7, 2017

I was thinking more of the code in the OS, compiler, run-time, etc. I should have made that more clear in my comment.

doubleplusgood · on May 6, 2017

I feel lucky to only have reached item 4; I wouldn't feel confident beyond it, anyway.

marcosdumay · on May 6, 2017

I don't think anybody is ever confident beyond level 1. The worst part is those "why the fuck does everything work when I plug a logical analyzer?" moments.

doubleplusgood · on May 8, 2017

I'm usually well into #3 before I realize I'm in too deep

kazinator · on May 7, 2017

> It turned out that the std::string empty string reference count was just doing a vanilla ++, no locking, nothing, variable not marked volatile, nothing.

The whole implementation is smells of silliness, because there is no need to track how many references there are to a global null string, which need not even be dynamically allocated.

bluejekyll · on May 7, 2017

I know; I've often wondered if this was changed, never went back to look.

daemin · on May 8, 2017

Probably a few years ago, around C++11 time. Where the standard made it not possible to have a Copy-on-Write implementation of string.

faragon · on May 6, 2017

The problem is forgetting that dynamic memory usage is not "free" (as in "gratis" or "cheap"). In fact, using std::string for long-lived server processes doing intensive string processing (e.g. parsing, text processing, etc.) is already known to be suicidal since forever, because of memory fragmentation.

For high load backend processing data, you need at least a soft real-time approach: avoid dynamic memory usage at runtime (use dynamic memory just at process start-up or reconfig, and rely on stack allocation for small stuff, when possible).

I wrote a C library with exactly that purpose [1], in order to work with complex data (strings -UTF8, with many string functions for string processing-, vectors, maps, sets, bit sets) on heap or stack memory, with minimum memory fragmentation and suitable for soft/hard real-time requirements.

[1] https://github.com/faragon/libsrt

stillworks · on May 7, 2017

Maybe not enitrely related, but I had worked on a Java project where any new object creation was prohibited !

Had a VERY VERY hard time unlearning and catching up to that "paradigm" but I now have a much better perspective on "automatic memory management" .

>> The problem is forgetting that dynamic memory usage is not "free"

Totally Agree.

gravypod · on May 6, 2017

Are you still developing this project?

faragon · on May 6, 2017

arunc · on May 6, 2017

I encountered exactly the same issue few years ago in UIDAI in one of our large scale biometric matchers and the resolution was exactly the same. After a week of debugging I found that the libstdc++ allocator was the culprit. I found [1] and confirmed the same, which helped in fixing this issue.

The thing that was more interesting (or sad) was to know that the GCC developers didn't expect the multithreaded applications to be long running.

"Operating systems will reclaim allocated memory at program termination anyway. "

[1] https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator...

CamperBob2 · on May 6, 2017

Notes about deallocation. This allocator does not explicitly release memory. Because of this, memory debugging programs like valgrind or purify may notice leaks: sorry about this inconvenience. Operating systems will reclaim allocated memory at program termination anyway.

Wow. This is worth a Linus Torvalds-level rant. Whoever accepted this code into the source tree needs to be put on GNU's version of a performance improvement plan.

badsectoracula · on May 6, 2017

This is actually a fine thing to do, i don't remember where i read it, but a good analogy for trying to free memory at program exit is like trying to clean the floors and walls of a building right before it is demolished.

This is also why Valgrind separates reachable and unreachable memory and only considers unreachable memory as leaks.

CamperBob2 · on May 6, 2017

How do you know the program is even supposed to terminate? Maybe it's a server, as in this case. Maybe the code will be reused someday as a subfunction or library within a larger application, instead of being launched and terminated directly by the OS. Maybe it's an embedded application in a 24/7 factory somewhere. Maybe it's on its way to the Kuiper Belt. Or maybe it's just supposed to stay up and running for longer than the average Windows 10 update period.

In any case, hiding this sort of behavior in a way that sucks down days of debugging time on the part of one expert programmer after another, after another, after another, is terrible engineering.

xsmasher · on May 6, 2017

Freeing memory at program exit may be unnecessary, but being unable to free memory while the program is running is terrible.

On iOS and android you are expected to free whatever unused memory you can when you are notified of a low memory situation.

TwoBit · on May 6, 2017

It's OK to do, but it makes memory analysis difficult. If your app exits with a lot of allocated memory, it's hard to tell what's a real leak and what's not.

daemin · on May 8, 2017

This behaviour is one of those cache-memory-leaks. Where even though memory is reachable it is still effectively leaked because it's soaked up by some data structure and not released to the rest of the system.

So it's not a traditional leak but because memory usage would continue to grow it causes the long lived process to choke itself and die.

dom0 · on May 7, 2017

A good example is something like doing a "cp -a". To preserve hardlinks you'll need a mapping, trying to free that at exit can and will take time. This was an actual bug in the coreutils.

CamperBob2 · on May 7, 2017

I have to wonder why the C standard library didn't include a Pascal-style mark/release allocator. A naive implementation wouldn't be much faster than free()'ing a bunch of allocations manually, but the general idea offers possibilities for optimization that otherwise aren't available to a conventional heap allocator.

dom0 · on May 8, 2017

Because (m)alloc predates mmap by a decade or so [1], and you expressly don't want stack-like allocation semantics when using malloc (otherwise you'd just put it on a/the stack).

[1] And you cannot have more than one heap without mmap. Without mmap, you only have sbrk = the one heap. On UNIX and those that pretend to be, anyway.

maxlybbert · on May 6, 2017

Raymond Chen often uses this analogy on the blog Old New Thing.

maxlybbert · on May 6, 2017

Years ago I read in the Perl documentation ( http://perldoc.perl.org/perlfaq3.html#How-can-I-free-an-arra... ):

> On most operating systems, memory allocated to a program can never be returned to the system. ... Some operating systems (notably, systems that use mmap(2) for allocating large chunks of memory) can reclaim memory that is no longer used ...

Because you can't return unneeded memory to most operating systems (or because it used to be that you couldn't return unneeded memory, even if that has changed recently), it isn't a surprise that by default GCC's free() and operator delete -- which are meant to be cross platform -- don't try to return that memory. Instead it's all free list management.

I do think it's silly for operators new/delete to have a separate free list from malloc()/free().

gpderetta · on May 6, 2017

>I do think it's silly for operators new/delete to have a separate free list from malloc()/free().

They don't. This is a custom, simple, non-default pool allocator for standard containers (i.e nothing to do with new)

maxlybbert · on May 6, 2017

Thanks for the correction. But it's just as silly to create an allocator for the standard containers, when the allocator consists of little more than free list management, given that malloc/free and new/delete already do that.

It's especially silly for a library (programs may want custom memory management and libraries really shouldn't go out of their way to make that harder), and the fact that it's the standard library doesn't make it less silly.

jwakely · on May 8, 2017

GCC's std::allocator also doesn't do that (not for at least a decade, IIRC). It's a non-default allocator, which nobody is forced to use. It's entirely optional. The default std::allocator just uses new/delete.

xwvvvvwx · on May 6, 2017

Isn't this just a Free List allocator?

https://en.wikipedia.org/wiki/Free_list

lapinrigolo · on May 6, 2017

Make them work on gnome?

staticlibs · on May 6, 2017

Looks like this is just one of possible "non-default" extension allocators, that can be selected during libstdc++ compile time: https://gcc.gnu.org/onlinedocs/libstdc++/manual/memory.html#...

jwakely · on May 8, 2017

Facts and up-to-date documentation, how novel.

alyandon · on May 6, 2017

I myself ran across this same scenario many years ago with a similar amount of hair pulling and eventually concluding that the GNU libstdc++ allocator wasn't reusing memory properly. Unfortunately, I was never able to pare down the application to the point that I had a reproducible test case to report upstream.

GLIBCPP_FORCE_NEW was the solution for the near term and since I was deploying on Solaris boxes I eventually switched to the Sun Forte C++ compiler.

It really bugs me that this problem still exists. :-/

flamedoge · on May 6, 2017

It really bugs me that nobody links bug on bugtracker or filed it. I know I'm asking a lot and being an ass.

jwr · on May 6, 2017

No, that's a reasonable expectation. Unfortunately, developers often don't appreciate bug reports and become defensive. File a bug report and you will be expected to provide a reproducible test case, or your bug will get closed. This isn't always possible, and the reporter might not be able to dedicate the time to do this.

I used to file lots of bug reports, because I know that as a developer, I'd want them. Every bug report is valuable, even if it's not reproducible. It happened to someone, so it surely also happened to 10 other people who did not report it, it's important.

Unfortunately, bug reports often get responses like "cannot reproduce", "please provide more details which is oh, about a day or two of work". Well, not everyone has a day or two to spend on a bug report, especially if you (like me) hit software bugs regularly.

These days I very rarely file bug reports. It just isn't worth the effort, as most developers do not appreciate the bug report. The dominating perception is that bug reports are annoyances that need to be closed ASAP, and if it isn't easily reproducible, it doesn't exist.

So I'm not surprised that nobody opens bugreports for bugs like the one described by the OP. Not easily reproducible? Rarely occurs? It's highly probable that nobody would care.

BTW, I ask users of my software to please do report bugs. Every bug report is a valuable data point, even if it isn't reproducible. And I do appreciate the effort that it takes just to file a bug report.

CJefferson · on May 6, 2017

The problem with libstdc++ is that, in my experience, 90% of bug reports are user error. That means non reproducible bug reports have almost no value.

fish2000 · on May 6, 2017

I know you were probably just throwing out the “90%” statistic, but if you take that as a given, the implication that 10% of the libstdc++ bugs on file are legit is a worrisome notion in and of its own right. I don’t want to be responsible for triaging those bugs (and nor do you, I am guessing; this being why the reports are valueless) but the fact that this is the bug rate in this, a gold-standard library in common, ubiquitous use… well as far as I can see, this is the context in which that the OP’s article should be considered.

… I build all of my C++ projects with Clang and link them against libc++, so I don’t know if I am dodging a very high-caliber bullet (so to speak) or if the other shoe will drop at some point, and I will find myself going down the OP’s rabbit-hole of library-bug investigation.

CJefferson · on May 6, 2017

You are right, it's probably a lot less than that.

The bug quality is higher than you might expect, because you have to register with bugzilla, and most bug reports are with development versions, as bugs are shaken out of new features. There are very few bugs in released versions, and where those bugs exist they are often of the form "stupid type where I have redefined & doesn't compile, while the standard technically days it should", where most users would never got them. Wrong answer or crash bugs in releases are extremely rare, although they could be rarer -- the test suite has less coverage than I would personally like.

saurik · on May 6, 2017

Yeah: people run into issues and somehow feel "obviously everyone else ran into this same issue and it was never fixed so they must not want to fix it" without it ever really occurring to them that maybe almost no no one runs into this problem and no one knows it exists, or even that it could be their code that is broken. We all know "I compiled my code with -O0 and it started working" is almost never due to "there is a bug in the code optimizer", and this one sounds extremely similar. If people think they have found a bug, and they want to feel righteous and haughty over how it never got fixed, they really need to be demonstrating it is actually a known bug (and maybe you were being sarcastic, but I will just state flatly: being bothered when people do not does not make you the ass).

tjalfi · on May 6, 2017

I searched GCC Bugzilla and found one result[0] for GLIBCPP_FORCE_NEW.

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13823

Edited to add the following text.

There is one result[1] for GLIBCXX_FORCE_NEW.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65018

hcs · on May 6, 2017

Default bugzilla search doesn't include resolved, verified, and closed bugs. That said, I only found one slightly relevant, which was resolved invalid, saying the env var should be used: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10183

And that's just due to not freeing the pools, not whatever is the complex problem here.

marvy · on May 6, 2017

I think these days it goes by GLIBCXX_FORCE_NEW

bgd11 · on May 6, 2017

All the technicalities aside the writing style of the author is amazing. I would have never thought that someone can create such an intense narrative with 'malloc' as the main character

dclowd9901 · on May 7, 2017

I thought the metaphor was pretty tortured, but I generally appreciate anything that colors up dry technical writing so much that I give it a pass.

firethief · on May 6, 2017

> Nothing made any sense until we noticed the controller microservice's memory consumption. A service that should be using perhaps a few hundred megabytes at most was using gigabytes and growing... and growing... and growing... and growing...

Not identifying this until many hours after symptoms were impacting users sounds like a pretty big monitoring blind spot.

ZoFreX · on May 6, 2017

Yes, although to be fair blindspots are always obvious in hindsight!

cyphar · on May 6, 2017

Did you report the issue upstream with a patch? The solution to "the standard library is broken" is to fix the standard library, no? It's all free software after all.

aw1621107 · on May 6, 2017

I'm not the post author, unfortunately, so I really have no knowledge of the specifics of what is going on.

It seems like the libstdc++ maintainers are aware of the issue at least, so that's a start. It'd be nice if some of the mentioned discussions/complaints were linked, though, so we could see what has already been said/done.

selimthegrim · on May 6, 2017

The post author has an active handle on HN (api)

an_d_rew · on May 6, 2017

Oh my. What a fun read. Sort of like reading about a horrific car accident. You read it and you think "Oh thank God… That can't possibly be real…"

And then you realize that it is.

api · on May 6, 2017

It's a known issue according to what I've read. It's never been fixed. This may be due to the fact that it's hard to trigger and reproduce.

neilc · on May 6, 2017

Can you link to some of the things you've read? What exactly is the "known issue"?

MaysonL · on May 6, 2017

From the documentation (linked by arunc above)[0] "Notes about deallocation. This allocator does not explicitly release memory. Because of this, memory debugging programs like valgrind or purify may notice leaks: sorry about this inconvenience. Operating systems will reclaim allocated memory at program termination anyway. If sidestepping this kind of noise is desired, there are three options: use an allocator, like new_allocator that releases memory while debugging, use GLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or use a custom pool datum that releases resources on destruction."

[0]https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator...

mjw1007 · on May 6, 2017

That doesn't explain the «Some allocations were much larger than anything ZeroTier should need.» part.

api · on May 6, 2017

Yes it does. The secondary C++ allocator is doing its own pooling on top of the primary allocator.

jwakely · on May 8, 2017

It's not a known issue, but GLIBCPP_FORCE_NEW has had no effect on libstdc++ code for more than a decade, so I wonder which prehistoric version you're using. Even the modern GLIBCXX_FORCE_NEW doesn't do anything for the default std::allocator implementation, which always uses new/delete unconditionally (since 2005).

cyphar · on May 7, 2017

Is there an open bug report? I have read the docs which specifically say "we don't care about leaked memory", but the brokenness being documented is different from it being a "known issue".

And, to be blunt, "it's never been fixed" should be written as "it's not been fixed yet" -- because you have an opportunity to fix it. That's how free software works after all.

jb1991 · on May 6, 2017

> The solution to "the standard library is broken" is to fix the standard library, no? It's all free software after all.

Doesn't the author make that case at the end?

cyphar · on May 6, 2017

They make the case that it's broken with very strong language. They don't say "here is a patch" or even an outline of a patch (or a link to a bug report even), just histrionics.

> GNU libstdc++ is broken. This is pretty unforgiveable. [...] Adding wheels to the wheel is sometimes forgiveable when dealing with closed systems that you can't fix but libstdc++ and glibc are both open source GNU projects.

I think there's a disconnect in the author's mind with regards to how free software projects work. If something is broken, you as a user are empowered to fix it. I mean, the author even went through the trouble of reading the libstdc++ code and figuring out what happened inside new (which is likely enough information to write a preliminary patch). At the very least open a bug report about it (or link to an existing one).

Unlike proprietary software, there are many ways to improve free software and ranting online is rarely one of them. I get that this bug screwed them over big time and that they are angry about it. But converting that emotional energy into something useful would help many people other than themselves.

tl;dr: If you find yourself ranting about "why wasn't X fixed before" in a free software project, it might be helpful to realise that you have the opportunity to be the person who fixes it.

IshKebab · on May 6, 2017

You can certainly send a patch, but there's no guarantee they'll accept it. Some core projects seem to be highly opinionated. Look at GCC's attitude to plugins or good error messages. Or Linux and gr-security. Or Linux and stable driver ABIs.

There are plenty of things that people would like to change but can't because the maintainers disagree. Not saying that is necessarily bad but you are stupid if you think the answer to everything is "well did you write a patch?".

cyphar · on May 7, 2017

> Not saying that is necessarily bad but you are stupid if you think the answer to everything is "well did you write a patch?".

I'm not sure why this tone is necessary. If someone just rants about a problem without even _trying_ to submit a bug report with a proposed patch that strikes me as laziness not foresight.

As a maintainer myself, I am well aware that maintainers will reject code if it disagrees with our view of a project. But how on earth do you expect us to know there is a problem without reporting a bug (the only reason I asked whether the author wrote a patch is because they went through the trouble of debugging the problem so probably are in a good place to write a patch anyway)? And if you decide to write an angry and ranting blog post rather than interact with us, we aren't going to be very nice to you either.

jwakely · on May 8, 2017

> Look at GCC's attitude to plugins or good error messages.

Which is?

A patch isn't necessary for it to be fixed, but a bug report generally is. A blog post linking to unofficial copies of documentation from 2004 doesn't count.

louithethrid · on May 6, 2017

If its broken - it is commented out, or opt in via parameter. It is not shipped - AS IS and then when after weeks and weekends the bug is found, you dont just pose yourself at the wall of the crater - lean down and yell. "Well this is just great, you discovered a cavern. With the sweat of your brow, this could be a nice house one day. Allmost like the one we promised to deliver in the first place."

stephen_g · on May 6, 2017

Things like this is why I was happy to see the LLVM project write their own C++ standard library. libstdc++ has always seemed a bit hacky and fragile to me. It's great to have an option which is a more modern, clean codebase.

Have you tested to see if this works better with LLVM libc++?

saurik · on May 6, 2017

Alternatively, things like this is why it is great when everyone works together to improve one lib rather instead of forming their efforts on two essentially-identical ones. You act like libc++ is somehow simply better and probably doesn't have bugs :/. I have been doing C++ work now for over two decades and let me set that record straight in your head: incredibly basic stuff has been pretty extremely broken in libc++. One horror story that wasted way way too much of my life is that the copy that Mac OS X 10.7 seriously shipped with a build of libc++ where std::streambuf failed to check EOF conditions correctly. Despite most of my projects being compiled simultaneously by numerous versions of both gcc and clang (to target various weird configurations), I seriously don't remember the last time I ran into a bug in gcc and libstdc++... it was pre-2006 for sure... but I continue to run into annoying issues with clang and libc++. The correct way to read "modern" when applied to "codebase" is "untested". And hell: while I am totally willing to believe there is a bug here, this post doesn't have a fix and doesn't even seem to have led to a bug report. This is like saying "I compiled my code using -O0 and it started working, so clearly this is a bug in the optimizer", which we should all know is a dubious statement at best.

tomohawk · on May 6, 2017

Alternatively, having a choice of more than one thing tends to cause competition to kick in, which often results in better quality than a single solution that everyone pretty much has to use.

rurban · on May 6, 2017

The "make malloc faster" part was done over a decade ago with the followup from ptmalloc2 (the official glibc malloc) to ptmalloc3. But it added one word overhead per region, so the libc people never updated it to v3. perf. regression. They rather broke the workarounds they added. And now they are breaking emacs with their new malloc.

consultSKI · on May 6, 2017

>> Most operators in C++, including its memory allocation and deletion operators, can be overloaded.

Have I mentioned lately how much I hate C++?

Great read.

spyder81 · on May 6, 2017

"Then I remembered reading something long ago" is when experienced programmers are worth their weight in gold.

brynet · on May 6, 2017

Interestingly for recent versions of GCC (>=4.0) GLIBCXX_FORCE_NEW is defined for libstdc++, not GLIBCPP_FORCE_NEW.

https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator...

charles-salvia · on May 6, 2017

I'm a bit confused here.

>> Most operators in C++, including its memory allocation and deletion operators, can be overloaded. Indeed this one was.

Okay, well, firstly - the issue here seems to be a problem with the implementation of std::allocator, rather than anything to do with overloading global operator new or delete. Specifically, it sounds like the blog author is talking about one of the GNU libstdc++ extension allocators, like "mt_allocator", which uses thread-local power-of-2 memory pools.[1] These extension allocators are basically drop-in extension implementations of plain std::allocator, and should only really effect the allocation behavior for the STL containers that take Allocator template parameters.

Essentially, libstdc++ tries to provide some flexibility in terms of setting up an allocation strategy for use with STL containers.[2] Basically, in the actual implementation, std::allocator inherits from allocator_base, (a non-standard GNU base class), which can be configured during compilation of libstdc++ to alias one of the extension allocators (like the "mt_allocator" pool allocator, which does not explicitly release memory to the OS, but rather keeps it in a user-space pool until program exit).

However, according to the GNU docs, the default implementation of std::allocator used by libstdc++ is new_allocator [3] - a simple class that the GNU libstdc++ implementation uses to wrap raw calls to global operator new and delete (presumably with no memory pooling.) This allocator is of course often slower than a memory pool, but obviously more predictable in terms of releasing memory back to the OS.

Note also that "mt_allocator" will check if the environment variable GLIBCXX_FORCE_NEW (not GLIBCPP_FORCE_NEW as the author mentions) is set, and if it is, bypass the memory pool and directly use raw ::operator new.

So, it looks like the blog author somehow was getting mt_allocator (or some other multi-threaded pool allocator) as the implementation used by std::allocator, rather than plain old new_allocator. This could have happened if libstdc++ was compiled with the --enable-libstdcxx-allocator=mt flag.

However, apart from explicitly using the mt_allocator as the Allocator parameter with an STL container, or compiling libstdc++ to use it by default, I'm not sure how the blog author is getting a multi-threaded pool allocator implementation of std::allocator by default.

[1] https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/ma...

[2] https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/ma...

[3] https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/ma...

yokaze · on May 6, 2017

Call me confused, too. When I was once following up the allocator in the STL because of some performance issues, it was (more or less) going directly to malloc.

I've searched now the code for GLIBCXX_FORCE_NEW, and it seems it is used in the pool_allocator and the mt_allocator [1].

String uses std::allocator [2].

I agree, the blog entry seems to missing some information to reproduce the issue. It looks to me, that the author was jumping to a conclusion, which confirmed his initial "insight". Not surprising, if you are under pressure and working over the whole weekend and nights. Who hasn't been there.

What bug me, that the standard answer seems quite often, that the whole thing is broken, and we have to switch to a complete different implementation, and/or re-write it from scratch.

[1] https://github.com/gcc-mirror/gcc/search?p=1&q=GLIBCXX_FORCE...

[2] https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-...

djsumdog · on May 6, 2017

I too was curious about what distribution they were using; specifically which GNU libstdc++ packages. It seems like a drill down to that specific distribution's rpm/deb/etc packaging scripts to see their ./configure options would also be important.

bboreham · on May 6, 2017

Post doesn't actually say what was broken, or indeed prove the location of broken-ness. Just that it went away with a different compile option.

Exciting writing, but lacking a point.

x4m · on May 6, 2017

Yeah, the post is just accusation with very indirect and unverifiable evidence.

I'd be nice if author created repro or actually spotted the bug.

Right now it's just a moaning. Quite probably, he's shooting into his leg himself in one of the numerous odd ways.

halayli · on May 6, 2017

this conclusion might be wrong. the code in question while it might not be allocating/freeing memory it might be stumbling on memory blocks and corrupting mem management structures. Turning the flag on might be fixing the issue by mere luck because memory allocations, locations and structures would be different

MichaelBurge · on May 6, 2017

Valgrind would've caught those, wouldn't it? Or, maybe the C++ layer prevents it from catching that since it's application-level, which is at least another reason to remove it.

fnord123 · on May 9, 2017

The tools are good but not magic. It's always possible that they miss something (e.g. when multithreading, the bug might not manifest since the timings are all different when running in valgrind). But this is a red flag:

"Nothing worked. It's leaking but it's not. It's leaking but the debugger says no memory was lost. It's leaking in ways that are dependent on irrelevant changes to the ordering of mundane operations. This can't be happening."

This is a red flag for heap corruption - or multithreading bugs. (Stack corruption is usually a crash and a wrong stack trace). As it's not trivially reproducible, it's probably a multithreading bug. It's also easy to imagine that OP wrote a scripty-input generator in C++ to run through valgrind which gives single threaded inputs. So running under valgrind it won't be detected. So it's never fun to solve and ever since I changed languages, my multithreaded debugging skills have become a bit rusty. Hey-o!

But it would be good if he amended his post to reduce the vitriol aimed at GCC. It's demonstrably false that GCC's default allocator holds a cache. It makes OP look stupid to people in the know; and makes GCC look bad to people not in the know. No one wins.

SFJulie · on May 6, 2017

Memory fragmentation due to dynamic non fixed size data structure and multithreading is an old foe. That may not be fixable in c/c++

Worker A allocates dynamic stuff. Algo take a segment (0+sof(str)(ofA) + n) Work B Allocates to create same kind of data structure (fragment of a JSON) [ofA, OfB] Wk A resume allocating, boundary of [0, ofA] exceeded, no free contiguous space up or down [Ofb, OfC] allocated Wk C enters wants to alloc, but sizeof(string) make it bigger than [0, OfA] so [ofD, ofE] asked .... and the more concurrent workers the more interleaving of memory allocation go on with fragmented memory.

Since malloc are costly the problem known, a complex allocator was created with pools of slab and else, probably having one edge case, very hard to trigger having phD proven really complex heuristic.

CPU power increase, more loads more workers, interleaving comes in, edge case gets triggered.

And C/C++ makes fun of fortran with its fixed size data structures embracing any new arbitrary size arbitrary depth data structure for the convenience of skipping a costly waterfall model before delivering a feature or a change in the data structure and avoiding bike shedding in committees.

Human want to work in a way that is more agile than what computers are under the hood.

Alternative:

Always allocated fixed size memory range for data handling, and make sure it will be enough. When doing REST make sure you have an upper bound, use paging/cursors, which require FSM, have all FP programmers say mutable are bads, sysadmin say that FSM are a pain to handle when HA is required, and CFO saying SLA will not be reached and business model is trashed, and REST fans saying that REST is dead when stateful.

Well REST is a bad idea.

Const-me · on May 7, 2017

Multi-threading ain’t particularly popular on Linux, hence issues like this one in STL.

But it’s fixable. In windows, they’ve implemented the solution in Windows XP (opt-in), and in Vista they use it by default:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

hackits · on May 6, 2017

The good thing about fixed size memory ranges is it decreases debugging time. It also allows deterministic behaviour for your modules.

squidlogic · on May 6, 2017

Amazing write-up. Informative and gripping in its prose.

saurik · on May 6, 2017

This article was an extremely long rant about "something has a if but templates are hard so I have no clue what it is". The author figured out a workaround to the issue, but we still don't know what the bug was, and the strongly worded conclusion that it is a bug in libstdc++ isn't even defended well (as this is similar to concluding "there is a bug in the compiler's code optimizer" from "I compiled my code with -O0 and it started working")... I can't really see calling this "informative" :(.

rhaps0dy · on May 6, 2017

I'm curious, what is usually the cause of code starting to work when it is compiled with -O0?

gvb · on May 6, 2017

* Optimizer bugs. Thankfully these are less common nowadays. It the "good old days" of new compilers this happened quite a bit.

Embedded code:

* Missing "volatiles" which allow the optimizer to optimize out "unused" loads and stores to hardware or multitasking shared variables.

* Race conditions (e.g. unsynchronized access to multitasking shared variables). Making the code run slower changes the access pattern, often times obscuring the bug.

gpderetta · on May 6, 2017

Crap code that hit undefined behaviour every other line (like accessing dead temporaries or freed memory, assuming stack layout, out of bound accesses, etc.).

bostik · on May 6, 2017

I don't have an answer to that, but I have seen - and worked with - several codebases which work fine when compiled with -g, but will crash (in good cases) or behave irrationally (in bad cases) without.

The crashing ones at least are easy. Somewhere a list or variable-argument array is missing the NULL terminator...

tonyarkles · on May 7, 2017

This was on Windows with VC++, but same deal. The code that some cow-orkers had written was copying strings like "LAX" and "ORD" into `char airport[3]` using strcpy(). In debug builds, VC was allocating a whole 32-bit word, but in release builds it was packing everything on the stack. Write to it, and the terminating null ends up overwriting a byte of the next variable on the stack. Urgh. Of course, these were several hundred line functions, so the strcpy and the subsequent use of the trashed variable were a long ways away.

ryl00 · on May 6, 2017

In my experience, it often means variables that are used before they are initialized, or dangling pointers.

ahoka · on May 7, 2017

Yes, I'm pretty sure the bug is not in the c++ library, but in his code.

nickpsecurity · on May 6, 2017

It was a great, gripping write-up. It also corroborated why I told api he was better off using a subset of C or safe language that generated it for software like this. I told him there were tons of ways to analyze or make safe C subsets but almost nothing available that will get similarly great results on C++ code. This was a good example of where its complexity and style of sneaking in abstractions bit him in the ass in a way that might be easier to spot in C, Ada/SPARK, a Wirth language, etc. C++ style is safer on average but highly-robust code is better in restricted, analyzed C if not a safe language.

api · on May 6, 2017

(Original blog author here.)

That's a nice idea, and we've considered "minus minus"ing the ZT core as part of an embedded port. But code like this that shleps a lot of structures around and works with JSON is eye gougingly painful to write in C and the chance of a worse and possibly exploitable memory bug is much higher.

This is the first time we have encountered an actual problem with C++ compilers or runtimes.

nickpsecurity · on May 6, 2017

You don't write those parts in C alone. You use something that shows the C is safe automatically, use tool that generates secure C from specs (eg Nail), and/ use safe language that compiles to C. This way, you get benefits of C ecosystem without risks of totally using C.

TimJYoung · on May 6, 2017

I'm not sure if the other debug tools mentioned offer this, but AQTime Pro:

https://smartbear.com/product/aqtime-pro/overview/

has an allocation profiler that can be used to track down this sort of problem. You can take allocation snapshots while the application is running to see where the allocations are coming from (provided that you can run AQTime Pro against a binary with debug symbols/info).

I'm not affiliated with the company - just a happy customer that has used them for years with Delphi development.

odbol_ · on May 7, 2017

Delphi... Now that's a name I haven't heard in a long time

TimJYoung · on May 7, 2017

It's still kicking. ;-)

tripzilch · on May 6, 2017

Upvoted for the Lovecraft and pulp horror lit references, and starting with "It was a dark and stormy night ..." :-)

Great writing, great read.

jcalvinowens · on May 6, 2017

I don't understand the point of this article... if you think there's a bug in the library, fix it. Don't write a melodramatic blog post lamenting how horrible it is in the hope somebody else will do it for you.

This isn't particle physics, it's code: we don't have to guess, we can look at it and see how it works.

bogomipz · on May 6, 2017

This was a nice write up, however I didn't follow how memory fragmentation was related to a memory leak. Can someone explain? I understand that alternate memory allocators would help with the fragmentation issue but how does the choice of allocators affect memory leakage?

Joeri · on May 6, 2017

When you have a rapid sequence of large and small allocations, your memory starts looking like swiss cheese. To fit a large allocation you need a contiguous block of free memory, but all the available blocks are too small, so it gets placed at the end where the free ram starts. Then another allocation is put right after it, the big chunk gets freed, and a small allocation is put in its place. Now instead of one big block of free memory you have a slightly smaller block, which is slightly too small to fit a large allocation, which has to be put at the end, and...

I ran into this issue once debugging a performance issue with a CAD file parser. It made a lot of copies of large and small chunks, and some CAD files would cause catastrophic fragmentation. Switching out the allocator for a smarter one fixed the problem. A few versions of delphi later they put that allocator in by default, so that now it is not prone to fragmentation anymore.

grandinj · on May 6, 2017

I'm guessing the cpp thing is a holdover from the days when the glibc maintainer was less than entirely helpful. There has been actual improvements in glibc in this area lately so hopefully these kinds of hacks will slowly go away.

jwakely · on May 8, 2017

The pooling behaviour of the libstdc++ std::allocator was only the default from 2004 until late 2005, so it went away more than a decade ago.

odbol_ · on May 7, 2017

Is C++ now going the way of PHP, where to have an actual working program you have to disable all the defaults in some mysterious but crucial ritual?

dummy323 · on May 10, 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80658

selimthegrim · on May 6, 2017

James Mickens, move over. There's a new sheriff in town.

jonnycomputer · on May 8, 2017

the idea that we should always blame ourselves first has merit. but frankly some bugs, just p e r s i s ttt.

like this one, with fputcsv in PHP. https://bugs.php.net/bug.php?id=43225

Safety1stClyde · on May 6, 2017

It was only yesterday that I was reading another discussion from hacker news about problems with Gnu C library.

https://news.ycombinator.com/item?id=14271305

jwakely · on May 8, 2017

That's a different library.

logicallee · on May 6, 2017

People forget that C++ is just a tool, like a screwdriver or a hammer. A good carpenter knows when it's time to take a metallurgy class and resmelt his hammer, because its composition is not correct for being a hammer.

johnfn · on May 6, 2017

If occasionally when you try to use a hammer to hit a nail, the hammer swings the other way and hits you, that tool has violated some essential assumptions about how tools work, and the "it's just a tool" reasoning might not apply as clearly any more. :)

cortesoft · on May 6, 2017

I am pretty sure there has never been a carpenter who resmelted his hammer.

theoh · on May 6, 2017

I'm not sure "resmelting" is even a thing. I am sure that the GP is joking.

logicallee · on May 6, 2017

Guys, yes! I was joking. I was directly riffing on the common claim that C++ is "just a tool". This exact phrase, in quotes, has over a hundred thousand hits on Google[1], many of them immediately comparing it to a hammer or a screwdriver. The usual context is that C++ isn't really "unsafe" -- it's just a tool, how you use it is up to you.

In the case of the standard library being broken, it is like a tool that is broken. Resmelting might not be a word but the idea of the hammer not being cast correctly and needing to be re-cast is ridiculous. In essence I was making fun of the "there's nothing wrong with C++, just like there's nothing wrong with a hammer" common claim. I was taking it to an extreme. (I thought it was funny.)

[1] https://www.google.com/search?q=%22C%2B%2B+is+just+a+tool%22

cortesoft · on May 6, 2017

I have no idea why I thought you were serious. Poe's law?

fasquoika · on May 6, 2017

It's not a thing. Smelting means purifying from ore by melting. Unless you turn your hammer back into iron ore, there's no way to "resmelt" it

louithethrid · on May 6, 2017

If you melt metall down again, you get a very brittle, carbon reduced new version, whos properties depend upon the crystallization temperaturcurve?

twic · on May 6, 2017

It seems it is:

http://www.khm.uio.no/english/research/projects/langeid/proj...

Sniffnoy · on May 6, 2017

I think that is the joke...

Ono-Sendai · on May 6, 2017

So where's the bug report with repro test code?

Pica_soO · on May 7, 2017

One thing, not even mentioned in the article is the component-scouting & profiling-phase, which completely failed. You do not go all in with a project on a crucial library, that you did not profile with the real workload.

One small prototype, never run under full load, with mock up classes- not even the size of future classes, mock up operations(not even close to the real workload) and sometimes not even the final db-type attached. Yeah, hard to see the future, but why not drive the test-setup to the limits and go from their?

Instead the whole frankenservice is run once for ten minutes and declared by all involved worthy to bare the load of the project.

Here is to lousy component scouting and then blaming it on the architect.

tbodt · on May 6, 2017

Maybe it'll get fixed now that a post saying "libc++ is broken" got hackernewsed

StephanTLavavej · on May 6, 2017

libstdc++ and libc++ are different libraries; this post is talking about libstdc++. They're like elephants and elephant seals.

mcguire · on May 6, 2017

"They're like elephants and elephant seals."

Yes, yes they are.

tbodt · on May 6, 2017

ris · on May 6, 2017

The correct response is to file a bug report, not write a clickbait-y article.

mtanski · on May 6, 2017

Yeah malloc() is pretty terrible in glibc by modern standards. For some workloads it just can't keep up and ends up fragmenting space in such a way that memory can't be returned to the OS (and thus be used for the page cache) and you end up in this performance spiral.

I always deploy C++ server on jemalloc. Been doing it for years and while there's been occasional hicks up when updating it has provided much more predictable performance.

kartD · on May 6, 2017

Actually from my understanding, it's libstdc++'s allocator that is causing the issue, not malloc.

mtanski · on May 6, 2017

A big reason the small object optimization exists in libstdc++ containers is because system malloc() is not fast enough.

We're not talking about another optimization (small object / locality) as his issue was caused by libstdc++ alloc pools which would not need to exist in the first place if system malloc was better. So libstdc++ reinvents end up reinventing the wheel poorly.

As the author mentioned, when he disabled the optimization behavior GLIBCPP_FORCE_NEW he ended up burning more CPU via system (glibc) malloc(). Once he added jemalloc on top of GLIBCPP_FORCE_NEW, this pretty much evened out with previous behavior runtime performance.

The conclusion towards the end of article: > The right answer to "malloc is slow" is to make it faster.

jwakely · on May 8, 2017

By default libstdc++ stopped using the pooling allocator in 2005: https://gcc.gnu.org/r106665

That's one year after the ancient, bitrotted, unofficial copy of the libstdc++ documentation that the blog post links to, but still ancient history.

api · on May 6, 2017

This is correct. Glibc malloc works fine, though jemalloc is faster in highly multithreaded code and seems to be slightly more memory efficient.

throwaway2048 · on May 6, 2017

This has nothing to do with glibc malloc()

rurban · on May 6, 2017

It has. It was never updated to ptmalloc3

nly · on May 6, 2017

Actually it is C's malloc and free that is "broken". malloc() takes a size parameter, but free() doesn't. This imbalance means it can never be maximally efficient. Whatever GNU stdlibc++ is doing is probably, on balance, a net win for most programs.

It's not exactly roses in C++ either of course. You can do better than the standard library facilities. Andrei Alexandrescu gave a great, entertaining, and technically elegant talk on memory allocation in C and C++ at Cppcon 2015 that is well worth watching

https://www.youtube.com/watch?v=LIb3L4vKZ7U

kabdib · on May 6, 2017

Most allocators will be able to pretty efficiently recover the size of the block you are freeing. And you can count on developers not getting the size right, for that to be a common error, and for everyone to cobble together their own, wildly different and probably slower ways of tracking sizes. So it doesn't really help.

malloc/free aren't a great API, but for other reasons (namely, that you want things like multiple arenas, good control over synchronization, decent debugging and introspection, leak-tracking, tagging for figuring out what a block really is when things get smashed, block enumeration, small block pools, placement for cache alignment, and . . . you get the idea).

nly · on May 6, 2017

> you can count on developers not getting the size right, for that to be a common error, and for everyone to cobble together their own, wildly different and probably slower ways of tracking size

You have to do this anyway. You either know the size of the thing you allocated the memory for or, if it's a block, you need to keep track of the size for bounds checking purposes.

There are no circumstances in which you call malloc() in which you don't need to know the keep the size in your application.

int_19h · on May 6, 2017

It's not just about C. In C++, for example, you might be deleting a base-typed pointer to a derived instance, that can be one of several options that you don't know at compile time.

nly · on May 6, 2017

Doesn't matter, since you call the destructor for the derived class through the vtable. The function you end up calling knows the size of your object and its exact layout. The size at this point is compiled statically in to your program.

int_19h · on May 6, 2017

The destructor does not deallocate memory in C++, since it is called just the same for objects allocated in other ways.

It's also not a given that such an object even has a vtable. It's perfectly legal for it to not have one, and the memory is still supposed to be deallocated in full (only the base class dtor is invoked then, but if derived class additions are trivial, it's not necessarily a problem).

Now, yes, you could add a separate vtable slot for the deallocation function. Or just store the object size directly in the type info (that's usually attached to the vtable). But this is really just a way to optimize size storage for objects that already have a word utilized for shared type descriptor like a vtable.

kabdib · on May 7, 2017

Hmm:

1) There doesn't have to be a vtable.

2) If there is a vtable available, it's the wrong one (look carefully at the treatment of vtable pointers in destructors when inheritance is involved).

2) The object has been completely destroyed before the deallocator is called, and it's unclear whether the vtable pointer is available (the standard doesn't seem to make this guarantee). In any event, the deallocator is statically determined and cannot be a virtual call.

I will note that vtables are just an implementation detail, and that you can successfully implement virtual calls with other mechanisms, which are also not required (by the C++ standard) to remember object sizes. So you can replace the concept 'vtable' with 'abstract mechanism by which the set of members appropriate for this class is determined' -- use token-based dispatch, for example -- and still have a compliant implementation.

[I helped write a few object runtimes in the late 80s and early 90s -- and man, C++ gets gnarly -- and I've shipped 8 to 10 allocators of various types in commercial products over the years]

kabdib · on May 6, 2017

Knowing a size and being forced to retain a size are different things. Here's a trivial example (imagine a system to process compressed audio data):

(1) receive packet: compute actual size, allocate buffer, expand data into it, start DMA to a speaker or something

(2) DMA-done: free the buffer

There, I didn't need the size on the free. There are many, many many similar cases, in fact these cases probably dominate.

nly · on May 6, 2017

Both your speaker driver and your application code still needed to keep track of the buffer size. calling free(size) on your buffer would require no extra overhead

charles-salvia · on May 6, 2017

The C++ delete[] operator doesn't take a size parameter either. This is neither here nor there, and unrelated to the problem the blog post is talking about.