Loving the progression here. Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle. A week from now, someone will prove P=NP because all the problems we thought were NP were just running strlen() on the whole input.
And maybe, in a decade or so, the man page for these functions will list their algorithmic complexity! That was the most interesting takeaway from this article, for me at least. I have only seen a one or two libraries that actually list this in their documentation.
It's easy to forget that the original C standards were largely codifying existing practice during an era when using gets() [1] was existing practice. The world wasn't quite ready for Ada, I guess. Best-laid plans of mice and men etc. etc..
Also, keep an eye out for "amortized" complexity. This does have a legitimately rigorous definition, but for latency-bound paths it can practically amount to "O(whatever), except for the particular invocations that are far, far worse under unspecified conditions".
It's also easy to forget that C was competing mainly with assembly, while C++ competed with managed languages. The early C programmer ethos, especially among library authors, was much more along the lines of "look at the generated object code if you want to know what it's doing" while modern practice leans more towards "read the documentation for complexity guarantees". I'm not saying that worse documentation leads to better programmers, but I'm not not saying that either. Practices change, standards change.
Good documentation and inspecting the compiled bytecode are both good ways of finding out about performance characteristics of certain features. The problem starts when people rely on assumptions ("sscanf should be fast because it's widely used") or performance folklore ("localizing every function you'll ever use makes your Lua code faster"), because those tend to either be completely wrong or lack very important context.
I live in js land, and the barrier between “folklore” and “documentation” is extremely thin. Especially since V8 may introduce changes at any time that affect performance characteristics of js.
I’d respond with “well if performance matters it shouldn’t be in js” except for all the shite being written in js these days, with js being the hammer that makes everything else look like a nail.
You can write very fast JS code. When carefully written it can have Java like performance[2]. It is just very hard in practice where most ecosystem is optimized for developer productivity.
When performance matter, write your own code and carefully benchmark everything. You can see this working for Typescript and VSCode[3]
It makes me chuckle when hash maps are stated to be O(1) insertions. Which is true, in respect to the number of items in the map, assuming the map doesn't need resizing and there isn't a hash collision... but it's generally not true in respect to the key length. (I think most implementations are O(ln), where l is the length of the key and n is the number of inserted items, assuming the hash function is O(l) - the _amortised_ runtime would be O(l))
I wrote my own version of a part of a very popular Java scientific tool, and my version runs about 50 times faster. Their mistake? They had a hashCode() implementation on the objects they were using as keys for HashMaps that iterated through all of the voluminous content of that object. And there was no point - they could have used IdentityHashMaps instead with the same result. I pointed this out to them, and they still haven't fixed it.
I'm guessing GP means the complexity guarantee sidesteps the complexity of the hashing function. It probably doesn't matter all that much in typical case - I'm guessing 80-90% of hash map use is with very short strings.
And the analysis of hashmaps is not such a well-written guarantee -- as you resize, you need a bigger hash function output to reach all possible buckets. A bigger hash function output, assuming you have to keep the avalanche effect to keep output well-scrambled, requires more computations.
Short strings, long strings; they're going to use the same key length. Calculating the key may take longer for the long string, if you're basing the hash on the contents of the string[1], but the key won't end up being a different size. The md5 of a 3-byte string is 16 bytes and the md5 of a 40GB string is also 16 bytes.
[1] Not typical. e.g. Java takes the hash key of an object to be its address in memory, which doesn't require looking at the contents.
Calculating the key may take longer for the long string
Right, that’s exactly what they are warning about.
Not typical. e.g. Java takes the hash key of an object to be its address in memory
No, that’s just the base implementation in Object (and arguably it was a bad idea). All useful “value type” classes will override it with a real hash of the content, including String.
There are some cases in Java where you do want to use IDs instead of values as your map keys, but they’re rare.
> All useful “value type” classes will override it with a real hash of the content
Well, this is necessary for a lot of sensible things you'd want to do with non-numeric value types as hash keys...
> including String
...except String is something of an intermediate case. There are loads of use cases where what you're really using is a set of constant strings, not variables that contain arbitrary character data. In that case, you should intern the strings, resulting in non-"value type" keywords where the only thing you care about for equality is whether two keywords do or don't have the same machine address.
I don't actually know how Java handles this, but I had the vague idea that two equal String literals will in fact share their machine address. And String is specifically set up to accommodate this; Strings are immutable, so in theory it could easily be the case that any two equal Strings must share their machine address, even if you got them from user input.
Java does intern string literals and constants, but you can’t rely on reference equality unless you intern every string you create at runtime by formatting or decoding, and it isn’t specified whether that creates strong references that will never be GC’d.
Yes, Strings are immutable, so they only calculate their hashCode once, then cache it. However, you need to explicitly intern them with String.intern() if you want to avoid multiple copies of the same String.
> Strings are immutable, so in theory it could easily be the case that any two equal Strings must share their machine address, even if you got them from user input.
Hey, and now you have two problems: String hashing and finding all strings which are equal to each other in memory
Well, no, the whole point of this discussion is that solving the second problem means the first problem never comes up.
And this isn't exactly some exotic approach; how often do you think people write Hashes in Ruby where the keys they use are all symbols? It's so common that there's dedicated syntax for it.
It's as old as Lisp, but there's a reason symbols exist separately from strings - they're used differently. Strings are frequently transformed, symbols almost never are. String are frequently taken from end-user input, symbols very rarely. Strings sometimes are very large, symbol names are almost universally very short.
The problem is, interning is an expensive operation. It means adding to an ever growing database of strings, but first checking if the string isn't already there. You don't want to do that every time you change case or flip a letter in a string, or use it to access a hash table. I'm not saying it can't be done, but I honestly have no idea how to implement sane, generic, automatic interning of strings. I feel more comfortable having a symbol type, and control over turning strings into symbols.
I definitely agree that uninterned strings are important. All I'm really trying to say down here is that there are many cases where you have a hash table which uses strings as keys (as an implementation detail), when (conceptually) it wants to be using symbols.
(And on a less fundamental level, the particular Java String class is less string-like and more symbol-like than most string types, and this appears to have been done intentionally.)
> Everything is O(1) if N is constant, including log(N), N^2, 2^N, N!, etc.
Not even close. 2^k is not O(1) by virtue of N being constant. Only 2^N.
This has been covered above. It is more common to consider the complexity of hash table operations in terms of the number of operations, or the size of the table; the size of the key is very often constant. These are different variables; the constant size of the key does not trivialize the complexity of inserting N items each with a constant key size.
Here, the relevant key is the output of the hash function though -- that's what you need to increase in order to ensure you can reach all buckets. And that (k) must increase with the table size. So it is not constant and depends on n (table size).
I remember a proof in CLRS which first developed a function that was bounded above by 5 for all conceivable input ("a very quickly-growing function and its very slowly-growing inverse"), and then substituted the constant 4 or 5 into a complexity calculation in place of that function, giving a result which was "only" correct for all conceivable input.
The same approach applies to key length requirements for hash tables with arbitrarily large backing stores. They do not grow as slowly as the CLRS log* function, but they grow so slowly that there are easily identifiable sharp limits on how large they can be -- an easy example is that a hash table cannot use more memory than the hardware offers no matter how the software is written. A backing store with 1TB of addressable bytes cannot need the key to be more than 40 bits long.
On a different note, by "table size" in my earlier comment I meant to refer to the number of entries in the table, not the capacity of the backing store. It seems like you might be using the same word for a different concept?
>The same approach applies to key length requirements for hash tables with arbitrarily large backing stores. They do not grow as slowly as the CLRS log* function, but they grow so slowly that there are easily identifiable sharp limits on how large they can be -- an easy example is that a hash table cannot use more memory than the hardware offers no matter how the software is written. A backing store with 1TB of addressable bytes cannot need the key to be more than 40 bits long.
So? That's still putting a bound on table size, which makes it in-practice constant, but doesn't make the algorithm O(1), because you can never get such a result by bounding n, for the reasons the GGP gave -- that's cheating.
Your complexity bound has to be written on the assumption that n (number of elements to store in hashtable) increases without bound. Assuming you will never use more that Y bytes of data is not valid.
>On a different note, by "table size" in my earlier comment I meant to refer to the number of entries in the table, not the capacity of the backing store. It seems like you might be using the same word for a different concept?
No, I was using table size exactly as you, to mean the number of elements stored. Is there a reason my comments only made sense under a different definition? It not, be charitable. (And avoid using obscure terms.)
> No, I was using table size exactly as you, to mean the number of elements stored. Is there a reason my comments only made sense under a different definition? It not, be charitable. (And avoid using obscure terms.)
I interpreted your comment to refer to the size of the backing store, because that is fundamentally what a hash key needs to be able to address.
I didn't mean to say that, if you were using it that way, you were doing anything wrong, only that there appeared to be a mismatch.
>I interpreted your comment to refer to the size of the backing store, because that is fundamentally what a hash key needs to be able to address.
Under the assumption (upthread) of constant resizing as element are added, the distinction is irrelevant. The more elements you have in the table, the more elements you need to address, and the more possible outputs your hash function needs to have.
And the needed size of the backing store scales with the number elements you want to store anyway.
>I didn't mean to say that, if you were using it that way, you were doing anything wrong, only that there appeared to be a mismatch.
Why bring up something like that if it doesn't translate into something relevant to the discussion e.g. to show my point to be in error?
Incidentally, the person replying to you in that thread incorrectly stated that comparison is O(logN) on the number of bits. The most common comparison function, lexicographic comparison, is actually O(1) average case given random inputs of arbitrary length.
But, isn't the key length a constant and we are back to O(1)? Ok, in theory you could exhaust all possible keys of a certain length and proceed with longer keys. It would give us what? O(ln(n))?
His point is, if you use Moby Dick as the key, it's going to take longer to hash that than a three letter string. Hashing isn't O(1) if the key has variable size.
...I fully plan to use "O(whatever)". Not sure for what.
But, yes. (naive) Quicksort's amortized complexity being O(nlogn), but its O(n^2) on already sorted data, is all I ever needed to learn to take away that lesson. When sorting already sorted data is worse than sorting randomized data, it's a quick realization that "amortized cost" = "read the fine print".
Or triple, or quadruple. Or even (IIRC) "increase by 50%" (but, I would need to sit down and do the actual math on that). But, doubling a number is cheap and more conservative than quadrupling (the next "cheap" multiplier).
Also, already sorted data.. in reverse order. If it's already sorted in the right order, quicksort takes linear time. This is an important difference - data you use might indeed often be appropriately sorted, but in practice will seldom be sorted in reverse order.
On the contrary: very common UI pattern to have a data grid that sorts by a particular column when you click the header, then reverses that sort order when you click the header again. So for a user to sort by date, descending, they click the header, causing an ascending sort, then click it again, causing a descending one.
Often such a grid will be quite well abstracted from its data source - it might be executing a remote query to return data in the new order every time - but I bet there are some examples out there that are backed by a local dataset and carry out an actual sort operation when you hit the header... and fall into a quicksort worst case if the user clicks the same header twice in a row.
Yes; random pivot selection is nlogn (unless you are very, very, statistically impossibly, unlucky. Or using very short arrays where it doesn't matter anyway).
But I'm pretty sure data sorted in either direction (i.e., 'reversed' or not, ascending or descending), and taking a pivot from either end, is n^2. It doesn't have to be reversed; you always end up with everything unsorted ending up on one side or the other of the pivot, with each recursive step being just one less comparison than the step prior, meaning it has N-1 + N-2 + ... + 1 comparisons regardless of which way the array is sorted, or N(N-1)/2 comparisons total (Gauss' formula but starting at one less than the total number of items N, since that's the number of comparisons each step), which is O(N^2). There is no cause where it's linear time, unless you are first iterating across the array to select the first pivot that is out of place (which may be a reasonable optimization, but can also be made to apply regardless of what direction the array is sorted).
In the standard there's things like "exactly N operations", but not seeing stuff for `istream`. There's like... an explanation of how things should work and I imagine you can derive complexity from it, but I think `istream` is a bit special since you're talking about this wrapper for (potentially) an arbitrary input source.
> Note that some implementations of sscanf involve a call to strlen, which makes their runtime linear on the length of the entire string. This means that if sscanf is called in a loop to repeatedly parse values from the front of a string, your code might run in quadratic time
Good. I'm so happy they put it there. It's a little thing, but such little things - documenting corner cases - can have great benefits.
I have a bad memory for all but most frequently used standard library calls, so I regularly end up refreshing my memory from cppreference.com, and I tend to instinctively scan any notes/remarks sections, as there's often critical information there. So now I can be sure I'll be reminded of this the next time I need to use scanf family.
I don't know if it is required to, but there doesn't really seem to be an upper bound to what glibc's scanf will eat for a %f (e.g. a gigabyte of zeroes followed by "1.5" will still be parsed as 1.5), so for that implementation there certainly isn't a trivial upper bound for the amount of input read and processed that is done for %f, like you would perhaps expect.
Yet another reason to not stringify floats. Just use hexfloats (but beware of C++ standard bugs involving >>) or binary.
Unfortunately "gigabytes of numerical data, but formatted as a text file" is commonplace. For some reason HDF5 is far less popular than it ought to be.
But why do strlen() at all? And why are all platforms (Linux, Windows, MacOS) seemingly doing that?
I think you're right that there is no upper bound but it shouldn't be necessary to do a full strlen() if you're instead scanning incremental. You could go char by char until the pattern '%f' is fullfilled and then return. That would solve the issue on it's root -- and who know how many programs would suddenly get faster...
I'm reading this source the first time, but I guess to not break anything one could introduce a new type of FILE* stringbuffer let's say in 'strops_incr.c' that is working incrementally reading one char at the time from the underlying string skipping the strlen()...
Would be awesome cool if GTA online would be loading faster under wine than on windows :-)
How would it help you knowing that it's O(n)? It needs to read all the characters of the float. Problem is that it's needlessly reading characters even after the float
You're joking, but now I'm thinking about the XML we parse at work and the library we're using to do it. We parse a lot of it, but I've always had this vague feeling that it takes a bit too long (given the codebase is C++).
The XML library we use is rather well-known, so if someone found a bug like this there, I'd suspect a general improvement of performance across the board in the entire industry. Efficient Market Hypothesis tells me it's unlikely the library has this problem, but then again, so I thought about AAA videogames, and then GTA Online thing came out.
Any sufficiently-complex library code likely has plenty of problems, often unavoidably so (e.g. trade-offs between best performance and edge cases). Whether they have been found or not is a function of many, many factors.
> Efficient Market Hypothesis
I've lived long enough to be very sceptical about that sort of thing. Markets tend to be efficient in aggregate, maybe, but on the single case they can fail quite brutally. Look at how "dramatic" bugs are overlooked even in critical pieces of infrastructure like openssl, for years and years; maybe it happens less for openssl than most equivalent libraries, but it still happens.
Also, once the "market" for standards moves on, network effects make it very hard to have any meaningful competition. I mean, who writes XML parsers nowadays? Whichever XML lib was winning when JSON "happened" is now likely to stay in control of that particular segment; and the likelihood that top developers will keep reviewing it, falls off a cliff. Sprinkle a bit of cargo-cultism on top, and "efficient markets" become almost a cruel joke.
There's a variant / corollary of the Efficient Market Hypothesis here, though.
Let's say the GP's XML library has The GTA Bug, i.e. it uses a quadratic-performance loop when parsing. The bug will go undiscovered until any one consumer of the library a) sees enough performance impact to care, b) has the expertise to profile their application and finds that the library is at fault, and c) reports the problem back to the library owner so that it can be fixed. This combination might be unlikely but since only one consumer has to have all those properties, the probability scales inversely with the number of library users.
It's possible. I've personally reduced the time spent for reading huge XML file on the startup of an application at least 10 times in the application I was in charge of, by avoiding the library dependence and writing a custom code. Having a lot of experience in such kinds of code and in the performance issues, it was quite a fast change with no negative effects.
The prehistory of that was simple: up to some point the amount of data stored was reasonably small. Then from some point on the amount of data grew significantly (a few orders of magnitude), and the startup times became very unpleasant.
There's a lot that is going on when loading huge XML files. As an example, don't forget all the possible Unicode conversions, all the possible allocations of the elements in the handling code, just to be discarded etc.
I don't suggest everybody doing it "just because" but if some specific use is known to have very specific assumptions and it is in the "hot path" and really dominates (profile first!) and it is known that only a small subset of all XML possibilities will ever be used it can be justified to avoid the heavy libraries. For example, in that specific case, I knew that the XML is practically always only read and written by the application, or by somebody who knew what he was doing, and not something that somebody random in some random form would regularly provide from the outside, and I knew that my change surely won't break anything for years to come, as I knew for sure that that part of application was not the "hot spot" of expected future changes.
So it was a win-win. Immensely faster application startup, which is something that improved everybody's work, while preserving the "readability" of that file for the infrequent manual editing or control (and easy diff).
I'm reminded of a 2008 article, Why is D/Tango so fast at parsing XML? [0]
One of the main factors seems to be that a lot of XML parser libraries, even the high-profile ones, did a lot of unnecessary copy operations. D's language features made it easy and safe to avoid unnecessary copying.
If you have a lot of nesting in the XML, and it is formatted for human reading (i.e. indented), you may want to consider not doing that. We had a project where we were creating human-readable versions of the XML (mostly for developer convenience) and then parsing it. When we stopped adding all the extra white space the parsing speed increased a couple of orders of magnitude. (The downside was we no longer had built in coffee breaks in our development process.)
That's interesting. I can't think of a mechanism why this would give so much of a performance boost, though - rejecting extra whitespace should be just a matter of a simple forward scan against a small set of characters, shouldn't it?
(Or maybe in your case something was running strlen() a lot during parsing, and just the difference in file size explains the boost?)
What about parsing that XML upfront, serialising to some binary format (e.g. CBOR, maybe with nlohmann's JSON library, or Cap'n Proto) and shipping the binary file?
Would be cool if we could that, but as things stand, enough various people want to occasionally look at these files, in environments where they can't just install specialized tooling and are using notepad.exe (or Notepad++ if already available), that we keep it text.
I like binary formats, but we can't afford the increased complexity around supporting a custom binary format, so I'm not pushing for changes here.
I did investigate replacing our pile of XML files with an SQLite database, which would give us fast and efficient format, and allow to use existing SQLite database viewers, or hit the file with trivial scripts, so we'd have no complexity supporting a dedicated tool. However, the data model we use would need such a large overhaul (and related retraining) that we tabled this proposal for now.
Seriously the most I have laughed in like 6 months. Which probably says a lot more about me than this joke. I know that jokes aren't really welcome on HN, and I generally really like this policy. But just had to mention this was just ... what I needed to read right now.
IMO, while I really don't come to HN to find dial-a-joke, or joke-of-the-day, I think some humor is essential in modern life.
Since we're talking about Matt Keeter, you will find he has a great sense of humor if you read his website or interact with him. Some of his jokes are ROTFL funny, but subtle.
>> So many years ago when I first took over the iOS string functions, I found that like 30% of boot time in debug builds was in strstr. <<
>> Needless to say, we did not fix that issue by writing a more efficient strstr. Removed the parser and then removed strstr from the environment where it had been in use =) <<
> ...Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle.
My 2019 MacBook often pauses when I connect the charging cable. Sometimes it just seizes, requiring a hard bounce.
Clearly there's a contended lock buried deep. Something non-obvious.
I'm certain everything these days has dozens of hidden quadratics and contended locks.
Which is one of the reasons I'm excited for stuff like structured concurrency (Java's Project Loom) and retoolings like systemd becoming the norm.
Ages ago I worked on kitchen sink app that had a very finicky startup. Any breeze would break it. Much consternation by mgmt. Apparently if we only clapped louder, Tinkerbell would fly. I couldn't take it any more. LARPing as a bulldozer, I replumbed the innards, changing from something like initd to be more like systemd with some lazy loading for good measure. Voila!
Back to GTA. The failure here is the product owner didn't specify a max load time, and then hold the team to it. Devs will mostly do the work that's expected of them. If load time wasn't measured (and managed), no one is going to bother with expunging sscanfs.
> My 2019 MacBook often pauses when I connect the charging cable. Sometimes it just seizes, requiring a hard bounce.
Yesterday my MBP kernel panicked because my keyboard was low on battery and the bluetooth connection kept dropping. There's something weird with MacOS where peripherals seem to really not be well isolated from the core OS runtime.
Oh peripherals on newer Macs are somehow very hit or miss. I have a very difficult time with external monitors, especially from sleep. My MBP 16" would just loop between initializing and failing to initialize, until I unplug, wait, and re-plug again. Or I have to press the `Extend` option instead of the `Mirror` option that I use. The older 2015 MBP would just connect fine.
Blog author here! Thanks to HN for warning me about sscanf at exactly the right time – within a day of me trying to load some ASCII STLs and noticing it was slow...
Linked deep in the Twitter replies [1], there's an open glibc issue about this, dating back to 2014:
IMO the lack of a complexity requirement is a bug in the C standard. And really it’s a bug in the implementation(s?) too. If it can be done on O(1), shame on library authors for doing it in O(n). If you want programmers to trust library authors, don’t do this to us. Maybe std::from_chars FTW?
This is not a complexity issue with the function. The function is linear to the input, as it should be. The problem is that the implementation does more work then it needs to (it doesn't need the length of the string). It should be linear to the end of parsing not the end of string. The complexity in this case comes from the loops calling it.
Shouldn’t we just come clean and admit to ourselves that there is no such thing as the C standard? There is a collection of loosely related languages that look similar and that collectively we call C, but really they’re all completely different and share almost no interoperability or common characteristics. And those standards that do exist provide almost no ability to reason about your code including things like ordering of statements.
> ISO-C11 specifies 203 circumstances that cause undefined behaviors.
203 is enough to make almost every line of code questionable. The result of this is that looking at a simple 3 line C program and being asked whether the program terminates is undecidable without knowing which compiler was used.
Null dereference for example is undefined behavior, and could cause a termination or not, depending on the implementation, even if it is known to be standards conforming to C11.
> 203 is enough to make almost every line of code questionable. The result of this is that looking at a simple 3 line C program and being asked whether the program terminates is undecidable without knowing which compiler was used.
This is hyperbole to the point of being nonsensical.
> Null dereference for example is undefined behavior, and could cause a termination or not, depending on the implementation, even if it is known to be standards conforming to C11.
This sentence doesn't make any sense. If your C code has UB, it is wrong. The behavior of particular environments around certain UB is irrelevant to standards-conforming code, because standards-conforming code doesn't have UB.
> This is hyperbole to the point of being nonsensical.
I think you can only say this if you've never had aggressive compiler optimizations introduce security issues into perfectly reasonable-looking code.
Quiz, what's wrong with the following code?
int buflen, untrusted;
char buf[MAX];
/* `untrusted` comes from an untrusted source */
if (buflen + untrusted > MAX) {
return -EINVAL;
}
The answer of course is that integer overflow is undefined; so if buflen + untrusted is greater than INT_MAX, the compiler is allowed to do absolutely anything it wants; and making sure it's only allowed to do something sensible turns out to be extremely difficult.
EDIT For instance, in an earlier age, people might have done something like this:
if (buflen + untrusted > MAX || buflen + untrusted < buflen)
But the second clause relies on overflow. The compiler is perfectly justified in saying, "Well, overflow is UB anyway, so if it happens, I'm allowed to not do anything; so I'll just make this code more efficient by removing that check entirely."
This goes against the sheer notion of UB. If some code was wrong, the standard would say it is not allowed and it would result in a compile error, or at least a runtime error. As it is, the language standards choose to leave it open almost as if to concede that the standard can’t cover every base. UB isn’t wrong, almost by definition. It’s just implementation specific, and that’s my point. We don’t have an overarching C language, we have a hundred or so C dialects.
One problem here is that correct code relies on valid inputs in order to avoid UB -- Undefined behaviour is a runtime property of a running program, rather than (necessarily) a static property of an isolated unit of code.
In this way, UB is essentially the converse of Rust's `unsafe` -- we must assume that our caller won't pass in values that would trigger undefined behaviour, and we don't necessarily have the local context to be able to tell at runtime whether our behaviour is well-defined or not.
There definitely are instances where local checks can avoid UB, but it's also perfectly possible to write a correct program where a change in one module causes UB to manifest via different module -- use after free is a classic here. So we can have two modules which in isolation couldn't be said to have any bugs, but which still exhibit UB when they interact with each other.
And that's before we start getting into the processing of untrusted input.
A C compiler -- and especially the optimiser -- assumes[1] that the conditions for provoking UB won't occur, while the Rust compiler (activate RESF[0]) mostly has defined behaviour that's either the same as common C compilers would give for a local UB case[2] in practice or have enough available context to prove that the UB case genuinely doesn't happen.
[1] Proof by appeal to authority: I was a compiler engineer, back in the day.
[2] Signed integer wrap-around is the classic here: C assumes it can't happen, Rust assumes it might but is much less likely to encounter code where there's a question about it happening.
I always though that code with UB is wrong, and UB allows implementation to deal with it on its own way (it is allowed to ignore it, stop program, corrupt memory, delete hard drive contents...).
So if your code has UB then it is wrong, one thing not specified in standard is exact consequences of that.
(yes, in some hacks one may rely on UB behaving in some way in some circumstances - it will be hack)
Suppose it is wrong, though; that implies a good chunk of C code out there is wrong code. Yet it compiles and people are using it, which means that their code does not conform to the standard. Just as wrong math isn’t math at all, wrong C is not C. People are therefore writing code whose runtime characteristics are not defined by any standard. Thus it is not actually C, it’s whatever compiler they’re using’s language.
Working and usable program typically contains wrong code of various kinds.
Nontrivial bugfree programs are extreme rarity.
> wrong C is not C
buggy C is still C, if on discovering undefined behavior people treat it as a bug - then it is just C program with some bugs in it.
If on discovering undefined behavior people treat it acceptable people treat it differently "on my compiler it does XYZ, therefore I will knowingly do ABC" then it is becoming something else.
It's not really a bug if it works as the way it was intended by the developer. It just exists in a world outside the law, makes its own laws based on what works, a renegade program. Most people don't read the C standard or care what it says (and it costs money, so it's almost as if reading it is discouraged), so it seems very likely the default human behavior is just to use this UB.
I still think undefined behavior is the wrong choice here. It should have been implementation-defined, like what happens if you bit shift a negative integer to the right. They could pick two's complement or trap on overflow or whatever is most convenient on their platform, but not just assume it will never happen.
There is always a fix for your own code but that’s not the problem. The issue is all the millions of lines of code in the wild that are intended to be compiled without that option.
There is a standard, sure. But there are also a lot of compilers out there and I would bet that all but a few has either a "this compiles c11 except for [list of unimplemented features]" caveat or non-standard extensions.
> But there are also a lot of compilers out there and I would bet that all but a few has either a "this compiles c11 except for [list of unimplemented features]" caveat or non-standard extensions.
Your statement is broadly reasonable. Here's the GP:
> Shouldn’t we just come clean and admit to ourselves that there is no such thing as the C standard? There is a collection of loosely related languages that look similar and that collectively we call C, but really they’re all completely different and share almost no interoperability or common characteristics. And those standards that do exist provide almost no ability to reason about your code including things like ordering of statements.
It's just a string of incorrect statements, or at best, extreme hyperbole.
1. There is a C standard.
2. There's only one C language. Implementations differ, but not "completely," and are mostly interoperable. They all have in common the standard language, of course.
3. The C standard provides a fairly strong mental model for reasoning about code. Especially in later revisions, but even in C99. "Almost no ability to reason about" is false.
If you think C is fragmented or difficult to reason about, let me introduce you to Python (Python3, Jython, PyPy), Java (Oracle, OpenJDK, Dalvik), C# (Mono, .NET Core, .NET on Windows...), C++ (if you think implementing C99-C18 is hard, check out the stdlib required by modern versions of C++), etc.
You are right of course. I was thinking specifically about the early PIC Microchip compilers for "C" where the weird banked memory and Harvard RISC architecture made the "C" you wrote for those essentially non-portable even to other 8-bit micros. I think the Microchip marketing was very carefully trumpeting C but not claiming to follow any version of the standard though.
And, the of course, the community around PIC's where almost uniformly on board with writing in assembly anyway.
In Java/C#/Python, you have a standard, a dominant implementation which is compliant to that standard. Few more independent/niche implementations which may not be 100% compliant. Do we have a similar implementation in C?
I think you're maybe overstating the degree of homogeneity in Java/C#/Python. E.g., CPython3 cannot run Python2 code at all, and there are a lot of platform-specific modules and behaviors in the stdlib ("import os"). I don't think Python has any real single standard to the level of detail that C does, although I may be mistaken.
For C, on approximately the same platforms, to approximately the same degree: either GCC or Clang would be the corresponding standard-compliant implementation.
CPython is looked to as the canonical Python. IronPython and PyPy are all modeled after it and aim to behave as closely to it as possible, while CPython is considered the gold standard. Comparing CPython3 and CPython2 is orthogonal to that; one is not trying to implement or emulate the other. You have similar situations with Mono C# imitating Microsoft C# and IronRuby imitating Ruby. If there is a difference, it is considered a deviation from Ruby which is the reference implementation.
Thank you for introducing me to the concept/term Ascetic programming. Not sure how widely used it is, but I find it more fitting for what I try to do than minimalistic or KISS.
Also, it is great to see someone write
> I noticed that ASCII STL loading was really quite slow.
> From startup to showing the window, it took over 1.8 seconds!
I always find pleasure seeing projects which highlight just how fast modern computers really are.
Re-read my comment and to be clear, the quote and the last paragraph are not related. The last sentence was meant to refer to the Erizo project as a nice single purpose high performing tool, not as a comment to the bug that made it slow.
printf and scanf match nicely with their format specifiers, so the serialization and deserialization can be maintained nicely in lockstep.
to avoid the quadratic overheating sttlen you can simply use fmemopen(3), which makes the temporary sscanf FILE object explicit and persistent for the whole parse, and needs just one strlen call.
I think the really embarrassing part for Rockstar is that they didn't bother to investigate what took 5+ minutes to load in their star product, a simple profiling would've made the issue obvious. So either they knew and they didn't care, or they didn't know and they didn't care.
That being said both for GTA and for TFA the issue is a very similar sscanf call:
sscanf(data, "%f", &f);
I already posted a similar comment in the GTA story but I really want to emphasize it: scanf is almost never the right tool for the job, and it's definitely not the right tool in this situation. Just use strtof. That's literally what it's for. String to float. There. Done.
Scanf is crappy and if it were up to me would've been deprecated a while ago. I can sort of see using it for a quick one-off "script", for instance to parse user input, but seeing it in the middle of a program will always raise a huge red flag for me.
Use strtok_r if you need to split a string, then parse every entry individually. It's more robust, more flexible (you can parse custom types and formats that way) and allows for much better error handling and diagnostics. And of course it's also usually vastly faster.
Scanf is an antipattern in my opinion. I literally never use it and I'm better off for it. The last time I interviewed for a C coder position I managed to answer the full C test quizz except for the one question regarding scanf. That's how much I don't use it.
I think it's even worse for developers who come from higher level languages and (reasonably) expect to be able to deserialize data easily. You simply can't do that in C, the type system and general philosophy of the language won't let you, but scanf may convey the illusion that it's sort of possible. Don't believe its lies.
I was thinking about it the other day when reading the original article, and this was the only plausible (and defensible) cause for it not being addressed:
When GTA online was released 7 years ago in 2013, the list of DLC items was probably much shorter, and grew over time. The performance issue is exponentially aggravated with list-length. The list growth was probably bell-curve shaped over the lifetime of the game.
This has an interesting dynamic when it comes to perceived performance:
In the beginning, on consoles and PCs - it was already a pretty long load time, but would have been 90s or so on an average gaming PC (I remember this from the early days playing it, on a modest gaming PC with an FX-8150 cpu). This is long, but tolerable for a game of this size. I'm certain that early complaints that it was sluggish to load were profiled and looked at, and at the time it wasn't a 4 minute ordeal to load the json and probably represented a fraction of the CPU time it takes today - not standing out as obviously as in OPs guerilla profiling. Devs put a pin in it and say "this is netcode related, it is what it is"
Over time, the list gets longer, the loading time takes more cycles, BUT, PCs are getting progressively faster year over year as well, with many of those improvements happening at the instruction-level - optimizing for things like, surprise, string scanning. Two console generations are released since, masking the problem on that side. For comparison sake, I just checked and I can load GTA online in about 75s on my Ryzen 3900x. This cpu is probably 4-6x faster in single core performance than the 8150 for most workloads. Again, it's slow but tolerable and by this time it's "yeah GTA online is just a big game and takes a while to load, it's always been that way". Complacency is the enemy of improvement, and things that regress slowly over time are hard for us to notice in general.
Don't take this as a "this is fine" comment, but instead the only reasonable justification I can think of as to why it might have flown under the radar all these years.
I think 'embarrassing' is too strong a word. AAA game development is rushed; the pressure is to ship. Something has to give. This is a user facing issue, but one that doesn't actually affect the gameplay. Assuming they had -time- to profile the load process, given that low a priority, seems extremely optimistic.
>AAA game development is rushed; the pressure is to ship.
I'd be more understanding if GTA Online hadn't already shipped its first version in October of 2013. Surely there would've been some time after shipping the first version to profile the game.
But I should note that once you ship a product in this space there is a heavy emphasis on not breaking much. Changes are for the next milestone (seasons, service packs, new features). There's very rarely any emphasis on "fixing" something because it could introduce even more bugs and Producers prefer sitting on a stack of known issues than addressing them with more unknown ones. Since known issues have a known cost.
Until it gets so bad that you have to make health patches, we made such patches (and referred to them internally as "Sanity" patches)
Sure. I'd be embarrassed if they didn't have the issue on their backlog ("Load times are high"). But the priority seems low, and the actual effort and viability of a fix seems unknown. Speaking as an engineering manager, that is very much going to be a "if you have spare time" ticket. Now, I also try to ensure people have spare time to investigate stuff like that, but that's me, and I don't work in game dev. I can easily see another manager, especially one in game dev (where what keeps players coming back is new content and features, not reduced load times) prioritizing other tickets ahead.
(disclaimer: I'm not in game development and only read about this)
Usually different staff rolls on and off at different times of product development and post-release lifecycle. I understand that most programmers would have been rolled off a while before launch. You early on have people build or adjust the engine and tooling, but later on you don't need most of them anymore and things come down to creating content.
In other areas of software development are perpetual. You don't hit some milestone at which 90% of developers are moved to a different project or laid off and folks with a different skill set are added.
Usually in software development you have different people over time, because of individual churn, not because you are changing the role mix
Well, it doesn't affect the gameplay if the player starts the game once and never closes it. But for anybody who wants to hop on for a quick bit of fun, it's a notable barrier. There are definitely games I've stopped playing because it takes too much time to launch the thing.
I wouldn't have said anything if the game was released one month ago, but GTA V is almost 8 year old now and it's been ported to several generations of hardware (IIRC they've even announced "next gen" ports to release this year). The online function is still maintained and makes them a lot of money. I also do think that it affects the gameplay because these loading times are genuinely terrible. A 30second loading screen is a nuisance, a 5+ minute loading screen just makes me want not to play the game.
I think that Rockstar deserves some blame here, especially since this problem might well be a consequence of their notoriously bad development practices.
I tend to agree. When you are rushing things get missed. Also if it was a problem from the beginning you just might not think its an issue (its just how long it takes) .
One philosophy I heard in my days of programming (not sure how I remembered this but its still out there) :
Make it work, make it right, make it fast.
-- Kent Beck
Rockstar has virtually endless resources and the game has been out for many years. for years, they didn't reduce the extremely long load times? not only embarrassing, but shows deep incompetence and lack of respect for the craft and for end users.
scanf and printf have complementary format specifiers, which can make maintaining serialization and parsing of regular data a breeze...
the proper remedy is to simply wrap the string to parse with fmemopen(3), which makes the temporary FILE object explicit and persistent for the whole parse, and needs just one strlen call.
Cool trick, thanks for sharing. I don't get why there isn't a suitable snscanf function that takes the buffer length as an argument and returns the number of bytes parsed?
Not excusing this but there are likely a few mitigating factors here.
* Tight deadlines result in shipping code that's barely tested and may have resulted in minimal code reviews on it.
* The original article mentioned how the ids were always unique. It may have been intended to load content from multiple sources or to allow patching of content on disk (or repurposed entirely from a different game). Or it could well be an oversight/over-engineering.
* It may even be a general purpose json parser from another project that had never been tested with data of this size until after launch.
* It probably wasn't always this bad. Likely when the game launched the loading times were much more reasonable as the amount of in-app-purchases was an order of magnitude smaller.
Typically most of the IAPs will be added much later, so much of the profiling work would have been done with this code having a much smaller json block.
When the game was shipped the dev team will likely have been shrunk significantly as the bulk of the team moves to a new project leaving a smaller team with a focus more on the content itself and the engine team that likely deal with and spot stuff like this will probably have their attention elsewhere.
Don't work for R*, have shipped many high budget titles though including live services.
Agreed. My first instinct was the same: *scanf is never the right tool for pretty much any job.
I learned this 20+ years ago. As far as I'm concerned it should have been considered deprecated along with gets; it was considered dangerous in the early 90s and probably before. Not sure why people are still using it in the 2000s+.
The Go standard library is pretty good, but unfortunately, it includes a scanf clone, so every once in a while you see a poor new developer posting to help forums trying to get it to work properly and you have to break it to them that they're using the wrong tool for basically any job.
There's a bigger potential problem with the *scanf() functions than performance. They are inherently unsafe for reading numeric input.
For example, if you do something like this:
int n;
sscanf("9999999999999999999999999", "%d", &n);
the behavior is undefined. As the C standard says:
> ... the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
You can control the appropriate type by writing the call directly, but you can't guarantee that the result can be represented unless you have control over the input.
Remember that in C "undefined behavior" doesn't mean that your program will fail, or will crash, or will tell you there was a problem. It means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.
Now most implementations will probably do something sane, like setting a floating-point object to infinity or an integer object to some arbitrary value, but the language doesn't guarantee anything.
If you want to write safe code, you can extract a substring that represents a number and pass it to one of the strto*() functions, which do have well defined behavior on overflow. (But I couldn't tell you exactly what that behavior is without looking it up.)
> Remember that in C "undefined behavior" [...] means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.
Actually, the worst case possibility is that your program will become Skynet, enslave humanity for 10000 years, collapse all stars in the universe into black holes, and significantly accelerate processes such as the heat death of the universe.
You’re misreading the standard a bit I think. It’s saying undefined behavior comes from the format string (which you should control and is a common compiler warning if it’s not a literal) doesn’t match the types of variables you pass it. This is kind of obvious when you think about it. Variadic C functions lose type information so the format string is the source of that.
The “out-of-range” issue just means that the library isn’t going to mandate every implementation of this function is guaranteeing to provide the same overflow behavior (some might stop when you saturate, others might stop at the end of digits input and overflow, others might detect the overflow and saturate).
The Linux man page is clearer here IMO:
> If the number of conversion specifications in format exceeds the number of pointer arguments, the results are undefined. If the number of pointer arguments exceeds the number of conversion specifications, then the excess pointer arguments are evaluated, but are otherwise ignored.
That’s the only spot the word “undefined” appears and doesn’t discuss overflow. My general impression is that the “undefined” problem largely only applies to language operations or user input causing a library to perform such an undefined behavior. Older C functions with older documentation may be using “undefined” with a less strict meaning to also cover “implementation-defined”. The “undefined behavior” brouhaha came up in the past 5-10 years only when compilers actually started leveraging it breaking a lot of assumptions.
Wow, the problems are so deep that no human should ever use sscanf again. It's just as bad as gets() because there is no way for the programmer to deal with the error case.
Implementations can define behavior for undefined behavior. The difference to implementation-defined behavior is that for the latter implementations MUST define some behavior (from the set of options specified by the standard), whereas for undefined behavior they don’t need to.
If an implementation has defined some behavior for sscanf undefined behavior, and then the standard defines a different behavior, then the existing implementation would become nonconforming, and an updated version of the implementation would be not backwards compatible with the existing one. That’s why such changes to the standard can be problematic.
I am writing an app for iOS in Swift and I have an array of structs with some 70,000 elements or thereabouts and for some bizarre reason the compiler uses so much memory if I define it as such directly in the source, that I run out of memory. So instead as a workaround for now I am storing the data as a JSON string that I parse at runtime. It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.
But I don’t understand why the Swift compiler decides to use so much RAM when compiling it in the first place. The string representation of the data itself is only ~3 MB. But when I tried to declare the data as an array of structs in Swift directly it uses gigabytes of memory when I try to compile it, which causes the system to start swapping and then the disk space runs out because I only have about ~20 GB of free space on the disk, so then the system can’t swap no more and is out of RAM also.
And my struct is very simple it’s just
struct Bazinga: Identifiable, Codable {
let id: Int32
let name: String
}
And before I had to turn to JSON it used to be only Identifiable even. So it’s like one of the simplest possible structs, and the 70,000 items of data only a few MB when written in the source. Yet more GB of memory is needed to compile an array of these structs than I have RAM, and even exceeds the amount of disk space I have that it can swap to. It’s super weird to me that this is even a problem, and it’s insane how many GB of memory it consumes trying to compile my code.
I don't know why you're running into that issue, but...
> It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.
You should look into Flatbuffers (https://google.github.io/flatbuffers/flatbuffers_guide_use_s...). It's a tool that can generate an API for reading/writing binary data based on a schema file where you design the layout (similar to protocol buffers). The data is ready to read, so you don't have to do any parsing at all, AND the compiler includes a feature to convert JSON files into binary that matches your given schema.
It won't solve your compiler woes, but it will help you avoid having to store and parse JSON, and it's a tiny dependency.
It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.
Absent that, of course we can potentially read the source code and find out, but I think for the most part we tend to operate based on an informed assumption about what we imagine the algorithmic complexity of a given operation would be. Inevitably, sometimes the assumption is incorrect.
There's no way to develop software without making assumptions, some of which inevitably turn out to be incorrect, so I don't think there is any great shame in having that happen, in itself. But better docs could help us make better assumptions with less effort, at least.
Keeping track of algorithmic complexity would be nice as a language and/or static analysis feature. If you wanted to be exact or do it for a language with complex metaprogramming I assume it would be a nightmare to implement. Absent those complications and especially if you always reduced it to O(1), O(n), O(log(n)), etc it might not even be that difficult given the potential advantages.
The difficulty here is "define n". And I don't mean that facetiously. You have a string parsing lib. It is, for reasons, quadratic over the number of strings parsed, and linear per string.
This is overall n^3, but that's meaningless because there actually isn't just one n. So, more m^2 * n. That means you can't reduce it to anything, because you want to keep both components. (Because, say, you know it will only ever be called with a single string).
But then, in the next app, this gets called and reinitialized once per file. And the routine handling files, for reasons beyond our ken, is (n lg n). We're now at k * log(k) * m^2 * n.
And so, over any sufficiently long call chain, "what is n" is the overriding question - string length, number of strings, number of files? Not "how complex is the algorithm", because you want to optimize for what's relevant to your use case.
It would be a huge step to simply follow the call tree and report the depth of nested loops for each branch. You could then check what N is at each level.
The trick is knowing where the nested loops are since they can be spread across functions.
I had a function that scaled as N^2, but it was creating a list of that size as well. Then it called a function to remove duplicates from that list. That function was N^2, which meant the whole thing was actually N^4. And now that I think of it, those loops were not nested... I rewrote the first part to no create duplicates and deleted the quadratic deduplication. Now its N^2, but it has to be.
I guess you're right. Keeping track of it all is required for the information to be meaningful enough. Still seems doable to me, assuming the functions are pure.
Here's another crazy idea: keeping track of this while taking into consideration aggressive compiler optimizations.
There is no general solution to deciding whether a program is O(n^k).[1] So either your static analysis won't know the answer for some programs or report a wrong bound, or report a ridiculous overestimate.
So? Static analysis doesn't need to always produce an answer, only produce an answer most of the time. The question isn't whether you can do it in general for all inputs (this is not possible for basically anything you would want to know), it's whether you can do it enough of the time on the kind of code which people actually write.
It's not hard to implement the construction in the proof. Generally you'll encounter problems in the wild in any interpreter. Similarly you can encode many open mathematical problems into simple programs where finding runtime bounds is equal to solving the problem. The Collatz Conjecture for example.
personally, I think I wouldn't even bother to check the algorithmic complexity of every external function I call. I'd just use the logical choice (like sscanf) and only consider optimising if things started to slow down and profiling the application highlighted it as a bottleneck.
I personally would, if it was listed in documentation. Doing stuff and profiling later is the right general approach to performance optimization. But what's better is not doing stupid mistakes in the first place, if they are trivial to avoid. To achieve that, you need to know the complexity guarantees of functions and data structures - or at least their ballpark (like, "this could be O(n) or perhaps O(n logn), definitely not worse").
This is where setting the guarantees and documenting them is useful - it allows people to trivially avoid making these performance mistakes. Prevention is better than cure, in that - as GTA Online case demonstrates - in the latter stage of product development, people may not bother fixing performance anymore.
It might still not help in he case of sscanf. The documentation would specify that it’s O(N) in the size of the input string, just what one would expect without deeper thought. The problem is not O(N), the problem is that N is the complete input string, not just the part being parsed. The documentation would have to include a big fat warning about that.
Part of the issue in my mind is that big O complexity values don't (can't?) tell you the point of inflection, if they are nonlinear. Sscanf could be O(N^10) (does the rule about ignoring the coefficient also apply to the exponent?) but if it only starts to hurt your application at the point of 10tb string reads then it's still unimportant.
I do agree that people often take "don't optimise early" as licence to not make optimisations up front that will clearly be needed (for example, implementing the buffer as a rope in a text editor), but I don't think this is the case here unless you test the function with reasonable future file sizes (something you should be doing anyway) and Sscanf profiles as a bottleneck.
I agree. And I'd personally probably trip over it too, if I used C functions much (I use C++ standard library equivalents, and I'm going to recheck string parsing code in our project now, because we may have that same problem, just with different API!). strlen() in a loop is something that can sneak up on you by virtue of having too many layers of abstraction - scanf() family being one of them.
But what I'm saying is, documenting big O complexity is useful, even if imperfect (and if the function has "tricky" complexity, the conditions where it gets bad should be documented too!).
> Sscanf could be O(N^10) (does the rule about ignoring the coefficient also apply to the exponent?) but if it only starts to hurt your application at the point of 10tb string reads then it's still unimportant.
Sure, but then applications grow, functions get reused. Iff it's trivial to spot and replace O(N^10) Sscanf with a O(n) alternative, I'd want to know the complexity and do the replacement immediately - otherwise, you may discover two years later that the company is investing employee-years into horizontally scaling your application as it's slow under workload, where fixing that one thing would let it run that same workload on a laptop, on a single core.
(This is not a joke, there are "big data" stories like this.)
> But what I'm saying is, documenting big O complexity is useful, even if imperfect (and if the function has "tricky" complexity, the conditions where it gets bad should be documented too!).
Ah yes I totally agree on this point, it should be documented for those who are more cautious/mathematical in their engineering efforts. I'm just not one of those people! I prefer to focus on the structural behaviour of a team (if it's my responsibility) to ensure that cases like:
> two years later that the company is investing employee-years into horizontally scaling your application as it's slow under workload
Resolve correctly - if a team is set up to respond healthily to production slowdowns (of course the equation is different for software with a slow/nonexistent update loop) then IMO you don't need to invest as heavily into up-front optimisation and can instead invest into features, allowing (sample-based for busy paths) profiling in production to notify you when optimisation is warranted.
At the end of the day, up front optimisation is an exchange of time with your future self/colleagues. It's worth it in some cases, not in others - but knowing where the balance in a tradeoff lies in your circumstance is a good chunk of all engineering decisions!
The important part of the documentation would be that N is the length of the entire string, not just the subset of data that needs to be processed. The actual complexity isn't relevant in this case.
Yes, I absolutely think profiling and then only optimizing the actual problems is always a sound choice.
I don't check the docs for every library function I use. I'm just saying, it wouldn't hurt if, when you do read the docs for standard library functions, the algorithmic complexity was mentioned in passing.
In principle, that sounds good. But then it can happen that you profiled when N=1000 and it seems fine. Then a few years later (like in GTA), N has grown to 63,000 and it's no longer fine. It seems unlikely the developer will go back and profile it again.
Also, I think the original Windows Update algorithm for figuring out which updates you needed to download started out fine, but 20 years later it turns out it's quadratic and now there are tens of thousands of updates, so it becomes almost impossible to install XP SP2 from a CD and have it update itself.
But you'll lose so much time doing that! Realizing there's a bug and investigating it is a huge amount of work compared to never writing it in the first place.
But if you know the performance of an algorithm up front, you don't have to spend any time optimizing it in the first place. You just know what to do, because you know the performance.
For instance: suppose you are building a CRUD app on a SQL database. Do you (a) add indexes for important queries as you go? or (b) ignore indexes and later profile and see what queries are slow. No, of course you just make the indexes in the first place. Having to do the latter would mean that instead of having a fast app out of the gate, you have an app that gets slower over time and requires additional dev time to debug and improve. Profiling and fixing performance problems is a massive waste of everyone's time if the problem could have been dodged when the code was being written.
It's different if the optimization is significant engineering effort. Then, yes, put them off till it's needed. But most aren't, in my experience: most optimizations are totally simple, in hindsight, and the code should have been written that way in the first place.
Of course you index hot columns up front in that case, but I think where we disagree is that you want to generalise "optimise up front" into a rule, do or don't; I consider whether it's applicable in the circumstance. C programs tend to use a lot of system calls, and are also usually easily rapidly testable with large data. So rather than profile every individual std function I call, I'll just profile the very resource intensive paths with different scales of data and see if anything pops off. If R* had profiled their JSON parser with a 1gb file, they would've found this bug.
I don't disagree unilaterally with "optimise up front"; I disagree with unilateralism.
I mean, that's my point too. There's a camp of people who will say "don't prematurely optimize! profile and tune the hotspots later" as a blanket rule and I think that's dumb. And I thought you were espousing that.
The line between bug fixing and faffing around is context based, and there are efficient and inefficient ways to both fix bugs, and faff around. Profiling every stdlib function is probably both inefficient and faffing around unless your circumstances dictate its a worthwhile and effective (there aren't better alternatives to reach the goal) effort.
There are many implementations of the C standard library and thy may not have the same complexity. In this case the code has the right complexity(linear) but to the wrong n (string length instead of parse length)
> It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.
Isn’t the point of standard library functions to define the contract and not implementation constraints? In other words, if algorithmic complexity is a significant concern, perhaps a standard library function is not a suitable choice.
This is a cool idea..I'm sure it's been done before automatically in code, but I'd love to see this in my editor. It sounds similar to the bundle-size extension that you can use that will annotate how big a package that you're importing is in node.
> standard library functions to include algorithmic complexity as part of the standard documentation.
it would be acceptable to have a graph of the input size vs time, and this graph could be autogenerated using a testbed that is also used by unit testing! two birds one stone!
I don't think we could come up with a standard x axis bounds for such a graph, since n=1000, or 1,000,000 may not be zoomed out enough to showcase the behavior approaching infinity.
I don‘t get the heat of this topic. Yes they wrote some very slow code because it‘s easy to shoot in your foot with scanf. It‘s nothing new that most software could be heavily optimized by just benchmarking slow parts. There is no reason for this shit storm than to feel better than other developers.
The real problem is that they shipped a game with a loading screen which is taking minutes and not looking whether they could optimize it. THAT is the real shit show!
Exactly. The problem isn't the bug itself or the developers who introduced it. The problem is they simply didn't care enough about their billion dollar game to fix the problem over seven years after release until someone got mad enough to reverse engineer the game, figure out why it was so slow and fix it on their behalf.
People will always make mistakes but it's how they deal with them that matters. Gotta have enough pride in one's work to be bothered when it performs badly. Having an user fix their billion dollar project for them just because they couldn't be bothered is just embarrassing.
Yeah, but I think it puts to test the usual mantra of "write now, profile later, and optimize the bottlenecks". As reality repeatedly shows, developers often don't progress past the "write now" part, even if performance issues are crippling and annoying users to no end.
We can and should do better.
What this topic also is, is a reminder that with null-terminated strings and enough abstraction layers, you can easily make a O(n^2) or O(n^3) out of something you thought is O(n). I technically knew this (there was a popular article about it many years ago, I think by Joel Spolsky, but I can't find it now), but I didn't bother to look out for it in my current project. Thanks to this debacle, I'll do a performance review looking for "accidental quadratics", and I'm guessing many other developers will do it too. So it's a win!
Forget pride in your work, this is literally costing rockstar money. It may be hard to estimate, but I’m sure it was in the hundreds of dollars a day category at some point. Shaving milliseconds off of loading time has real effects in the web world and so at a previous job we made sure to evangelize that fact to management and did what we could to tie it to revenue as well as other KPI in order to get this kind of thing prioritized. I’m just surprised there isn’t a PM at rockstar pushing hard to reduce load times.
I think that its fairly notable that functionality, that have been arround for so long, and have been implemented so many times, is as poorly implemented as this.
Usually you can count on the C std lib to be very optimized. Many std functions like memcpy are even intrinsics in compiles, and than means they are literally faster then its possible to write in C since someone has gone in and hand optimized the assembler.
Thing is that they didnt ship it that way.
Back when it came out the loading screens were "fast". Things just grew out of proportion with the exponential increase of new items in the online mode.
And that‘s ok. But it seems no one in management approves benchmarking it now that loading is taking several minutes (!).
I am not blaming the developers, they have a lot to do every day. Maybe someone even wanted to try to fix it. But that it‘s still like this is clearly showing that management doesn‘t care and they are completely ok with a loading screen taking longer than brewing a new cup of coffee.
"it will open a 97 MB binary STL file in about 165 milliseconds flat, on a 2013 Macbook Pro. This is blinding fast."
This actually sounds incredibly slow, that's nearly 1/5th of an entire second. What can it possibly be doing? :)
In case anyone else was wondering, I followed the link and clicked the description and this is actually based on the time to the first frame being rendered - not just the time to load the file (which should be essentially infinitely fast). So it's actually more impressive than it sounds.
From my experience creating loaders for 3D formats (FBX, glTF, LWO) it's not loading the file that takes a long time, it's parsing the data in the file and converting it to a suitable format for rendering in OpenGL. In practice, most people use the terms "parsing" and "loading" interchangeably, or "loading" means "reading + parsing file".
There can be a lot of processing involved (looking at you FBX) or less (glTF, probably STL) but there's still going to be at least decompressing the binary data and copying it into buffers then uploading those to the GPU. So, without knowing how STL binary is specified, parsing 97mb in 165ms seems reasonable.
1. Indexing the mesh. STL files don't contain meshes, they instead have a triangle soup. Indexed meshes are more efficient for rendering, they save VRAM bandwidth and vertex shaders.
2. Computing normals. STL files have per-triangle normals (can be complete garbage because most software ignores them), for smooth surfaces you want per-vertex normals. Computing them well (like I did there https://github.com/Const-me/Vrmac#3d-gpu-abstraction-layer ) is slow and complicated.
Touché – after all, disk-to-RAM is hundreds of MB/s, and faster if it's cached!
In practice, I'm racing mesh loading against "how long does the OS take to give you an OpenGL context", which is rarely below 160 ms (longer if it has to switch from integrated to discrete GPU).
Do anything with a file where you actually have to touch every byte (e.g. parsing), it is pretty impressive to get anything to go faster than 100MB/s. 500MB/s is blindly fast, IMHO.
Yeah, parsing text with a state machine is slow. Parsing, say, HTTP at that speed would be impressive without SIMD. But this is a binary file full of fixed sized structures, hence my confusion.
Anyway, the answer is there, it's actually measuring the performance of sending the mesh to openGL, to the GPU, and getting a frame rendered.
When there is nothing to do. Like with an array of fixed sized structures all you need to know is how many there are and then you can increment a pointer past any number of those objects, and that pointer increment itself can probably be compiled away to nothing.
It depends on exactly what you're measuring, you see. If you are loading data in order to do something, then the "do something" part will incur 100% of all the costs in that case. So when you subtract the "do something" part to just be left with the "load it" part, you can end up with a meaningless zero-to-do/doing-it-infinitely-fast kind of a result.
So then, what would you measure? Just peak RAM throughput? You can get that from the data-sheets :)
I used to work for people that processed emails and loaded them into databases with perl scripts. One day someone asked me if I could help, because the script they were running on a batch of emails was inexplicably freezing or running out of memory, I forget the exact details.
There were maybe a few thousand or tens of thousands of emails, and so, I came to look at the issue with my usual attitude which is that if it isn't running instantly on a GHz computer of any kind, something must be horrifically wrong. Not to mention running out of memory.
It turned out that essentially every email was cc'ed to several thousand people; you know, the thread went to a whole company or something. The file size itself wasn't huge in the scheme of things, but our perl script (written by someone much more elevated that me, with a master's degree) read the whole file at once into memory, and expanded the headers to perl hashes, multiplying the size.
But there was no reason the whole thing had to be in memory. Only that people learn to program in CS 101 these days, I guess, as if memory is infinite, because gigabytes might as well be infinity, but if you multiply a certain moderate overhead times tens of thousands by tens of thousands, suddenly all your gigabytes are gone.
Another thing I remember, was when I was given a T-SQL report that typically ran on a few thousand documents, and was asked to run it on all of the millions on all our databases. It was hundreds or thousands of times too slow, but it was running a SQL statement in a procedural loop per document and it could be turned into a single statement.
So my experience has basically taught me that if something is within a factor of two of optimal, it's good enough, but an incredible amount of code, regardless of high or low level, is way worse than that.
But I've never gotten much pay or glory for fixing this sort of thing. Sometimes I daydream about being a high paid consultant who goes and turns three nested loops into two, or two into one.
There is a whole lot of low hanging fruit in the world. When I am new at a job if I don’t find several order or two of magnitude improvements I am impressed.
You can fault the docs but what's the problem with the API? Why should you be surprised that accessing a collection with 'nil' index is a runtime error? What else could it be?
A simple fix in the doc seems to solve the confusion:
"Returns an index that is the specified distance from the given index, unless that distance is beyond a given limiting index [in which case it returns nil]".
It does say "returns an index ... unless ..". So yeah, a bit too terse. But with is the issue with the API?
// /!\ Warning! Do not use in production! /!\
let s = "Swift"
if let i = s.index(s.startIndex, offsetBy: 5, limitedBy: s.endIndex) {
print(s[i])
}
"What does it print? Nothing, you hope?"
Seriously? 'print(s[nil])' should print nothing? How about 'let c = s[nil]'. Should that just silently pass? That is the runtime error, btw. It won't even get to print(). (Valid to question the entirely non-informative error, however.)
If s.index() returned nil, the "if" would test false and the s[i] would not be reached.
The problem is that it returns non-nil, but the _limit_ is broken in this case: using s.endIndex as a limit means you can get non-nil but bogus indices returned. And yes, this is the fault of the docs for using a broken last arg to the API, but there's no really clean way to use this API as designed, afaict. At least not if you want to limit to "end of string" as opposed to "some index into the string that I already know to be valid".
The index is not bogus, the API is working as designed. The example provided shows the use of the index, which I understand can be confusing because the index returned may not always be valid for this, but the index is decidedly valid. FWIW, since String is a BiderectionalCollection, this code works for what you are probably trying to do:
let s = "Swift"
if !s.isEmpty,
let index = s.index(s.startIndex, offsetBy: 5, limitedBy: s.index(before: s.endIndex)) {
print(s[index])
}
I am sure the equivalent code in other languages, barring Python, is going to be similarly verbose.
The problem is that `s.index` with an `offset` equal to `limitedBy` returns non-nil index, rather than nil, but that index is invalid (out of bounds) and causes the program to blow up...
let s = "Swift"
print(s.index(s.startIndex, offsetBy: 4, limitedBy: s.endIndex))
print(s.index(s.startIndex, offsetBy: 5, limitedBy: s.endIndex))
print(s.index(s.startIndex, offsetBy: 6, limitedBy: s.endIndex))
outputs:
Optional(Swift.String.Index(_rawBits: 262401))
Optional(Swift.String.Index(_rawBits: 327681)) <- this is unexpected
nil
For me the moral of the story is do not use (whatever)scanf() for anything other than toy programs. In most cases implementing your own tokenizer (for both of these cases of reading numbers that involves str(c)spn() to get length of candidate token and then strtosomething()) is significantly easier than reasoning about what scanf() really does (even ignoring accidentally quadratic implementation details) and whether that can be adapted to your usecase.
Yes. But any data format you read, particularly any plaintext data format you read, is essentially interpreting or compiling a DSL. On a typical job, people are writing compilers much more often than they think!
It is a general term for the process of breaking a string into "tokens" which have a sort of meaning. Definitely a common task in compilers, but not limited to it.
I would argue the reverse - there is higher chance of this accidentally quadratic problems with more abstraction layers, convenience functions, and advanced language syntax. But I agree we shouldn't write parsing in C, but for other reasons :)
I say there's an equal chance, and it's equally easy to fix, but a high level language provides fewer distractions and temptations to work on small optimizations which don't matter. Getting in the weeds with the details is how you lose sight of the big picture. And allocate your time poorly.
My biggest issue with c-strings is that by requiring a zero terminator and a single char const , it forces a lot of copies(when truncating) and calls to strlen(caller doesn't know the length).
Had it been a char const /size_t or a pair of char const * for first/last it should be both safer and faster. I prefer the later as a pair of pointers doesn't require updating both when iterating with the first ptr.
Wouldn't this be an argument to go in the opposit direction? If you are using high level functionality that you dont know the implementation details of, you are running the risk of unintended consequences.
I am a C programmer who have implemented string to number parsing for this very reason. I know exactly what it does and how fast it is.
If you do use code you didn't write, The chance of a standard library being poorly implemented, is probably lover then most other libraries, so picking a non standard lib as a grantee against bad performance seems misguided.
I think it goes both ways in that you either go full low level and write yourself everything (for questionable benefits), or you use a (possibly higher level) language with sane standard library, but the important thing is the quality of said library.
I find that writing everything yourself, especially simple things like a text2inteeger parser, is very valuable because it takes very little time and it levels up your understanding of the system. I'm starting to believe that you rarely understand something until you have implemented it. Therefor implementation is the best way to learn.
I’m totally with you here, it is a really invaluable learning tool. But similarly to science with “standing on the shoulders of giants”, we would not be anywhere if everyone started everything from scratch. Like, it’s okayish to reimplement a vector or something, but even a sort gets harder (especially if you want to make one that is performant both on a few element list and longer ones). And algorithms are just one thing, will you also learn into the totally foreign world of eg. audio processing?
Basically the only silver bullet for productivity increase is shared code. We just have to place higher emphasis on software correctness/quality instead of the functionality churn in certain parts of the software world.
I've taken it a step further (or three). I'm building a whole computer on the principle[1] (one possible way of looking at it) of avoiding parsing as far as possible.
Mu takes the dwm[2] principle of avoid-config-files/modify-sources-to-reconfigure to the limit. The idea is that there are at any given time only 3 languages in the computer:
1. A self-hosted notation for a subset of 32-bit x86 machine code
2. A memory-safe statement-oriented language where most statements map 1:1 to machine-code instructions. Written in level 1 above.
3. A high-level interpreted language.
The vision is to fix 1 and 2 but allow 3 to fork wantonly. If you want a Lisp for your HLL, make a fork. If you want a Python-like syntax, make a fork. But, and this is the important part, in any given computer/repo there is only ever one language. Only one non-trivial parser. (Levels 1 and 2 have extremely uniform syntax.)
As best I can tell, the #1 way to avoid the need to run a fuzzer is to avoid writing parsers. Just say no.
Hi, level 1 and 2 looks really cool, but I may not understand the point of only having these languages “running” on a computer? Both 2 and 3 are interpreted and the interpreter is the area you want to minimize?
What about a program written in 3 that can compile to either 1 or 2? Why would it hurt anything to have a different language somehow made possible to run here?
I'm not sure I follow your question, but the idea is that the computer the end user receives has a single opinionated language for programming it. Kinda like Basic for microcomputers. The end user is of course welcome to build their own languages. That is encouraged! But multiple languages make the computer more difficult for others to comprehend. My goal is to convince you that, all things being equal, a computer with fewer languages is in your best interest. Less code, fewer bugs, fewer security holes.
(Level 2 is not interpreted, it's compiled. Skipping level 2 would be totally in line with my principle above. But it would be more painful, I think. You'd basically be programming in machine code.)
My question is regarding why would a single language codebase be easier to comprehend/have fewer bugs, security holes? In terms of a single program it makes sense, but I seldom read the source code of a library for example I depend on - if it has a good public API, it could be written in anything for all I care.
Not trying to dismiss the idea at all, just I don’t yet see “the light”, so to say.
Yeah, these are good questions and to be fair your questions are shared by many.
My basic worldview is that we need to have 100-1000x more people reading and auditing open source. The original value of open source was in the ability of people to read the source. If we don't use that ability then we don't really get the benefit.
The world today focuses on APIs and ignores implementation. I would argue that's the biggest source of problems today, with security holes, data breaches, user-hostile UX and so on.
If you accept that being able to read sources matters, hopefully it makes sense why reducing the number of languages matters. Every new language you add is more code to read, another language to learn and become proficient in, a new source of gotchas and subtleties to spend 10 years learning. Another set of tools, another set of moving parts that might not build on your computer because of some subtle version mismatch.
It's a hard problem. So let's make it easier for ourselves by relying on fewer languages, and being more thoughtful about the dependencies we introduce into our projects.
Thanks for the answer! I totally agree on the not enough people read source code part — unfortunately I believe it is not only a language “barrier” thing. I mean, even in a language I know by heart, I probably could not make sense of some complex part of the linux kernel, because I lack both the underlying technical knowledge on some hardware interface, or the context of the code. And especially this latter can not be overcome with only code with good comments. It needs good documentation, which should give a basic understanding, and on top of it we can build the code for the fine detail. Of course it is a noble goal to try to decrease the surface area of the required knowledge, so props to you!
What’s your proposed solution to the problem with low and high level languages? Is the level 3 language a higher level one? Because I’n not sure there could exist a one language to rule them all, because of the inherent difference between the two domains.
Yeah, level 3 will be a HLL. It just doesn't matter too much which one it is, or that it "rules them all". A single reasonably high-level language X is in practice superior to a basket of high-level languages, even if some of the languages in the basket are individually higher-level than X.
You're absolutely right that languages are only part of the problem. Beyond the language choice, Mu provides guardrails to help you pick up the underlying technical knowledge by just trying things and seeing informative error messages (often failing tests) in response. That's the hope, anyway.
Right now the first HLL is still in progress. I spent some time with a postfix-based language before changing my mind. Now I'm working on a Lisp-based HLL. So I'm not dogmatic about what the HLL should be, and in time there will probably be multiple options in separate forks/repos.
Wrong conclusion. The problem is the broken C string library in POSIX. Don't use it! Wrong design. Zero-termination is too fragile and the cause of the evil.
Rather use buffers with known lenghts, and for strings you need to known the string (=unicode) rules. Nobody does that. Know libunistring. I have my own, because libunistring is too slow, but know it.
For my string libraries I rather follow the STL, with ranges/views and boehmgc core. const vs dynamic strings. So I will not step into the accidental strlen and buffer-overflow trap.
E.g. For input buffers know if they are zero-terminated and const. With the GTA post I pointed out the libfuzzer design flaw, giving you an ASCII input buffer which is not zero-terminated. Even strtol/strtod cannot be used then. You need to copy the buffer, terminate it, and then you can use the broken string libc. Not talking about sscanf, which I usually use only as sscanf_s if available. Or _snscanf/_snscanf_s. Microsoft does many things wrong, but its libc is far superior to glibc, bsd or musl. musl is better than glibc, but also lacks in this regard.
Don't roll your own parser? How the hell would you get anything done? Unless you don't count regular expressions or something, I can't imagine somehow avoiding problems requiring parsers, especially on any unix-based system.
There are a lot of tasks that only need to work with existing, commonplace file formats with existing high-quality parsers (e.g., JSON, XML, sqlite, ...).
SQLite I can grant you, but I'm starting to get a bit worried about certain XML parser popular in C/C++ land. Might do a swing at it with a profiler later today.
This is a sign that writing (trivial) parsers is a core competency for you. However, that doesn't meant that writing all parsers is your core competency. Especially not for an industry standard like JSON that should have plenty of libraries that take care of the problem.
Writing parsers is the same level of complexity as interview FAANG level questions. Any decent developer should be able to write a parser from scratch.
Any decent developers can write a parser. Most can't (or at least don't have enough time before they need to get back to whatever their main job is) to write a fast, secure and robust one for a complex grammar.
I didn’t follow the original story or comments about GTA, but based on the description in this article, I wouldn’t be surprised that this sort of problem could happen to any coder of any experience level and I wouldn’t give them any grief, but I would be surprised that the problem would be live in production for a very long time without ever having been profiled. Surely seeing JSON parsing taking more than 70% of the time would have made it onto someone’s radar?
I would bet that a lot of games receive a lot less development effort after release. Most of the devs probably got moved on to something else (sequel, or maybe a completely different title).
GTA Online is a live game that pulls in over half a billion USD per year in revenue. They release new content on a regular basis. It's actively developed, they just don't care (my guess would be that defects were filed against this on a regular basis and fixing it was never prioritized)
The problem was reported in the comp.lang.c newsgroup on Usenet in 2002. This very discussion mentions the GNU C library bug report that has been around since 2014. It was reported in RapidYAML in 2020. This problem has been noticed in production, several times over, and yet still lives to this day.
You'd be surprised. In the companies I worked for so far, it's usually my radar that bleeped over ridiculously inefficient code that was present for years in the codebase. That is, some developers would be aware something is off with performance, but they didn't bother to do anything about it. Hell, sometimes even management would know users complain about performance, but the feedback didn't percolate down to developers.
Sure, I get priorities and "good enough". But really, about half of the order-of-magnitude wins in performance I achieved were on things you could find and fix in few hours if you bothered to look. The other half tends to be unfortunate architectural decisions that may take a few days to fix, so I get why those don't get done.
The premise of the post is wrong, from what I read here and on reddit the people were rightfully complaining about the lack of reaction of Rockstar, not the initial bug that could happen to most people.
It is a good opportunity to mention that you should not use strtod/strtol either if you can help it, since they are impacted by locale. Exactly what to use instead is a bit of a tough nut to crack; you could extract musl’s floatscan code, or implement the Clinger algorithm yourself. Or, of course, use programming languages that have a more reasonable option in the standard library...
I see you are reiterating this point raised in the previous discussion several days ago, but I don't thing it is particularly well grounded.
ISO C allows strtod and strtol to accept, other than in the "C" locale, additional "subject sequence forms".
This does not affect programming language implementations which extract specific token patterns from an input stream, which either are, or are transformed into the portable forms supported by these functions.
What the requirement means is that the functions cannot be relied on to reject inputs that are outside of their description. Those inputs could accidentally match some locale-dependent representations.
You must do your own rejecting.
So for instance, if an integer token is a sequence of ASCII digits with an optional + or - sign, ensured by your lexical analyzer's regex, you can process that with strtol without worry about locale-dependent behavior.
Basically, rely on the functions only for conversion, and feed them only the portable inputs.
I can't really understand what you mean. You can validate yourself that parsing with strtod will break if your system's locale is set to a locale where the decimal separator is a comma ',' instead of a period '.' - as an example, most European locales. Whether or not strtod will try to magically fall back to "C" locale behavior is irrelevant because it is ambiguous. For example, what do you do if you are in Germany and you try to parse 100.001? Is it 100001?
strtod also doesn't guarantee round-trip accuracy that you can achieve when you use Steele & White and Clinger. All in all, I really think it is just not a good idea to use the C standard library for string operations.
Sorry about that; I see the next now in the description of strtod where it is required that the datum is interpreted like a floating-point constant in C, except that instead of the period character, the decimal point character is recognized.
Yikes!
There is a way to fix that, other than popping into the "C" locale, which is to look up that decimal character in the current locale, and substitute it into the string that is being fed to strtod. That's a fairly unsavory piece of code to have to write.
What locale issues did you find for strol{l,ul,ull}?
> I really think it is just not a good idea to use the C standard library for string operations.
I really despise locale-dependent behavior also, and try to avoid it.
If you control the entire program, you can be sure nothing calls setlocale, but if you are writing middleware code, you can't assume anything.
Also in POSIX environments, the utilities react to localization environment variables; they do call setlocale. And so much stuff depends on it, like for instance operators in regex! [A-Z] does not refer to 26 letters; what [A-Z] denotes in a POSIX regex depends on the collation order of the character set.
There are functions in the C library without locale-specific behaviors, though, like strchr, strpbrk, strcpy, and their wide character counterparts.
> There are functions in the C library without locale-specific behaviors, though, like strchr, strpbrk, strcpy, and their wide character counterparts.
Obviously these string functions are totally fine if you use them correctly, and their implementations should be fairly optimal. However, I do think they put a lot of onus on the programmer to be very careful.
For example, using strncpy to avoid buffer overflows is an obvious trap, since it doesn’t terminate a string if it overflows... strlcpy exists to help deal with the null termination issue, but it still has the failure mode of truncating on overflow, which can obviously lead to security issues if one is not careful. strcpy_s exists in C11 and Microsoft CRT, though I believe Microsoft’s “secure” functions work differently from C11’s. These functions are a bit better because they fail explicitly on truncation and clobber the destination.
OpenBSD arguably has some of the best security track record of all C projects and I still feel weary about their preferred mechanism for string copying and concatenation (strlcpy and strlcat.) I feel strlcpy and strlcat are both prone to errors if the programmer is not careful to avoid security and correctness issues caused by truncation, and strlcat has efficiency issues in many non-trivial use cases.
While there are obvious cases where dynamically allocated strings of arbitrary length, such as those seen in C++, Rust, Go, etc. can lead to security issues, especially DoS issues, I still feel they are a good foundation to build on because they are less prone to correctness issues that can lead to more serious problems. Whether you are in C or not, you will always need to set limits on inputs to avoid DoS issues (even unintentional ones) so I feel less concerned about the problems that come with strings that grow dynamically.
One of the biggest complaints about prefix-sized strings/Pascal style strings is that you can’t point into the string to get a suffix of the original string. However, in modern programming languages this is alleviated by making not only dynamic strings a primitive, but also string slices. (Even modern C++, with its string_view class.) String slices are even more powerful, since they can specify any range in a string, not just suffixes.
So really locale and strtod are just little microcosms of why I am weary of C string handling. Clearly you can write code using C string functions that is efficient, secure and correct. However, I feel like there are plenty of pitfalls for all three that even experienced programmers have trouble avoiding sometimes. I don’t actually know of a case where locale can break strtol, but it doesn’t matter too much, since anyone can write a decent strtol implementation (as long as they test the edge cases carefully...) strtod though, is not so easy, and I guess that means apps are best off avoiding locales other than C. In a library though, there’s not much you can do about it. At least not without causing thread safety issues :) In other languages, aside from dynamic strings and string slices, locale independent string functions is also typically the default. Rust’s f64::from_str, Go’s strconv.ParseFloat, C++’s std::from_chars and so forth. It’s not too surprising since a lot of the decisions made in these languages were specifically made from trying to improve on C pitfalls. I do wish C itself would also consider at least adding some locale independent string functions for things like strtod in a future standard...
This just makes me think that null-terminated strings are the bad gift that keeps on giving. If we were to design an OS, language, or standard library in 2021 (or even 1999) we probably wouldn't use them, but we're stuck with this relic of a former era.
The thing is, they are even worse for performance than string implementations that store the length.. that extra few bits of memory is much cheaper than checking the size of a string everywhere. For example, copying a string with known length.
Also, c++’s strings even do some clever hacking where they store the text itself for shorter strings in the pointer, barring a pointer lookup. And this is possible only because abstraction.
They were designed when an extra byte or so per string cost you a lot of money. Nowadays, when 99% of the systems anyone will program start at 1MB RAM and 90% probably start at 512MB, they're a liability for almost no benefit.
You’ve got an extra byte either way, the \0 at the end. Which in many cases will make you copy a string because you can’t just “point” into a string literal and say take n chars from there. Of course I am not that old so I don’t have enough expertise — but seeming that every other language even at the time decided against it is pretty telling.
I think your parent was referring to the cost of storing a 2-byte string length instead of a 1-byte terminator. In the 1970s and 1980s, 2 bytes would likely be the minimum storage needed for the length of a general purpose string implementation. Although there were some language environments (e.g. Pascal) that had counted strings with a max length of 255.
Fair enough; but actually it can be more memory efficient as well because of the better reusability of substrings (in case of null-terminated ones only the end can be reused)
Ok, let’s assume that 10mb json source was loaded into a not null-terminated opaque struct str_t {size_t; pchar;}. You have to parse a number from a position `i’ and you have (double parse_number(str_t)). Next obvious step?
I think your code would be pretty much the same, sscanf, strlen and all. The main differences would be the standard library's implementations of strlen and whatever function you use to read the file into a string in the first place.
With opaque str_t you can’t just json[offset]. Should sscanf take offset with every string (sscanf(fmt, s, off))? Should we copy a slice of json and parse it? Should str_t have zerocopy mirroring ability (s2 = strmirror(s, off, len))? How many of these three are just a snakeoil that changes nothing?
It’s only pretty much the same until you try to write actual code with a new idea in mind.
You can offset your str_t by creating a new str_t that subtracts offs from the length and adds offs to the pchar. There is no need to keep track of the offset separately.
Rust's &str type, or the non-unicode-assuming version &[u8], allow creating (sub-)slices, which probably matches your strmirror function. Except that the same syntax works for all (dynamic/static) length arrays, and even allows custom code to e.g. transparently wrap an SoA[0] representation.
Okay, I see what you're saying now. I haven't worked with C strings in a while. Python uses offset parameters or seek operations in various places, and C++ string streams have an inherent position too (C++ probably has a number of other ways to do it too...).
Yes, I’m aware of it. I’m just tired by these layman’s “oh that’s another reason to ditch C strings”, when it has nothing to do with it. Working with offsets requires handling offsets and lengths, be it explicit ‘off’ and ‘n’ or a string_view. All that is needed in this case in C is snscanf (note the ‘n’), so that it would know its limits apriori, like snprintf does. Sadly that ‘n’ never made it into the standard.
Years ago I was working on a compiler frontend that was capable of reading multiple files into one program representation, as opposed to compilers that compile each input file completely separately. At some point we were trying to compile some project made up of many small files, and our frontend got very slow. However, when we concatenated all those files into one big source file, everything went smoothly.
I investigated. It turned out that we were keeping some sort of global symbol table structure. It was updated after parsing each file, something like this (the original was C++, but this is pseudocode because I can't be bothered):
list<symbol> global_symbol_table;
void update_symbol_table(list<symbol> new_symbols) {
global_symbol_table.add_all(new_symbols);
// Keep sorted so we can do efficient lookups with binary search.
sort(global_symbol_table);
}
For N files, this meant N calls to sort() on a list that grew linearly in N, so having something like O(N^2 log N) complexity overall.
This has a few problems:
1. Even if you want to use a "sorted list" representation, it would suffice to sort the new symbols only (the global table is always sorted by its invariant) and do a merge of the sorted list.
2. But really, you want a set data structure that does the same thing better.
3. But also, could we maybe speed up the lookups in some other way? I looked around for the uses of this global symbol table and found... none. We were keeping data around and updating it in the least efficient manner imaginable, without ever using that data.
I deleted the above function and the global symbol table, and performance was back to expected levels.
Still, folks in the comments section generally agreed: they wouldn't write anything that silly.
Well, if you've never accidentally crashed a system running an unexpectedly (and unnecessarily) "non-performant" piece of code then you're either an absolute genius of a coder, or relatively inexperienced.
I don't think it's a problem that they wrote anything "that silly" (okay - maybe that list/hashmap construct was pretty stupid to write originally). Instead, I think it is that they were satisfied with 6 minute load times for years. They should have been profiling this and tracking it themselves prior to launch, getting customer feedback, etc. Someone should have realized this was a problem and then developers on their team should have taken out the profilers and figured out what was up and how to make it come down.
So many times, in higher level code, it's seeing a foreach loop in a foreach loop, and the nested loop is calling an API or re-rerunning the same database call 5000 times.
Move things around or just use a cache... and instant 1000%+ speedup.
I've seen this too many times to count, often in apps and on sites that are in fairly heavy use.
The answer is often to scale up and pay for 10x more server capacity to handle the load, rather than spend some time optimizing the slow code paths.
A few years back, I was doing some geographic calculations. Basically building a box of lat/lngs and getting all the points within that box.
It was slow. Weirdly slow. I made sure the lat and lng of the records were in the index, but it was still slow.
More testing revealed that the way I was passing the lat/lng into the query was causing those values to be converted to strings, which were converted back to numbers, but caused the index to not be used. This meant the database had to do a lot more work.
Converting the parameters to numbers made sure the index could be used, which lead to a nice speed up.
At least that was an accidental conversion and an understandable mistake, I've seen the same with dates stored as strings from devs unaware that dates are just integers with a lot of maths to make them meaningful.
At once place our end of month billing was getting slower and slower over the course of months, from one hour out to about twelve and it had to baby sit and run in batches in order to not bring the whole database to it's knees. We couldn't change the mess of legacy classic asp of course, but adding a couple of actual date columns calculated from the string fields on insert/update bought the whole process down to seconds.
Whoa, that's an awesome story. Nuts that the slowdown happened over such a short period, must have been a fair amount of data running through that system. Lots of table scans <shiver>.
In a project I worked on few years ago, we had persistent performance problems, present for a long time before I showed up. One of the first things I did when I joined the team was do some unsolicited statistical profiling and narrow it down to database interactions. Since we were using a custom-written connector to a graph database, we assumed it's the connector, or the JSON-based API it's using.
The issue got on the back burner, but it bugged me for a long time. I was later writing some performance-intensive number crunching code, and ended up writing a hacky tracing profiler[0] to aid my work. Having that available, I took another swing at our database connector issue...
And it turned out the connector was perfectly fine, its overhead was lower than the time the DB typically spent filtering data. The problem was, where I expected a simple operation in our application to do a few DB queries, it did several hundreds of them! Something that, for some reason, wasn't obvious on the statistical profiler, but it stood out on the full trace like a sore thumb.
I cut the run time of most user-facing operations by half by doing a simple tweak to the data connector - it turns out even a tiny unnecessary inefficiency becomes important when it's run a thousand times in a row. But ultimately, we were doing 100x as many queries as we should have, nobody noticed, and fixing it would be a huge architecture rework. We put it on the roadmap, but then the company blew up for unrelated reasons.
I sometimes think that maybe if we tracked and fixed this problem early in the development, our sales would have been better, and the company would have been still alive. For sure, we'd be able to iterate faster.
I maintain my original position that sscanf calculating the entire length of its input is absolutely ridiculous. Are *scanf difficult to use safely, not very robust, and somewhat baroque? Yes. Should sscanf("%f") be a correct (not performance-killing) way of reading floats? Also yes. (Though aside: the OP seems to be reading data from files, so they could have just used fscanf, which has correct performance already.)
Unfortunately, many libcs are guilty of this:
- glibc uses memchr (the trail is convoluted, but ends up at _IO_str_init_static_internal)
- freebsd libc (and thus also the apple and android libcs, as well as those of the other BSDs) use strlen
- uclibc and newlib are the same as freebsd (appear to be copied directly from it)
- Since the original bug was in GTA, which only runs on windows, I must presume msvcrt has the same problem
- musl has the correct behaviour, processing input in 128-byte chunks
- managarm doesn’t strlen but looks broken for unrelated reasons. (Assumes nul byte means eof.) Also has codebloat because of templates.
- serenityos tries to implement fscanf in terms of sscanf, not the other way around! Unfortunately that means it chomps a whole line of input at every call, so it doesn’t even work correctly. Horrifying.
- pdclib has ok performance, but with an interesting implementation: it duplicates code between sscanf and fscanf, though the heavyweight format parsing is shared.
- dietlibc and sortix have the sensible, simple implementation
Reading this article was a surprise for me, I didn't know of this issue at all.
But this is pretty ridiculous. If it's possible to write scanf, which matches chars from a stream, why can't sscanf just do the exact same thing but check for '\0' rather than EOF...
It can, and the people who only check a few well-known open source C library implementations miss that there is quite a range of other C library implementations out there that do this very thing, from P.J. Plauger's through OpenWatcom's and Tru64 Unix's to mine. (-:
I don't know what you mean by that. I pointed out two libcs that do exactly that (that was what I meant by ‘the sensible, simple implementation’; perhaps that wasn't clear enough?) as well as multiple other approaches that also result in correct performance. And the managarm and sortix libcs (for instance) are hardly well known.
also a trivial workaround is to wrap the buffer with a FILE via fmemopen(3), which makes the temporary FILE object explicit and persistent for the whole parse, and needs just one strlen call.
many years ago i slaved away in a low end (5 people) ecommerce web shop with a homebrew php cms. i hated working there, but was too vain to quit. one day our biggest customer by far complained about slow loading speeds and i was set upon the task. it was spaghetti code of the worst sort, but i managed to find the culprit quickly: converting a list of rows (for pages) from the database into the hierarchical menu tree structure; of course it was O(n²). i fixed it and the page generation time went down from 7 seconds to a few milliseconds again.
they didn't let me push the fix back into the main repo because "it only affects this one customer, so we'll only fix it here". luckily i was fired shortly after. fin.
By the way, does anyone know whether red dead redemption online has the same problem? Maybe the issue was fixed for the newer game but they decided not to update gta repo?
C string processing is the other "billion dollar mistake" we are all paying for decades later (in terms of slow programs, time wasted, power used etc).
Given everyone's interest in the topic, can I also share something I wrote under "accidentally quadratic"? I think people might enjoy reading it: https://news.ycombinator.com/item?id=26337913
It turns out that multi-pass algorithms in general can be quite susceptible to this issue, and it can be far from obvious in general.
Could the sscanf bug also be a security issue? Most C strings are null terminated, but I could imagine using sscanf to read outside of bounds due to the null-seeking behavior on a non-null terminated array.
If it is an issue, I think it probably can’t be used for more than a DoS or a timing attack. That said after meltdown & co anything is possible.
I've always hated how that's used as an example of recursive programming in early programming education. Invariably, students try a large n, get a stack overflow and conclude that recursion is inefficient, leads to mysterious errors, and must be avoided.
And now your memory usage will grow eye-wateringly large. Instead, convert the algorithm to be iterative or at least tail-recursive and it will be faster than both the naive and memoized versions!
The "closed-form solution" is slower than standard method. It just uses arbitrary-precision fractional arithmetic (square root, exponentiation, division) instead of arbitrary-precision integer arithmetic (exponentiation of a 2x2 integer matrix).
Not exactly. This is more an artefact of language design.
If you convert everything to continuation passing style, then every function call, tail call, recursion, etc. is just as expensive (and expressive) as a GOTO. This is, incidentally, the main "trick" or lightbulb moment in the classic Cheney on the MTA paper by Henry Baker [1].
Now if we're talking specifically in C, then absolutely! But while this intuition holds for C, it's not always true and doesn't have to be always true.
I'm thinking of the memory model for each - one mutates some variables (at least some sort of index, unless you're doing an infinite loop) in a single frame, while the other accumulates function calls and stack frames until you bottom out, then return values are "folded" back up.
Semantically, from a caller's perspective, they can achieve the exact same thing, but aside from that they seem radically different to me - is there anything else that could lead us to say that recursion is a form of looping?
> In computer science, a loop is a programming structure that repeats a sequence of instructions until a specific condition is met.
That's the general definition at least I've always been most aware of. I don't want to claim it is the most common one, cause I don't really have numbers and who is the authority on comp-sci definitons? But I do feel it is at least a somewhat common academic definition for looping.
That would mean that recursion is a form of looping. The distinction between recursion and imperative loops would then be a matter of implementation and exposed programmer interface. Similarly like others have said, goto's can also be used to implement similar loops.
And in that regard, there are variants of recursion as well, such as tail-call recursion which has a similar memory model as what you described and differs only to an imperative loop in the programming interface it exposes.
There's no denying that from that definition they are the same. It's just after you've debugged enough loops and recursions you can't help but think they are quite different!
Well, I don't mean they are the same, they're just different kinds of looping constructs.
Like what you call a loop isn't a loop, its actually a for-loop, or its a while-loop, or a for-each loop, or its an iterator loop, and similarly recursion is just a recursive loop.
At least that's the common taxonomy I know off. So all these are loops, and the ones that involve mutation for the condition to kick in are further grouped as imperative loops.
This isn't really related to your question, but I don't think tail calls could help for Fibonacci since f(n) branches to two calls, f(n-1) and f(n-2). And each of those branches into 2. So it can't be done in a finite stack area with naive recursion.
The compiler would either have to memoize, or be extremely clever and start at the base case (0, 1) and then transform the code to use the 2x2 matrix exponentiation. I wouldn't have been suprised if GHC haskell was that clever, but even with -O2 "print $ fibb 10000" isn't terminating.
That's pretty much what I meant by the matrix exponentiation method - applying the map (a, b) -> (a+b, a) repeatedly. Your function definitely uses tail calls, but I was just trying to say more than tail call optimization is needed to transform the trivial recursive version
If you replaced the recursion with loops but implemented the same algorithm you'd just have to manage a stack on the heap. I don't think that would be faster.
At least for Racket, which does have TCO, the for-loop is really a macro for a tail recursive loop. I'm not sure about other languages with more aggressive optimization though.
Strange. I've written some simple parsers over the course of my life. Latest one was to parse some subset of SQL for home grown in memory database.
I do not remember ever using anything even remotely resembling scanf or related stuff. It was always read next char from some abstracted stream and execute state machine on it.
I think these kind of things get caught in an organization with technically competent people that can give feedback to the product process. People who can understand from first principles how long something should last, and don't tolerate it lasting ten times as much.
It's just plain weird that the library would use a generic length function. Might it have something to do with needlessly accepting pathological NaN representations?
The expected code would do something closer to this:
len = strspn(s, "0123456789INFTYAEBCDXPinftyaebcdxp-+.");
Cool stuff mate! Searching for sscanf in github produces over 19 million results with over 15 million in C code. I bet there might be few cases where it could be replaced with something else... :) (not saying it's not useful function when used correctly)
The reward system of my brain wants me to write clever code to solve the "core" problem. The remaining 80-99.999% (depending on the environment) of the code - authentication & authorization, logging, parsing, I/O, ensuring that the program is "well behaved" - all that is perceived as "boring", "boilerplate", "glue code", which results in churning out code that is not thought through.
So yes, that code probably contains some bugs from the "stupid if you think about it for a second" category.
I think the only way I've avoided any of this is my academic work primarily deals with "write the performant, simulation codes in C++ or Fortran, write analysis in Python" (although others use matlab et al.) so everything parsing related has been through python, not C++, which ofc just has the "float()" type constructor. Generally, like most academics, I own my whole shop so fortunately no one uses my code other than one or two colleagues occasionally, so when bottlenecks arise I know where and what it is.
I agree with the "it can happen to you" bit and the fact that this is largely the fault of C's poor documentation (e.g. not mentioning algorithmic complexity) and poor design choices when it comes to strings (you get similar issues with buffer overruns when using standard library functions).
That said, the GTA thing was far more noteworthy because apparently the GTA developers hadn't bothered to profile loading times and get to the bottom of why exactly game load was taking such a ridiculously long time.
I think I would argue that both in this case and in case of GTA, sscanf is actually to blame. Surely, by profiling, this could have been detected, and workarounds are simple. But sscanf doesn't need to be so slow. A naive implementation of sscanf would not be slow. So I think it is perfectly fine to assume that scanf should only take constant time to parse sth like numbers (obviously not "%s").
Seems to me that the problem is the use of text formats in general. I understand the need for universality, but all it would take to solve this is for major OS vendors to ship with a clean simple HDF5 editor. I would also like to see a binary UI description format become a web3c standard, because, good luck trying to eliminate all the quadratic strlen in web services.
Honestly, I don't know why people store floats in anything other than binary. IEEE 754, or bfloat16, if you don't want all that precision is so much more compact & efficient to parse. The amount of work the computer does to convert text floating point to binary representations, and the potential for there to be subtle differences/bugs in the encoding are just... tremendous.
Some of the blame should be placed on the "worse is better" dogma. Often it really just means "my library does whatever was easy to implement". It has it's merits, but it's a very leaky abstraction. It's part of why writing good C code is hard: much of the hard stuff is left to the caller, because the library implementation would be more difficult otherwise.
It's impossible to know all the pitfalls, and the author notes that. Metrics (or - ugh - telemetry if it's client-side), as well as automated testing with expectations around performance can go a long way to prevent these issues. Of course, ideally everyone should think about how their code performs with large input, but everyone messes up every now and then.
Quick question: At the top of the parser they define
const char VERTEX_STR[] = "vertex ";
And a few lines in
data += strlen(VERTEX_STR);
Would parsers optimize this out? Seems like an easy win to replace that with a "7" (or a constant or something), although I don't know how much of a win it would be.
I think the real moral is that if you're doing something where performance matters even slightly, profile it and see if the time is being spent where you expect it to be spent. If not, investigate and fix as required.
Doesn't matter unless it's a critical path for your application. You can do things in an especially stupid and obnoxious way, as long as they're not on a common path that will affect your users.
There are currentely 19 million + results when searching for code using sscanf on github. I wonder how many of those are suffering under the same problem.
Genuinely confused how this guy ever thought capitalism was to blame for the GTA's loading issue. As if any other economic system wouldn't have the same result.
I don't think anyone sensible would claim they would never have made this mistake. It's not even surprising that it made it to production. What is surprising is that it lead to a massive and easily-diagnosable slowdown that no one bothered to fix for several years.
I don't find it that surprising because there's a huge lack of awareness in the industry when it comes to profiling tools. Quite often I've seen people trying to speed up programs without once profiling to see where the problem is. More than once I've seen the supposed solution (usually caching or distributing in my line of work) actually slow things down. At times it can be almost impossible to get permission to "waste" time on performance profiling because it's not fixing bugs or adding features.
I kind of expected more from game developers, but I doubt the guys shaving micro second off tight game loops are the same ones writing the asset loading code.
Devs should follow carpentry rules for this: measure twice, cut once.
Or it may have worked fine in a dev environment and it only became an issue once the game was in production for a certain amount of time.
By that point R* developers had moved on to other projects and the artists may have used a different workflow to validate things and never bothered booting up the "real" game to test things.
This, the article misses the point entirely. Either the author already knew that and used it as an opportunity to show off or has a low level common sense.
You see it if you have showdead turned on, littering the bottoms of threads. There's not very much of it though; I can't imagine it's very effective. Brand new accounts that post links in their first comments often get automatically shadow-banned, which is why the spam isn't very visible.
Same! In all these years... I am examining it now and dare is say it is just basic spam. "It's better than Tinder!" A redirect but then just a very small, simple page with no JS that looks malicious. Strange community to target. :/
Wouldn’t it be better to read the entire file into memory first? And then do the parsing in the next step? Would take more memory of course but if you’re trying to maximize performance.
Wouldn't matter at all, and the function in question is already operating on a string buffer. You might be able to get a minor boost by reading the file in parallel with parsing it, using async IO or threads, but it doesn't seem like the disk is actually the bottleneck here.
It has been a hot minute since I've touched C, so I'm failing to grok the issue here. Sscanf is reading the data variable for a float-formatted string into a float variable. How is that also getting the size? What is different about strtof? It looks from the docs that it does something similar, just without using the formatting string.
> sscanf() converts the string you pass in to an _IO_FILE* to make the string look like a "file". This is so the same internal _IO_vfscanf() can be used for both a string and a FILE*.
> However, as part of that conversion, done in a _IO_str_init_static_internal() function, it calls __rawmemchr (ptr, '\0'); essentially a strlen() call, on your input string. This conversion is done on every call to sscanf(), and since your input buffer is rather large, it'll spend a fair amount of time calculating the length of the input string.
and the proper remedy is to use fmemopen(3), which makes the temporary FILE object explicit and persistent for the whole parse, and needs just one strlen call.
Everybody in this thread seems to be missing a particularly large elephant in this particular room, which is that sscanf() supports scientific notation while strtod() and strtof() do not.
Or at least, they didn't originally support it.
Has this been fixed in the 20+ years since I noticed it and started using sscanf() everywhere instead?