Hacker News new | past | comments | ask | show | jobs | submit login

This just makes me think that null-terminated strings are the bad gift that keeps on giving. If we were to design an OS, language, or standard library in 2021 (or even 1999) we probably wouldn't use them, but we're stuck with this relic of a former era.



The thing is, they are even worse for performance than string implementations that store the length.. that extra few bits of memory is much cheaper than checking the size of a string everywhere. For example, copying a string with known length.

Also, c++’s strings even do some clever hacking where they store the text itself for shorter strings in the pointer, barring a pointer lookup. And this is possible only because abstraction.


They were designed when an extra byte or so per string cost you a lot of money. Nowadays, when 99% of the systems anyone will program start at 1MB RAM and 90% probably start at 512MB, they're a liability for almost no benefit.


You’ve got an extra byte either way, the \0 at the end. Which in many cases will make you copy a string because you can’t just “point” into a string literal and say take n chars from there. Of course I am not that old so I don’t have enough expertise — but seeming that every other language even at the time decided against it is pretty telling.


I think your parent was referring to the cost of storing a 2-byte string length instead of a 1-byte terminator. In the 1970s and 1980s, 2 bytes would likely be the minimum storage needed for the length of a general purpose string implementation. Although there were some language environments (e.g. Pascal) that had counted strings with a max length of 255.


Fair enough; but actually it can be more memory efficient as well because of the better reusability of substrings (in case of null-terminated ones only the end can be reused)


Ok, let’s assume that 10mb json source was loaded into a not null-terminated opaque struct str_t {size_t; pchar;}. You have to parse a number from a position `i’ and you have (double parse_number(str_t)). Next obvious step?


You can keep the existing sscanf function and now strlen is O(1) so the bug is gone. Any questions?


I just can’t figure out the exact code.


I think your code would be pretty much the same, sscanf, strlen and all. The main differences would be the standard library's implementations of strlen and whatever function you use to read the file into a string in the first place.


  str_t json = loadfile();
  size_t offset = random();
  sscanf(“%d”, ?);
With opaque str_t you can’t just json[offset]. Should sscanf take offset with every string (sscanf(fmt, s, off))? Should we copy a slice of json and parse it? Should str_t have zerocopy mirroring ability (s2 = strmirror(s, off, len))? How many of these three are just a snakeoil that changes nothing?

It’s only pretty much the same until you try to write actual code with a new idea in mind.


You can offset your str_t by creating a new str_t that subtracts offs from the length and adds offs to the pchar. There is no need to keep track of the offset separately.


Rust's &str type, or the non-unicode-assuming version &[u8], allow creating (sub-)slices, which probably matches your strmirror function. Except that the same syntax works for all (dynamic/static) length arrays, and even allows custom code to e.g. transparently wrap an SoA[0] representation.

[0]: https://en.wikipedia.org/wiki/AoS_and_SoA


Well, in C++, it would read:

  int target;
  sscanf(json+offset, "%d", &target)
Where str_t's operator+ would look roughly like:

  str_t str_t::operator+(size_t offset) {
    return str_t{size - offset, ptr + offset};
  }
(Might look exactly like this, if str_t's constructor would throw if size was negative.)


Okay, I see what you're saying now. I haven't worked with C strings in a while. Python uses offset parameters or seek operations in various places, and C++ string streams have an inherent position too (C++ probably has a number of other ways to do it too...).


C++'s std::string_view is essentially your struct. You can check the methods it provides.


Yes, I’m aware of it. I’m just tired by these layman’s “oh that’s another reason to ditch C strings”, when it has nothing to do with it. Working with offsets requires handling offsets and lengths, be it explicit ‘off’ and ‘n’ or a string_view. All that is needed in this case in C is snscanf (note the ‘n’), so that it would know its limits apriori, like snprintf does. Sadly that ‘n’ never made it into the standard.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: